diff --git a/containers/062325_slr_ocr_job.log b/containers/062325_slr_ocr_job.log new file mode 100644 index 0000000..ab5a3a3 --- /dev/null +++ b/containers/062325_slr_ocr_job.log @@ -0,0 +1,2615 @@ +setting up the environment by loading singularity at Mon Jun 23 22:01:45 CDT 2025 +singularity loaded, running the job command at Mon Jun 23 22:01:50 CDT 2025 +WARNING: While bind mounting '/home/nws8519/git/adaptation-slr:/home/nws8519/git/adaptation-slr/': destination is already in the mount point list + +========== +== CUDA == +========== + +CUDA Version 12.8.1 + +Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + +This container image and its contents are governed by the NVIDIA Deep Learning Container License. +By pulling and using the container, you accept the terms and conditions of this license: +https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license + +A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. + +INFO:olmocr.check:pdftoppm is installed and working. +2025-06-23 22:02:07,072 - __main__ - INFO - Got --pdfs argument, going to add to the work queue +2025-06-23 22:02:07,096 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf as PDF document +2025-06-23 22:02:07,114 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/002-barcomb.pdf as PDF document +2025-06-23 22:02:07,140 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf as PDF document +2025-06-23 22:02:07,152 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf as PDF document +2025-06-23 22:02:07,167 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/005-crowston-shamshurin.pdf as PDF document +2025-06-23 22:02:07,184 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/006-franke.pdf as PDF document +2025-06-23 22:02:07,205 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf as PDF document +2025-06-23 22:02:07,221 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/008-geiger.pdf as PDF document +2025-06-23 22:02:07,237 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/009-hsieh.pdf as PDF document +2025-06-23 22:02:07,254 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/010-hu.pdf as PDF document +2025-06-23 22:02:07,300 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf as PDF document +2025-06-23 22:02:07,322 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/012-jensen-scacchi.pdf as PDF document +2025-06-23 22:02:07,349 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/013-klug.pdf as PDF document +2025-06-23 22:02:07,388 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf as PDF document +2025-06-23 22:02:07,409 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf as PDF document +2025-06-23 22:02:07,426 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf as PDF document +2025-06-23 22:02:07,441 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/017-wessel.pdf as PDF document +2025-06-23 22:02:07,457 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/018-yin.pdf as PDF document +2025-06-23 22:02:07,470 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/019_ding.pdf as PDF document +2025-06-23 22:02:07,489 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf as PDF document +2025-06-23 22:02:07,520 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf as PDF document +2025-06-23 22:02:07,543 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf as PDF document +2025-06-23 22:02:07,578 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf as PDF document +2025-06-23 22:02:07,602 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/024_zhou.pdf as PDF document +2025-06-23 22:02:07,625 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf as PDF document +2025-06-23 22:02:07,644 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/026_vendome.pdf as PDF document +2025-06-23 22:02:07,667 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/027_vendome.pdf as PDF document +2025-06-23 22:02:07,680 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/028_meloca.pdf as PDF document +2025-06-23 22:02:07,691 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/029_heinemann.pdf as PDF document +2025-06-23 22:02:07,715 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/030_abdalkareem.pdf as PDF document +2025-06-23 22:02:07,744 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf as PDF document +2025-06-23 22:02:07,769 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf as PDF document +2025-06-23 22:02:07,787 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/033_businger.pdf as PDF document +2025-06-23 22:02:07,804 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/034_zhang.pdf as PDF document +2025-06-23 22:02:07,804 - __main__ - INFO - Found 34 total pdf paths to add + Sampling PDFs to calculate optimal length: 0%| | 0/34 [00:00 +2025-06-23 22:05:08,639 - __main__ - INFO - INFO 06-23 22:05:08 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 +2025-06-23 22:05:08,641 - __main__ - WARNING - Attempt 42: Please wait for vllm server to become ready... +2025-06-23 22:05:09,363 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. +2025-06-23 22:05:09,694 - __main__ - WARNING - Attempt 43: Please wait for vllm server to become ready... +2025-06-23 22:05:10,340 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. +2025-06-23 22:05:10,753 - __main__ - WARNING - Attempt 44: Please wait for vllm server to become ready... +2025-06-23 22:05:11,807 - __main__ - WARNING - Attempt 45: Please wait for vllm server to become ready... +2025-06-23 22:05:12,861 - __main__ - WARNING - Attempt 46: Please wait for vllm server to become ready... +2025-06-23 22:05:13,914 - __main__ - WARNING - Attempt 47: Please wait for vllm server to become ready... +2025-06-23 22:05:14,172 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. +2025-06-23 22:05:14,675 - __main__ - INFO - INFO 06-23 22:05:14 [topk_topp_sampler.py:49] Using FlashInfer for top-p & top-k sampling. +2025-06-23 22:05:14,783 - __main__ - INFO - INFO 06-23 22:05:14 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview... +2025-06-23 22:05:14,968 - __main__ - WARNING - Attempt 48: Please wait for vllm server to become ready... +2025-06-23 22:05:15,062 - __main__ - INFO - INFO 06-23 22:05:15 [gpu_model_runner.py:1600] Loading model from scratch... +2025-06-23 22:05:15,806 - __main__ - INFO - WARNING 06-23 22:05:15 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. +2025-06-23 22:05:16,025 - __main__ - WARNING - Attempt 49: Please wait for vllm server to become ready... +2025-06-23 22:05:16,059 - __main__ - INFO - INFO 06-23 22:05:16 [cuda.py:252] Using Flash Attention backend on V1 engine. +2025-06-23 22:05:16,451 - __main__ - INFO - INFO 06-23 22:05:16 [weight_utils.py:292] Using model weights format ['*.safetensors'] +2025-06-23 22:05:16,774 - __main__ - INFO - Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00 [Errno 104] Connection reset by peer +2025-06-23 22:08:33,587 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-7 to allow server restart +2025-06-23 22:08:33,587 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9: [Errno 104] Connection reset by peer +2025-06-23 22:08:33,587 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9 to allow server restart +2025-06-23 22:08:34,131 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:08:34,131 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +2025-06-23 22:08:34,131 - __main__ - INFO - +Worker ID | started +----------+-------- +0 | 533 +1 | 276 +2025-06-23 22:08:37,686 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,686 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35 to allow server restart +2025-06-23 22:08:37,686 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23 to allow server restart +2025-06-23 22:08:37,687 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42 to allow server restart +2025-06-23 22:08:37,687 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44 to allow server restart +2025-06-23 22:08:37,688 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,688 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27 to allow server restart +2025-06-23 22:08:37,688 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,688 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41 to allow server restart +2025-06-23 22:08:37,689 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,689 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43 to allow server restart +2025-06-23 22:08:37,689 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38 to allow server restart +2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45 to allow server restart +2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38 to allow server restart +2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36 to allow server restart +2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37 to allow server restart +2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29 to allow server restart +2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7 to allow server restart +2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25 to allow server restart +2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32 to allow server restart +2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34 to allow server restart +2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24 to allow server restart +2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,692 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31 to allow server restart +2025-06-23 22:08:37,692 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11: [Errno 104] Connection reset by peer +2025-06-23 22:08:37,692 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11 to allow server restart +2025-06-23 22:08:41,816 - __main__ - INFO - vllm running req: 60 queue req: 151 +2025-06-23 22:08:43,932 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-7 +2025-06-23 22:08:43,942 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9 +2025-06-23 22:08:44,132 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:08:44,132 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +2025-06-23 22:08:44,133 - __main__ - INFO - +Worker ID | started +----------+-------- +0 | 533 +1 | 276 +2025-06-23 22:08:48,490 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45 +2025-06-23 22:08:48,725 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24 +2025-06-23 22:08:48,775 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35 +2025-06-23 22:08:48,785 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41 +2025-06-23 22:08:48,808 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38 +2025-06-23 22:08:48,809 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34 +2025-06-23 22:08:48,829 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44 +2025-06-23 22:08:48,856 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29 +2025-06-23 22:08:48,882 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25 +2025-06-23 22:08:48,906 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32 +2025-06-23 22:08:48,907 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31 +2025-06-23 22:08:48,909 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23 +2025-06-23 22:08:48,938 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36 +2025-06-23 22:08:48,981 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11 +2025-06-23 22:08:48,989 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43 +2025-06-23 22:08:49,007 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37 +2025-06-23 22:08:49,026 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38 +2025-06-23 22:08:49,026 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42 +2025-06-23 22:08:49,062 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27 +2025-06-23 22:08:49,096 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7 +2025-06-23 22:08:49,459 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,459 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24 to allow server restart +2025-06-23 22:08:49,459 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,459 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6 to allow server restart +2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1 to allow server restart +2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14 to allow server restart +2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15 to allow server restart +2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11 to allow server restart +2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8 to allow server restart +2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4 to allow server restart +2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5 to allow server restart +2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31 to allow server restart +2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11 to allow server restart +2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34 to allow server restart +2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36 to allow server restart +2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37 to allow server restart +2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40 to allow server restart +2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2 to allow server restart +2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33 to allow server restart +2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41 to allow server restart +2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35 to allow server restart +2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42 to allow server restart +2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39 to allow server restart +2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3: [Errno 104] Connection reset by peer +2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3 to allow server restart +2025-06-23 22:08:52,006 - __main__ - INFO - vllm running req: 75 queue req: 261 +2025-06-23 22:08:52,530 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24: [Errno 104] Connection reset by peer +2025-06-23 22:08:52,530 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24 to allow server restart +2025-06-23 22:08:52,530 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23: [Errno 104] Connection reset by peer +2025-06-23 22:08:52,530 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23 to allow server restart +2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10: [Errno 104] Connection reset by peer +2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10 to allow server restart +2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28: [Errno 104] Connection reset by peer +2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28 to allow server restart +2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8: [Errno 104] Connection reset by peer +2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8 to allow server restart +2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27: [Errno 104] Connection reset by peer +2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27 to allow server restart +2025-06-23 22:08:53,043 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,043 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5 to allow server restart +2025-06-23 22:08:53,043 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,043 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4 to allow server restart +2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5 to allow server restart +2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6 to allow server restart +2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2 to allow server restart +2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1 to allow server restart +2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12 to allow server restart +2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46: [Errno 104] Connection reset by peer +2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46 to allow server restart +2025-06-23 22:08:54,134 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:08:54,134 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +2025-06-23 22:08:54,134 - __main__ - INFO - +Worker ID | started +----------+-------- +0 | 533 +1 | 276 +2025-06-23 22:08:56,626 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36: [Errno 104] Connection reset by peer +2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36 to allow server restart +2025-06-23 22:08:56,627 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47: [Errno 104] Connection reset by peer +2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47 to allow server restart +2025-06-23 22:08:56,627 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39: [Errno 104] Connection reset by peer +2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39 to allow server restart +2025-06-23 22:08:59,800 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24 +2025-06-23 22:08:59,806 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6 +2025-06-23 22:09:00,593 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3 +2025-06-23 22:09:00,636 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2 +2025-06-23 22:09:00,663 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4 +2025-06-23 22:09:00,794 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41 +2025-06-23 22:09:00,830 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1 +2025-06-23 22:09:00,945 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37 +2025-06-23 22:09:00,971 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39 +2025-06-23 22:09:00,975 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5 +2025-06-23 22:09:00,975 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42 +2025-06-23 22:09:00,992 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11 +2025-06-23 22:09:01,003 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8 +2025-06-23 22:09:01,003 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15 +2025-06-23 22:09:01,005 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14 +2025-06-23 22:09:01,044 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40 +2025-06-23 22:09:01,069 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35 +2025-06-23 22:09:01,075 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33 +2025-06-23 22:09:01,162 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31 +2025-06-23 22:09:01,223 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11 +2025-06-23 22:09:01,243 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36 +2025-06-23 22:09:01,247 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34 +2025-06-23 22:09:02,117 - __main__ - INFO - vllm running req: 68 queue req: 383 +2025-06-23 22:09:03,397 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28 +2025-06-23 22:09:03,497 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24 +2025-06-23 22:09:03,516 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23 +2025-06-23 22:09:03,564 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10 +2025-06-23 22:09:03,614 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8 +2025-06-23 22:09:03,688 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27 +2025-06-23 22:09:03,736 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4 +2025-06-23 22:09:03,910 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5 +2025-06-23 22:09:03,964 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12 +2025-06-23 22:09:03,973 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46 +2025-06-23 22:09:03,977 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2 +2025-06-23 22:09:03,986 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1 +2025-06-23 22:09:03,996 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5 +2025-06-23 22:09:04,135 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:09:04,135 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.00 0.00 +server_input_tokens 4.16 5.79 +server_output_tokens 0.92 1.28 +2025-06-23 22:09:04,136 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 1 | 533 +1 | 0 | 276 +2025-06-23 22:09:04,192 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6 +2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4: [Errno 104] Connection reset by peer +2025-06-23 22:09:04,307 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4 to allow server restart +2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9: [Errno 104] Connection reset by peer +2025-06-23 22:09:04,307 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9 to allow server restart +2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51: [Errno 104] Connection reset by peer +2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51 to allow server restart +2025-06-23 22:09:04,308 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50: [Errno 104] Connection reset by peer +2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50 to allow server restart +2025-06-23 22:09:04,308 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49: [Errno 104] Connection reset by peer +2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49 to allow server restart +2025-06-23 22:09:06,960 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36 +2025-06-23 22:09:06,963 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47 +2025-06-23 22:09:06,973 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39 +2025-06-23 22:09:12,139 - __main__ - INFO - vllm running req: 62 queue req: 489 +2025-06-23 22:09:14,137 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:09:14,137 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.03 0.05 +server_input_tokens 72.62 103.42 +server_output_tokens 19.92 28.37 +2025-06-23 22:09:14,137 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 14 | 533 +1 | 0 | 276 +2025-06-23 22:09:14,721 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49 +2025-06-23 22:09:14,787 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51 +2025-06-23 22:09:14,793 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50 +2025-06-23 22:09:14,890 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9 +2025-06-23 22:09:14,945 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4 +2025-06-23 22:09:22,266 - __main__ - INFO - vllm running req: 60 queue req: 610 +2025-06-23 22:09:24,139 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:09:24,139 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.06 0.08 +server_input_tokens 135.92 198.10 +server_output_tokens 38.88 56.67 +2025-06-23 22:09:24,139 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 25 | 533 +1 | 0 | 276 +2025-06-23 22:09:32,267 - __main__ - INFO - vllm running req: 64 queue req: 709 +2025-06-23 22:09:34,141 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:09:34,142 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.08 0.13 +server_input_tokens 215.97 321.97 +server_output_tokens 62.79 93.60 +2025-06-23 22:09:34,142 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 38 | 533 +1 | 0 | 276 +2025-06-23 22:09:42,269 - __main__ - INFO - vllm running req: 68 queue req: 694 +2025-06-23 22:09:44,143 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:09:44,144 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.11 0.16 +server_input_tokens 277.54 423.01 +server_output_tokens 81.61 124.38 +2025-06-23 22:09:44,144 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 49 | 533 +1 | 0 | 276 +2025-06-23 22:09:52,269 - __main__ - INFO - vllm running req: 65 queue req: 683 +2025-06-23 22:09:54,145 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:09:54,146 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.13 0.21 +server_input_tokens 353.36 550.36 +server_output_tokens 106.73 166.23 +2025-06-23 22:09:54,146 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 63 | 533 +1 | 0 | 276 +2025-06-23 22:10:02,271 - __main__ - INFO - vllm running req: 65 queue req: 674 +2025-06-23 22:10:04,147 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:10:04,147 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.15 0.24 +server_input_tokens 389.51 619.66 +server_output_tokens 118.93 189.19 +2025-06-23 22:10:04,147 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 71 | 533 +1 | 0 | 276 +2025-06-23 22:10:12,273 - __main__ - INFO - vllm running req: 61 queue req: 671 +2025-06-23 22:10:14,148 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:10:14,148 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.16 0.27 +server_input_tokens 428.79 696.43 +server_output_tokens 135.56 220.17 +2025-06-23 22:10:14,148 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 80 | 533 +1 | 0 | 276 +2025-06-23 22:10:22,273 - __main__ - INFO - vllm running req: 63 queue req: 654 +2025-06-23 22:10:24,149 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:10:24,149 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.19 0.31 +server_input_tokens 490.36 812.79 +server_output_tokens 160.71 266.37 +2025-06-23 22:10:24,150 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 94 | 533 +1 | 0 | 276 +2025-06-23 22:10:32,274 - __main__ - INFO - vllm running req: 62 queue req: 643 +2025-06-23 22:10:34,150 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:10:34,151 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.21 0.36 +server_input_tokens 543.42 918.85 +server_output_tokens 179.77 303.97 +2025-06-23 22:10:34,151 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 105 | 533 +1 | 2 | 276 +2025-06-23 22:10:42,274 - __main__ - INFO - vllm running req: 62 queue req: 630 +2025-06-23 22:10:44,154 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:10:44,154 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.23 0.40 +server_input_tokens 596.17 1027.93 +server_output_tokens 200.04 344.92 +2025-06-23 22:10:44,154 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 117 | 533 +1 | 3 | 276 +2025-06-23 22:10:52,275 - __main__ - INFO - vllm running req: 61 queue req: 623 +2025-06-23 22:10:54,155 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:10:54,155 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.24 0.42 +server_input_tokens 632.42 1111.50 +server_output_tokens 217.21 381.76 +2025-06-23 22:10:54,156 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 122 | 533 +1 | 5 | 276 +2025-06-23 22:11:02,276 - __main__ - INFO - vllm running req: 63 queue req: 612 +2025-06-23 22:11:04,156 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:11:04,157 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.25 0.45 +server_input_tokens 663.06 1187.46 +server_output_tokens 229.91 411.74 +2025-06-23 22:11:04,157 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 126 | 533 +1 | 9 | 276 +2025-06-23 22:11:12,277 - __main__ - INFO - vllm running req: 63 queue req: 602 +2025-06-23 22:11:14,158 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:11:14,159 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.26 0.48 +server_input_tokens 703.41 1283.18 +server_output_tokens 244.87 446.69 +2025-06-23 22:11:14,159 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 132 | 533 +1 | 13 | 276 +2025-06-23 22:11:22,279 - __main__ - INFO - vllm running req: 62 queue req: 590 +2025-06-23 22:11:24,160 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:11:24,160 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.28 0.53 +server_input_tokens 754.96 1402.39 +server_output_tokens 261.06 484.94 +2025-06-23 22:11:24,160 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 140 | 533 +1 | 18 | 276 +2025-06-23 22:11:32,281 - __main__ - INFO - vllm running req: 64 queue req: 577 +2025-06-23 22:11:34,161 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:11:34,161 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.30 0.57 +server_input_tokens 799.80 1512.33 +server_output_tokens 279.24 528.01 +2025-06-23 22:11:34,161 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 147 | 533 +1 | 23 | 276 +2025-06-23 22:11:42,281 - __main__ - INFO - vllm running req: 64 queue req: 565 +2025-06-23 22:11:44,162 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:11:44,163 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.31 0.60 +server_input_tokens 833.40 1603.66 +server_output_tokens 293.39 564.55 +2025-06-23 22:11:44,163 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 152 | 533 +1 | 28 | 276 +2025-06-23 22:11:52,281 - __main__ - INFO - vllm running req: 63 queue req: 556 +2025-06-23 22:11:54,163 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:11:54,164 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.33 0.64 +server_input_tokens 882.61 1727.78 +server_output_tokens 313.58 613.85 +2025-06-23 22:11:54,164 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 158 | 533 +1 | 34 | 276 +2025-06-23 22:12:02,281 - __main__ - INFO - vllm running req: 61 queue req: 550 +2025-06-23 22:12:04,165 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:12:04,165 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.34 0.67 +server_input_tokens 908.52 1808.77 +server_output_tokens 322.85 642.77 +2025-06-23 22:12:04,165 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 161 | 533 +1 | 40 | 276 +2025-06-23 22:12:12,282 - __main__ - INFO - vllm running req: 62 queue req: 537 +2025-06-23 22:12:14,166 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:12:14,167 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.35 0.71 +server_input_tokens 944.56 1912.03 +server_output_tokens 333.03 674.14 +2025-06-23 22:12:14,167 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 162 | 533 +1 | 50 | 276 +2025-06-23 22:12:22,284 - __main__ - INFO - vllm running req: 63 queue req: 526 +2025-06-23 22:12:24,167 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:12:24,168 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.36 0.74 +server_input_tokens 978.24 2012.82 +server_output_tokens 345.07 710.01 +2025-06-23 22:12:24,168 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 162 | 533 +1 | 61 | 276 +2025-06-23 22:12:32,284 - __main__ - INFO - vllm running req: 66 queue req: 510 +2025-06-23 22:12:34,169 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:12:34,169 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.37 0.78 +server_input_tokens 1021.46 2135.79 +server_output_tokens 365.30 763.81 +2025-06-23 22:12:34,169 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 167 | 533 +1 | 68 | 276 +2025-06-23 22:12:42,286 - __main__ - INFO - vllm running req: 66 queue req: 501 +2025-06-23 22:12:44,170 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:12:44,170 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.38 0.82 +server_input_tokens 1057.70 2246.84 +server_output_tokens 383.96 815.63 +2025-06-23 22:12:44,170 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 171 | 533 +1 | 74 | 276 +2025-06-23 22:12:52,287 - __main__ - INFO - vllm running req: 64 queue req: 490 +2025-06-23 22:12:54,172 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:12:54,172 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.40 0.86 +server_input_tokens 1088.49 2348.52 +server_output_tokens 392.48 846.82 +2025-06-23 22:12:54,172 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 175 | 533 +1 | 83 | 276 +2025-06-23 22:13:02,288 - __main__ - INFO - vllm running req: 61 queue req: 481 +2025-06-23 22:13:04,173 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:13:04,174 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.41 0.90 +server_input_tokens 1123.77 2462.11 +server_output_tokens 402.62 882.11 +2025-06-23 22:13:04,174 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 177 | 533 +1 | 92 | 276 +2025-06-23 22:13:12,289 - __main__ - INFO - vllm running req: 58 queue req: 472 +2025-06-23 22:13:14,175 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:13:14,175 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.42 0.94 +server_input_tokens 1152.91 2564.40 +server_output_tokens 411.53 915.35 +2025-06-23 22:13:14,175 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 185 | 533 +1 | 97 | 276 +2025-06-23 22:13:22,292 - __main__ - INFO - vllm running req: 61 queue req: 456 +2025-06-23 22:13:24,176 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:13:24,176 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.43 0.98 +server_input_tokens 1189.02 2684.34 +server_output_tokens 424.32 957.96 +2025-06-23 22:13:24,177 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 192 | 533 +1 | 102 | 276 +2025-06-23 22:13:32,293 - __main__ - INFO - vllm running req: 59 queue req: 449 +2025-06-23 22:13:34,177 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:13:34,177 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.44 1.01 +server_input_tokens 1207.97 2767.41 +server_output_tokens 430.83 987.00 +2025-06-23 22:13:34,178 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 199 | 533 +1 | 104 | 276 +2025-06-23 22:13:42,293 - __main__ - INFO - vllm running req: 62 queue req: 436 +2025-06-23 22:13:44,179 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:13:44,179 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.45 1.04 +server_input_tokens 1232.57 2864.85 +server_output_tokens 438.51 1019.23 +2025-06-23 22:13:44,180 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 206 | 533 +1 | 106 | 276 +2025-06-23 22:13:52,293 - __main__ - INFO - vllm running req: 59 queue req: 427 +2025-06-23 22:13:54,181 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:13:54,181 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.46 1.08 +server_input_tokens 1269.62 2993.30 +server_output_tokens 449.30 1059.29 +2025-06-23 22:13:54,181 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 218 | 533 +1 | 107 | 276 +2025-06-23 22:14:02,294 - __main__ - INFO - vllm running req: 65 queue req: 412 +2025-06-23 22:14:04,182 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:14:04,182 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.47 1.11 +server_input_tokens 1296.63 3094.40 +server_output_tokens 457.29 1092.08 +2025-06-23 22:14:04,182 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 227 | 533 +1 | 107 | 276 +2025-06-23 22:14:12,294 - __main__ - INFO - vllm running req: 62 queue req: 402 +2025-06-23 22:14:14,183 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:14:14,184 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.48 1.11 +server_input_tokens 1322.77 3103.36 +server_output_tokens 466.04 1101.46 +2025-06-23 22:14:14,184 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 240 | 533 +1 | 107 | 276 +2025-06-23 22:14:22,297 - __main__ - INFO - vllm running req: 60 queue req: 398 +2025-06-23 22:14:24,185 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:14:24,185 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.48 1.10 +server_input_tokens 1332.38 3084.51 +server_output_tokens 468.30 1097.13 +2025-06-23 22:14:24,185 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 246 | 533 +1 | 108 | 276 +2025-06-23 22:14:32,297 - __main__ - INFO - vllm running req: 61 queue req: 385 +2025-06-23 22:14:34,186 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:14:34,186 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.49 1.09 +server_input_tokens 1364.44 3076.82 +server_output_tokens 479.98 1102.01 +2025-06-23 22:14:34,186 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 252 | 533 +1 | 113 | 276 +2025-06-23 22:14:42,298 - __main__ - INFO - vllm running req: 64 queue req: 371 +2025-06-23 22:14:44,187 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:14:44,188 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.50 1.09 +server_input_tokens 1391.13 3088.64 +server_output_tokens 489.22 1110.56 +2025-06-23 22:14:44,188 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 259 | 533 +1 | 118 | 276 +2025-06-23 22:14:52,298 - __main__ - INFO - vllm running req: 65 queue req: 359 +2025-06-23 22:14:52,928 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:14:52,929 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:14:53,075 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:14:54,190 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:14:54,190 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.50 1.08 +server_input_tokens 1412.24 3073.89 +server_output_tokens 496.47 1107.94 +2025-06-23 22:14:54,190 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 262 | 533 +1 | 123 | 276 +2025-06-23 22:15:02,300 - __main__ - INFO - vllm running req: 62 queue req: 354 +2025-06-23 22:15:04,191 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:15:04,191 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.51 1.08 +server_input_tokens 1425.85 3074.71 +server_output_tokens 500.91 1108.67 +2025-06-23 22:15:04,192 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 269 | 533 +1 | 126 | 276 +2025-06-23 22:15:12,302 - __main__ - INFO - vllm running req: 63 queue req: 343 +2025-06-23 22:15:14,193 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:15:14,193 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.52 1.09 +server_input_tokens 1450.64 3118.13 +server_output_tokens 508.76 1117.77 +2025-06-23 22:15:14,193 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 275 | 533 +1 | 131 | 276 +2025-06-23 22:15:22,303 - __main__ - INFO - vllm running req: 62 queue req: 331 +2025-06-23 22:15:22,740 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14 +2025-06-23 22:15:22,740 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14: - Response exceeded model_max_context, cannot use this response +2025-06-23 22:15:23,013 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14 +2025-06-23 22:15:24,194 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:15:24,194 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.52 1.08 +server_input_tokens 1466.97 3085.92 +server_output_tokens 514.14 1100.03 +2025-06-23 22:15:24,194 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 279 | 533 +1 | 138 | 276 +2025-06-23 22:15:30,120 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5: Invalid \escape: line 1 column 1851 (char 1850) +2025-06-23 22:15:30,121 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5 +2025-06-23 22:15:30,428 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5 +2025-06-23 22:15:32,305 - __main__ - INFO - vllm running req: 64 queue req: 323 +2025-06-23 22:15:34,195 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:15:34,196 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.53 1.06 +server_input_tokens 1479.70 3070.82 +server_output_tokens 518.29 1093.48 +2025-06-23 22:15:34,196 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 282 | 533 +1 | 142 | 276 +2025-06-23 22:15:42,305 - __main__ - INFO - vllm running req: 62 queue req: 314 +2025-06-23 22:15:44,198 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:15:44,198 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.53 1.05 +server_input_tokens 1496.43 3056.93 +server_output_tokens 525.01 1088.33 +2025-06-23 22:15:44,198 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 289 | 533 +1 | 145 | 276 +2025-06-23 22:15:52,307 - __main__ - INFO - vllm running req: 64 queue req: 301 +2025-06-23 22:15:54,200 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:15:54,200 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.54 1.07 +server_input_tokens 1520.12 3093.15 +server_output_tokens 533.95 1098.93 +2025-06-23 22:15:54,201 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 294 | 533 +1 | 152 | 276 +2025-06-23 22:16:02,308 - __main__ - INFO - vllm running req: 64 queue req: 289 +2025-06-23 22:16:04,203 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:16:04,203 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.55 1.08 +server_input_tokens 1536.45 3107.65 +server_output_tokens 539.99 1097.50 +2025-06-23 22:16:04,203 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 304 | 533 +1 | 154 | 276 +2025-06-23 22:16:12,309 - __main__ - INFO - vllm running req: 60 queue req: 276 +2025-06-23 22:16:14,205 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:16:14,205 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.56 1.10 +server_input_tokens 1565.93 3147.35 +server_output_tokens 550.10 1109.60 +2025-06-23 22:16:14,205 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 319 | 533 +1 | 155 | 276 +2025-06-23 22:16:22,309 - __main__ - INFO - vllm running req: 61 queue req: 263 +2025-06-23 22:16:24,207 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:16:24,208 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.57 1.10 +server_input_tokens 1588.32 3144.31 +server_output_tokens 556.64 1108.31 +2025-06-23 22:16:24,208 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 331 | 533 +1 | 156 | 276 +2025-06-23 22:16:26,616 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18 +2025-06-23 22:16:26,616 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18: - Response exceeded model_max_context, cannot use this response +2025-06-23 22:16:26,943 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18 +2025-06-23 22:16:32,312 - __main__ - INFO - vllm running req: 60 queue req: 257 +2025-06-23 22:16:34,209 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:16:34,210 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.57 1.08 +server_input_tokens 1594.86 3105.04 +server_output_tokens 557.18 1084.24 +2025-06-23 22:16:34,210 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 337 | 533 +1 | 157 | 276 +2025-06-23 22:16:42,313 - __main__ - INFO - vllm running req: 60 queue req: 250 +2025-06-23 22:16:44,212 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:16:44,213 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.57 1.07 +server_input_tokens 1591.23 3069.89 +server_output_tokens 555.73 1069.02 +2025-06-23 22:16:44,213 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 340 | 533 +1 | 159 | 276 +2025-06-23 22:16:52,313 - __main__ - INFO - vllm running req: 58 queue req: 243 +2025-06-23 22:16:54,214 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:16:54,214 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.57 1.06 +server_input_tokens 1609.73 3033.38 +server_output_tokens 564.13 1054.70 +2025-06-23 22:16:54,214 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 348 | 533 +1 | 162 | 276 +2025-06-23 22:17:02,316 - __main__ - INFO - vllm running req: 62 queue req: 228 +2025-06-23 22:17:04,216 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:17:04,217 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.58 1.07 +server_input_tokens 1630.51 3077.18 +server_output_tokens 576.79 1085.11 +2025-06-23 22:17:04,217 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 357 | 533 +1 | 164 | 276 +2025-06-23 22:17:12,317 - __main__ - INFO - vllm running req: 61 queue req: 218 +2025-06-23 22:17:14,219 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:17:14,219 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.59 1.07 +server_input_tokens 1647.83 3080.03 +server_output_tokens 581.99 1088.56 +2025-06-23 22:17:14,219 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 367 | 533 +1 | 166 | 276 +2025-06-23 22:17:22,318 - __main__ - INFO - vllm running req: 64 queue req: 204 +2025-06-23 22:17:24,221 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:17:24,221 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.59 1.07 +server_input_tokens 1663.26 3073.03 +server_output_tokens 588.56 1089.67 +2025-06-23 22:17:24,221 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 375 | 533 +1 | 168 | 276 +2025-06-23 22:17:32,321 - __main__ - INFO - vllm running req: 69 queue req: 189 +2025-06-23 22:17:34,222 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:17:34,222 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.60 1.07 +server_input_tokens 1685.07 3072.93 +server_output_tokens 594.50 1073.86 +2025-06-23 22:17:34,223 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 385 | 533 +1 | 170 | 276 +2025-06-23 22:17:42,321 - __main__ - INFO - vllm running req: 69 queue req: 173 +2025-06-23 22:17:44,224 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:17:44,224 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.61 1.08 +server_input_tokens 1706.87 3096.57 +server_output_tokens 600.92 1067.50 +2025-06-23 22:17:44,224 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 396 | 533 +1 | 172 | 276 +2025-06-23 22:17:52,322 - __main__ - INFO - vllm running req: 60 queue req: 166 +2025-06-23 22:17:54,225 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:17:54,225 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.62 1.09 +server_input_tokens 1731.66 3125.79 +server_output_tokens 604.26 1062.50 +2025-06-23 22:17:54,226 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 412 | 533 +1 | 173 | 276 +2025-06-23 22:18:02,324 - __main__ - INFO - vllm running req: 63 queue req: 152 +2025-06-23 22:18:04,227 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:18:04,228 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.62 1.10 +server_input_tokens 1737.98 3124.56 +server_output_tokens 605.55 1061.80 +2025-06-23 22:18:04,228 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 421 | 533 +1 | 173 | 276 +2025-06-23 22:18:12,324 - __main__ - INFO - vllm running req: 63 queue req: 144 +2025-06-23 22:18:14,230 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:18:14,230 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.62 1.08 +server_input_tokens 1748.20 3089.42 +server_output_tokens 609.36 1053.60 +2025-06-23 22:18:14,230 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 427 | 533 +1 | 176 | 276 +2025-06-23 22:18:22,325 - __main__ - INFO - vllm running req: 63 queue req: 131 +2025-06-23 22:18:24,232 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:18:24,232 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.63 1.08 +server_input_tokens 1768.02 3093.29 +server_output_tokens 615.63 1050.90 +2025-06-23 22:18:24,232 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 442 | 533 +1 | 176 | 276 +2025-06-23 22:18:32,326 - __main__ - INFO - vllm running req: 62 queue req: 116 +2025-06-23 22:18:34,234 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:18:34,234 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.64 1.10 +server_input_tokens 1788.03 3117.25 +server_output_tokens 621.54 1058.56 +2025-06-23 22:18:34,235 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 456 | 533 +1 | 176 | 276 +2025-06-23 22:18:42,328 - __main__ - INFO - vllm running req: 64 queue req: 106 +2025-06-23 22:18:44,236 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:18:44,237 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.64 1.10 +server_input_tokens 1796.36 3113.99 +server_output_tokens 625.59 1062.35 +2025-06-23 22:18:44,237 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 464 | 533 +1 | 177 | 276 +2025-06-23 22:18:52,329 - __main__ - INFO - vllm running req: 64 queue req: 95 +2025-06-23 22:18:54,238 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:18:54,238 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.65 1.09 +server_input_tokens 1813.96 3097.64 +server_output_tokens 631.77 1062.07 +2025-06-23 22:18:54,238 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 476 | 533 +1 | 177 | 276 +2025-06-23 22:19:02,330 - __main__ - INFO - vllm running req: 66 queue req: 83 +2025-06-23 22:19:04,239 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:19:04,239 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.65 1.09 +server_input_tokens 1824.70 3087.65 +server_output_tokens 632.50 1051.53 +2025-06-23 22:19:04,239 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 485 | 533 +1 | 177 | 276 +2025-06-23 22:19:12,332 - __main__ - INFO - vllm running req: 66 queue req: 71 +2025-06-23 22:19:14,241 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:19:14,241 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.66 1.09 +server_input_tokens 1839.87 3109.67 +server_output_tokens 636.03 1053.52 +2025-06-23 22:19:14,241 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 496 | 533 +1 | 177 | 276 +2025-06-23 22:19:22,334 - __main__ - INFO - vllm running req: 67 queue req: 58 +2025-06-23 22:19:24,243 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:19:24,243 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.66 1.11 +server_input_tokens 1864.18 3171.49 +server_output_tokens 643.90 1075.60 +2025-06-23 22:19:24,244 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 510 | 533 +1 | 178 | 276 +2025-06-23 22:19:32,335 - __main__ - INFO - vllm running req: 67 queue req: 48 +2025-06-23 22:19:34,245 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:19:34,245 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.67 1.11 +server_input_tokens 1870.93 3132.95 +server_output_tokens 646.43 1061.19 +2025-06-23 22:19:34,245 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 517 | 533 +1 | 180 | 276 +2025-06-23 22:19:38,303 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8: Expecting ',' delimiter: line 1 column 2504 (char 2503) +2025-06-23 22:19:38,303 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8 +2025-06-23 22:19:38,539 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8 +2025-06-23 22:19:42,337 - __main__ - INFO - vllm running req: 67 queue req: 35 +2025-06-23 22:19:44,246 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:19:44,246 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.67 1.11 +server_input_tokens 1885.46 3133.66 +server_output_tokens 650.63 1058.22 +2025-06-23 22:19:44,246 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 518 | 533 +1 | 192 | 276 +2025-06-23 22:19:52,337 - __main__ - INFO - vllm running req: 63 queue req: 28 +2025-06-23 22:19:54,249 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:19:54,249 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.68 1.12 +server_input_tokens 1893.00 3122.97 +server_output_tokens 652.22 1050.72 +2025-06-23 22:19:54,249 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 519 | 533 +1 | 202 | 276 +2025-06-23 22:20:02,338 - __main__ - INFO - vllm running req: 64 queue req: 12 +2025-06-23 22:20:04,250 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:20:04,250 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.68 1.14 +server_input_tokens 1913.56 3177.61 +server_output_tokens 659.94 1072.11 +2025-06-23 22:20:04,250 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 521 | 533 +1 | 216 | 276 +2025-06-23 22:20:12,339 - __main__ - INFO - vllm running req: 63 queue req: 1 +2025-06-23 22:20:14,251 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:20:14,252 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.69 1.14 +server_input_tokens 1927.76 3180.25 +server_output_tokens 664.76 1074.28 +2025-06-23 22:20:14,252 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 523 | 533 +1 | 225 | 276 +2025-06-23 22:20:21,974 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13 +2025-06-23 22:20:21,975 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13: - Response exceeded model_max_context, cannot use this response +2025-06-23 22:20:22,274 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13 +2025-06-23 22:20:22,340 - __main__ - INFO - vllm running req: 42 queue req: 0 +2025-06-23 22:20:24,253 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:20:24,253 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.70 1.17 +server_input_tokens 1961.53 3284.33 +server_output_tokens 676.07 1109.23 +2025-06-23 22:20:24,253 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 527 | 533 +1 | 241 | 276 +2025-06-23 22:20:32,341 - __main__ - INFO - vllm running req: 22 queue req: 0 +2025-06-23 22:20:34,254 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:20:34,255 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.71 1.22 +server_input_tokens 2004.27 3416.28 +server_output_tokens 690.33 1153.40 +2025-06-23 22:20:34,255 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 528 | 533 +1 | 261 | 276 +2025-06-23 22:20:42,342 - __main__ - INFO - vllm running req: 9 queue req: 0 +2025-06-23 22:20:44,256 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:20:44,256 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.72 1.23 +server_input_tokens 2020.84 3488.51 +server_output_tokens 697.63 1182.37 +2025-06-23 22:20:44,256 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 529 | 533 +1 | 271 | 276 +2025-06-23 22:20:46,223 - __main__ - WARNING - JSON decode error on attempt 1 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:20:46,223 - __main__ - INFO - Reducing anchor text len to 1500 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:20:46,372 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:20:52,343 - __main__ - INFO - vllm running req: 6 queue req: 0 +2025-06-23 22:20:54,257 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:20:54,258 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.71 1.19 +server_input_tokens 2019.62 3397.47 +server_output_tokens 699.92 1157.77 +2025-06-23 22:20:54,258 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 529 | 533 +1 | 275 | 276 +2025-06-23 22:21:02,344 - __main__ - INFO - vllm running req: 5 queue req: 0 +2025-06-23 22:21:04,260 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:21:04,260 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.71 1.19 +server_input_tokens 2001.86 3397.47 +server_output_tokens 693.77 1157.77 +2025-06-23 22:21:04,260 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 529 | 533 +1 | 275 | 276 +2025-06-23 22:21:06,731 - __main__ - INFO - Finished TaskGroup for worker on a33f691ea15b24c747ed3f2369ced021b03cea55 +2025-06-23 22:21:06,732 - __main__ - INFO - Got 13 docs for a33f691ea15b24c747ed3f2369ced021b03cea55 +2025-06-23 22:21:06,749 - __main__ - INFO - Writing 13 markdown files for a33f691ea15b24c747ed3f2369ced021b03cea55 +2025-06-23 22:21:06,758 - __main__ - INFO - Worker 1 exiting due to empty queue +2025-06-23 22:21:11,889 - __main__ - WARNING - JSON decode error on attempt 2 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:21:11,889 - __main__ - INFO - Reducing anchor text len to 750 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:21:12,037 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:21:12,345 - __main__ - INFO - vllm running req: 3 queue req: 0 +2025-06-23 22:21:14,261 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:21:14,262 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.70 1.11 +finished_input_tokens 671.32 2567.49 +finished_output_tokens 234.75 897.82 +server_input_tokens 1992.10 3210.25 +server_output_tokens 694.44 1106.62 +2025-06-23 22:21:14,262 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 530 | 533 +1 | 276 | 276 +2025-06-23 22:21:22,346 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:21:24,265 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:21:24,265 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.70 1.10 +finished_input_tokens 665.51 2567.49 +finished_output_tokens 232.72 897.82 +server_input_tokens 1980.54 3184.50 +server_output_tokens 694.17 1112.84 +2025-06-23 22:21:24,265 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:21:31,565 - __main__ - WARNING - JSON decode error on attempt 3 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:21:31,565 - __main__ - INFO - Reducing anchor text len to 375 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:21:31,720 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:21:32,347 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:21:34,267 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:21:34,268 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.69 1.05 +finished_input_tokens 659.81 2567.49 +finished_output_tokens 230.73 897.82 +server_input_tokens 1964.80 3064.29 +server_output_tokens 689.40 1079.47 +2025-06-23 22:21:34,268 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:21:42,348 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:21:44,269 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:21:44,269 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.69 1.05 +finished_input_tokens 654.21 2567.49 +finished_output_tokens 228.77 897.82 +server_input_tokens 1948.11 3064.29 +server_output_tokens 683.55 1079.47 +2025-06-23 22:21:44,269 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:21:48,089 - __main__ - WARNING - JSON decode error on attempt 4 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:21:48,089 - __main__ - INFO - Reducing anchor text len to 187 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:21:48,235 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:21:52,349 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:21:54,271 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:21:54,272 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.68 1.01 +finished_input_tokens 648.70 2567.49 +finished_output_tokens 226.84 897.82 +server_input_tokens 1932.75 2953.05 +server_output_tokens 678.78 1041.06 +2025-06-23 22:21:54,272 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:02,349 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:22:04,273 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:22:04,273 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.67 1.01 +finished_input_tokens 643.28 2567.49 +finished_output_tokens 224.95 897.82 +server_input_tokens 1916.61 2953.05 +server_output_tokens 673.11 1041.06 +2025-06-23 22:22:04,273 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:10,051 - __main__ - WARNING - JSON decode error on attempt 5 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:22:10,052 - __main__ - INFO - Reducing anchor text len to 93 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:22:10,198 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:22:12,351 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:22:14,274 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:22:14,274 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.67 0.94 +finished_input_tokens 637.95 2567.49 +finished_output_tokens 223.08 897.82 +server_input_tokens 1901.66 2717.47 +server_output_tokens 668.83 948.51 +2025-06-23 22:22:14,274 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:22,351 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:22:24,276 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:22:24,276 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.66 0.94 +finished_input_tokens 632.71 2567.49 +finished_output_tokens 221.25 897.82 +server_input_tokens 1886.04 2717.47 +server_output_tokens 663.34 948.51 +2025-06-23 22:22:24,277 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:32,353 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:22:32,624 - __main__ - WARNING - JSON decode error on attempt 6 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:22:32,624 - __main__ - INFO - Reducing anchor text len to 46 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:22:32,779 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:22:34,278 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:22:34,278 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.66 0.85 +finished_input_tokens 627.55 2567.49 +finished_output_tokens 219.45 897.82 +server_input_tokens 1871.54 2475.65 +server_output_tokens 659.25 864.54 +2025-06-23 22:22:34,278 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:42,354 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:22:44,280 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:22:44,280 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.65 0.85 +finished_input_tokens 622.48 2567.49 +finished_output_tokens 217.67 897.82 +server_input_tokens 1856.41 2475.65 +server_output_tokens 653.92 864.54 +2025-06-23 22:22:44,280 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:52,355 - __main__ - INFO - vllm running req: 1 queue req: 0 +2025-06-23 22:22:54,281 - __main__ - INFO - Queue remaining: 0 +2025-06-23 22:22:54,282 - __main__ - INFO - +Metric Name Lifetime (tokens/sec) Recently (tokens/sec) +---------------------------------------------------------------------------------- +completed_pages 0.65 0.85 +finished_input_tokens 617.49 2567.49 +finished_output_tokens 215.93 897.82 +server_input_tokens 1841.53 2475.65 +server_output_tokens 648.67 864.54 +2025-06-23 22:22:54,282 - __main__ - INFO - +Worker ID | finished | started +----------+----------+-------- +0 | 532 | 533 +1 | 276 | 276 +2025-06-23 22:22:55,207 - __main__ - WARNING - JSON decode error on attempt 7 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) +2025-06-23 22:22:55,208 - __main__ - INFO - Reducing anchor text len to 23 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 +2025-06-23 22:22:55,208 - __main__ - ERROR - Failed to process /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 after 8 attempts. +2025-06-23 22:22:55,229 - __main__ - ERROR - Document /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf has 1 fallback pages out of 58 exceeding max_page_error_rate of 0.004, discarding document. +2025-06-23 22:22:55,230 - __main__ - INFO - Finished TaskGroup for worker on 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 +2025-06-23 22:22:55,230 - __main__ - INFO - Got 20 docs for 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 +2025-06-23 22:22:55,254 - __main__ - INFO - Writing 20 markdown files for 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 +2025-06-23 22:22:55,274 - __main__ - INFO - Worker 0 exiting due to empty queue +2025-06-23 22:22:55,274 - __main__ - INFO - ================================================================================ +2025-06-23 22:22:55,275 - __main__ - INFO - FINAL METRICS SUMMARY +2025-06-23 22:22:55,275 - __main__ - INFO - ================================================================================ +2025-06-23 22:22:55,275 - __main__ - INFO - Total elapsed time: 1248.38 seconds +2025-06-23 22:22:55,275 - __main__ - INFO - Total Server Input tokens: 2,298,171 +2025-06-23 22:22:55,275 - __main__ - INFO - Total Server Output tokens: 810,762 +2025-06-23 22:22:55,275 - __main__ - INFO - Finished input tokens: 2,143,438 +2025-06-23 22:22:55,276 - __main__ - INFO - Finished output tokens: 772,257 +2025-06-23 22:22:55,276 - __main__ - INFO - Completed pages: 808 +2025-06-23 22:22:55,276 - __main__ - INFO - Failed pages: 1 +2025-06-23 22:22:55,276 - __main__ - INFO - Page Failure rate: 0.12% +2025-06-23 22:22:55,276 - __main__ - INFO - Server Input tokens/sec rate: 1840.92 +2025-06-23 22:22:55,276 - __main__ - INFO - Server Output tokens/sec rate: 649.45 +2025-06-23 22:22:55,276 - __main__ - INFO - Finished Input tokens/sec rate: 1716.97 +2025-06-23 22:22:55,276 - __main__ - INFO - Finished Output tokens/sec rate: 618.61 +2025-06-23 22:22:55,276 - __main__ - INFO - ================================================================================ +2025-06-23 22:22:55,277 - __main__ - INFO - Work done +2025-06-23 22:22:55,277 - __main__ - INFO - Got cancellation request for VLLM server +job pau at: Mon Jun 23 22:23:00 CDT 2025 diff --git a/containers/ocr_job_script.sh b/containers/ocr_job_script.sh index 4e811fc..9794157 100644 --- a/containers/ocr_job_script.sh +++ b/containers/ocr_job_script.sh @@ -18,6 +18,6 @@ module load singularity echo "singularity loaded, running the job command at $(date)" -singularity run --nv -B /home/nws8519/git/adaptation-slr/ /home/nws8519/git/adaptation-slr/containers/new_olmocr_container.sif python -m olmocr.pipeline /home/nws8519/git/adaptation-slr/ocr_studies_text/ --markdown --pdfs /home/nws8519/git/adaptation-slr/studies_pdfs/*.pdf +singularity run --nv -B /home/nws8519/git/adaptation-slr/ /home/nws8519/git/adaptation-slr/containers/new_olmocr_container.sif python -m olmocr.pipeline /home/nws8519/git/adaptation-slr/ocr_studies_text/ --markdown --pdfs /home/nws8519/git/adaptation-slr/studies/014-norskov.pdf echo "job pau at: $(date)" diff --git a/containers/olmocr-pipeline-debug.log b/containers/olmocr-pipeline-debug.log index 2d29df2..3e4e288 100644 --- a/containers/olmocr-pipeline-debug.log +++ b/containers/olmocr-pipeline-debug.log @@ -2748,3 +2748,4 @@ Worker ID | finished | started 2025-06-23 22:22:55,276 - __main__ - INFO - ================================================================================ 2025-06-23 22:22:55,277 - __main__ - INFO - Work done 2025-06-23 22:22:55,277 - __main__ - INFO - Got cancellation request for VLLM server +2025-06-24 08:30:17,234 - __main__ - INFO - Got --pdfs argument, going to add to the work queue diff --git a/containers/slr_ocr_job.log b/containers/slr_ocr_job.log index ab5a3a3..5fa4a57 100644 --- a/containers/slr_ocr_job.log +++ b/containers/slr_ocr_job.log @@ -1,5 +1,5 @@ -setting up the environment by loading singularity at Mon Jun 23 22:01:45 CDT 2025 -singularity loaded, running the job command at Mon Jun 23 22:01:50 CDT 2025 +setting up the environment by loading singularity at Tue Jun 24 08:29:57 CDT 2025 +singularity loaded, running the job command at Tue Jun 24 08:30:01 CDT 2025 WARNING: While bind mounting '/home/nws8519/git/adaptation-slr:/home/nws8519/git/adaptation-slr/': destination is already in the mount point list ========== @@ -17,2599 +17,22 @@ https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. INFO:olmocr.check:pdftoppm is installed and working. -2025-06-23 22:02:07,072 - __main__ - INFO - Got --pdfs argument, going to add to the work queue -2025-06-23 22:02:07,096 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf as PDF document -2025-06-23 22:02:07,114 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/002-barcomb.pdf as PDF document -2025-06-23 22:02:07,140 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf as PDF document -2025-06-23 22:02:07,152 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf as PDF document -2025-06-23 22:02:07,167 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/005-crowston-shamshurin.pdf as PDF document -2025-06-23 22:02:07,184 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/006-franke.pdf as PDF document -2025-06-23 22:02:07,205 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf as PDF document -2025-06-23 22:02:07,221 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/008-geiger.pdf as PDF document -2025-06-23 22:02:07,237 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/009-hsieh.pdf as PDF document -2025-06-23 22:02:07,254 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/010-hu.pdf as PDF document -2025-06-23 22:02:07,300 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf as PDF document -2025-06-23 22:02:07,322 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/012-jensen-scacchi.pdf as PDF document -2025-06-23 22:02:07,349 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/013-klug.pdf as PDF document -2025-06-23 22:02:07,388 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf as PDF document -2025-06-23 22:02:07,409 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf as PDF document -2025-06-23 22:02:07,426 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf as PDF document -2025-06-23 22:02:07,441 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/017-wessel.pdf as PDF document -2025-06-23 22:02:07,457 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/018-yin.pdf as PDF document -2025-06-23 22:02:07,470 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/019_ding.pdf as PDF document -2025-06-23 22:02:07,489 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf as PDF document -2025-06-23 22:02:07,520 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf as PDF document -2025-06-23 22:02:07,543 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf as PDF document -2025-06-23 22:02:07,578 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf as PDF document -2025-06-23 22:02:07,602 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/024_zhou.pdf as PDF document -2025-06-23 22:02:07,625 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf as PDF document -2025-06-23 22:02:07,644 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/026_vendome.pdf as PDF document -2025-06-23 22:02:07,667 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/027_vendome.pdf as PDF document -2025-06-23 22:02:07,680 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/028_meloca.pdf as PDF document -2025-06-23 22:02:07,691 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/029_heinemann.pdf as PDF document -2025-06-23 22:02:07,715 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/030_abdalkareem.pdf as PDF document -2025-06-23 22:02:07,744 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf as PDF document -2025-06-23 22:02:07,769 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf as PDF document -2025-06-23 22:02:07,787 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/033_businger.pdf as PDF document -2025-06-23 22:02:07,804 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/034_zhang.pdf as PDF document -2025-06-23 22:02:07,804 - __main__ - INFO - Found 34 total pdf paths to add - Sampling PDFs to calculate optimal length: 0%| | 0/34 [00:00 -2025-06-23 22:05:08,639 - __main__ - INFO - INFO 06-23 22:05:08 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 -2025-06-23 22:05:08,641 - __main__ - WARNING - Attempt 42: Please wait for vllm server to become ready... -2025-06-23 22:05:09,363 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. -2025-06-23 22:05:09,694 - __main__ - WARNING - Attempt 43: Please wait for vllm server to become ready... -2025-06-23 22:05:10,340 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. -2025-06-23 22:05:10,753 - __main__ - WARNING - Attempt 44: Please wait for vllm server to become ready... -2025-06-23 22:05:11,807 - __main__ - WARNING - Attempt 45: Please wait for vllm server to become ready... -2025-06-23 22:05:12,861 - __main__ - WARNING - Attempt 46: Please wait for vllm server to become ready... -2025-06-23 22:05:13,914 - __main__ - WARNING - Attempt 47: Please wait for vllm server to become ready... -2025-06-23 22:05:14,172 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. -2025-06-23 22:05:14,675 - __main__ - INFO - INFO 06-23 22:05:14 [topk_topp_sampler.py:49] Using FlashInfer for top-p & top-k sampling. -2025-06-23 22:05:14,783 - __main__ - INFO - INFO 06-23 22:05:14 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview... -2025-06-23 22:05:14,968 - __main__ - WARNING - Attempt 48: Please wait for vllm server to become ready... -2025-06-23 22:05:15,062 - __main__ - INFO - INFO 06-23 22:05:15 [gpu_model_runner.py:1600] Loading model from scratch... -2025-06-23 22:05:15,806 - __main__ - INFO - WARNING 06-23 22:05:15 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. -2025-06-23 22:05:16,025 - __main__ - WARNING - Attempt 49: Please wait for vllm server to become ready... -2025-06-23 22:05:16,059 - __main__ - INFO - INFO 06-23 22:05:16 [cuda.py:252] Using Flash Attention backend on V1 engine. -2025-06-23 22:05:16,451 - __main__ - INFO - INFO 06-23 22:05:16 [weight_utils.py:292] Using model weights format ['*.safetensors'] -2025-06-23 22:05:16,774 - __main__ - INFO - Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00 [Errno 104] Connection reset by peer -2025-06-23 22:08:33,587 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-7 to allow server restart -2025-06-23 22:08:33,587 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9: [Errno 104] Connection reset by peer -2025-06-23 22:08:33,587 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9 to allow server restart -2025-06-23 22:08:34,131 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:08:34,131 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -2025-06-23 22:08:34,131 - __main__ - INFO - -Worker ID | started -----------+-------- -0 | 533 -1 | 276 -2025-06-23 22:08:37,686 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,686 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35 to allow server restart -2025-06-23 22:08:37,686 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23 to allow server restart -2025-06-23 22:08:37,687 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42 to allow server restart -2025-06-23 22:08:37,687 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44 to allow server restart -2025-06-23 22:08:37,688 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,688 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27 to allow server restart -2025-06-23 22:08:37,688 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,688 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41 to allow server restart -2025-06-23 22:08:37,689 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,689 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43 to allow server restart -2025-06-23 22:08:37,689 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38 to allow server restart -2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45 to allow server restart -2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38 to allow server restart -2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36 to allow server restart -2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37 to allow server restart -2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29 to allow server restart -2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7 to allow server restart -2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25 to allow server restart -2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32 to allow server restart -2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34 to allow server restart -2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24 to allow server restart -2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,692 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31 to allow server restart -2025-06-23 22:08:37,692 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11: [Errno 104] Connection reset by peer -2025-06-23 22:08:37,692 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11 to allow server restart -2025-06-23 22:08:41,816 - __main__ - INFO - vllm running req: 60 queue req: 151 -2025-06-23 22:08:43,932 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-7 -2025-06-23 22:08:43,942 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9 -2025-06-23 22:08:44,132 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:08:44,132 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -2025-06-23 22:08:44,133 - __main__ - INFO - -Worker ID | started -----------+-------- -0 | 533 -1 | 276 -2025-06-23 22:08:48,490 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45 -2025-06-23 22:08:48,725 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24 -2025-06-23 22:08:48,775 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35 -2025-06-23 22:08:48,785 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41 -2025-06-23 22:08:48,808 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38 -2025-06-23 22:08:48,809 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34 -2025-06-23 22:08:48,829 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44 -2025-06-23 22:08:48,856 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29 -2025-06-23 22:08:48,882 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25 -2025-06-23 22:08:48,906 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32 -2025-06-23 22:08:48,907 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31 -2025-06-23 22:08:48,909 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23 -2025-06-23 22:08:48,938 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36 -2025-06-23 22:08:48,981 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11 -2025-06-23 22:08:48,989 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43 -2025-06-23 22:08:49,007 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37 -2025-06-23 22:08:49,026 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38 -2025-06-23 22:08:49,026 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42 -2025-06-23 22:08:49,062 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27 -2025-06-23 22:08:49,096 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7 -2025-06-23 22:08:49,459 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,459 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24 to allow server restart -2025-06-23 22:08:49,459 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,459 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6 to allow server restart -2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1 to allow server restart -2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14 to allow server restart -2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15 to allow server restart -2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11 to allow server restart -2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8 to allow server restart -2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4 to allow server restart -2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5 to allow server restart -2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31 to allow server restart -2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11 to allow server restart -2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34 to allow server restart -2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36 to allow server restart -2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37 to allow server restart -2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40 to allow server restart -2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2 to allow server restart -2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33 to allow server restart -2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41 to allow server restart -2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35 to allow server restart -2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42 to allow server restart -2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39 to allow server restart -2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3: [Errno 104] Connection reset by peer -2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3 to allow server restart -2025-06-23 22:08:52,006 - __main__ - INFO - vllm running req: 75 queue req: 261 -2025-06-23 22:08:52,530 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24: [Errno 104] Connection reset by peer -2025-06-23 22:08:52,530 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24 to allow server restart -2025-06-23 22:08:52,530 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23: [Errno 104] Connection reset by peer -2025-06-23 22:08:52,530 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23 to allow server restart -2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10: [Errno 104] Connection reset by peer -2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10 to allow server restart -2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28: [Errno 104] Connection reset by peer -2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28 to allow server restart -2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8: [Errno 104] Connection reset by peer -2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8 to allow server restart -2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27: [Errno 104] Connection reset by peer -2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27 to allow server restart -2025-06-23 22:08:53,043 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,043 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5 to allow server restart -2025-06-23 22:08:53,043 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,043 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4 to allow server restart -2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5 to allow server restart -2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6 to allow server restart -2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2 to allow server restart -2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1 to allow server restart -2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12 to allow server restart -2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46: [Errno 104] Connection reset by peer -2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46 to allow server restart -2025-06-23 22:08:54,134 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:08:54,134 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -2025-06-23 22:08:54,134 - __main__ - INFO - -Worker ID | started -----------+-------- -0 | 533 -1 | 276 -2025-06-23 22:08:56,626 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36: [Errno 104] Connection reset by peer -2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36 to allow server restart -2025-06-23 22:08:56,627 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47: [Errno 104] Connection reset by peer -2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47 to allow server restart -2025-06-23 22:08:56,627 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39: [Errno 104] Connection reset by peer -2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39 to allow server restart -2025-06-23 22:08:59,800 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24 -2025-06-23 22:08:59,806 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6 -2025-06-23 22:09:00,593 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3 -2025-06-23 22:09:00,636 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2 -2025-06-23 22:09:00,663 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4 -2025-06-23 22:09:00,794 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41 -2025-06-23 22:09:00,830 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1 -2025-06-23 22:09:00,945 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37 -2025-06-23 22:09:00,971 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39 -2025-06-23 22:09:00,975 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5 -2025-06-23 22:09:00,975 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42 -2025-06-23 22:09:00,992 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11 -2025-06-23 22:09:01,003 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8 -2025-06-23 22:09:01,003 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15 -2025-06-23 22:09:01,005 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14 -2025-06-23 22:09:01,044 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40 -2025-06-23 22:09:01,069 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35 -2025-06-23 22:09:01,075 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33 -2025-06-23 22:09:01,162 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31 -2025-06-23 22:09:01,223 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11 -2025-06-23 22:09:01,243 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36 -2025-06-23 22:09:01,247 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34 -2025-06-23 22:09:02,117 - __main__ - INFO - vllm running req: 68 queue req: 383 -2025-06-23 22:09:03,397 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28 -2025-06-23 22:09:03,497 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24 -2025-06-23 22:09:03,516 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23 -2025-06-23 22:09:03,564 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10 -2025-06-23 22:09:03,614 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8 -2025-06-23 22:09:03,688 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27 -2025-06-23 22:09:03,736 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4 -2025-06-23 22:09:03,910 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5 -2025-06-23 22:09:03,964 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12 -2025-06-23 22:09:03,973 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46 -2025-06-23 22:09:03,977 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2 -2025-06-23 22:09:03,986 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1 -2025-06-23 22:09:03,996 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5 -2025-06-23 22:09:04,135 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:09:04,135 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.00 0.00 -server_input_tokens 4.16 5.79 -server_output_tokens 0.92 1.28 -2025-06-23 22:09:04,136 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 1 | 533 -1 | 0 | 276 -2025-06-23 22:09:04,192 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6 -2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4: [Errno 104] Connection reset by peer -2025-06-23 22:09:04,307 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4 to allow server restart -2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9: [Errno 104] Connection reset by peer -2025-06-23 22:09:04,307 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9 to allow server restart -2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51: [Errno 104] Connection reset by peer -2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51 to allow server restart -2025-06-23 22:09:04,308 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50: [Errno 104] Connection reset by peer -2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50 to allow server restart -2025-06-23 22:09:04,308 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49: [Errno 104] Connection reset by peer -2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49 to allow server restart -2025-06-23 22:09:06,960 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36 -2025-06-23 22:09:06,963 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47 -2025-06-23 22:09:06,973 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39 -2025-06-23 22:09:12,139 - __main__ - INFO - vllm running req: 62 queue req: 489 -2025-06-23 22:09:14,137 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:09:14,137 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.03 0.05 -server_input_tokens 72.62 103.42 -server_output_tokens 19.92 28.37 -2025-06-23 22:09:14,137 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 14 | 533 -1 | 0 | 276 -2025-06-23 22:09:14,721 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49 -2025-06-23 22:09:14,787 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51 -2025-06-23 22:09:14,793 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50 -2025-06-23 22:09:14,890 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9 -2025-06-23 22:09:14,945 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4 -2025-06-23 22:09:22,266 - __main__ - INFO - vllm running req: 60 queue req: 610 -2025-06-23 22:09:24,139 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:09:24,139 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.06 0.08 -server_input_tokens 135.92 198.10 -server_output_tokens 38.88 56.67 -2025-06-23 22:09:24,139 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 25 | 533 -1 | 0 | 276 -2025-06-23 22:09:32,267 - __main__ - INFO - vllm running req: 64 queue req: 709 -2025-06-23 22:09:34,141 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:09:34,142 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.08 0.13 -server_input_tokens 215.97 321.97 -server_output_tokens 62.79 93.60 -2025-06-23 22:09:34,142 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 38 | 533 -1 | 0 | 276 -2025-06-23 22:09:42,269 - __main__ - INFO - vllm running req: 68 queue req: 694 -2025-06-23 22:09:44,143 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:09:44,144 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.11 0.16 -server_input_tokens 277.54 423.01 -server_output_tokens 81.61 124.38 -2025-06-23 22:09:44,144 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 49 | 533 -1 | 0 | 276 -2025-06-23 22:09:52,269 - __main__ - INFO - vllm running req: 65 queue req: 683 -2025-06-23 22:09:54,145 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:09:54,146 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.13 0.21 -server_input_tokens 353.36 550.36 -server_output_tokens 106.73 166.23 -2025-06-23 22:09:54,146 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 63 | 533 -1 | 0 | 276 -2025-06-23 22:10:02,271 - __main__ - INFO - vllm running req: 65 queue req: 674 -2025-06-23 22:10:04,147 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:10:04,147 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.15 0.24 -server_input_tokens 389.51 619.66 -server_output_tokens 118.93 189.19 -2025-06-23 22:10:04,147 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 71 | 533 -1 | 0 | 276 -2025-06-23 22:10:12,273 - __main__ - INFO - vllm running req: 61 queue req: 671 -2025-06-23 22:10:14,148 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:10:14,148 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.16 0.27 -server_input_tokens 428.79 696.43 -server_output_tokens 135.56 220.17 -2025-06-23 22:10:14,148 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 80 | 533 -1 | 0 | 276 -2025-06-23 22:10:22,273 - __main__ - INFO - vllm running req: 63 queue req: 654 -2025-06-23 22:10:24,149 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:10:24,149 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.19 0.31 -server_input_tokens 490.36 812.79 -server_output_tokens 160.71 266.37 -2025-06-23 22:10:24,150 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 94 | 533 -1 | 0 | 276 -2025-06-23 22:10:32,274 - __main__ - INFO - vllm running req: 62 queue req: 643 -2025-06-23 22:10:34,150 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:10:34,151 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.21 0.36 -server_input_tokens 543.42 918.85 -server_output_tokens 179.77 303.97 -2025-06-23 22:10:34,151 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 105 | 533 -1 | 2 | 276 -2025-06-23 22:10:42,274 - __main__ - INFO - vllm running req: 62 queue req: 630 -2025-06-23 22:10:44,154 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:10:44,154 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.23 0.40 -server_input_tokens 596.17 1027.93 -server_output_tokens 200.04 344.92 -2025-06-23 22:10:44,154 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 117 | 533 -1 | 3 | 276 -2025-06-23 22:10:52,275 - __main__ - INFO - vllm running req: 61 queue req: 623 -2025-06-23 22:10:54,155 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:10:54,155 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.24 0.42 -server_input_tokens 632.42 1111.50 -server_output_tokens 217.21 381.76 -2025-06-23 22:10:54,156 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 122 | 533 -1 | 5 | 276 -2025-06-23 22:11:02,276 - __main__ - INFO - vllm running req: 63 queue req: 612 -2025-06-23 22:11:04,156 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:11:04,157 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.25 0.45 -server_input_tokens 663.06 1187.46 -server_output_tokens 229.91 411.74 -2025-06-23 22:11:04,157 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 126 | 533 -1 | 9 | 276 -2025-06-23 22:11:12,277 - __main__ - INFO - vllm running req: 63 queue req: 602 -2025-06-23 22:11:14,158 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:11:14,159 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.26 0.48 -server_input_tokens 703.41 1283.18 -server_output_tokens 244.87 446.69 -2025-06-23 22:11:14,159 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 132 | 533 -1 | 13 | 276 -2025-06-23 22:11:22,279 - __main__ - INFO - vllm running req: 62 queue req: 590 -2025-06-23 22:11:24,160 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:11:24,160 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.28 0.53 -server_input_tokens 754.96 1402.39 -server_output_tokens 261.06 484.94 -2025-06-23 22:11:24,160 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 140 | 533 -1 | 18 | 276 -2025-06-23 22:11:32,281 - __main__ - INFO - vllm running req: 64 queue req: 577 -2025-06-23 22:11:34,161 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:11:34,161 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.30 0.57 -server_input_tokens 799.80 1512.33 -server_output_tokens 279.24 528.01 -2025-06-23 22:11:34,161 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 147 | 533 -1 | 23 | 276 -2025-06-23 22:11:42,281 - __main__ - INFO - vllm running req: 64 queue req: 565 -2025-06-23 22:11:44,162 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:11:44,163 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.31 0.60 -server_input_tokens 833.40 1603.66 -server_output_tokens 293.39 564.55 -2025-06-23 22:11:44,163 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 152 | 533 -1 | 28 | 276 -2025-06-23 22:11:52,281 - __main__ - INFO - vllm running req: 63 queue req: 556 -2025-06-23 22:11:54,163 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:11:54,164 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.33 0.64 -server_input_tokens 882.61 1727.78 -server_output_tokens 313.58 613.85 -2025-06-23 22:11:54,164 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 158 | 533 -1 | 34 | 276 -2025-06-23 22:12:02,281 - __main__ - INFO - vllm running req: 61 queue req: 550 -2025-06-23 22:12:04,165 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:12:04,165 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.34 0.67 -server_input_tokens 908.52 1808.77 -server_output_tokens 322.85 642.77 -2025-06-23 22:12:04,165 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 161 | 533 -1 | 40 | 276 -2025-06-23 22:12:12,282 - __main__ - INFO - vllm running req: 62 queue req: 537 -2025-06-23 22:12:14,166 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:12:14,167 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.35 0.71 -server_input_tokens 944.56 1912.03 -server_output_tokens 333.03 674.14 -2025-06-23 22:12:14,167 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 162 | 533 -1 | 50 | 276 -2025-06-23 22:12:22,284 - __main__ - INFO - vllm running req: 63 queue req: 526 -2025-06-23 22:12:24,167 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:12:24,168 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.36 0.74 -server_input_tokens 978.24 2012.82 -server_output_tokens 345.07 710.01 -2025-06-23 22:12:24,168 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 162 | 533 -1 | 61 | 276 -2025-06-23 22:12:32,284 - __main__ - INFO - vllm running req: 66 queue req: 510 -2025-06-23 22:12:34,169 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:12:34,169 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.37 0.78 -server_input_tokens 1021.46 2135.79 -server_output_tokens 365.30 763.81 -2025-06-23 22:12:34,169 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 167 | 533 -1 | 68 | 276 -2025-06-23 22:12:42,286 - __main__ - INFO - vllm running req: 66 queue req: 501 -2025-06-23 22:12:44,170 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:12:44,170 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.38 0.82 -server_input_tokens 1057.70 2246.84 -server_output_tokens 383.96 815.63 -2025-06-23 22:12:44,170 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 171 | 533 -1 | 74 | 276 -2025-06-23 22:12:52,287 - __main__ - INFO - vllm running req: 64 queue req: 490 -2025-06-23 22:12:54,172 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:12:54,172 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.40 0.86 -server_input_tokens 1088.49 2348.52 -server_output_tokens 392.48 846.82 -2025-06-23 22:12:54,172 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 175 | 533 -1 | 83 | 276 -2025-06-23 22:13:02,288 - __main__ - INFO - vllm running req: 61 queue req: 481 -2025-06-23 22:13:04,173 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:13:04,174 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.41 0.90 -server_input_tokens 1123.77 2462.11 -server_output_tokens 402.62 882.11 -2025-06-23 22:13:04,174 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 177 | 533 -1 | 92 | 276 -2025-06-23 22:13:12,289 - __main__ - INFO - vllm running req: 58 queue req: 472 -2025-06-23 22:13:14,175 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:13:14,175 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.42 0.94 -server_input_tokens 1152.91 2564.40 -server_output_tokens 411.53 915.35 -2025-06-23 22:13:14,175 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 185 | 533 -1 | 97 | 276 -2025-06-23 22:13:22,292 - __main__ - INFO - vllm running req: 61 queue req: 456 -2025-06-23 22:13:24,176 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:13:24,176 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.43 0.98 -server_input_tokens 1189.02 2684.34 -server_output_tokens 424.32 957.96 -2025-06-23 22:13:24,177 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 192 | 533 -1 | 102 | 276 -2025-06-23 22:13:32,293 - __main__ - INFO - vllm running req: 59 queue req: 449 -2025-06-23 22:13:34,177 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:13:34,177 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.44 1.01 -server_input_tokens 1207.97 2767.41 -server_output_tokens 430.83 987.00 -2025-06-23 22:13:34,178 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 199 | 533 -1 | 104 | 276 -2025-06-23 22:13:42,293 - __main__ - INFO - vllm running req: 62 queue req: 436 -2025-06-23 22:13:44,179 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:13:44,179 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.45 1.04 -server_input_tokens 1232.57 2864.85 -server_output_tokens 438.51 1019.23 -2025-06-23 22:13:44,180 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 206 | 533 -1 | 106 | 276 -2025-06-23 22:13:52,293 - __main__ - INFO - vllm running req: 59 queue req: 427 -2025-06-23 22:13:54,181 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:13:54,181 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.46 1.08 -server_input_tokens 1269.62 2993.30 -server_output_tokens 449.30 1059.29 -2025-06-23 22:13:54,181 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 218 | 533 -1 | 107 | 276 -2025-06-23 22:14:02,294 - __main__ - INFO - vllm running req: 65 queue req: 412 -2025-06-23 22:14:04,182 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:14:04,182 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.47 1.11 -server_input_tokens 1296.63 3094.40 -server_output_tokens 457.29 1092.08 -2025-06-23 22:14:04,182 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 227 | 533 -1 | 107 | 276 -2025-06-23 22:14:12,294 - __main__ - INFO - vllm running req: 62 queue req: 402 -2025-06-23 22:14:14,183 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:14:14,184 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.48 1.11 -server_input_tokens 1322.77 3103.36 -server_output_tokens 466.04 1101.46 -2025-06-23 22:14:14,184 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 240 | 533 -1 | 107 | 276 -2025-06-23 22:14:22,297 - __main__ - INFO - vllm running req: 60 queue req: 398 -2025-06-23 22:14:24,185 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:14:24,185 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.48 1.10 -server_input_tokens 1332.38 3084.51 -server_output_tokens 468.30 1097.13 -2025-06-23 22:14:24,185 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 246 | 533 -1 | 108 | 276 -2025-06-23 22:14:32,297 - __main__ - INFO - vllm running req: 61 queue req: 385 -2025-06-23 22:14:34,186 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:14:34,186 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.49 1.09 -server_input_tokens 1364.44 3076.82 -server_output_tokens 479.98 1102.01 -2025-06-23 22:14:34,186 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 252 | 533 -1 | 113 | 276 -2025-06-23 22:14:42,298 - __main__ - INFO - vllm running req: 64 queue req: 371 -2025-06-23 22:14:44,187 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:14:44,188 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.50 1.09 -server_input_tokens 1391.13 3088.64 -server_output_tokens 489.22 1110.56 -2025-06-23 22:14:44,188 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 259 | 533 -1 | 118 | 276 -2025-06-23 22:14:52,298 - __main__ - INFO - vllm running req: 65 queue req: 359 -2025-06-23 22:14:52,928 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:14:52,929 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:14:53,075 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:14:54,190 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:14:54,190 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.50 1.08 -server_input_tokens 1412.24 3073.89 -server_output_tokens 496.47 1107.94 -2025-06-23 22:14:54,190 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 262 | 533 -1 | 123 | 276 -2025-06-23 22:15:02,300 - __main__ - INFO - vllm running req: 62 queue req: 354 -2025-06-23 22:15:04,191 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:15:04,191 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.51 1.08 -server_input_tokens 1425.85 3074.71 -server_output_tokens 500.91 1108.67 -2025-06-23 22:15:04,192 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 269 | 533 -1 | 126 | 276 -2025-06-23 22:15:12,302 - __main__ - INFO - vllm running req: 63 queue req: 343 -2025-06-23 22:15:14,193 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:15:14,193 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.52 1.09 -server_input_tokens 1450.64 3118.13 -server_output_tokens 508.76 1117.77 -2025-06-23 22:15:14,193 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 275 | 533 -1 | 131 | 276 -2025-06-23 22:15:22,303 - __main__ - INFO - vllm running req: 62 queue req: 331 -2025-06-23 22:15:22,740 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14 -2025-06-23 22:15:22,740 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14: - Response exceeded model_max_context, cannot use this response -2025-06-23 22:15:23,013 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14 -2025-06-23 22:15:24,194 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:15:24,194 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.52 1.08 -server_input_tokens 1466.97 3085.92 -server_output_tokens 514.14 1100.03 -2025-06-23 22:15:24,194 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 279 | 533 -1 | 138 | 276 -2025-06-23 22:15:30,120 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5: Invalid \escape: line 1 column 1851 (char 1850) -2025-06-23 22:15:30,121 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5 -2025-06-23 22:15:30,428 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5 -2025-06-23 22:15:32,305 - __main__ - INFO - vllm running req: 64 queue req: 323 -2025-06-23 22:15:34,195 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:15:34,196 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.53 1.06 -server_input_tokens 1479.70 3070.82 -server_output_tokens 518.29 1093.48 -2025-06-23 22:15:34,196 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 282 | 533 -1 | 142 | 276 -2025-06-23 22:15:42,305 - __main__ - INFO - vllm running req: 62 queue req: 314 -2025-06-23 22:15:44,198 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:15:44,198 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.53 1.05 -server_input_tokens 1496.43 3056.93 -server_output_tokens 525.01 1088.33 -2025-06-23 22:15:44,198 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 289 | 533 -1 | 145 | 276 -2025-06-23 22:15:52,307 - __main__ - INFO - vllm running req: 64 queue req: 301 -2025-06-23 22:15:54,200 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:15:54,200 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.54 1.07 -server_input_tokens 1520.12 3093.15 -server_output_tokens 533.95 1098.93 -2025-06-23 22:15:54,201 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 294 | 533 -1 | 152 | 276 -2025-06-23 22:16:02,308 - __main__ - INFO - vllm running req: 64 queue req: 289 -2025-06-23 22:16:04,203 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:16:04,203 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.55 1.08 -server_input_tokens 1536.45 3107.65 -server_output_tokens 539.99 1097.50 -2025-06-23 22:16:04,203 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 304 | 533 -1 | 154 | 276 -2025-06-23 22:16:12,309 - __main__ - INFO - vllm running req: 60 queue req: 276 -2025-06-23 22:16:14,205 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:16:14,205 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.56 1.10 -server_input_tokens 1565.93 3147.35 -server_output_tokens 550.10 1109.60 -2025-06-23 22:16:14,205 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 319 | 533 -1 | 155 | 276 -2025-06-23 22:16:22,309 - __main__ - INFO - vllm running req: 61 queue req: 263 -2025-06-23 22:16:24,207 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:16:24,208 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.57 1.10 -server_input_tokens 1588.32 3144.31 -server_output_tokens 556.64 1108.31 -2025-06-23 22:16:24,208 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 331 | 533 -1 | 156 | 276 -2025-06-23 22:16:26,616 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18 -2025-06-23 22:16:26,616 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18: - Response exceeded model_max_context, cannot use this response -2025-06-23 22:16:26,943 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18 -2025-06-23 22:16:32,312 - __main__ - INFO - vllm running req: 60 queue req: 257 -2025-06-23 22:16:34,209 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:16:34,210 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.57 1.08 -server_input_tokens 1594.86 3105.04 -server_output_tokens 557.18 1084.24 -2025-06-23 22:16:34,210 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 337 | 533 -1 | 157 | 276 -2025-06-23 22:16:42,313 - __main__ - INFO - vllm running req: 60 queue req: 250 -2025-06-23 22:16:44,212 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:16:44,213 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.57 1.07 -server_input_tokens 1591.23 3069.89 -server_output_tokens 555.73 1069.02 -2025-06-23 22:16:44,213 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 340 | 533 -1 | 159 | 276 -2025-06-23 22:16:52,313 - __main__ - INFO - vllm running req: 58 queue req: 243 -2025-06-23 22:16:54,214 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:16:54,214 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.57 1.06 -server_input_tokens 1609.73 3033.38 -server_output_tokens 564.13 1054.70 -2025-06-23 22:16:54,214 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 348 | 533 -1 | 162 | 276 -2025-06-23 22:17:02,316 - __main__ - INFO - vllm running req: 62 queue req: 228 -2025-06-23 22:17:04,216 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:17:04,217 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.58 1.07 -server_input_tokens 1630.51 3077.18 -server_output_tokens 576.79 1085.11 -2025-06-23 22:17:04,217 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 357 | 533 -1 | 164 | 276 -2025-06-23 22:17:12,317 - __main__ - INFO - vllm running req: 61 queue req: 218 -2025-06-23 22:17:14,219 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:17:14,219 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.59 1.07 -server_input_tokens 1647.83 3080.03 -server_output_tokens 581.99 1088.56 -2025-06-23 22:17:14,219 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 367 | 533 -1 | 166 | 276 -2025-06-23 22:17:22,318 - __main__ - INFO - vllm running req: 64 queue req: 204 -2025-06-23 22:17:24,221 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:17:24,221 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.59 1.07 -server_input_tokens 1663.26 3073.03 -server_output_tokens 588.56 1089.67 -2025-06-23 22:17:24,221 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 375 | 533 -1 | 168 | 276 -2025-06-23 22:17:32,321 - __main__ - INFO - vllm running req: 69 queue req: 189 -2025-06-23 22:17:34,222 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:17:34,222 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.60 1.07 -server_input_tokens 1685.07 3072.93 -server_output_tokens 594.50 1073.86 -2025-06-23 22:17:34,223 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 385 | 533 -1 | 170 | 276 -2025-06-23 22:17:42,321 - __main__ - INFO - vllm running req: 69 queue req: 173 -2025-06-23 22:17:44,224 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:17:44,224 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.61 1.08 -server_input_tokens 1706.87 3096.57 -server_output_tokens 600.92 1067.50 -2025-06-23 22:17:44,224 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 396 | 533 -1 | 172 | 276 -2025-06-23 22:17:52,322 - __main__ - INFO - vllm running req: 60 queue req: 166 -2025-06-23 22:17:54,225 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:17:54,225 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.62 1.09 -server_input_tokens 1731.66 3125.79 -server_output_tokens 604.26 1062.50 -2025-06-23 22:17:54,226 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 412 | 533 -1 | 173 | 276 -2025-06-23 22:18:02,324 - __main__ - INFO - vllm running req: 63 queue req: 152 -2025-06-23 22:18:04,227 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:18:04,228 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.62 1.10 -server_input_tokens 1737.98 3124.56 -server_output_tokens 605.55 1061.80 -2025-06-23 22:18:04,228 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 421 | 533 -1 | 173 | 276 -2025-06-23 22:18:12,324 - __main__ - INFO - vllm running req: 63 queue req: 144 -2025-06-23 22:18:14,230 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:18:14,230 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.62 1.08 -server_input_tokens 1748.20 3089.42 -server_output_tokens 609.36 1053.60 -2025-06-23 22:18:14,230 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 427 | 533 -1 | 176 | 276 -2025-06-23 22:18:22,325 - __main__ - INFO - vllm running req: 63 queue req: 131 -2025-06-23 22:18:24,232 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:18:24,232 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.63 1.08 -server_input_tokens 1768.02 3093.29 -server_output_tokens 615.63 1050.90 -2025-06-23 22:18:24,232 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 442 | 533 -1 | 176 | 276 -2025-06-23 22:18:32,326 - __main__ - INFO - vllm running req: 62 queue req: 116 -2025-06-23 22:18:34,234 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:18:34,234 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.64 1.10 -server_input_tokens 1788.03 3117.25 -server_output_tokens 621.54 1058.56 -2025-06-23 22:18:34,235 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 456 | 533 -1 | 176 | 276 -2025-06-23 22:18:42,328 - __main__ - INFO - vllm running req: 64 queue req: 106 -2025-06-23 22:18:44,236 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:18:44,237 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.64 1.10 -server_input_tokens 1796.36 3113.99 -server_output_tokens 625.59 1062.35 -2025-06-23 22:18:44,237 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 464 | 533 -1 | 177 | 276 -2025-06-23 22:18:52,329 - __main__ - INFO - vllm running req: 64 queue req: 95 -2025-06-23 22:18:54,238 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:18:54,238 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.65 1.09 -server_input_tokens 1813.96 3097.64 -server_output_tokens 631.77 1062.07 -2025-06-23 22:18:54,238 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 476 | 533 -1 | 177 | 276 -2025-06-23 22:19:02,330 - __main__ - INFO - vllm running req: 66 queue req: 83 -2025-06-23 22:19:04,239 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:19:04,239 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.65 1.09 -server_input_tokens 1824.70 3087.65 -server_output_tokens 632.50 1051.53 -2025-06-23 22:19:04,239 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 485 | 533 -1 | 177 | 276 -2025-06-23 22:19:12,332 - __main__ - INFO - vllm running req: 66 queue req: 71 -2025-06-23 22:19:14,241 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:19:14,241 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.66 1.09 -server_input_tokens 1839.87 3109.67 -server_output_tokens 636.03 1053.52 -2025-06-23 22:19:14,241 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 496 | 533 -1 | 177 | 276 -2025-06-23 22:19:22,334 - __main__ - INFO - vllm running req: 67 queue req: 58 -2025-06-23 22:19:24,243 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:19:24,243 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.66 1.11 -server_input_tokens 1864.18 3171.49 -server_output_tokens 643.90 1075.60 -2025-06-23 22:19:24,244 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 510 | 533 -1 | 178 | 276 -2025-06-23 22:19:32,335 - __main__ - INFO - vllm running req: 67 queue req: 48 -2025-06-23 22:19:34,245 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:19:34,245 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.67 1.11 -server_input_tokens 1870.93 3132.95 -server_output_tokens 646.43 1061.19 -2025-06-23 22:19:34,245 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 517 | 533 -1 | 180 | 276 -2025-06-23 22:19:38,303 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8: Expecting ',' delimiter: line 1 column 2504 (char 2503) -2025-06-23 22:19:38,303 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8 -2025-06-23 22:19:38,539 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8 -2025-06-23 22:19:42,337 - __main__ - INFO - vllm running req: 67 queue req: 35 -2025-06-23 22:19:44,246 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:19:44,246 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.67 1.11 -server_input_tokens 1885.46 3133.66 -server_output_tokens 650.63 1058.22 -2025-06-23 22:19:44,246 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 518 | 533 -1 | 192 | 276 -2025-06-23 22:19:52,337 - __main__ - INFO - vllm running req: 63 queue req: 28 -2025-06-23 22:19:54,249 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:19:54,249 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.68 1.12 -server_input_tokens 1893.00 3122.97 -server_output_tokens 652.22 1050.72 -2025-06-23 22:19:54,249 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 519 | 533 -1 | 202 | 276 -2025-06-23 22:20:02,338 - __main__ - INFO - vllm running req: 64 queue req: 12 -2025-06-23 22:20:04,250 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:20:04,250 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.68 1.14 -server_input_tokens 1913.56 3177.61 -server_output_tokens 659.94 1072.11 -2025-06-23 22:20:04,250 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 521 | 533 -1 | 216 | 276 -2025-06-23 22:20:12,339 - __main__ - INFO - vllm running req: 63 queue req: 1 -2025-06-23 22:20:14,251 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:20:14,252 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.69 1.14 -server_input_tokens 1927.76 3180.25 -server_output_tokens 664.76 1074.28 -2025-06-23 22:20:14,252 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 523 | 533 -1 | 225 | 276 -2025-06-23 22:20:21,974 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13 -2025-06-23 22:20:21,975 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13: - Response exceeded model_max_context, cannot use this response -2025-06-23 22:20:22,274 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13 -2025-06-23 22:20:22,340 - __main__ - INFO - vllm running req: 42 queue req: 0 -2025-06-23 22:20:24,253 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:20:24,253 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.70 1.17 -server_input_tokens 1961.53 3284.33 -server_output_tokens 676.07 1109.23 -2025-06-23 22:20:24,253 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 527 | 533 -1 | 241 | 276 -2025-06-23 22:20:32,341 - __main__ - INFO - vllm running req: 22 queue req: 0 -2025-06-23 22:20:34,254 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:20:34,255 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.71 1.22 -server_input_tokens 2004.27 3416.28 -server_output_tokens 690.33 1153.40 -2025-06-23 22:20:34,255 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 528 | 533 -1 | 261 | 276 -2025-06-23 22:20:42,342 - __main__ - INFO - vllm running req: 9 queue req: 0 -2025-06-23 22:20:44,256 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:20:44,256 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.72 1.23 -server_input_tokens 2020.84 3488.51 -server_output_tokens 697.63 1182.37 -2025-06-23 22:20:44,256 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 529 | 533 -1 | 271 | 276 -2025-06-23 22:20:46,223 - __main__ - WARNING - JSON decode error on attempt 1 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:20:46,223 - __main__ - INFO - Reducing anchor text len to 1500 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:20:46,372 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:20:52,343 - __main__ - INFO - vllm running req: 6 queue req: 0 -2025-06-23 22:20:54,257 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:20:54,258 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.71 1.19 -server_input_tokens 2019.62 3397.47 -server_output_tokens 699.92 1157.77 -2025-06-23 22:20:54,258 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 529 | 533 -1 | 275 | 276 -2025-06-23 22:21:02,344 - __main__ - INFO - vllm running req: 5 queue req: 0 -2025-06-23 22:21:04,260 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:21:04,260 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.71 1.19 -server_input_tokens 2001.86 3397.47 -server_output_tokens 693.77 1157.77 -2025-06-23 22:21:04,260 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 529 | 533 -1 | 275 | 276 -2025-06-23 22:21:06,731 - __main__ - INFO - Finished TaskGroup for worker on a33f691ea15b24c747ed3f2369ced021b03cea55 -2025-06-23 22:21:06,732 - __main__ - INFO - Got 13 docs for a33f691ea15b24c747ed3f2369ced021b03cea55 -2025-06-23 22:21:06,749 - __main__ - INFO - Writing 13 markdown files for a33f691ea15b24c747ed3f2369ced021b03cea55 -2025-06-23 22:21:06,758 - __main__ - INFO - Worker 1 exiting due to empty queue -2025-06-23 22:21:11,889 - __main__ - WARNING - JSON decode error on attempt 2 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:21:11,889 - __main__ - INFO - Reducing anchor text len to 750 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:21:12,037 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:21:12,345 - __main__ - INFO - vllm running req: 3 queue req: 0 -2025-06-23 22:21:14,261 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:21:14,262 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.70 1.11 -finished_input_tokens 671.32 2567.49 -finished_output_tokens 234.75 897.82 -server_input_tokens 1992.10 3210.25 -server_output_tokens 694.44 1106.62 -2025-06-23 22:21:14,262 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 530 | 533 -1 | 276 | 276 -2025-06-23 22:21:22,346 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:21:24,265 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:21:24,265 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.70 1.10 -finished_input_tokens 665.51 2567.49 -finished_output_tokens 232.72 897.82 -server_input_tokens 1980.54 3184.50 -server_output_tokens 694.17 1112.84 -2025-06-23 22:21:24,265 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:21:31,565 - __main__ - WARNING - JSON decode error on attempt 3 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:21:31,565 - __main__ - INFO - Reducing anchor text len to 375 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:21:31,720 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:21:32,347 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:21:34,267 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:21:34,268 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.69 1.05 -finished_input_tokens 659.81 2567.49 -finished_output_tokens 230.73 897.82 -server_input_tokens 1964.80 3064.29 -server_output_tokens 689.40 1079.47 -2025-06-23 22:21:34,268 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:21:42,348 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:21:44,269 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:21:44,269 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.69 1.05 -finished_input_tokens 654.21 2567.49 -finished_output_tokens 228.77 897.82 -server_input_tokens 1948.11 3064.29 -server_output_tokens 683.55 1079.47 -2025-06-23 22:21:44,269 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:21:48,089 - __main__ - WARNING - JSON decode error on attempt 4 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:21:48,089 - __main__ - INFO - Reducing anchor text len to 187 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:21:48,235 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:21:52,349 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:21:54,271 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:21:54,272 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.68 1.01 -finished_input_tokens 648.70 2567.49 -finished_output_tokens 226.84 897.82 -server_input_tokens 1932.75 2953.05 -server_output_tokens 678.78 1041.06 -2025-06-23 22:21:54,272 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:02,349 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:22:04,273 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:22:04,273 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.67 1.01 -finished_input_tokens 643.28 2567.49 -finished_output_tokens 224.95 897.82 -server_input_tokens 1916.61 2953.05 -server_output_tokens 673.11 1041.06 -2025-06-23 22:22:04,273 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:10,051 - __main__ - WARNING - JSON decode error on attempt 5 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:22:10,052 - __main__ - INFO - Reducing anchor text len to 93 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:22:10,198 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:22:12,351 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:22:14,274 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:22:14,274 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.67 0.94 -finished_input_tokens 637.95 2567.49 -finished_output_tokens 223.08 897.82 -server_input_tokens 1901.66 2717.47 -server_output_tokens 668.83 948.51 -2025-06-23 22:22:14,274 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:22,351 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:22:24,276 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:22:24,276 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.66 0.94 -finished_input_tokens 632.71 2567.49 -finished_output_tokens 221.25 897.82 -server_input_tokens 1886.04 2717.47 -server_output_tokens 663.34 948.51 -2025-06-23 22:22:24,277 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:32,353 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:22:32,624 - __main__ - WARNING - JSON decode error on attempt 6 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:22:32,624 - __main__ - INFO - Reducing anchor text len to 46 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:22:32,779 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:22:34,278 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:22:34,278 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.66 0.85 -finished_input_tokens 627.55 2567.49 -finished_output_tokens 219.45 897.82 -server_input_tokens 1871.54 2475.65 -server_output_tokens 659.25 864.54 -2025-06-23 22:22:34,278 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:42,354 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:22:44,280 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:22:44,280 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.65 0.85 -finished_input_tokens 622.48 2567.49 -finished_output_tokens 217.67 897.82 -server_input_tokens 1856.41 2475.65 -server_output_tokens 653.92 864.54 -2025-06-23 22:22:44,280 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:52,355 - __main__ - INFO - vllm running req: 1 queue req: 0 -2025-06-23 22:22:54,281 - __main__ - INFO - Queue remaining: 0 -2025-06-23 22:22:54,282 - __main__ - INFO - -Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ----------------------------------------------------------------------------------- -completed_pages 0.65 0.85 -finished_input_tokens 617.49 2567.49 -finished_output_tokens 215.93 897.82 -server_input_tokens 1841.53 2475.65 -server_output_tokens 648.67 864.54 -2025-06-23 22:22:54,282 - __main__ - INFO - -Worker ID | finished | started -----------+----------+-------- -0 | 532 | 533 -1 | 276 | 276 -2025-06-23 22:22:55,207 - __main__ - WARNING - JSON decode error on attempt 7 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) -2025-06-23 22:22:55,208 - __main__ - INFO - Reducing anchor text len to 23 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 -2025-06-23 22:22:55,208 - __main__ - ERROR - Failed to process /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 after 8 attempts. -2025-06-23 22:22:55,229 - __main__ - ERROR - Document /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf has 1 fallback pages out of 58 exceeding max_page_error_rate of 0.004, discarding document. -2025-06-23 22:22:55,230 - __main__ - INFO - Finished TaskGroup for worker on 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 -2025-06-23 22:22:55,230 - __main__ - INFO - Got 20 docs for 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 -2025-06-23 22:22:55,254 - __main__ - INFO - Writing 20 markdown files for 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 -2025-06-23 22:22:55,274 - __main__ - INFO - Worker 0 exiting due to empty queue -2025-06-23 22:22:55,274 - __main__ - INFO - ================================================================================ -2025-06-23 22:22:55,275 - __main__ - INFO - FINAL METRICS SUMMARY -2025-06-23 22:22:55,275 - __main__ - INFO - ================================================================================ -2025-06-23 22:22:55,275 - __main__ - INFO - Total elapsed time: 1248.38 seconds -2025-06-23 22:22:55,275 - __main__ - INFO - Total Server Input tokens: 2,298,171 -2025-06-23 22:22:55,275 - __main__ - INFO - Total Server Output tokens: 810,762 -2025-06-23 22:22:55,275 - __main__ - INFO - Finished input tokens: 2,143,438 -2025-06-23 22:22:55,276 - __main__ - INFO - Finished output tokens: 772,257 -2025-06-23 22:22:55,276 - __main__ - INFO - Completed pages: 808 -2025-06-23 22:22:55,276 - __main__ - INFO - Failed pages: 1 -2025-06-23 22:22:55,276 - __main__ - INFO - Page Failure rate: 0.12% -2025-06-23 22:22:55,276 - __main__ - INFO - Server Input tokens/sec rate: 1840.92 -2025-06-23 22:22:55,276 - __main__ - INFO - Server Output tokens/sec rate: 649.45 -2025-06-23 22:22:55,276 - __main__ - INFO - Finished Input tokens/sec rate: 1716.97 -2025-06-23 22:22:55,276 - __main__ - INFO - Finished Output tokens/sec rate: 618.61 -2025-06-23 22:22:55,276 - __main__ - INFO - ================================================================================ -2025-06-23 22:22:55,277 - __main__ - INFO - Work done -2025-06-23 22:22:55,277 - __main__ - INFO - Got cancellation request for VLLM server -job pau at: Mon Jun 23 22:23:00 CDT 2025 +2025-06-24 08:30:17,234 - __main__ - INFO - Got --pdfs argument, going to add to the work queue +Traceback (most recent call last): + File "", line 198, in _run_module_as_main + File "", line 88, in _run_code + File "/usr/local/lib/python3.12/dist-packages/olmocr/pipeline.py", line 1228, in + asyncio.run(main()) + File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run + return runner.run(main) + ^^^^^^^^^^^^^^^^ + File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run + return self._loop.run_until_complete(task) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete + return future.result() + ^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.12/dist-packages/olmocr/pipeline.py", line 1105, in main + raise ValueError("pdfs argument needs to be either a local path, an s3 path, or an s3 glob pattern...") +ValueError: pdfs argument needs to be either a local path, an s3 path, or an s3 glob pattern... +job pau at: Tue Jun 24 08:30:18 CDT 2025 diff --git a/models/bertopic_job.sh b/models/bertopic_job.sh new file mode 100644 index 0000000..7f94d3f --- /dev/null +++ b/models/bertopic_job.sh @@ -0,0 +1,23 @@ +#!/bin/bash +#SBATCH -A p32852 +#SBATCH -p gengpu +#SBATCH --gres=gpu:a100:1 +#SBATCH --nodes=2 +#SBATCH --ntasks-per-node=1 +#SBATCH --time=24:00:00 +#SBATCH --mem=64G +#SBATCH --cpus-per-task=4 +#SBATCH --job-name=SLR_OCR +#SBATCH --output=slr_ocr_job.log +#SBATCH --mail-type=BEGIN,END,FAIL +#SBATCH --mail-user=gaughan@u.northwestern.edu + +echo "setting up the environment by loading in conda environment at $(date)" + +conda activate bertopic-env + +echo "running the bertopic job at $(date)" + +python /home/nws8519/git/adaptation-slr/models/bertopic_modeling.py + + diff --git a/models/bertopic_modeling.py b/models/bertopic_modeling.py new file mode 100644 index 0000000..f78aa98 --- /dev/null +++ b/models/bertopic_modeling.py @@ -0,0 +1,83 @@ +from bertopic import BERTopic + +import os +import re +from markdown import markdown +from bs4 import BeautifulSoup + +#function generated by GitHub CoPilot +def strip_markdown(md_text): + # Convert markdown to HTML, then extract plaintext + html = markdown(md_text) + soup = BeautifulSoup(html, "html.parser") + return soup.get_text(separator="\n").strip() + +#function generated by GitHub CoPilot +def split_omd_sections(md_content): + # Use headings (lines starting with #) as section delimiters + sections = [] + current_section = [] + lines = md_content.splitlines() + for line in lines: + if re.match(r'^#{1,6} ', line): # Heading line + if current_section: + sections.append('\n'.join(current_section)) + current_section = [] + current_section.append(line) + if current_section: + sections.append('\n'.join(current_section)) + return sections + +#function generate by GitHub CoPilot +def split_md_sections(md_content): + sections = [] + current_section = [] + lines = md_content.splitlines() + num_lines = len(lines) + + def is_heading(line): + return re.match(r'^#{1,6} ', line) + + def is_title_line(idx): + # A title line is surrounded by blank lines and is not itself blank or a heading + if is_heading(lines[idx]) or not lines[idx].strip(): + return False + before_blank = (idx == 0) or not lines[idx-1].strip() + after_blank = (idx == num_lines-1) or not lines[idx+1].strip() + # Exclude if the line is too short (e.g., just a number) + line = lines[idx].strip() + substantial = bool(re.match(r'^\d+ [^\d\.].*', line)) + return before_blank and after_blank and substantial + + for i, line in enumerate(lines): + if is_heading(line) or is_title_line(i): + if current_section: + sections.append('\n'.join(current_section)) + current_section = [] + current_section.append(line) + if current_section: + sections.append('\n'.join(current_section)) + return sections + + +#function generated by GitHub CoPilot +def get_all_md_sections(directory): + all_sections = [] + for filename in os.listdir(directory): + if filename.endswith('.md'): + filepath = os.path.join(directory, filename) + with open(filepath, encoding="utf-8") as f: + content = f.read() + sections = split_md_sections(content) + clean_sections = [strip_markdown(section) for section in sections if section.strip()] + all_sections.extend(clean_sections) + return all_sections + +if __name__ == "__main__": + directory = "/home/nws8519/git/adaptation-slr/studies/" + docs = get_all_md_sections(directory) + topic_model = BERTopic() + topics, probabilities = topic_model.fit_transform(docs) + topic_model.get_topic_info() + topic_model.get_document_info(docs) + topic_model.save("/home/nws8519/git/adaptation-slr/models/", serialization="pickle") diff --git a/models/slr_ocr_job.log b/models/slr_ocr_job.log new file mode 100644 index 0000000..5274b3c --- /dev/null +++ b/models/slr_ocr_job.log @@ -0,0 +1,33385 @@ +setting up the environment by loading in conda environment at Tue Jun 24 10:00:23 CDT 2025 + +CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. +To initialize your shell, run + + $ conda init + +Currently supported shells are: + - bash + - fish + - tcsh + - xonsh + - zsh + - powershell + +See 'conda init --help' for more information and options. + +IMPORTANT: You may need to close and restart your shell after running 'conda init'. + + +running the bertopic job at Tue Jun 24 10:00:24 CDT 2025 +------------------------------- +Section 1: +I Depended on You and You Broke Me: An Empirical Study of Manifesting Breaking Changes in Client Packages + + +DANIEL VENTURINI, Federal University of Technology (UTFPR), Brazil +FILIPE ROSEIRO COGO, Huawei Technologies, Canada +IVANILTON POLATO, Federal University of Technology (UTFPR), Brazil +MARCO A. GEROSA, Northern Arizona University (NAU), United States +IGOR SCALIANTE WIESE, Federal University of Technology (UTFPR), Brazil + + +Complex software systems have a network of dependencies. Developers often configure package managers (e.g., npm) to automatically update dependencies with each publication of new releases containing bug fixes and new features. When a dependency release introduces backward-incompatible changes, commonly known as breaking changes, dependent packages may not build anymore. This may indirectly impact downstream packages, but the impact of breaking changes and how dependent packages recover from these breaking changes remain unclear. To close this gap, we investigated the manifestation of breaking changes in the npm ecosystem, focusing on cases where packages’ builds are impacted by breaking changes from their dependencies. We measured the extent to which breaking changes affect dependent packages. Our analyses show that around 12% of the dependent packages and 14% of their releases were impacted by a breaking change during updates of non-major releases of their dependencies. We observed that, from all of the manifesting breaking changes, 44% were introduced in both minor and patch releases, which in principle should be backward compatible. Clients recovered themselves from these breaking changes in half of the cases, most frequently by upgrading or downgrading the provider’s version without changing the versioning configuration in the package manager. We expect that these results help developers understand the potential impact of such changes and recover from them. + + +CCS Concepts: • Software and its engineering → Software evolution; + + +Additional Key Words and Phrases: Breaking changes, Semantic Version, npm, dependency management, change impact + + +ACM Reference format: +Daniel Venturini, Filipe Roseiro Cogo, Ivanilton Polato, Marco A. Gerosa, and Igor Scaliante Wiese. 2023. I Depended on You and You Broke Me: An Empirical Study of Manifesting Breaking Changes in Client Packages. ACM Trans. Softw. Eng. Methodol. 32, 4, Article 94 (May 2023), 26 pages. https://doi.org/10.1145/3576037 + + +This work is partially supported by the National Science Foundation under Grant Number IIS-1815503, CNPq/MCTI/FNDCT (grant #408812/2021-4), and MCTIC/CGI/FAPESP (grant #2021/06662-1). + + +Authors’ addresses: D. Venturini, I. Polato, and I. S. Wiese, Federal University of Technology (UTFPR), Campo Mourão, Paraná, Brazil; emails: danielventurini@alunos.utfpr.edu.br, {ipolato,igor}@utfpr.edu.br; F. R. Cogo, Huawei Technologies, Kingston, Canada; email: filipe.cogo@gmail.com; M. A. Gerosa, Northern Arizona University (NAU), Flagstaff, AZ; email: Marco.Gerosa@nau.edu. + + +Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. + + +© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1049-331X/2023/05-ART94 $15.00 https://doi.org/10.1145/3576037 +1 INTRODUCTION + + +Complex software systems are commonly built upon dependency relationships in which a client package reuses the functionalities of provider packages, which in turn depend on other packages. To automate the process of installing, upgrading, configuring, and removing dependencies, package managers such as npm, Maven, pip, and Cargo are widely adopted. Despite the many benefits brought by the reuse of provider packages, one of the main risks client packages face is breaking changes [21]. Breaking changes are backward-incompatible changes performed by the provider package that renders the client package build defective (e.g., a change in a provider’s API). When client packages configure package managers to automatically accept updates on a range of provider package versions, the breaking change will have the serious consequence of catching clients off guard. For example, in npm, where most of the packages follow the Semantic Versioning specification [23], clients adopt configurations that automatically update minor and patch releases of their providers. In principle, these release types should not contain any breaking changes, as the semantic version posits that only major updates should contain breaking changes. However, minor or patch releases occasionally introduce breaking changes and generate unexpected errors in the client packages when these breaking changes manifest on clients. Due to the transitive nature of the dependencies in package managers, unexpected breaking changes can potentially impact a large proportion of the dependency network, preventing several packages from performing a successful build. + + +Research has shown that providers occasionally incorrectly use the Semantic Versioning specification [15]. In the npm ecosystem, prior research has shown that provider packages indeed publish releases containing breaking changes [14, 15, 18, 19]. However, such studies provide limited information regarding the prevalence of these breaking changes, focusing on API breaking changes without clarifying how the client packages solve the problems they cause. In this article, we fill this gap by conducting an empirical study of npm projects hosted on GitHub, verifying the frequency and types of the breaking changes that manifest as defects in client packages and how clients recover from them. npm is the main package manager for the JavaScript programming language, with more than 1 million packages. An estimated 97% of web applications come from npm [1], making it the most extensive dependency network [9]. We employed mixed methods to identify and analyze the types of manifesting breaking changes—changes in a provider release that render the client’s build defective—and how client packages deal with them in their projects. This article does not study cases in which a breaking change does not manifest itself in other projects. Our research answers the following questions: + + +RQ1. To what extent do breaking changes manifest themselves in client packages? +We analyzed 384 packages selected using a random sampling approach (95% confidence level and ±5% confidence interval) to select client packages with at least one provider. We found that manifesting breaking changes impacted 11.7% of all client packages (regardless of their releases) and 13.9% of their releases. In addition, 2.6% of providers introduced manifesting breaking changes. + + +RQ2. What changes in the provider packages manifest a breaking change? +The main causes of manifesting breaking changes were feature modifications, change propagation among dependencies, and data type modifications. We also verified that an equal proportion of manifesting breaking changes was introduced in minor and patch releases (approximately 44% in each release type). Providers fixed most of the manifesting breaking change cases introduced in minor and patch releases (46.4% and 61.5%, respectively). Finally, manifesting breaking changes were documented in issue reports, pull requests, or changelogs in 78.1% of cases. +RQ3. How do client packages recover from manifesting breaking changes? + + +Client packages recovered from manifesting breaking changes in 39.1% of the cases, and their recovery took about 134 days when providers did not fix the break or when clients recovered first. When providers released a fix to a manifesting breaking change, they took a median of 7 days. Upgrading the provider is the most frequent way client packages recover from a manifesting breaking change. + + +This article contributes to the literature by providing quantitative and qualitative empirical evidence about the phenomenon of manifesting breaking changes in the npm ecosystem. Our qualitative study may help developers understand the types of changes that manifest defects in client packages and which strategies are used to recover from breaking changes. We also provide several suggestions about how clients and providers can enhance the quality of their release processes. As an additional contribution, we created pull requests for real manifesting breaking change cases that had not yet been resolved, half of which were merged. +---------------------------------------- +------------------------------- +Section 2: +2 DEFINITIONS, SCOPE, AND MOTIVATING EXAMPLES + + +This section defines terms used in this article and describes motivating examples for our research. + + +2.1 Glossary Definitions + + +In the following, we describe the terms and definitions that we use in the article, based on related work [7, 11, 17]. + + + + + + +Provider package release + is the package release that provides features and resources for use by other package releases. In Figure 1, the package express is a provider of ember-cli, body-parser is a provider of express, and so on. We refer to a provider package $P$ as a transitive provider when we want to emphasize that $P$ has other provider packages. For instance, in Figure 1, body-parser is a provider of express; body-parser also has bytes as a provider. In this scenario, we consider body-parser to be a transitive provider. + + + + + + +Client package release + is the package release that uses features and resources exposed by provider package releases. In Figure 1, express is a client of body-parser, body-parser is a client of bytes, and so on. + + + + + + +Direct provider release + is the one directly used by its client, that is, the package that the client explicitly declares as a dependency. In Figure 1, express is a direct provider of ember-cli, and bytes is a direct provider of body-parser. + + + + + + +Indirect provider release + is a package release that at least one of its providers uses. In other words, it is a provider of at least one of the direct client’s providers. In Figure 1, both body-parser and bytes are indirect providers of ember-cli, and bytes is an indirect provider of express. + + + + + + +Transitive provider release + is the package release between the one that introduced a breaking change and the client. For example, if a breaking change is introduced by bytes, in Figure 1, and affects client ember-cli, both packages express and body-parser are transitive providers. This is because the breaking change transited through these packages (body-parser and express) to arrive at client ember-cli. The transitive providers are all also impacted by the breaking change. +• +Version statement: + A client can specify its provider’s versions on +package.json +, a metadata file used by npm to specify providers and their versions, among other purposes. The version statement contains the accepted version of a provider. For example, the version statement in the following metadata +{"dependencies": {"express": "^4.10.6"}} + defines that the client requires express on version +^4.10.6 +. + + + + + + +• +Version range: + On the version statement a client can specify a range of versions/releases accepted by its provider. There are three types of ranges: + - +All (>=, or *): + Using this range, the client specifies that all new provider releases are supported/accepted and downloadable, even the ones with breaking changes. + - +Caret (^): + With this range, the client specifies that all new provider releases that contain new features and bug fixes are supported/accepted and downloadable; breaking changes must be avoided. This is the default range used by npm when a dependency is installed. + - +Tilde range (~): + This range specifies that all new provider releases that only contain bug fixes are supported/accepted and downloadable; breaking changes and new features must be avoided. + - +Steady range: + This range always resolves to a specific version and is also known as specific range. That is, the versioning statement has no range on it but rather a specific version. npm allows installation with a steady range using the command line option +--save-exact +. + + +• +Implicit and explicit update: + An implicit update happens when the client receives a new provider version due to the range version in the +package.json +. For a version statement defined with a range of versions, for example, +^4.10.6 +, an implicit update happens when npm installs a version 4.10.9 that matches the range. An explicit update takes place when the client manually updates the versioning statement directly in the +package.json +. + + +• +Manifesting breaking changes + are provider changes that manifest as a fault on the client package, ultimately breaking the client’s build. The adopted definition of breaking change by the prior literature [3–6, 8, 15, 19, 21] includes cases that are not considered breaking changes (e.g., a change in an API that is not effectively used by a client package). Conversely, manifesting breaking changes include cases that are not covered by the prior definitions of breaking change (e.g., because the provider package is used in a way that is not intended by the provider developer, a semantic-version-compliant change introduced by a new release of this provider causes an expected error in the client package). +---------------------------------------- +------------------------------- +Section 3: +2.2 Motivating Examples + + +We found the following two examples of manifesting breaking changes in our manual analysis (on each of the following Listings, red lines have been removed from the source code, whereas blue lines have been inserted into the source code). Our manual analysis (Section 3.2.1) consists of executing the client tests suite for its releases and analyzing all executions that run into an error. + + +The client +assetgraph-builder@7.0.0 + has a provider +assetgraph@6.0.0 + that has a provider +terser@^4.0.0 +, but, due to a range of versions, npm installed +terser@4.6.10 +. Release 4.3.0 of terser introduces a change that, by default, enables the wrapping of functions on parsing, as shown in Listing 1. + + +```javascript +// terser@4.2.1 without default wrapping behavior +foo(function(){}); + + +// terser@4.3.0 default wrapping behavior +foo((function(){})); +``` + + +Listing 1. Diff between terser@4.2.1 and terser@4.3.0 default behavior. + + +[1]https://github.com/terser/terser/compare/v4.2.1..v4.3.0. +This change breaks the assetgraph-builder@7.0.0’s tests. Once this feature is turned into a default behavior, the client assetgraph-builder@8.0.0 adopts its test to make it compatible with the terser’s behavior, as shown in Listing 2. + + +javascript +expect( + javaScriptAssets[0].text, + 'to match', + - /SockJS=[\s\S]*define\("main",function\(\)\{\}\);/ + + /SockJS=[\s\S]*define\("main",\(?function\(\)\{\}\) ?\);/ +); + + +Listing 2. Diff with assetgraph@8.0.0 client’s tests adjusting to breaking change. + + +Sometimes, provider changes can break a client long after their introduction. This occurred in the client package ember-cli-chartjs@2.1.1. In Figure 2, the release 1.0.4 of ember-cli-qunit (left-tree) introduced a change that did not lead to a breaking change. However, almost 3 years later, ember-cli-qunit was used together with release 1.3.1 of the provider broccoli-plugin (middle-tree), and a breaking change manifested. + + +In November 2015, the provider ember-cli-qunit@1.0.4 fixed an error in its code, changing the returned object type of function lintTree, as shown in Listing 3. Despite being a type change, it did not break the client when it was released, and this fix was retained in further releases of ember-cli-qunit. + + +javascript +lintTree: function(type, tree) { + // Skip if useLintTree === false. + if (this.options['ember-cli-qunit'] && ... ) { + return tree; + + // Fakes an empty broccoli tree + + return { inputTree: tree, rebuild: function() { return []; } };} + + +Listing 3. ember-cli-qunit@1.0.4 object type change. + + +Almost 3 years later, in August 2018, the provider broccoli-plugin@1.3.1 was released (middle-tree in Figure 2) to fix a bug, as in Listing 4. + + +javascript +function isPossibleNode(node) { + - return typeof node === 'string' || + - (node !== null && typeof node === 'object') + + var type = typeof node; + + + + +2 https://github.com/terser/terser/issues/496. +3 https://github.com/assetgraph/assetgraph-builder/commit/e4140416e7feaa3d088cf3ad0229fd677ff36dbc. +4 https://github.com/ember-cli/ember-cli-qunit/commit/6fdfe7d. +5 https://github.com/broccolijs/broccoli-plugin/commit/3f9a42b. +Release 1.3.1 of the broccoli-plugin package experienced a manifesting breaking change due to a fix in the provider ember-cli-qunit@1.0.4, which was released almost 3 years prior. This manifesting breaking change occurred because the ember-cli-chartjs’ dependency tree evolved over time due to the range versions, as shown in Figure 2, causing the break. When the package ember-cli-chartjs@2.1.1 was installed in April 2020 (the date of our analysis), the installation failed due to the integration of broccoli-plugin@1.3.1 changes into ember-cli-qunit. Fifteen days later, ember-cli-qunit@1.4.3 fixed the issue when the ember-cli-qunit’s object type was changed again. During the 15-day period when the manifesting breaking change remained unresolved, broccoli-plugin received about 384k downloads from npm. This scenario shows that even popular and mature projects can be affected by breaking changes. Although we recognize that the download count does not necessarily reflect the popularity of a package, we use this metric as an illustrative example of how many client packages might have been impacted by a provider package. +---------------------------------------- +------------------------------- +Section 4: +3 STUDY DESIGN + + +This section describes how we collected our data (Section 3.1) and the motivation and approach for each RQ (Section 3.2). + + +3.1 Data Collection + + +3.1.1 Obtaining Metadata from npm Packages. The first part of Figure 3 shows our approach for sampling the database. We initially gathered all the metadata files (i.e., package.json files) from the published packages in the npm registry between December 20, 2010, and April 01, 2020, accounting for 1,233,944 packages. This range refers to the oldest checkpoint that we could retrieve and the most recent one when we started this study. We ignored packages that did not have any providers in the package.json since they cannot be considered client packages and will therefore not suffer breaking changes. After filtering packages without a provider, our dataset comprises 987,595 package.json metadata files. For each release of each package, we recorded the timestamp of the release and the name of the providers with their respective versioning statements. + + +We parsed all the versioning statements and determined the resolved provider version at the time of each client release. Prior works have adopted similar approaches when studying dependency management [7, 29]. For each provider in each client release, we retrieved the most recent provider version that satisfied the range specified by the client in that release, i.e., the resolved version. Using this resolved version, we determined whether a provider changed its version between the two client releases. In other words, we reproduced the adopted versions of all providers by resolving the provider version at the release time of the client. + + +To further refine our sample, we analyzed two criteria in the associated package.json snapshot with the latest version of the client packages in our dataset: + + +6https://github.com/broccolijs/broccoli-merge-trees/issues/65. +7https://github.com/ember-cli/ember-cli-qunit/commit/59ca6ad. +(1) The +package.json + snapshot should have a non-empty entry for the “script test” field, and the entry should differ from the default: +Error: no test specified +. We specified this criterion in order to run the automated tests that were part of our method to detect manifesting breaking changes. In total, 488,805 packages remained after applying this criterion. + + +(2) The +package.json + snapshot should have an entry containing the package’s repository URL, as we wanted to retrieve information from the package codebase. After applying this criterion, 410,433 packages remained in our dataset. +---------------------------------------- +------------------------------- +Section 5: +3.1.2 Running Clients’ Tests. + + +Given the size of our dataset (more than 410,000 client packages), we ran tests on a random sample. At a 95% confidence level and ±5% confidence interval, we randomly selected 384 packages. Our sample has a median of 5.5 releases and 9 direct providers per package. We chose to study a random sample since our manual analysis is slow to run over a large dataset (Section 3.1.3); we spent a month executing our method in our sample. We did not ignore packages based on the number of releases or providers or any other metric. We performed a manual check on all selected packages that had fewer than four releases (130 out of 384) by checking their repositories and aiming to remove packages that are not real projects, lack tests, lack code, are example projects, and so forth. When we removed one package, we sampled another one following the two criteria described above. + + +The second part of Figure 3 depicts our approach to running the test scripts for each release of the 384 clients. For each client package, we cloned its repository—all client repositories are hosted on GitHub—and restored the work tree of all releases using their respective release tags (e.g., “v1.0.0”). For releases that are not tagged, we used their provided timestamp in the +package.json + metadata to restore the work tree (i.e., we matched the release timestamp and the closest existing commit in the master branch). We conducted an analysis and verified that tags and timestamps point to the same commit in 94% of releases with tags; thus, checkout based on timestamps is reliable for untagged releases. + + +After restoring the work tree of a client release, we updated all versioning statements in the associated +package.json + entry with the specific resolved provider version (see Section 3.1.1). We then excluded a file called +package-lock.json +, which locks the providers’ and indirect providers’ versions. We also executed the associated tests on a release of the client package whenever a provider package changed in that release, as this can potentially introduce a manifesting breaking +change. A provider change can be (1) a provider added into the +package.json + or (2) the resolved version of a provider changed between the previous and current release of the client package. + + +We sought to reproduce the same build environment that existed when the provider changed. Therefore, before executing the tests of the client packages, we performed a best-effort procedure to identify the Node.js that was adopted by the client package at the time the provider changed. This was because every 6 months a new major version of Node.js is released.(^8) As we wanted to reproduce the test results with respect to the time when the client package published its release, we changed the Node.js version before executing the client package tests. We selected the Node.js version using two different approaches. Our preferred approach was to select the same Node.js version as the one specified in the +engines → node + field of the +package.json + file.(^9) This field allows developers to manually specify the Node.js version that runs the associated code with the build of a specific release. When this field was not set, we selected the latest Node.js version available(^10) at the time of the client package release. Therefore, we changed the Node.js version, executed the install script, and released tests using the +npm install + and +npm test + commands, respectively. If the install or test commands failed due to incompatibilities with the selected Node.js version or took more than 10 minutes, we changed to the previous major release of Node.js until the install and test commands succeeded. We used the +Node Version Manager (NVM) + tool to exchange Node.js versions. Additionally, we also changed the npm version according to the Node.js version. npm is the package manager to Node.js packages and executes the +install + and +test + scripts. We performed the same procedure to select the npm version to use during the installation and test runs. Finally, we executed the install/test scripts and saved the results (success or error) for each client release. + + +After executing the install/test scripts of the 384 client packages in our sample, we discarded 33 packages because the errors did not allow the execution of the install/test script in any of their releases: 15 clients did not have one of the required files; 11 had invalid test scripts (e.g., +"test": "no test" +); 4 listed some required files in the +.gitignore + file, that specifies untracked files that git should ignore;(^11) 2 required specific database configurations that could not be done; and 1 package required a key to access a server. We randomly replaced these 33 packages following the aforementioned criteria. + + +Table 1 shows the results of the execution of the install/test scripts of the 384 client packages and their 3,230 releases. Since the associated providers’ version with 2,727 releases did not change, these tests’ releases were not executed. Finally, we consider as possible manifesting breaking changes cases in which all client packages and releases failed the install/test scripts. + + +A replication package including our client packages’ sample, instruments, scripts, and identified manifesting breaking changes is available for download at +https://doi.org/10.5281/zenodo.5558085 +. + + + + +(^8) +https://github.com/nodejs/node#release-types +. + + +(^9) +https://docs.npmjs.com/files/package.json#engines +. + + +(^10) +https://nodejs.org/en/download/releases +. + + +(^11) +https://git-scm.com/docs/gitignore +. +3.1.3 Manual Check on Failure Cases: Detecting Manifesting Breaking Changes. For all failure cases (203 clients and 1,276 releases) on the execution of install/test scripts, we manually analyzed which ones were true cases of manifesting breaking changes. To identify breaking changes that manifest themselves in a client package, we leveraged the output logs (logs generated by npm when executing the install and test scripts) generated as the result of executing the method described in Section 3.1.2 (see the second part of Figure 3). For each failed test result, we obtained the error description and the associated stack trace. We then differentiated failed test results caused by a related issue with the client package (e.g., an introduced bug by the client) from those caused by a change in the provider package (e.g., a change in the return type of a provider’s function). From the obtained stack traces, we determined whether any function of a provider package was called and manually investigated the positive cases. During our manual investigation, we sought to confirm that the test failure was caused by a manifesting breaking change introduced by the provider package. + + +The first author was responsible for running the tests and identifying the manifesting breaking changes and related releases and commits. The first author also manually analyzed each of the manifesting breaking changes and recorded the following information about each of them: the number of affected versions of the client, whether any documentation mentions the manifesting breaking change, the responsible package for addressing the breaking change (provider or client), the client version impacted by the manifesting breaking change, the provider version that introduced the breaking change, and a textual description about the causes for the breaking change manifestation (e.g., “The provider function was renamed by mistake,” “The provider normalizeurl@1.0.0 introduce[d] a new function and the client assetgraph use[d] it. But the client forgot to update the provider version in package.json,” “The provider inserts an ‘in a null body request’”). During this process, several rounds of discussions were performed among the authors to refine the analysis, using continuous comparison [22] and negotiated agreement [13]. In the negotiated agreement process, the researchers discussed the rationale they used to categorize each code until reaching consensus [13]. More specifically, we leveraged the recorded information about each manifesting breaking change to derive a consistent categorization of the introduced breaking changes (RQ2 and RQ3) and to guide new iterations of the manual analysis. + + +More specifically, the following set of actions was performed during our manual investigation: + + + + + + +Analyze the execution flow: + To determine whether the associated function with the test failure occurred in the provider or the client code, we leveraged the stack traces to identify which function was called when the test failed. In particular, we instrumented the code of the provider and the client packages to output any necessary information to analyze the execution flow. We analyzed the variable contents by adding a call to the +console.log() + and +console.trace() + functions in each part of the code where the client package calls a function of the provider. For example, suppose the following error appeared: “TypeError: my-Object.callback is not a function.” To discover the variable content, we use the command +console.log(myObject) + to check whether myObject variable was changed, was null, or received other values. + + + + + + +Analyze the status of the Continuous Integration (CI) pipeline: + We compared the status of the CI pipeline between the originally built release and the status of the CI pipeline at the time of our manual investigation. Since the source code of the client package remains the same between the original release and the installed version in our analysis, we use the difference between the status of the CI pipeline as additional evidence that the test failure was caused by a provider version change. Not all clients had CI pipelines, but when they did, it was helpful. +• +Search for client fixing commits: + We manually searched for recovering commits in the history of commits between the installed and previous releases of the client package. Whenever a recovery commit was identified (by reading the commit message), we determined whether the error was due to the client or the provider code. For example, we observed cases in which a client updated a provider in the release with failed tests. We also observed that, in the following commits, the provider was downgraded and the commit message was “downgrade provider” or “fix breaking change.” In these cases, we considered the test failure as caused by a manifesting breaking change. + + + + + + +• +Search for related issue reports and pull requests: + We hypothesized that a manifesting breaking change would affect different clients that, in turn, would either issue a bug report or perform a fix followed by a pull request to the codebase of the provider package. Therefore, we searched for issue reports and pull requests with the same error message obtained in our stack trace. We then collected detailed information about the error to confirm whether it was due to a manifesting breaking change introduced by the provider package. + + +• +Previous and subsequent provider versions: + If the test error was caused by a manifesting breaking change, downgrading to the previous provider version or upgrading to a subsequent provider version might fix the error, if the provider already fixed it. +Subsequent provider versions + means all provider versions that fit the versioning statement and are greater than the provider version that introduced the manifesting breaking change (i.e., the adopted provider version when the test failed). In this case, we uninstalled the current version and installed the previous and subsequent versions and executed the test scripts again. For example, if the client specified a provider +p + as +{"p": "^1.0.2"} + that brought about a breaking change in the version, for example, +1.0.4 +, we installed +p@1.0.2 +, +p@1.0.3 +, and +p@1.0.5 + to verify whether the error persisted for those versions. +---------------------------------------- +------------------------------- +Section 6: +3.2 Research Questions: Motivation, Approach + + +This section contains the motivation and the approach for each of the research questions. +---------------------------------------- +------------------------------- +Section 7: +3.2.1 RQ1. To What Extent Do Manifesting Breaking Changes Manifest in Client Packages? + + +Motivation: + By default, npm sets the caret range as a default versioning statement that automatically updates minor and patch releases. Hence, manifesting breaking changes that are introduced in minor and patch releases can inadvertently cause downtime in packages that are downloaded hundreds of thousands of times per day, affecting a large body of software developers. Understanding the prevalence of manifesting breaking changes in popular software ecosystems such as npm is important to help developers assess the risks of accepting automatic minor and patch updates. Although prior studies have focused on the frequency of API breaking changes [3], breaking changes can occur for different reasons. Determining the prevalence of a broader range of breaking change types remains an open research problem. + + +Approach: + For all cases that resulted in an error on the install/test script, we determined the type of error (client, provider, not discovered). We calculated, out of the 384 packages and 3,230 releases, the percentage of cases that we confirmed as manifesting breaking change. Considering all the providers on the client’s latest releases, we calculated the percentage of providers that introduced manifesting breaking changes. In addition, we calculated how many times (number of releases) each provider introduced at least one manifesting breaking change. +3.2.2 RQ2. What Problems in the Provider Package Cause a Manifesting Breaking Change? + + +Motivation: + Prior studies about breaking changes in the npm ecosystem are restricted to APIs’ breaking changes [14]. However, other issues that provider packages can introduce in minor and patch releases can manifest a breaking change. To support developers to reason about manifesting breaking changes, it is important to understand their root causes. + + +Approach: + In this RQ, we analyzed the type of changes introduced by provider packages that bring about a manifesting breaking change. With the name and version of the provider packages, we manually analyzed the provider’s repository to find the exact change that caused a break. We used the following approaches to find the specific changes introduced by providers: + + + + + + +Using diff tools: + We used diff tools to analyze the introduced change between two releases of a provider. For example, suppose that a manifesting breaking change was introduced in the release +provider@1.2.5 +. In this case, we retrieved the source code of previous versions, e.g., +provider@1.2.4 +, and performed the diff between these versions to manually inspect the changed code. + + + + + + +Analyzing provider’s commits: + We used the provider’s commits to analyze the changes between releases. For a manifesting breaking change in the provider +p +, we verified its repository and manually analyzed the commits ahead or behind the release tag commit that introduced a manifesting breaking change. + + + + + + +Analyzing changelogs: + Changelogs contain information on all relevant changes in the history of a package. We used these changelogs to understand the introduced changes in a release of a client package and to verify whether any manifesting breaking change fix was described. + + + + + + +We also looked at issue reports and pull requests for explanations of the causes of manifesting breaking changes. After discovering the provider changes that introduced breaking changes, we analyzed, categorized, and grouped common issues. For example, all related issues to changing object types were grouped into a category called +Object type changed +. Furthermore, we analyzed the Semantic Version level that introduced and fixed/recovered the manifesting breaking changes in both the provider and client packages to verify the relationship between manifesting breaking changes and non-major releases. + + +We analyzed the version numbering of releases that fixed a manifesting breaking change and where manifesting breaking changes were documented (changelogs, issue reports, etc.). Furthermore, we analyzed the depth of the dependency tree of the provider that introduced a manifesting breaking change, since 25% of npm packages had at least 95 transitive dependencies in 2016 [10]. + + +3.2.3 RQ3. How Do Client Packages Recover from a Manifesting Breaking Change? + + +Motivation: + A breaking change may impact the client package through an +implicit + or +explicit + update. A client recovery is identified by an update to its code, by waiting for a new provider’s release, or by performing a downgrade/upgrade in the provider’s version. Breaking changes may be caused by either a +direct + or +indirect + provider since the client packages depend on a few direct providers and many indirect ones [11]. A breaking change may cascade to transitive dependencies if it remains unfixed. Even if the client packages can recover from the breaking change by upgrading to a newer version of the provider package, the client packages can manually resolve incompatibilities that might exist [12]. Understanding how breaking changes manifest in client packages can help developers understand how to recover from them. + +Approach: + We retrieved all information for this RQ from the clients’ repositories. We searched for information about the error and how the client packages recovered from the manifesting breaking change. The following information was analyzed: + + + + + + +Commits: + We manually checked the subsequent commits of the client packages that were pushed to their repositories after the provider release that introduced the respective manifesting breaking change. In particular, we searched for commits that touched the +package.json + file. In the file history, we checked if the provider was downgraded, upgraded, replaced, or removed. + + + + + + +Changelogs: + We analyzed the client changelogs and release notes looking for mentions of provider updates/downgrades. About 48% of clients maintained a changelog or release notes in their repositories. + + + + + + +Pull requests/issue reports: + We searched for pull requests and issue reports in the client repository that contained information about the manifesting breaking changes. For example, we found pull requests and issue reports with “Update provider” and “Fix provider error” in the title. + + + + + + +For each manifesting breaking change case, we recovered the provider’s dependency tree. For example, in our second motivating example (Section 2), we recovered the dependency tree from the client to the package that introduced the manifesting breaking change, which resulted in +broccoli-asset-rev +→ +broccoli-filter +→ +broccoli-plugin + (Figure 2). We investigated how many breaking change cases were introduced by direct and indirect providers, when the manifesting breaking change was introduced and fixed/recovered, which package fixed/recovered from it, and how it was fixed/recovered. We also verified how client packages changed the provider’s versions and how the associated documentation with manifesting breaking changes related to the time to fix it. +---------------------------------------- +------------------------------- +Section 8: +3.3 Scope and Limitations + + +As our definition of manifesting breaking changes includes cases that are not included by the prior definitions of breaking changes (see Section 2.1), this article does not intend to provide a direct comparison between these two phenomena. As a result, the stated research questions do not indicate the proportion of manifest breaking changes that are, in fact, breaking changes as defined by prior literature (e.g., an API change by the provider). In addition, since provider packages are rarely accompanied by any formal specification of their intended behavior, it is impossible at the scale of our study to differentiate errors that manifest in the client package due to breaking changes from those that manifest due to an idiosyncratic usage of the provider by the client package. Therefore, the results of the stated RQs cannot be used to assess whether a client package could fix its build by simply updating to a newer version of the provider. +---------------------------------------- +------------------------------- +Section 9: +4 RESULTS + + +This section presents the associated findings for each RQ. +---------------------------------------- +------------------------------- +Section 10: +4.1 RQ1. How Often Do Manifesting Breaking Changes Occur in the Client Package? + + +Finding 1: + 11.7% of the client packages (regardless of their releases) and 13.9% of the client releases were impacted by a manifesting breaking change. From all 384 client packages, 45 (11.7%) suffered a failing test from a manifesting breaking change in at least one release. From 3,230 client releases for which the tests were executed, 1,276 failed, and all errors were manually analyzed. In 450 (13.9%) releases, the error was raised by the provider packages, characterizing a manifesting breaking change. In 86 (2.7%) releases, we could not identify which package raised the error. +Table 2. Results of Releases’ Analyses + + +| Results | Releases (#) | (%) | +|-------------------------------|--------------|------| +| Success | 1,954 | 60.5 | +| Fail | | | +| Client’s errors | 479 | 14.8 | +| manifesting breaking changes | 450 | 13.9 | +| Breaking due to external changes | 261 | 8.1 | +| Errors not identified | 86 | 2.7 | +| Total | 3,230 | 100 | + + +We detected that 261 (8.1%) releases suffered a particular error type that we call +breaking due to external change +. These releases used a provider that relied on data/resources from an external API/service (e.g., Twitter) that were no longer available, impacting all clients’ releases. The provider cannot fix this error, because it does not own the resource. These cases imply that detecting manifest breaking changes by running the clients’ tests can introduce false positives, which we simply ignored during our manual analyses. We also considered cases in which a provider package was removed from npm as +breaking due to external change +. Table 2 shows the results of analyses by releases. + + +Finding 2: + 92.2% of providers introduced a single manifesting breaking change. In our sample, 47 providers (92.2%) of 51 introduced a single release with a manifesting breaking change, and 4 providers introduced two releases with manifesting breaking changes. We detected 55 unique manifesting breaking change cases introduced by providers, some of which impacted multiple clients. For example, the breaking change exhibited in the +Incompatible Providers’ Versions + classification (Finding 3) impacted six clients. Therefore, 64 manifesting breaking change cases manifested in the client packages. Finally, there were 1,909 providers on all clients’ latest versions, and the percentage of providers that introduced manifesting breaking change was 2.6% (51 of 1,909). + + + + +About 11.7% of clients and 13.9% of their releases suffered from manifesting breaking changes. + + +We detected failing tests due to 2% of the providers with changes. + + +Over 90% of those that introduced manifesting breaking changes did so through just a single release with a manifesting breaking change. + + + + +4.2 RQ2. What Issues in the Provider Package Caused a Breaking Change to Manifest? + + +Finding 3: + We found eight categories of issues. We grouped each manifesting breaking change into eight categories, depending on its root cause (issue). Table 3 presents each category, the number of occurrences, and the number of impacted client releases. + + +In the following, we describe each category and present an example that we found during our manual analysis. + + + + +Feature change: + Manifesting breaking changes in this category are related to modifications of provider features (e.g., the default value of variables). An example happens in request@2.17.0—this version was removed from npm, but the introduced change remained in the package—when developers introduced a new decision rule into their code(^\text{12}) as shown in Listing 5. + + + + +(^{12})https://github.com/request/request/commit/d05b6ba. +Table 3. The Identified Categories of Manifesting Breaking Changes + + +| Category | Cases | Releases | +|---------------------------------|-------|----------| +| | (#) | (%) | (#) | (%) | +| Feature change | 25 | 39.1 | 101 | 22.4 | +| Incompatible providers’ versions| 15 | 23.4 | 64 | 14.2 | +| Object type changed | 9 | 14.1 | 213 | 47.3 | +| Undefined object | 5 | 7.8 | 28 | 6.2 | +| Semantically wrong code | 5 | 7.8 | 14 | 3.1 | +| Failed provider update | 2 | 3.1 | 24 | 5.3 | +| Renamed function | 2 | 3.1 | 2 | 0.4 | +| File not found | 1 | 1.6 | 4 | 0.9 | +| +Total + | 64 | | 450 | | + + +Listing 5. Example of a manifesting breaking change categorized as feature change. + + +javascript +debug('emitting complete', self.uri.href) ++ if(response.body == undefined && !self._json) { + + response.body = ""; ++ } +self.emit('complete', response, response.body) + + +In Listing 5, the provider request assigns an empty string to the +response.body + variable instead of preserving +response.body + with its default +undefined + value. + + + + +Incompatible providers’ versions: + In this category, the client breaks because of a change in an indirect provider. An example happens in the packages +babel-eslint + and +escope +, where +escope + is an +indirect + provider of +babel-eslint +. + + + + +javascript +} - }, +- visitClass: { ++ },{ ++ key: 'visitClass', + value: function visitClass(node) { + + +Listing 6. Incompatible providers’ versions example. + + +The release +escope@3.4 + introduced the presented change in Listing 6. This change impacted the package +babel-eslint +, even though the +escope + had not been a direct provider to +babel-eslint +. This manifesting breaking change remained unresolved for a single day, during which +babel-eslint + received about 80k downloads from npm. + + + + +Object type changed: + We detected nine (14.06%) cases in which the provider changed the type of an object, resulting in a breaking change in the client packages. + + + + +javascript +this.setup(); +- this.sockets = []; ++ this.sockets = {}; +this.nsps = {}; +this.connect Buffer = []; +} +var socket = nsp.add(this, function() { +- self.sockets.push(socket); ++ self.sockets[socket.id] = socket; +self.nsps[nsp.name] = socket; + + +Listing 7. Object type changed example. + + +13 +https://github.com/babel/babel-eslint/issues/243 +. +14 +https://github.com/estools/escope/issues/99#issuecomment-178151491 +. +In Listing 7, the provider socket.io@1.4.0 turned an array into an object.\textsuperscript{15} This simple change broke many of socket.io’s clients, even the package karma,\textsuperscript{16} a browser test runner, which was forced to update its code\textsuperscript{17} and publish karma@0.13.19. During the single day, the manifesting breaking change remained unresolved, and karma was downloaded about 146k times from npm. + + + + +Undefined object: + In this category, an undefined object causes a runtime exception that breaks the provider, which throws the exception to the client package. + + + + +javascript ++ app.options = app.options || {}; + app.options.babel = app.options.babel || {}; + app.options.babel.plugins = app.options.babel.plugins || []; + + +Listing 8. Undefined object code example. + + +This error happened in the provider ember-cli-htmlbars-inline-precompile@0.1.3, which solved it as shown in Listing 8.\textsuperscript{18} + + + + + + +Failed provider update: + In this category, provider A updates its provider B, but provider A does not update its code to work with the new provider B. We detected two cases of this category. In addition to an explicit update, one provider A from this category specified its provider B as an accept-all range ((\geq)). Over time, its provider B published a major release that introduced a manifesting breaking change. Despite provider A specifying an accept all range, it did not consider the implicit update of provider B and the client suffered an error. + + + + + + +Semantically wrong code: + Manifesting breaking changes in this category happen when the provider writes a semantically wrong code, generating an error in its runtime process\textsuperscript{19} and affecting the client. These errors could be caught in compile-time in a compiled language, but in JavaScript these errors happen at runtime. This occurred in the provider front-matter@0.2.0 and four other cases. + + + + + + +javascript +const separators = [ '---', '=' yaml ='] +- const pattern = pattern = '^(' ++ const pattern = '^(' ++ '((= yaml =)\|\(---\))' + + +Listing 9. Semantically wrong code example. + + +In Listing 9, the provider repeated the variable name (pattern) on its declaration, which generated a semantic error. Although this error can be easily detected and fixed, as the provider did\textsuperscript{20} in Listing 9, the provider took almost 1 year to fix it (front-matter@0.2.2). Meanwhile, front-matter received about 366 downloads in that period. + + + + +Renamed function: + The manifesting breaking changes in this category occur when functions are renamed. Our analysis revealed two cases in which the functions were renamed. The renaming case is our first motivating example (Section 2); we describe the second one below. + + + + +javascript +- RedisClient.prototype.send_command = function (command, args, callback) { +- var args_copy, arg, prefix_keys; ++ RedisClient.prototype.internal_send_command = function (command, args, callback) { ++ var arg, prefix_keys; + + +Listing 10. Renamed function code example. + + +\textsuperscript{15}https://github.com/socketio/socket.io/commit/b73d9be. +\textsuperscript{16}https://github.com/socketio/socket.io/issues/2368. +\textsuperscript{17}https://github.com/karma-runner/karma/commit/3ab78d6. +\textsuperscript{18}https://github.com/ember-cli/ember-cli-htmlbars-inline-precompile/pull/5/commits/b3faf95. +\textsuperscript{19}https://hacks.mozilla.org/2017/02/a-crash-course-in-just-in-time-jit-compilers/. +\textsuperscript{20}https://github.com/jxson/front-matter/commit/f16fc01. +Table 4. Manifesting Breaking Changes in Each Semantic Version Level + + +| Levels | (#) | (%) | +|------------|-----|------| +| Major | 3 | 4.7 | +| Minor | 28 | 43.75| +| Patch | 28 | 43.75| +| Pre-release| 5 | 7.8 | +| Total | 64 | 100 | + + +The provider +redis@2.6.0-1 + renamed a function, as in Listing 10.(^{21}) However, this function was used in a client package +fakeredis +,(^{22}) which broke with this change. Client package +fakeredis@1.0.3 + recovered from this error by downgrading to +redis@2.6.0-0 +.(^{23}) In the 5-day period within which the manifesting breaking change was not fixed, +fakeredis + received about 2.3k downloads from npm. + + + + +File not found: + In the cases in this category, the provider removes a file or adds it to the version control ignore list ( +.gitignore +) and the client tries to access it. In the unique case of this category in our sample, the provider referenced a file that was added to the ignore list. + + + + +Finding 4: + Manifesting breaking changes are often introduced in patch releases. As shown in Table 4, of the 64 cases of manifesting breaking changes we analyzed, 3 cases were introduced in major releases, 26 in minor releases, 28 in patch releases, and 5 in pre-releases. Although we only analyzed manifesting breaking changes from minor and patch releases, in three cases the manifesting breaking changes were introduced at major levels in an indirect provider, which transitively affected client packages—as in the +jsdom@16 + case (see Section 2). + + +Pre-releases precede a stable release and are considered unstable; anything may change until a stable version is released.(^{24}) In all detected breaking changes in pre-releases, the providers introduced unstable changes in pre-releases and propagated these changes to stable versions. An example is the pre-release +redis@2.6.0-1 + (described in Section 3.2.2), whose rename of a function propagated to the stable version and caused a failure in the client packages. + + +Finding 5: + Manifesting breaking change fixes/recoveries are introduced by both clients and/or providers. We searched to identify which package fixed/recovered from the manifesting breaking changes—client or provider—and at which level the fixed/recovered release was published, as depicted in Figure 4. + + +Figure 4 shows that client packages recover from nearly half of the manifesting breaking changes introduced in minor updates. In turn, 76.9% of the manifesting breaking changes that are introduced by providers in a minor release are fixed in a patch release. Providers fix the majority of the manifesting breaking changes introduced in patch releases (46.4% of the time), typically through a patch release (61.5%). + + +Finding 6: + 21.9% of the manifesting breaking changes are not documented. Although clients and providers often document the occurrence or repair of a manifesting breaking change in issue reports, pull requests, or changelogs, more than one-fifth of the manifesting breaking changes are undocumented. + + +(^{21})https://github.com/NodeRedis/node-redis/commit/861749f. + + +(^{22})https://github.com/NodeRedis/node-redis/issues/1030#issuecomment-205379483. + + +(^{23})https://github.com/hdachev/fakeredis/commit/01d1e99. + + +(^{24})https://semver.org/#spec-item-9. +Table 5 shows that client and provider packages documented manifesting breaking changes in 78.1% of all manifesting breaking changes. Out of all cases that have documentation, 70% have more than one type of documentation. For example, the provider received an issue report, fixed the manifesting breaking change, and documented it in a changelog. Documenting manifesting breaking changes and their fixes supports client recovery (Section 3.2.3). + + +Finding 7: + 57.8% of the manifesting breaking changes are introduced by an indirect provider. Indirect providers might also introduce manifesting breaking changes, which can then propagate to the client. Table 6 shows the depth level in the dependency tree of each provider that introduced a manifesting breaking change. About 42.2% of manifesting breaking changes are introduced by a direct provider in the client’s +package.json +. These providers are the ones the client directly installs and that perform function calls in their own code; they are in the first depth level of the dependency tree. + + +Manifesting breaking changes introduced by indirect providers in the depth level greater than 1 represent 57.8% of the cases. Six cases are in the third depth level and a single one is in the fourth depth level. Clients do not install these providers directly; rather, they come from the direct provider. In these cases, the manifesting breaking change may be totally unclear to client packages, since they are typically unaware of such providers (or have no direct control over their installation). +Table 7. Packages Fixing/Recovering from the Error + + +| Fixed by/Recovered from | (#) | (%) | +|-------------------------|-----|-----| +| Provider | 32 | 50 | +| Client | 13 | 20.3| +| Transitive provider | 12 | 18.8| +| Client + Transitive provider | 25 | 39.1| +| Not fixed/recovered | 7 | 10.9| +| Total | 64 | 100 | + + + + +The most frequent issues with provider packages that introduced manifesting breaking changes were feature changes, incompatible providers, and object type changes. + + +Provider packages introduced these manifesting breaking changes at similar rates in minor and patch releases. + + +Most of the fixed manifesting breaking changes by providers were fixed in patch releases. + + +Manifesting breaking changes are documented in 78.1% of the cases, mainly on issue reports. + + +Indirect providers introduced manifesting breaking changes in most cases. + + + + +4.3 RQ3. How Do Client Packages Recover from a Manifesting Breaking Change? + + +Finding 8: Clients and transitive providers recover from breaking changes in 39.1% of cases. In the dependency tree, the transitive provider is located between the provider that introduced the manifesting breaking change and the client where it manifested (see Section 2.1). Table 7 shows which package fixed/recovered from each manifesting breaking change case. The provider packages fixed the majority of the manifesting breaking changes. Since they introduced the breaking change, theoretically this was the expected behavior. Client packages recovered from the manifesting breaking change in 20.3% of cases, and transitive providers recovered from manifesting breaking changes in 18.8% of cases. When the provider who introduced a manifesting breaking change does not fix it, the transitive provider may fix it and solve the client’s issue. + + +Since transitive providers are also clients of the providers that introduced the manifesting breaking change, clients (clients and transitive providers) recovered from these breaking changes in 39.1% of cases. This observation suggests that client packages occasionally have to work on a patch when a manifesting breaking change is introduced since in 39.1% of the cases clients and transitive providers need to take actions to recover from the manifesting breaking change. + + +Finding 9: Transitive providers fix manifesting breaking changes faster than other packages: When a manifesting breaking change is introduced, it should be fixed by either the provider who introduced it or a transitive provider. In a few cases, the client package will also recover from it. Table 8 shows the time that each package takes to fix the breaking change. In general, manifesting breaking changes are fixed in 7 days by provider packages. Even in this relatively short period of time, many direct and indirect clients are affected. + + +Transitive providers fix manifesting breaking changes faster than clients and even providers. Since the manifesting breaking change only exists when it is raised in the client packages, transitive providers break first and need a quick fix; transitive providers usually spent 4 days to fix a break. Meanwhile, providers that introduced the manifesting breaking change take a median of 7 days to introduce a fix. In cases where the provider neglected to introduce a fix or took longer than the client, client packages took a comparably lengthy 134 days (mean 286; SD 429) to recover from a +manifesting breaking change. According to Table 7, the direct providers and transitive providers fixed most of the manifesting breaking changes, about 78.8%, because clients can be slow to recover. + + +However, because transitive providers are also clients, we can analyze the time that clients and transitive providers spend to fix/recover from a manifesting breaking change. Clients and transitive providers recovered from a manifesting breaking change in around 82 days. + + +Finding 10: + Upgrading is the most frequent way to recover from a manifesting breaking change. Table 9 describes how clients recovered from breaking changes. In 48 cases, the provider version was changed. In most cases (71.4%), client packages upgraded their providers’ version. We analyzed all cases where clients and transitive providers recovered from the manifesting breaking change by changing the provider’s version before the provider fixed the error. We observed an upgrade in 12 (52.2%) cases out of 23. Thus, in more than half of the cases where the client and transitive providers fixed/recovered from the manifesting breaking change, the provider package had newer versions, but the client was not using any follow-up releases from the provider packages. + + +The number of downgrades in a transitive provider may explain why they recover from the manifesting breaking change faster than the client packages. Since transitive providers are also providers, they should fix the manifesting breaking change as soon as possible, avoiding the propagation of the error caused by the manifesting breaking change. Consequently, the downgrade to a stable release of the provider is the most frequent way for transitive providers to recover from a manifesting breaking change. Finally, the provider is replaced or removed in a small proportion when a breaking change is raised—about 7.2% for both cases combined. + + +Finding 11: + To recover from manifesting breaking changes, clients often change the adopted provider version without changing the range of automatically accepted versions. When a breaking change manifests itself, clients often update the provider’s version. Figure 5 shows when the clients and transitive providers updated their providers’ versions. + + +We verified that transitive providers never set a steady version of their provider. When a breaking change manifests in transitive providers, they use a range in the provider’s version. However, a single transitive provider changed the range from a caret range to a steady one (e.g., ˆ1.2.1 → 1.2.1) to recover from the manifesting breaking change. Nevertheless, when the clients used a caret range and a breaking change manifested, in 38.5% of the cases they downgraded the provider to a steady version. +The majority of the manifesting breaking changes were introduced when the clients and transitive providers used the caret range ((^\ast)). It is the default range statement that npm inserts in the package.json when a provider is added as a dependency of a client package. In more than half of the cases, these clients changed the provider’s version to another caret range. The accept all ranges ((\geq), or (^*)) were less commonly used and less common when updating. + + +Clients and the transitive provider in 60.5% of cases retained the range type and updated it. The range type (all, caret, tilde, or steady) was kept, but the provider was updated/downgraded. For example, a client package specifies a provider p@(^\ast)1.2.0 and receives a breaking change in p@1.3.2. Whenever the provider fixes the code, the client package will update it to, for example, p@(^\ast)1.4.0 but will not change it for another range type, such as all, tilde, or steady range. + + + + +Client packages recovered manifesting breaking changes in 39.1% of cases, including clients and transitive providers. + + +Providers fixed manifesting breaking changes faster than client packages recovered from manifesting breaking changes by updating the provider, and clients preferred to update rather than downgrade their providers. + + +The provider’s range can be updated or downgraded after a breaking change, but in around 60% of cases, they did not change the range type. +---------------------------------------- +------------------------------- +Section 11: +5 DISCUSSION + + +This section discusses the implications of our findings for dependency management practices (Section 5.1) and the best practices that clients and providers can follow to mitigate the impact caused by manifesting breaking changes (Section 5.2). We also discuss the manifestation of breaking changes and the aspects of Semantic Versioning in the npm ecosystem (Section 5.3). + + +5.1 Dependency Management + + +When managing dependencies, client packages can use dependency bots in GitHub, such as Snyk and Dependabot, to receive automatic pull requests when there is a new provider’s release [27]. These bots continuously check for new versions and providers’ bugs/vulnerabilities fixes. They open pull requests in the client’s repository, updating the package.json, including changelogs and information about the provider’s new version. Mirhosseini and Parnin [16] show that packages using such bots update their dependencies 1.6x faster than through manual verification. +Additionally, tools such as JSFIX [20] can be helpful when upgrading provider releases, especially those that include manifesting breaking changes or major releases. The JSFIX tool was designed to adapt the client code to the new provider release, offering a safe way to upgrade providers. + + +We verified that a small percentage of the clients recovered from manifesting breaking changes by removing or replacing the provider (c.f., Finding 10), which may be difficult when several features or resources from the provider package are used by the client [2]. Instead, client packages tend to temporarily downgrade to a stable provider version. To ease the process to upgrade/downgrade providers and avoid surprises, clients should search in the provider changelogs for significant changes. As we verified in Finding 6, most manifesting breaking changes are documented in changelogs, issue reports, or pull requests. Dependency bots also could analyze the content of changelogs and issue reports to create red flags, like notifications, about documentation that cites a manifesting breaking change. + + +Finally, client packages may use a +package-lock.json + file to better manage dependencies. We observed in Finding 7 that indirect providers—the ones in depth 2 and 3 in the dependency tree—are responsible for 57.8% of the manifesting breaking changes that affect a client package. Using a +package-lock.json + file, client packages can stay aware of all of the providers’ versions of the latest successful build. When a provider is upgraded due to the range of versions and the new release manifests a breaking change on the client side, the client can still install all of the providers’ versions that successfully built the client. +---------------------------------------- +------------------------------- +Section 12: +5.2 Best Practices + + +Several issues found in our manual classification of manifesting breaking changes (Section 3.2.2) could be avoided through the use of static analysis tools. Errors classified as +Semantically Wrong Code + and +Rename function + are typically captured by such tools. Both client and provider developers can use such tools. For a dynamic language such as JavaScript, these tools can help avoid some issues [26]. Options for JavaScript include +jshint +, +jslint +, and +standard +. Tómasdóttir et al. [26] and Tómasdóttir et al. [25] show that developers use linters mainly to prevent errors, bugs, and mistakes. + + +Due to the dynamic nature of JavaScript, however, static analysis tools cannot verify inherited objects’ properties. They do not capture errors classified as +Change one rule +, +Object type change +, and +Undefined object +, as well as +Rename Function + in functions of objects’ properties. Thus, developers should be concerned about creating test cases that run their code along with the functionality of providers, as only then will they (client developers) find breaking changes that affect their own code. Many available frameworks, such as +mocha +, +chai +, and +ava +, support these tasks. These tests should also be executed on integrated environments every time the developer commits and pushes new changes. For this case, several tools are available, such as +Travis +, +Jenkins +, +Drone CI +, and +Codefresh +. Using linters and continuous integration systems, developers can catch most of these errors before releasing a new version. + + +Finally, a good practice for npm packages is to keep a changelog or to document breaking changes and their fixes in issue reports and pull requests. This practice should continue and be more widely adopted, since currently around a fifth of providers do not do it (c.f., Finding 6). This would also help the development of automated tools (e.g., bots) for dealing with breaking changes. Providers could create issue reports and pull request templates to allow clients to specify consistent descriptions of issues they found. +---------------------------------------- +------------------------------- +Section 13: +5.3 Breaking Changes Manifestation and Semantic Versioning + + +Breaking changes often occur in the npm ecosystem and impact client packages (c.f., Finding 1). Most of the manifesting cases come from indirect providers, that is, providers from the second level +or deeper in the dependency tree. Findings from Decan et al. [10] show that in 2016 half of the client packages in npm had at least 22 transitive dependencies (indirect providers), and a quarter had at least 95 transitive dependencies. In this context, clients may face challenges in diagnosing where the manifesting breaking changes came from, because when a manifesting breaking change is introduced by an indirect provider, the client may not know this provider. + + +Our results show that provider packages introduce manifesting breaking changes in minor and patch levels, which in principle should only contain backward-compatible updates according to the Semantic Versioning specification. Semantic Versioning is a recommendation that providers can choose to use or not [4, 8]. If providers do not comply with Semantic Versioning, several errors might be introduced, as we observed in Finding 4 that all manifesting breaking changes in pre-releases were propagated to stable releases (c.f., Finding 4). One hypothesis is that providers might be unaware of the correct use of the Semantic Versioning rules, which may explain why they propagated the unstable changes to stable releases. Finally, npm could provide badges where provider packages would be able to explicitly show that they are aware of and adhere to the Semantic Versioning. Trockman [24] claims that developers use visible signals (specifically on GitHub) like badges to indicate project quality. This way, clients could make a better choice about their providers and prefer those aware of Semantic Versioning. +---------------------------------------- +------------------------------- +Section 14: +6 RELATED WORK + + +This section describes related work regarding breaking changes in npm and other ecosystems. + + +Breaking changes in npm: + Bogart et al. [5] present a survey about the stability of dependencies in the npm and CRAN ecosystem. The authors interviewed seven package maintainers about software changes. In this paper, interviewees highlighted the importance of adhering to Semantic Versioning to avoid issues with dependency updates. More recently, the authors investigated policies and practices in 18 software ecosystems, finding that all ecosystems share values such as stability and compatibility but differ on other values [4]. Kraaijeveld [14] studied API breaking changes in three provider packages. The author uses 3k client packages, parsing the providers’ and clients’ files to detect API breaking changes and their impact on clients. This work identified that 9.8% to 25.8% of client releases are impacted by API breaking changes. + + +Mezzetti et al. [15] present a technique called type regression testing that verifies the type of a returned object from an API and compares it with the returned type in another provider release. The authors chose the 12 most popular provider packages and their major releases, applying the technique in all patch/minor releases belonging to the first major update. They verified type regression in 9.4% of the minor or patch releases. Our research focused on any kind of manifesting breaking changes and we analyzed both client and provider packages, with 13.9% of releases impacted by manifesting breaking changes. + + +Mujahid et al. [19] focus on detecting break-inducing versions of third-party dependencies. The authors analyzed 290k npm packages. They flagged each downgrade in the provider version as a possible breaking change. These provider versions were tested using client tests and the authors identified 4.1% of fails after an update, which resulted in a downgrade. Similar to these authors, we resolved each client’s providers for a release, but we ran the tests whenever at least one provider version changed. + + +Møller et al. [17] present a tool that uses breaking change patterns described by providers and fixes the client code. They analyzed a dataset with 10 of the most used npm packages and searched for breaking changes described in changelogs. We can compare our classification (Finding 3) with theirs. They found 153 cases of breaking changes that were introduced in major releases. They claim that most of the breaking changes (85%) are related to specific package API +points, such as modules, properties, and function changes. Considering our classification (Finding 3), feature changes, object type changed, undefined object, and renamed function can also be classified as changes in the package API and, if so, we claim that 64.06% of manifesting breaking changes are package API related. + + +Breaking changes in other ecosystems: + Brito et al. [6] studied 400 providers from the Maven repository for 116 days. The provider packages were chosen by popularity on GitHub and the authors looked for commits that introduced an API breaking change during that period. Developers were asked about the reasons for breaking changes that occurred. Our article presents similar results: the authors claim that New Feature is the most frequent way a breaking change is introduced, while we claim that Feature Change is the main breaking change type (Finding 3). Also, the authors similarly detected that breaking changes are frequently documented on changelogs (Finding 6). + + +Foo et al. [12] present a study about API breaking changes in the Maven, PyPI, and RubyGems ecosystems. The study focuses on detecting breaking changes by computing a diff between the code of two releases. They found API-breaking changes in 26% of provider packages, and their approach suggests automatic upgrades for 10% of the packages. Our approach goes beyond API breaking changes; we found that 11.7% of the client packages are impacted by manifesting breaking changes. +---------------------------------------- +------------------------------- +Section 15: +7 THREATS TO VALIDITY + + +Internal validity: + When a breaking change was detected, we verified the type of change that the provider package introduced and collectively grouped the changes into categories. However, some cases might fall into more than one category. For example, a provider package changes the type of an object to change/improve its behavior. This case might fall into Feature change and Object type changed. So, we categorized the case in the category that most represents the error. In this case, since the object is changed by a feature change, the most appropriate category would be Feature change. + + +The error cases that we categorized as breaking due to external change are the ones in which the clients or providers use—or depend on—external data/resources from sites and APIs that changed over time (see Finding 1). These cases represent about 8.1% of the client’s releases, and in these cases, we could not search for manifesting breaking changes because we could not execute the release tests. After all, the data/resource needed by the test were no longer available. So, about 8% of client releases might be impacted by breaking changes, but we could not analyze them. + + +Construct validity: + In our approach to detecting breaking changes, we only performed an analysis when the client tests failed. If a client used a provider version that had a breaking change but the client did not call the function that causes the breaking change or did not have tests to exercise that code, we could not detect the breaking change. This is why we call all of our cases manifesting breaking changes. + + +Therefore, we might not have detected all API breaking changes, as we were able to detect only API name changes and API removal. Parameter changes may not be detected because JavaScript allows making a call to an API with any number of parameters.(^{25}) + + +We restored the working tree index in the respective commit tagged by the developer for each release. We listed all tags in the repository, and we used the checkout with the respective tag. However, for untagged releases we performed a checkout in the timestamp referenced in the package.json. We trusted the timestamp once we verified that the tags and timestamp point to the same commit in 94% of cases for tagged repositories. + + +(^{25})https://eloquentJavaScript.net/03_functions.html#p_kzCivbonMM. +Lastly, we did not mention the file +npm-shrinkwrap.json + in our study. This file is intended to work like the file +package-lock.json + when controlling transitive dependency updates, but it may be published along with the package. However, +npm + strongly recommend avoiding its use. Also, the existence of +npm-shrinkwrap.json + files does not play any major role in our study, as they do not affect our results, based on our adopted research method. We did not include them in our study. + + +External validity: + We randomly selected client packages that varied in release numbers, clients, providers, and size. However, since we only analyzed +npm + packages hosted at GitHub projects, our findings cannot be directly generalized to other settings. It is also important to state that representativeness can also be limited because +npm + increases the number of packages and releases daily. Future work can replicate our study in other platforms and ecosystems. Finally, since the number of projects in our sample is small, we do not have enough statistical power to perform hypothesis tests around results that involve package-level comparisons. + + +Conclusion validity: + Conclusion validity relates to the inability to draw statistically significant conclusions due to the lack of a large enough data sample. However, as our research used a qualitative approach, we mitigate any potential conclusion threat by conducting a sanity check on repositories of all client packages with fewer than four releases. This guarantees that all packages are intended for use in production (Section 3.1.2). Finally, all of the manifesting breaking changes that we claim in our work were manually analyzed to ensure they are legitimate breaking changes that impact clients in the real world (Section 3.1.3). +---------------------------------------- +------------------------------- +Section 16: +8 CONCLUSIONS + + +Software reuse is a widely adopted practice, and package ecosystems such as +npm + support reusing software packages. However, breaking changes are a negative side effect of software reuse. Breaking changes and their impacts are studied in the literature in several software ecosystems [3, 6, 18, 28]. A few papers examine breaking changes in the +npm + ecosystem from the client packages perspective, i.e., executing the client tests to verify the impact of breaking changes [5, 15, 19]. In this work, we analyzed manifesting breaking changes in the +npm + ecosystem from the client and provider perspectives, providing an empirical analysis regarding breaking changes in minor and patch levels. + + +From the client’s perspective, we analyzed the impact of manifesting breaking changes. We found that 11.7% of clients are impacted by such changes and offer some advice to help clients and automated tool developers discover, avoid, and recover from manifesting breaking changes. Clients can use dependency bots to accelerate the process of upgrading their providers, and clients can look at changelog files for any non-desired updating, such as breaking changes. From the provider’s perspective, we analyzed the most frequent causes of manifesting breaking changes. We found that the most common causes were when providers changed some rules/behaviors on features that had been stable over the last releases, when an object type changed, and when there were unintentionally undefined objects at runtime. Maintainers should pay attention during code review phases regarding these issues. Future research can look into the correlation among package characteristics and metrics with breaking change occurrence. + + +REFERENCES + + +[1] 2018. This year in JavaScript: 2018 in review and npm’s predictions for 2019. (Dec 2018). https://blog.npmjs.org/post/180868064080/this-year-in-javascript-2018-in-review-and-npms.html. + + +[2] Hussein Alrubaye and Mohamed Wiem Mkaouer. 2018. Automating the detection of third-party java library migration at the function level. In Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering (CASCON’18). 60–71. +[3] Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to break an API: Cost negotiation and community values in three software ecosystems. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’16). 109–120. https://doi.org/10.1145/2950290.2950325 + + +[4] Chris Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2021. When and how to make breaking changes: Policies and practices in 18 open source software ecosystems. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 42 (July 2021), 56 pages. https://doi.org/10.1145/3447245 + + +[5] C. Bogart, C. Kästner, and J. Herbsleb. 2015. When it breaks, it breaks: How ecosystem developers reason about the stability of dependencies. In 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW’15). 86–89. https://doi.org/10.1109/ASEW.2015.21 + + +[6] A. Brito, L. Xavier, A. Hora, and M. T. Valente. 2018. Why and how Java developers break APIs. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). Campobasso, Mulise, Italy, 255–265. + + +[7] F. R. Cogo, G. A. Oliva, and A. E. Hassan. 2019. An empirical study of dependency downgrades in the npm ecosystem. IEEE Transactions on Software Engineering (Nov. 2019), 1–13. + + +[8] A. Decan and T. Mens. 2019. What do package dependencies tell us about semantic versioning? IEEE Transactions on Software Engineering (May 2019), 1226–1240. + + +[9] Alexandre Decan, Tom Mens, and Maelick Claes. 2016. On the topology of package dependency networks: A comparison of three programming language ecosystems. In Proceedings of the 10th European Conference on Software Architecture Workshops (ECSAW’16). Article 21, 4 pages. https://doi.org/10.1145/2993412.3003382 + + +[10] A. Decan, T. Mens, and M. Claes. 2017. An empirical comparison of dependency issues in OSS packaging ecosystems. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17). 2–12. + + +[11] Alexandre Decan, Tom Mens, and Philippe Grosjean. 2019. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineer 24, 1 (Feb. 2019), 381–416. https://doi.org/10.1007/s10664-017-9589-y + + +[12] Darius Foo, Hendy Chua, Jason Yeo, Ming Yi Ang, and Asankhaya Sharma. 2018. Efficient static checking of library updates. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 791–796. https://doi.org/10.1145/3236024.3275535 + + +[13] D. Garrison, Martha Cleveland-Innes, Marguerite Koole, and James Kappelman. 2006. Revisiting methodological issues in transcript analysis: Negotiated coding and reliability. Internet and Higher Education 9, 1 (2006), 1–8. + + +[14] Michel Kraaijeveld. 2017. Detecting Breaking Changes in JavaScript APIs. Master’s thesis. Dept. Soft. Tech., Delft University of Technology, Delft, Netherlands. http://resolver.tudelft.nl/uuid:56e646dc-d5c7-482b-8326-90e0de4ea419. + + +[15] Gianluca Mezzetti, Anders Møller, and Martin Toldam Torp. 2018. Type regression testing to detect breaking changes in Node.js libraries. In Proceedings of the 32nd European Conference on Object-Oriented Programming (ECOOP’18) (Leibniz International Proceedings in Informatics (LIPIcs)). 7:1–7:24. + + +[16] S. Mirhosseini and C. Parnin. 2017. Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 84–94. + + +[17] Anders Møller, Benjamin Barslev Nielsen, and Martin Toldam Torp. 2020. Detecting locations in JavaScript programs affected by breaking library changes. Proc. ACM Program. Lang. 4, OOPSLA, Article 187 (Nov. 2020), 25 pages. https://doi.org/10.1145/3428255 + + +[18] Anders Møller and Martin Torp. 2019. Model-based testing of breaking changes in Node.js libraries. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 409–419. https://doi.org/10.1145/3338906.3338940 + + +[19] Suhail Mujahid, Rabe Abdalkareem, Emad Shihab, and Shane McIntosh. 2020. Using others’ tests to identify breaking updates. In International Conference on Mining Software Repositories. https://doi.org/10.1145/3379597.3387476 + + +[20] Benjamin Barslev Nielsen, Martin Toldam Torp, and Anders Møller. 2021. Semantic patches for adaptation of JavaScript programs to evolving libraries. In Proc. 43rd International Conference on Software Engineering (ICSE’21). + + +[21] S. Raemaekers, A. van Deursen, and J. Visser. 2014. Semantic versioning versus breaking changes: A study of the maven repository. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. 215–224. https://doi.org/10.1109/SCAM.2014.30 + + +[22] Anselm Strauss and Juliet Corbin. 1998. Basics of Qualitative Research Techniques. Thousand Oaks, CA: Sage Publications. + + +[23] Jacob Stringer, Amjed Tahir, Kelly Blincoe, and Jens Dietrich. 2020. Technical lag of dependencies in major package managers. In Proceedings of the 27th Asia-Pacific Software Engineering Conference (APSEC’20). 228–237. https://doi.org/10.1109/APSEC51365.2020.00031 + + +[24] Asher Trockman. 2018. Adding sparkle to social coding: An empirical study of repository badges in the npm ecosystem. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion’18). 524–526. +[25] K. F. Tómasdóttir, Maurício Aniche, and Arie Deursen. 2018. The adoption of JavaScript linters in practice: A case study on ESLint. +IEEE Transactions on Software Engineering + PP (Sept. 2018), 26. https://doi.org/10.1109/TSE.2018.2871058 + + +[26] K. F. Tómasdóttir, M. Aniche, and A. van Deursen. 2017. +Why and How JavaScript Developers Use Linters +. Master’s thesis. Dept. Soft. Tech., Delft University of Technology, Delft, Netherlands. + + +[27] Mairieli Wessel, Bruno Mendes De Souza, Igor Steinmacher, Igor S. Wiese, Ivanilton Polato, Ana Paula Chaves, and Marco A. Gerosa. 2018. The power of bots: Characterizing and understanding bots in OSS projects. +Proceedings of the ACM on Human-Computer Interaction + 2, CSCW (2018), 1–19. + + +[28] Jooyong Yi, Dawei Qi, Shin Hwei Tan, and Abhik Roychoudhury. 2013. Expressing and checking intended changes via software change contracts. In +Proceedings of the 2013 International Symposium on Software Testing and Analysis + (ISSTA’13). 1–11. https://doi.org/10.1145/2483760.2483772 + + +[29] Ahmed Zerouali, Eleni Constantinou, Tom Mens, Gregorio Robles, and Jesus Gonzalez-Barahona. 2018. An empirical analysis of technical lag in npm package dependencies. https://doi.org/10.1007/978-3-319-90421-4_6 + + +Received 19 November 2021; revised 27 October 2022; accepted 8 November 2022 +---------------------------------------- +------------------------------- +Section 17: +Core-Periphery Communication and the Success of Free/Libre Open Source Software Projects + + +Kevin Crowston\textsuperscript{1(✉)} and Ivan Shamshurin\textsuperscript{2} + + +\textsuperscript{1} Syracuse University School of Information Studies, 348 Hinds Hall, Syracuse, NY 13244–4100, USA +crowston@syr.edu + + +\textsuperscript{2} Syracuse University School of Information Studies, 337 Hinds Hall, Syracuse, NY 13244–4100, USA +ishamshu@syr.edu + + +Abstract. We examine the relationship between communications by core and peripheral members and Free/Libre Open Source Software project success. The study uses data from 74 projects in the Apache Software Foundation Incubator. We conceptualize project success in terms of success building a community, as assessed by graduation from the Incubator. We compare successful and unsuccessful projects on volume of communication by core (committer) and peripheral community members and on use of inclusive pronouns as an indication of efforts to create intimacy among team members. An innovation of the paper is that use of inclusive pronouns is measured using natural language processing techniques. We find that core and peripheral members differ in their volume of contribution and in their use of inclusive pronouns, and that volume of communication is related to project success. +---------------------------------------- +------------------------------- +Section 18: +1 Introduction + + +Community-based Free/Libre Open Source Software (FLOSS) projects are developed and maintained by teams of individuals collaborating in globally-distributed environments [8]. The health of the developer community is critical for the performance of projects [7], but it is challenging to sustain a project with voluntary members over the long term [4, 11]. Social-relational issues have been seen as a key component of achieving design effectiveness [3] and enhancing online group involvement and collaboration [15]. In this paper, we explore how community interactions are related to community health and so project success. + + +Specifically, we examine contributions made by members in different roles. Members have different levels of participation in FLOSS development and so taken on different roles [5]. A widely accepted models of roles in community-based FLOSS teams is the core-periphery structure [1, 3, 12]. For example, Crowston and Howison [7] see community-based FLOSS teams as having an onion-like core-periphery structure, in which the core category includes core developers and the periphery includes co-developers and active users. Rullani and Haeffiger [17] described periphery as a “cloud” of members that orbits around the core members of open source software development teams. +Generally speaking, access to core roles is based on technical skills demonstrated through the development tasks that the developer performs [13]. Core developers usually contribute most of the code and oversee the design and evolution of the project, which requires a high level of technical skills [7]. Peripheral members, on the other hand, submit patches such as bug fixes (co-developers), which provides an opportunity to demonstrate skills and interest, or just provide use cases and bug reports or test new releases without contributing codes directly (active users), which requires less technical skill [7]. + + +Despite the difference in contributions, both core and peripheral members are important to the success of the project. It is evident that, by making direct contributions to the software developed, core members are vital to project development. On the other hand, even though they contribute only sporadically, peripheral members provide bug reports, suggestions and critical expertise that are fundamental for innovation [17]. In addition, the periphery is the source of new core members [10, 20], so maintaining a strong periphery is important to the long-term success of a project. Amrit and van Hillegersberg [1] examined core-periphery movement in open source projects and concluded that a steady movement toward the core is beneficial to a project, while a shift away from the core is not. But how communication among core and periphery predicts project success has yet to be investigated systematically, a gap that this paper addresses. +---------------------------------------- +------------------------------- +Section 19: +2 Theory and Hypotheses + + +To develop hypotheses for our study, we discuss in turn the dependent and independent variables in our study. + + +The dependent variable for our study is project success. Project success for FLOSS projects can be measured in many different ways, ranging from code quality to member satisfaction to market share [6]. For the community-based FLOSS projects we examine, success in building a developer community is a critical issue, so we chose building a developer community as our measure of success. + + +To identify independent variables that predict success (i.e., success in building a developer community), we examine communication among community members. A starting hypothesis is that more communication is predictive of project success: + + +H1: Successful projects will have a higher volume of communication than unsuccessful projects. + + +More specifically, we are interested in how members in different roles contribute to projects. As noted above, projects rely on contributions from both core and peripheral members. We can therefore extend H1 to consider roles. Specifically, we hypothesize that: + + +H2a: Successful projects will have a higher volume of communication by core members than unsuccessful projects. + + +H2b: Successful projects will have a higher volume of communication by peripheral members than unsuccessful projects. +Prior research on the core-periphery structure in FLOSS development has found inequality in participation between core and peripheral members. For example, Luthiger Stoll [14] found that core members make greater time commitment than peripheral members: core participants spend an average of 12 h per week, with project leaders averaging 14 h, and bug-fixers and otherwise active users, around 5 h per week. Similarly, using social network analysis, Toral et al. [19] found that a few core members post the majority of messages and act as middlemen or brokers among other peripheral members. We therefore hypothesize that: + + +H3: Core members will contribute more communication than will peripheral members. + + +Prior research on the distinction between core-periphery has mostly focused on coding-related behaviour, as project roles are defined by the coding activities performed [3]. However, developers do more than just coding [3]. Both core and peripheral members need to engage in social-relational behaviour in addition to task-oriented behaviour such as coding. Consideration of these non-task activities is important because effective interpersonal communication plays a vital role in the development of online social interaction [16]. + + +Scialdone et al. [18] and Wei et al. [21] analyzed group maintenance behaviours used by members to build and maintain reciprocal trust and cooperation in their everyday interaction messages, e.g., through emotional expressions and politeness strategies. In this paper, we examine one factor they identified, investigating how core and peripheral members use language to create “intimacy among team members” thus “building solidarity in teams”. Specifically, Scialdone et al. [18] found that core members of two teams used more inclusive pronouns (i.e., pronouns referring to the team) than did peripheral members. They interpreted this finding as meaning that “peripheral members in general do not feel as comfortable expressing a sense of belonging within their groups”. We therefore hypothesize that: + + +H4: Core members will use more inclusive pronouns in their communication than will peripheral members. + + +Scialdone et al. [18] further noted that one team they studied that had ceased production had exhibited a greater gap between core and periphery in usage of inclusive pronouns. Such a situation could indicate that the peripheral members of the group do not feel ownership of the project, with negative implications for their future as potential core members. Scialdone et al. [18] noted that such use of inclusive pronouns is “consistent with Bagozzi and Dholakia [2]’s argument about the importance of we-intention in Linux user groups, i.e., when individuals think themselves as ‘us’ or ‘we’ and so attempt to act in a joint way”. A similar argument can be made for the importance of core member use of inclusive pronouns. We therefore hypothesize that: + + +H5a: Successful projects will have a higher usage of inclusive pronouns by core members than unsuccessful projects. + + +H5b: Successful projects will have a higher usage of inclusive pronouns by peripheral members than unsuccessful projects. +3 Methods + + +3.1 Setting + + +Scialdone et al. [18] and Wei et al. [21] studied only a few projects and noted problem making comparison across projects that can be quite diverse. To address this concern, in this paper we studied a larger number of projects (74 in total) that all operated within a common framework at a similar stage of development. Specifically, we studied projects in the Apache Software Foundation (ASF) Incubator. The ASF is an umbrella organization including more than 60 free/libre open source software (FLOSS) development projects. The ASF’s apparent success in managing FLOSS projects has made it a frequently mentioned model for these efforts, though often without a deep understanding of the factors behind that success. + + +The ASF Incubator’s purpose is to mentor new projects to the point where they are able to successfully join the ASF. Projects are invited to join the Incubator based on an application and support from a sponsor (a member of the ASF). Accepted projects (known as Podlings) receive support from one or more mentors, who help guide the Podlings through the steps necessary to become a full-fledged ASF project. + + +The incubation process has several goals, including fulfillment of legal and infrastructural requirements and development of relationships with other ASF projects, but the main goal is to develop effective software development communities, which Podlings must demonstrate in order to graduate from the Incubator. The Apache Incubator specifically promotes diverse participation in development projects to improve the long-term viability of the project community and ensure requisite diversity of intellectual resources. The time projects spend in incubation varies widely, from as little as two months to nearly five years, indicating significant diversity in the efforts required for Podlings to become viable projects. The primary reason that projects are retired from the Incubator (rather than graduated) is a lack of community development that stalls progress. + + +3.2 Data Collection and Processing + + +In FLOSS settings, collaborative work primarily takes place by means of asynchronous computer-mediated communication such as email lists and discussion fora [5]. ASF community norms strongly support transparency and broad participation, which is accomplished via electronic communications, such that even collocated participants are expected to document conversations in the online record, i.e., the email discussion lists. We therefore drew our data from messages on the developers’ mailing list for each project. + + +A Perl script was used to collect messages in html format from the site http://markmail.org. We discarded any messages sent after the Podling either graduated or retired from the ASF Incubator, as many of the projects apparently used the same email list even after graduation. After the dataset was collected, relevant data was extracted from the html files representing each message thread and other sources. +3.2.1 Dependent Variable: Success + + +The dependent variable, project success in building a community, was determined by whether the project had graduated (success) or been retired (not success) based on the list of projects maintained by the Apache Incubator and available on the Apache website. The dataset includes email messages for 24 retired and 50 graduated Podlings. The data set also included messages for some projects still in incubation and some with unknown status; these were not used for further analysis. + + +As a check on this measure of successful community development, we examined the number of developers active in the community (a more successful community has more developers). We considered as active members of the projects those who sent an email to the developer mailing list during incubation. + + +3.2.2 Core Vs. Periphery + + +Crowston et al. [9] suggested three methods to identify core and peripheral members in FLOSS teams: relying on project-reported formal roles, analysis of distribution of contributions based on Bradford’s Law of Scatter, and core-and-periphery analysis of project social network. Their analysis showed that relying on project-reported roles was the most accurate. Therefore, in this study, we identified a message sender as a core member if the sender’s name was on the list of project committers on the project website. If we did not find a match, then the sender was labeled as non-committer (peripheral member). We developed a matching algorithm to take into account the variety of ways that names appear in email message. + + +3.2.3 Inclusive Pronouns + + +As noted above, we examined the use of inclusive pronouns as one way that team members build a sense of belong to the group. Inclusive pronouns were defined as: + + +reference to the team using an inclusive pronoun. If we see “we” or “us” or “our”, and it refers to the group, then it is Inclusive Reference. Not if “we” or “us” or “our” refer to another group that the speaker is a member of. + + +That is, the sentences were judged on two criteria: (1) whether there are language cues for inclusive reference (a pronoun), as specified in the definition above and (2) if these cues refer to the current group rather than another group. To judge the second criteria may require reviewing the sentence in the context of the whole conversation. This usage is only one of the many indicators studied by Scialdone et al. [18] and Wei et al. [21], but it is interesting and tractable for analysis. + + +To handle the large volume of messages drawn from many projects, we applied NLP techniques as suggested (but not implemented) by previous research. Specifically, we used a machine-learning (ML) approach, where an algorithm learns to classify sentences from a corpus of already coded data. Sentences were chosen as the unit of coding instead of the thematic units more typically used in human coding, because sentences can be more easily identified for machine learning. Training data was obtained from the SOCQA (Socio-computational Qualitative Analysis) project at the Syracuse University (http://socqa.org/) [22, 23]. The training data consists of 10,841 +sentences drawn from two Apache projects, SpamAssassin and Avalon. Trained annotators manually coded each sentence as to whether it included an inclusive pronoun (per the above definition) or not. The distribution of the classes in the training data is shown in Table 1 (“yes” means the sentence has an inclusive pronoun). Note that the sample is unbalanced. + + +| | # | % | +|-------|-----|-----| +| “yes” | 1395| 12.9| +| “no” | 9446| 87.1| +| Total | 10841| | + + +As features for the ML, we used bag of words, experimenting with unigrams, bigrams and trigrams. Naïve Bayes (MNB), k Nearest Neighbors (KNN) and Support Vector Machines (SVM) algorithms (Python LibSVM implementation) were trained and applied to predict the class of the sentences, i.e., whether a sentence has inclusive pronoun or not. We expected that the NLP would have no problem handling the first part of the definition, but that the second (whether the pronoun refers to the project or some other group) would pose challenges. + + +10-fold cross-validation was used to evaluate the classifier’s performance on the training data. Results are shown in Table 2. The results show that though all three approaches gave reasonable performance, SVM outperformed other methods. The Linear SVM model was therefore selected for further use. We experimented with tuning SVM parameters such as minimal term frequency, etc. but did not find settings that affected the accuracy, so we used the default settings. + + +| | Unigram | Bigram | Trigram | +|-------|---------|--------|---------| +| MNB | 0.86 | 0.81 | 0.75 | +| KNN | 0.89 | 0.89 | 0.88 | +| SVM (LinearSVC) | 0.97 | 0.97 | 0.97 | + + +The random guess baseline for a binary classification task would give an accuracy of 0.5; a majority vote rule baseline (classify all examples to the majority class) provides an accuracy of 0.87. The trained SVM model significantly outperforms both. To further evaluate model performance, it was applied to new data and the results checked by a trained annotator (one of the annotators of the training data set). Specifically, we used the model to code 200 sentences (10 sentences randomly selected from 5 projects each in the “graduated”, “in incubator”, “retired” and “unknown” classes of projects). The annotator coded the same sentences and we compared the results. The Cohen kappa (agreement corrected for chance agreement) for the human vs. machine coding was 88.6 %, which is higher than the frequently applied threshold of 80 % agreement. In other words, the ML model performed at least as well as a second human coder would be expected to do. +Examining the results, somewhat surprisingly, we found no cases where a predicted “inclusive reference” refers to another group, suggesting that the ML had managed to learn the second criterion. Two sentences that the model misclassified are illustrative of limitations of the approach: + + +It looks like it requires work with “our @patterns” in lib/path.pm I looked at the path.pm for www.apache.org and it is a clue. + + +The actual class is “no” but the classifier marked it as “yes” because the inclusive pronoun “our” was included in the sentence, though in quotes. + + +Could also clarify download URLs for third-party dependencies we can’t ship. + + +The actual class is “yes” but the model marked the sentence as “no” due to the error in spelling (no space after “we”). The human annotator ignored the error, but there were not enough examples of such errors for the ML to learn to do so. Despite such limitations, the benefit of being able to handle large volumes of email more than makes up for the possible slight loss in reliability of coding, especially considering that human coders are also not perfectly reliable. +---------------------------------------- +------------------------------- +Section 20: +4 Findings + + +In this section we discuss in turn the findings from our study, first validating the measure of success, then examining support for each hypothesis. + + +4.1 Membership + + +As a check on our measure of success (graduation from the Incubator), we compared the number of developers in graduated and retired projects (active developers were those who had participated on the mailing list). The results are shown in Table 3. As the table shows, graduated projects had more than twice as many developers active on the mailing list as did retired projects. The differences are so large than a statistical test of significance seems superfluous (for doubters, a Kruskal-Wallis test, chosen because the data are not normally distributed, shows a statistically significant difference in the number of developers between graduated and retired projects, p = 0.001). This result provides evidence for the validity of graduation as a measure of project community health. + + +| Project status | Core | Peripheral | +|----------------|------------|------------| +| Graduated | 31.6 (19.4)| 82.2 (102.4)| +| Retired | 13.9 (9.3) | 25.4 (18.3) | + + +N = 74. Standard deviations in parentheses. +Hypothesis 1 was that successful projects would have more communication. As shown in Table 4, this hypothesis is strongly supported, as graduated projects have many times more messages sent than retired projects during the incubation process ($p = 0.0001$). + + +Table 4. + Mean number of project messages by project status and developer role + + +| | Core | Peripheral | +|----------|------------|------------| +| Graduated| 8265 (8878)| 7306 (8908)| +| Retired | 1791 (1805)| 1652 (2058)| + + +$N = 74$. Standard deviations in parentheses. + + +Hypotheses 2a and 2b were that core and peripheral members respectively would communicate more in successful projects than in unsuccessful projects. The differences in Tables 4 and 5 show that these hypotheses are supported ($p = 0.0001$ for core and $p = 0.0001$ for peripheral members for overall message count in graduated vs. retired projects, and $p = 0.0011$ and $p = 0.0399$ for messages per developer). + + +Table 5. + Mean number of messages sent per developer by project status and developer role + + +| | Core | Peripheral | +|----------|------------|------------| +| Graduated| 239 (191) | 109 (119) | +| Retired | 107 (200) | 47 (92) | + + +$N = 74$. Standard deviations in parentheses. + + +Hypothesis 3 was that core members would communicate more than peripheral members. From Table 4, we can see that in fact in total core and peripheral members send about the same volume of messages in both graduated and retired projects. However, there are fewer core members, so on average, each sends many more messages on average, as shown in Table 5 ($p = 0.0001$). + + +Table 6. + Mean number of messages including an inclusive pronoun sent per developer by project status and developer role + + +| | Core | Periphery | +|----------|------------|-----------| +| Graduated| 22 (18) | 6 (5) | +| Retired | 12 (8) | 4 (5) | + + +$N = 74$. Standard deviations in parentheses. +Hypothesis 4 was that core members would use more inclusive pronouns than peripheral members. Table 6 shows the number of messages sent by developers that included an inclusive pronoun. The table shows that core developers do send more messages with inclusive pronouns in both graduated and retired projects (p = 0.0001). + + +Table 7. Mean percentage of messages that include an inclusive pronoun per developer by project status and developer role + + +| | Core | Periphery | +|----------|----------|-----------| +| Graduated| 7.6 (3.4)| 5.5 (2.2) | +| Retired | 9.3 (5. )| 5.3 (3.2) | + + +N = 74. Standard deviations in parentheses. + + +To control for the fact that core developers send more messages in general, we computed the percentage of messages that include an inclusive pronoun, as shown in Table 7. From this table, we can see that the mean percentage of messages sent by core developers that include an inclusive pronoun is higher than for peripheral members (p = 0.001). + + +Hypotheses 5a and b were that there would be more use of inclusive pronouns by core and peripheral members respectively in successful projects. From Table 6, this hypothesis seems supported for core members at least, but note that successful projects have more communication overall. Examining Table 7 suggests that there is in fact slightly more proportional use of inclusive pronouns by core members in unsuccessful projects, but no difference in use by peripheral members. However, neither difference is significant using a KW test, meaning that Hypothesis 5 is not supported. + + +Finally, to assess which of the factors we examined are most predictive of projects success, we applied a stepwise logistic regression, predicting graduation from the various measures of communication developed (e.g., total number of message by developer role, mean number, percentage of message with inclusive pronouns). Our first regression identified only one factor as predictive, the number of core members. This result can be expected, as we argued above that the number of core members can also be viewed as a measure of community health. A regression without counts of members identified the total number and the mean number of messages sent by core members as predictive, with mean having a negative coefficient. (The $R^2$ for the regression was 33 %.) This combination of factors does not provide much insight as it is essentially a proxy for developer count: greatest when there are a lot of messages but not many messages per developer, i.e., when there are more developers. +---------------------------------------- +------------------------------- +Section 21: +5 Discussion + + +In general, our data suggest that successful projects (i.e., those that successfully built a community and graduated from incubation) have more members and a correspondingly large volume of communication, suggesting an active community. As expected, core +members contribute more, but overall, the message volume seems almost evenly split between core and peripheral members, suggesting that both roles play an important part in projects. These results demonstrate the importance of interaction between and the shared responsibilities of core and peripheral members. + + +As expected, core members do display somewhat greater ownership of the project, as expressed in the use of inclusive pronouns, but counter to our expectations, the use of inclusive pronouns did not distinguish successful and unsuccessful projects. A possible explanation for this result is a limitation in our data processing: we determined developer status (core or periphery) based on committer lists from the project website collected at the time of analysis. This process does not take into account the movement of developers from periphery to core (or less frequently, from core to periphery). It could be that in successful projects, active peripheral members (i.e., those using more inclusive pronouns) are invited to join the core, thus suppressing the average for peripheral members. +---------------------------------------- +------------------------------- +Section 22: +6 Conclusions + + +The work presented here can be extended in many ways in future work. First, as noted, developers may change status during the project. The results would be more accurate if they took into account the history of when developers became committers to correctly assign their status over time. Obtaining such historical data is challenging but not impossible. Second, the ML NLP might be improved with a richer feature set [24], though as noted, the performance was already as good as would be expected from an additional human coder. Third, it would be interesting to examine the first few months of a project for early signs that are predictive of its eventual outcome. Fourth, it might similarly be possible to predict which peripheral members will become core members from their individual actions. Fifth, we can consider the effects of additional group maintenance behaviours from Wei et al. [21]. The Syracuse SOCQA project has had some success applying ML NLP techniques to these codes, suggesting that this analysis is feasible. Sixth, it is necessary to consider limits to the hypothesized impacts. For example, we hypothesized that more communication reflects a more developed community, but it could be that too much communication creates information overload and so has a negative impact. Finally, in this paper we have considered only communication behaviours. A more complete model of project success would take into account measure of development activities such as code commits or project topic, data for which are available online. + + +Despite its limitations, our research offers several advances over prior work. First, it examines a much large sample of projects. Second, it uses a more objective measure of project success, namely graduation from the ASF Incubator, as a measure of community development. Finally, it shows the viability of the application of NLP and ML techniques to processing large volumes of email messages, incorporating analysis of the content of messages, not just counts or network structure. +Acknowledgements. We thank the SOCQA Project (Nancy McCracken PI) for access to the coded sentences for training and Feifei Zhang for checking the coding results. SOCQA was partially supported by a grant from the US National Science Foundation Socio-computational Systems (SOCS) program, award 11–11107. + + +References + + + + +Amrit, C., van Hillegersberg, J.: Exploring the impact of socio-technical core-periphery structures in open source software development. J. Inf. Technol. 25(2), 216–229 (2010) + + +Bagozzi, R.P., Dholakia, U.M.: Open source software user communities: a study of participation in Linux user groups. Manage. Sci. 52(7), 1099–1115 (2006) + + +Barcellini, F., Détienne, F., Burkhardt, J.-M.: A situated approach of roles and participation in open source software communities. Hum.-Comput. Interact. 29(3), 205–255 (2014) + + +Bonaccorsi, A., Rossi, C.: Why F/OSS can succeed. Res. Policy 32, 1243–1258 (2003) + + +Crowston, K., Wei, K., Howison, J., Wiggins, A.: Free/Libre open source software development: what we know and what we do not know. ACM Comput. Surv. 44(2), Article 7 (2012) + + +Crowston, K., Howison, J., Annabi, H.: Information systems success in free and open source software development: theory and measures. Softw. Process Improv. Pract. 11(2), 123–148 (2006) + + +Crowston, K., Howison, J.: Assessing the health of open source communities. IEEE Comput. 39(5), 89–91 (2006) + + +Crowston, K., Li, Q., Wei, K., Eseryel, U.Y., Howison, J.: Self-organization of teams for Free/Libre open source software development. Inf. Softw. Technol. 49(6), 564–575 (2007) + + +Crowston, K., Wei, K., Li, Q., Howison, J.: Core and periphery in Free/Libre and open source software team communications. In: Proceedings of the Hawai‘i International Conference on System System (HICSS-39) (2006) + + +Dahlander, L., O’Mahony, S.: Progressing to the center: coordinating project work. Organ. Sci. 22(4), 961–979 (2011) + + +Fang, Y., Neufeld, D.: Understanding sustained participation in open source software projects. J. Manage. Inf. Syst. 25(4), 9–50 (2009) + + +Jensen, C., Scacchi, W.: Role migration and advancement processes in OSSD projects: a comparative case study. In: Proceedings of the 29th International Conference on Software Engineering (ICSE), pp. 364–374 (2007) + + +Jergensen, C., Sarma, A., Wagstrom, P.: The onion patch: migration in open source ecosystems. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, pp. 70–80 (2011) + + +Luthiger Stoll, B.: Fun and software development. In: Proceedings of the First International Conference on Open Source Systems, Genova, Italy, 11–15 July 2005 + + +Park, J.R.: Interpersonal and affective communication in synchronous online discourse. Libr. Q. 77(2), 133–155 (2007) + + +Park, J.-R.: Linguistic politeness and face-work in computer mediated communication, part 2: an application of the theoretical framework. J. Am. Soc. Inf. Sci. Technol. 59(14), 2199–2209 (2008) + + +Rullani, F., Haefliger, S.: The periphery on stage: the intra-organizational dynamics in online communities of creation. Res. Policy 42(4), 941–953 (2013) + + + + +Scialdone, M.J., Heckman, R., Crowston, K.: Group maintenance behaviours of core and peripheral members of Free/Libre open source software teams. In: Proceedings of the IFIP WG 2.13 Working Conference on Open Source Systems, Skövde, Sweden, 3–6 June 2009 + + + + + + +Toral, S.L., Martínez-Torres, M.R., Barrero, Federico: Analysis of virtual communities supporting OSS projects using social network analysis. Inf. Softw. Technol. 52(3), 296–303 (2010) + + + + + + +von Krogh, G., Spaeth, S., Lakhani, K.R.: Community, joining, and specialization in open source software innovation: a case study. Res. Policy 32(7), 1217–1241 (2003) + + + + + + +Wei, K., Crowston, K., Li, N.L., Heckman, R.: Understanding group maintenance behaviour in Free/Libre open-source software projects: the case of fire and gaim. Inf. Manage. 51(3), 297–309 (2014) + + + + + + +Yan, J.L.S., McCracken, N., Crowston, K.: Design of an active learning system with human correction for content analysis. Paper Presented at the Workshop on Interactive Language Learning, Visualization, and Interfaces, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, June 2014. http://nlp.stanford.edu/events/illvi2014/papers/mccracken-illvi2014.pdf + + + + + + +Yan, J.L.S., McCracken, N., Crowston, K.: Semi-automatic content analysis of qualitative data. In: Proceedings of the iConference, Berlin, Germany, 4–7 Mar 2014 + + + + + + +Yan, J.L.S., McCracken, N., Zhou, S., Crowston, K.: Optimizing features in active machine learning for complex qualitative content analysis. Paper Presented at the Workshop on Language Technologies and Computational Social Science, 52nd Annual Meeting of the Association for Computational Linguistics Baltimore, MD, June 2014 +---------------------------------------- +------------------------------- +Section 23: +The impacts of lockdown on open source software contributions during the COVID-19 pandemic + + +Jin Hu\textsuperscript{a, b}, Daning Hu\textsuperscript{b, *}, Xuan Yang\textsuperscript{c}, Michael Chau\textsuperscript{a} + + +\textsuperscript{a} Faculty of Business and Economics, The University of Hong Kong, Hong Kong +\textsuperscript{b} Business School, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China +\textsuperscript{c} Department of Informatics, University of Zurich, 8006 Zurich, Switzerland + + +\textbf{ARTICLE INFO} + + +\textbf{Keywords:} +COVID-19 +Lockdown +Work productivity +Open source software +Face-to-face interactions + + +\textbf{ABSTRACT} + + +The COVID-19 pandemic instigated widespread lockdowns, compelling millions to transition to work-from-home (WFH) arrangements and rely heavily on computer-mediated communications (CMC) for collaboration. This study examines the impacts of lockdown on innovation-driven work productivity, focusing on contributions to open source software (OSS) projects on GitHub, the world's largest OSS platform. By leveraging two lockdowns in China as natural experiments, we discover that developers in the 2021 Xi'an lockdown increased OSS contributions by 9.0\%, while those in the 2020 Wuhan lockdown reduced their contributions by 10.5\%. A subsequent survey study elucidates this divergence, uncovering an adaptation effect wherein Xi'an developers became more accustomed to the new norm of WFH over time, capitalizing on the flexibility and opportunities of remote work. Moreover, our findings across both lockdowns reveal that the lack of face-to-face (F2F) interactions significantly impeded OSS contributions, whereas the increased available time at home positively influenced them. This finding is especially noteworthy as it challenges the assumption that CMC can effortlessly substitute for F2F interactions without negatively affecting productivity. We further examine the impacts of stay-at-home orders in the United States (US) on OSS contributions and find no significant effects. Collectively, our research offers valuable insights into the multifaceted impacts of lockdown on productivity, shedding light on how individuals adapt to remote work norms during protracted disruptions like a pandemic. These insights provide various stakeholders, including individuals, organizations, and policymakers, with vital knowledge to prepare for future disruptions, foster sustainable resilience, and adeptly navigate the evolving landscape of remote work in a post-pandemic world. + + +\textbf{1. Introduction} + + +The COVID-19 pandemic has catalyzed a global transition to work-from-home (WFH) arrangements, as nations implemented lockdown measures to limit human mobility and curb the spread of the virus (Fang et al., 2020; Sheridan et al., 2020; Wang, 2022). This unprecedented shift to remote work, facilitated by a myriad of computer-mediated communications (CMC) technologies, has instigated profound and lasting impacts on work productivity, an area that has garnered significant attention in recent scholarly investigations (Barber et al., 2021; Cui et al., 2022). Understanding such impacts on work productivity is crucial for guiding policy and decision-making at multiple levels. It can help reshape individual approaches to work-life balance, redefine organizational strategies on WFH arrangements, and inform governmental policies or legislation aimed at supporting remote work. Moreover, the significant disruptions brought by the pandemic highlight the imperative for adaptability and resilience at all these levels. Studying the effects of lockdown on work productivity can provide valuable insights, enabling stakeholders to better navigate future upheavals and cultivate enduring resilience. However, the impact of lockdown on work productivity, especially within innovation-driven domains such as open source software (OSS) development, remains largely unexplored. + + +To address this research gap, our study leverages the lockdowns implemented in two of the world's largest economies – the United States (US) and China – during various stages of the pandemic. These lockdowns serve as natural experiments, enabling us to study their impacts on OSS developers' contributions to GitHub, the world's largest OSS platform (GitHub, 2022b). China's Zero-COVID strategy, marked by its... +uniform and strict lockdown measures across various cities at different times, provides an ideal setting to study OSS contributors’ responses to lockdowns. More importantly, it allows us to understand their adaptation to the new normal of WFH throughout various pandemic stages. Meanwhile, the US, with its prominent role in the OSS community and extensive data availability, serves as an optimal environment to extend and validate our findings derived from the Chinese lockdowns, thereby enhancing the generalizability of our insights beyond the specific context of China. Taken together, these natural experiments enable us to delve deeper into how different approaches to managing the pandemic influence OSS contributors’ productivity. + + +Our main difference-in-differences (DID) analysis focused on two lockdowns in China: the initial lockdown in Wuhan in 2020 and another one occurred in Xi’an in 2021. Interestingly, the results revealed a significant positive impact of the 2021 lockdown on the OSS contributions of Xi’an developers, in contrast to the negative impact observed among Wuhan developers during the 2020 lockdown. Moreover, in both lockdowns, the results indicated that developers who made more online comments to their local peers experienced a more pronounced decline in their contributions. To delve deeper into the underlying mechanisms driving these outcomes, we conducted a targeted survey among the developers affected by these two lockdowns. + + +The survey findings reveal that Xi’an developers reported significantly fewer interruptions and a marked increase in flexibility in making OSS contributions during the later lockdown in 2021. Factors such as fear related to COVID-19 and increased housework responsibilities, which had significantly reduced Wuhan developers’ contributions during the initial 2020 lockdown, became insignificant for developers during the 2021 Xi’an lockdown. These findings point to a notable adaptation effect, as developers became more accustomed to the new norms of WFH imposed by the COVID-19 pandemic over time. The survey also found that, for both Wuhan and Xi’an developers, the increase in available time positively influenced OSS contributions. + + +Moreover, our survey study unveiled that, for both Wuhan and Xi’an developers, the lack of face-to-face (F2F) interactions significantly was found to significantly reduce their contribution levels. This finding is further corroborated by another survey discovery that identified a strong positive correlation between developers’ tendency to comment on GitHub and their propensity for F2F interactions prior to the lockdown. Coupled with the aforementioned DID analysis, which demonstrated a more pronounced negative impact on contributions from Wuhan and Xi’an developers who engaged in more online commenting activities with their local collaborators, this evidence leads to the inference that developers who frequently engaged in F2F interactions were more adversely affected by the lockdowns in terms of their contributions. This finding underscores the importance of F2F interactions in collaborative work environments and challenges the assumption that CMC can seamlessly replace F2F interactions without any adverse impact on productivity. + + +Furthermore, we use the DID analysis to examine the impact of stay-at-home lockdown orders in the US on developers’ OSS contributions. This empirical approach is guided by three key considerations. First, assessing the generalizability of the findings from Chinese lockdowns to other contexts is vital, as the impacts of strict lockdown measures like those in China may differ from the effects of milder restrictions adopted elsewhere. Second, the prominence of OSS development in the US, coupled with the extensive data available on GitHub, makes it an apt context for our analysis. Third, the heterogeneity in policies regarding lockdowns across different US states offers a unique opportunity for comparative analysis. This allows for a nuanced understanding of how diverse approaches to pandemic management can influence OSS contributions. + + +In addition, by comparing the effects observed in China and the US, we aim to provide valuable insights into the broader implications of lockdown measures on OSS contributions on a global scale. Interestingly, our analysis revealed no significant impact of US lockdowns on developers’ OSS contributions. We posit that this may be attributable to the less strict nature of stay-at-home orders in the US compared to the lockdown measures enforced in China. The relatively lenient restrictions in the US, which permitted essential activities and work, may not have led to significant disruptions in potential F2F interactions or provided additional available time for developers. Consequently, these factors may have exerted minimal effects on their OSS contributions. + + +Our contributions are threefold. First, by examining the impact of lockdowns on OSS contributions, our study provides novel insights into the effects of remote work on productivity. The nuanced findings on how individuals adapt to new norms of WFH during prolonged periods of disruption can equip various stakeholders – including individuals, organizations, and governments – with essential knowledge. This knowledge can guide preparations for similar future disruptions and build sustainable resilience. Second, our research reveals the detrimental effects of reduced F2F interactions, challenging the assumption that CMC can effortlessly replace F2F interactions without compromising productivity. This is especially salient in innovation-driven domains like OSS development. This insight enriches the discussion on the comparative impacts of CMC and F2F on the efficacy of virtual teams, a discussion that has become increasingly pertinent in an era where reliance on CMC for remote work is likely to persist even beyond the pandemic (Airbnb, 2022; Warren, 2020). Third, our study stands out through the adoption of systematic causal analysis methods. While previous research on the impact of lockdown has mainly relied on survey methods, our use of DID analysis on empirical data from GitHub enables a more robust examination of the causal effects of lockdowns. This methodological approach, reinforced with various robustness tests, not only strengthens the findings of our study but also offers a valuable framework that can be leveraged in future research. This includes exploring the impact of policy interventions or organization strategies in response to similar disruptions. + + + + +Literature review + + + + +2.1. COVID-19 and work productivity + + +The COVID-19 pandemic has led to an unprecedented shift to remote work, with millions mandated to work from home due to government-imposed lockdowns. The impact of WFH arrangements, brought about by those lockdowns, on work productivity has been the subject of intensive study, yielding mixed findings. Several studies found that lockdown-induced WFH is associated with declines in productivity, especially in innovation-oriented work such as software development (Ralph et al., 2020) and scholarly research (Barber et al., 2021; Walters et al., 2022). Ralph et al. (2020) surveyed 2225 software developers across 53 countries and found that both their productivity and well-being were diminished due to COVID-19. The primary influencing factors were fear related to the pandemic, disaster preparedness, and home office ergonomics. Barber et al. (2021) surveyed 1008 members of the American Finance Association, with 78.1% of the respondents suggesting that their research productivity is negatively affected by COVID-19. This was due to the lack of traditional F2F communications to disseminate research and obtain feedback, as well as overwhelming health concerns. Another survey study by Walters et al. (2022) investigated the reasons behind the reported decline in research activity among female academics during lockdowns. The primary reason was that while working from home, female academics were burdened with traditional family roles typically assumed by women, as well as increasing teaching and administrative workloads. + + +On the other hand, some studies found that productivity in lockdown-induced WFH scenarios has actually increased during this pandemic. Asay (2020) reports that OSS developers consistently increased their work volume in 2020, as they never truly left their work. Cui et al. (2022) found an overall 35% increase in productivity and a 13% increase in the gender gap among social science scholars in the US. +since the lockdown began. They suggest that while the lockdown could result in substantial time savings for work-related tasks such as commuting, female researchers may find themselves allocating more time for home-related tasks such as childcare. + + +Another line of research suggests that lockdowns in general have little effect on software developers. Forsgren (2020) reports that the activity of GitHub developers in the early days of COVID-19 was similar to or slightly increased compared to the previous year. Neto et al. (2021) surveyed 279 developers of GitHub projects developed using Java and found that WFH during the pandemic did not affect task completion time, code contribution, or quality. Similar studies were conducted to survey developers at major IT companies like Microsoft (Ford et al., 2021) and Baidu (Bao et al., 2022). They found that lockdown generally had little impact on developers’ productivity. However, these developers had differing opinions about the effects of lockdown. Some suggest their productivity benefited from WFH because of fewer disturbances, saved commuting time, and improved work-life balance. Others suggested their productivity suffered from WFH due to increased home-related tasks, decreased collaboration with others, and interruptions from family members. + + +To summarize, existing studies on the impact of pandemic-induced lockdowns on work productivity have yielded mixed findings and are heavily reliant on survey methods. Moreover, these studies have not sufficiently explored how knowledge workers, such as developers, adapt to remote work settings and how this adaptation influences their productivity during prolonged periods of lockdown. There is a clear need for systematic causal analyses on large empirical datasets to study the impacts and underlying mechanisms of pandemic-induced lockdowns on innovation-related work considering the effects of adaptation. + + +2.2. Face-to-face communications and computer-mediated communications + + +Previous research (NicCanna et al., 2021; Smite et al., 2023) has highlighted that one of the direct implications of pandemic-induced lockdowns is the diminished opportunity for traditional F2F interactions and an increased reliance on CMC, both of which have been considered crucial in the realm of OSS development (Crowston et al., 2007; O’Mahony and Ferraro, 2007). Crowston et al. (2007) identify several settings in which OSS developers engage in F2F meetings and the benefits they derive from such interactions. For instance, F2F meetings provide OSS developers with great opportunities to socialize, build teams, and verify each other’s identity. They also find that certain OSS development activities are best suited for F2F interactions, such as conveying important news (Boden and Molotch, 1994). Kock (2004) suggests that it is because human beings evolved over many years to excel at F2F interactions. Moreover, O’Mahony and Ferraro (2007) discovered that F2F interactions with OSS community members could increase one’s likelihood of ascending to a community leadership role. This is achieved through 1) building more trusting and reciprocal relationships and 2) creating potential coalitions. Butler and Jaffe (2021) also suggested that F2F interactions can significantly influence one’s efforts in community building. + + +These OSS studies are typically conducted in empirical contexts where F2F interactions and CMC co-exist among OSS community members, making it difficult to disentangle their effects. However, the strict lockdown measures in China have presented a unique opportunity to examine developers’ OSS contributions in a setting where F2F interactions are entirely absent. An important conjecture is that OSS developers, who have been accustomed to working productively using CMC in a remote and asynchronous manner for decades (Columbro, 2020; Wellman et al., 1996), are less likely to be affected by the absence of F2F interactions during the COVID-19 pandemic. Our study puts this conjecture to the test by examining the scenario where F2F interactions are largely absent due to the lockdowns in China. + + +Moving from the specific context of OSS to a more general comparison of F2F interactions and CMC in virtual teams, the findings remain inconclusive. Townsend et al. (1998) find that CMC can facilitate efficient connections between individuals regardless of their geographical locations, thereby significantly improving the performance of virtual teams. Moreover, team members distributed across different time zones can leverage CMC to coordinate more effectively and operate within a more flexible and efficient 24-hour cycle (Lipnack and Stamps, 1999). Therefore, Bergiel et al. (2006) suggest that virtual collaboration via CMC can overcome the constraints of time, distance, and organizational boundaries, leading to improvements in productivity and efficiency among team members. + + +On the other hand, another stream of the literature suggests that compared with F2F interactions, CMC carries fewer physical and emotional cues, thereby limiting the extent and synchronicity of information exchange (Cramton and Webber, 2005; Daft and Lengel, 1986; Dennis et al., 2008). This can negatively affect team members’ capabilities to establish mutual understanding (Kraut et al., 1982; Sproull and Kiesler, 1986; Straus and McGrath, 1994), their sense of belonging, and awareness of group activities (Cramton, 2001). Moreover, in the absence of F2F interactions, individuals are more likely to experience heightened conflicts (Wakefield et al., 2008), leading to decreased team productivity and satisfaction (Hambrick et al., 1998; Lau and Murnighan, 1998). Furthermore, despite recent advances in communication technologies, such as videoconferencing, which allow users to convey more non-verbal information cues than before, the lack of F2F interactions can still negatively affect innovation that relies on collaborative idea generation. A recent study (Brucks and Levav, 2022) discovered that, despite technological advancement, the absence of F2F interactions during the COVID-19 pandemic still negatively affected innovation. The authors attribute this finding to the differences between the physical nature of videoconferencing and F2F interactions, as the former focuses individuals on a display with a narrower cognitive focus. + + +To summarize, the existing literature has yet to conclusively establish whether, despite technological advancement, CMC can effectively replace the role of F2F interactions without impacting the productivity of collaborative work. Some studies (Crowston et al., 2007; Ocker et al., 1998) suggest that a mix of both CMC and F2F interactions is most beneficial for teamwork. However, as the preference for remote work and reliance on CMC continue to rise at an unprecedented scale even in the post-pandemic era, our research aims to fill this gap by studying whether CMC can fully replace F2F interactions without negatively affecting teamwork productivity. + + +2.3. Motivations for open source software contributions + + +Another stream of research that is very relevant to our study is the literature on motivations for contributing to OSS development. The prevailing framework in this field typically categorizes OSS developers’ motivations into intrinsic and extrinsic factors. Intrinsic motivations often stem from developers’ personal needs such as altruism and joy derived from contributing (Davidson et al., 2014; Hertel et al., 2003), whereas extrinsic motivations are usually related to utility-based external rewards, such as opportunities for career advancement (Fang and Neufeld, 2009; Yang et al., 2021). Studies by Hertel et al. (2003) and Shah (2006) have found that intrinsic motivations, such as enjoyment and fun, significantly influence OSS developers’ contributions. However, during the COVID-19-induced lockdowns, developers may experience fear and stress related to the health of their family and friends, which could negatively affect these intrinsic motivations, especially in the early stages of the pandemic. + + +However, there is a dearth of OSS motivation research that focuses on the social effects through which developers’ contribution motivations are influenced by their interactions with their peers. For instance, individuals’ OSS contributions are encouraged by the attention they received from their peers (Moqri et al., 2018) and collaboration with other team members (Crowston et al., 2007; Daniel and Stewart, 2016; +Xu et al., 2009). von Krogh et al. (2012) suggest that aspects of social practice like ethics and virtues are largely overlooked as a context for contribution motivations. These aspects are typically cultivated through social interactions among OSS community members, including both F2F interactions and CMC. Our study aims to enrich the understanding of the research community and policymakers on how major disruptions like lockdowns may limit such social effects, particularly through the reduced F2F interactions, and thereby influence OSS developers' contribution motivations. + + + + +Methods + + + + +In this section, we first adopt a mixed-method approach to study the impacts of two lockdowns in China on OSS developers' contributions. We treat the lockdowns in Wuhan and Xi'an as natural experiments, and for each GitHub developer in Wuhan or Xi'an, we match her with a developer in comparable regions that did not experience lockdown measures. We then utilize DID and difference-in-difference-in-differences (DDD) analyses, combined with propensity score matching (PSM), to discern the impacts. To delve deeper into the mechanisms that underpin the changes in developers' OSS contributions during the lockdowns, we also administer a survey to GitHub developers in both lockdowns. In Section 4, we report the main results of this analysis and perform a series of robustness tests to validate our findings. + + +Moreover, in Section 5, we extend our empirical approaches, such as the DID analysis, to data collected from a distinct context – the US. This supplementary analysis is designed to investigate whether the patterns observed in our findings on Chinese lockdowns are also present in other regions. By comparing the effects in China and the US, we aim to provide valuable insights into the wider implications of lockdown measures on OSS contributions on a global scale. + + +3.1. Experimental settings + + +COVID-19 has become one of the most severe global pandemics in recent decades (Fang et al., 2020). Our first natural experiment leverages the lockdown imposed in Wuhan, China from January 23 to April 8, 2020, in response to the initial major outbreak of COVID-19. The authorities enforced a citywide lockdown in Wuhan, leading to the closure of all public transport and non-essential businesses. The residents of all the 7148 residential communities in Wuhan were mandated to stay at home, with leaving only permitted in emergencies. The abrupt imposition of the Wuhan lockdown, which was implemented without prior warning, serves as an exogenous shock. This natural experimental setting provides us with an opportunity to examine the impact of the Wuhan lockdown on OSS contributions. + + +We designate Wuhan developers as the treatment group and choose developers in Hong Kong, Macau, and Taiwan (HMT) regions as the control group for several reasons. Firstly, most major cities in mainland China swiftly followed Wuhan's lead in implementing strict lockdown or social distancing measures, while the HMT regions did not implement such measures until March 2020. Hong Kong authorities prohibited indoor and outdoor public gatherings of more than four people in March 2020. Meanwhile, although Macau authorities took some ad-hoc measures such as closing casinos and public parks, they did not implement any citywide lockdown measures. Therefore, while developers in Wuhan were strictly required to stay at home in the early stage of this COVID-19 outbreak, those in HMT regions could go out and engage in F2F interactions. Secondly, compared with developers in other parts of the world, HMT developers are much more similar to Wuhan developers as they belong to the same ethnic group – Han Chinese (Wikipedia, 2022) and share similar cultural backgrounds. + + +We have chosen a ten-week period surrounding the day of the Wuhan lockdown (i.e., between December 19, 2019, and February 27, 2020) as the time frame for the DID analysis mainly for two reasons. Firstly, this timeframe is too short, allowing us to observe potential changes in developers' contributions. Secondly, as COVID-19 began to spread to other parts of the world, including the HMT regions, their developers might have started to consciously avoid F2F meetings with others to prevent potential COVID-19 infections, even before any lockdown or social distancing measures were implemented. This would make HMT developers less ideal control subjects in the natural experiment. Therefore, we set the end of the time window as February 27, 2020, as COVID-19 cases in HMT regions only started to increase significantly in March. + + +We also leverage the lockdown of Xi'an in China as a second natural experiment. The strictness of a city's lockdown measures often corresponds to the severity of the local outbreak, leading to endogeneity when attempting to causally identify the impacts of the lockdown measures. During the pandemic, China's Zero-COVID policy provides an ideal opportunity to address this endogeneity issue. This policy, which is centered around lockdowns, aims to halt the transmission of COVID-19 as soon as they are detected through mass testing (Chen et al., 2022a). Even a few COVID-19 cases can trigger a full-scale citywide lockdown in a very short period (Chen et al., 2022a). Such swift lockdowns in response to extra small numbers of new COVID-19 cases minimize the endogeneity of policy responses. + + +The Xi'an lockdown, which lasted from December 23, 2021, to January 23, 2022, was as strict as the Wuhan lockdown, even with far fewer initial infection cases, thus minimizing the endogeneity of policy responses. During the Xi'an lockdown, all public transport and non-essential businesses were suspended, and all Xi'an residents were strictly required to stay at home except for emergencies. Thus, we use Xi'an developers as the treatment group. To construct the control group, we follow existing studies (Muralidharan and Prakash, 2017; Wang, 2022) by choosing developers in the seven capitals of provinces (or municipalities) neighboring Xi'an that did not implement any lockdown measures during the Xi'an lockdown. This is because developers in these neighboring capitals are more similar to Xi'an developers in many aspects. The timeframe of the DID analysis covers the eight weeks surrounding the day of Xi'an lockdown (i.e., between November 25, 2021, and January 20, 2022). + + +3.2. Data collection for the Chinese lockdowns + + +Our empirical study collects and uses two types of data: GitHub data and COVID-19 case data. We obtain historical GitHub data through its API and GH Archive database. The latter archives public OSS development activities on GitHub since February 2011 and has been widely used in recent OSS research (Moqri et al., 2018; Negoița et al., 2019). We first use the “search-by-location” function of the GitHub API to extract developers who had at least one public repository and were located in the regions chosen for the natural experiments. For each experiment, we further select developers who joined GitHub before the chosen time window and exclude developers who did not push any commit within that time window. This procedure yields 1695 Wuhan developers and 5282 HMT developers for the Wuhan case. The selected sample of the Xi'an case includes 919 Xi'an developers and 4274 developers in the seven neighboring provincial capitals (or municipalities). Moreover, we obtain data about COVID-19 cases from relevant health authorities such as the National Health Commission of China as well as mainstream media. This comprehensive data collection allows us to conduct a robust analysis of the impact of Chinese lockdowns on OSS contributions. + + +3.3. Propensity score matching + + +To address potential endogeneity issues, we employ the DID technique in conjunction with PSM, following the methodology of previous studies (Chen et al., 2019; Foerderer, 2020). PSM selects control subjects by measuring their distance from the treated subjects based on pre-treatment covariates. This method is particularly effective in overcoming the curse of dimensionality (i.e., too many covariates) by transforming covariate vectors into a single propensity score and then +selecting control subjects closest to the treated ones (Chen et al., 2022b). It allows us to create a more balanced and comparable control group, thereby enhancing the robustness of our findings. + + +More specifically, we apply a one-on-one nearest neighbor matching without replacement to select a control developer for each treated developer based on a set of observable characteristics before the lockdown (Fang and Neufeld, 2009; Foss et al., 2021; Moqri et al., 2018; Zhang and Zhu, 2011). These characteristics include the number of weeks since the developer joined GitHub, whether the developer is a student or an employee based on her profile, whether the developer reports her contact information in her profile, the number of OSS projects that the developer created, the number of commits that the developer contributed on GitHub, the number of stars/issues/comments that the developer received for her repositories, the number of stars/issues/comments that the developer sent out, whether the developer used the following core language on GitHub – C/C++/C#/Go/Java/JavaScript/PHP/Python/Ruby/Scala/TypeScript (GitHub, 2022a) as her primary programming language, the number of the developer’s collaborators that contributed to the same projects with her, the number of the developer’s local collaborators who contributed to the same projects and lived in the same region with her, the average age of the developer’s OSS projects, and the number of projects with the General Public License (GPL) created by the developer. GPL, being the most restrictive license, could serve as a proxy for the developer’s ideological level (Foss et al., 2021). This PSM procedure yields 1608 matched pairs of Wuhan (treatment) and HMT (control) developers for the Wuhan lockdown and 919 matched pairs of Xi’an (treatment) and neighboring-city (control) developers for the Xi’an lockdown case. + + +Table 1 summarizes the mean values of the pre-treatment characteristics for all developers in the selected regions before matching. The results of the t-test indicate significant differences across many observable characteristics between the developers in lockdown areas and those in non-lockdown areas for both lockdowns. These differences suggest that a direct comparison between the treatment and control groups in the two natural experiments may not be appropriate. Therefore, we apply the aforementioned matching procedure. Table 2 reports the mean values of the same characteristics for the matched sample. The t-test results in Table 2 show that there are no significant differences across these observable characteristics between the treatment and matched control groups for both lockdowns. This suggests that the matching procedure has effectively balanced the observable characteristics between the treatment and matched control groups. +---------------------------------------- +------------------------------- +Section 24: +3.4. Empirical models +---------------------------------------- +------------------------------- +Section 25: +3.4.1. Difference-in-differences model + + +For each natural experiment, we now examine the change in OSS contributions of every developer selected in the matched sample using the following DID regression framework: + + +[ +\text{CONTRIBUTION} +{it} = \alpha + \beta \text{AFTER} +{it} \times \text{LOCKDOWN} +{it} + \gamma \text{CV} +{it} + \mu_i + \theta_t + \epsilon_{it} +] + + +where (i) indexes the developer and (t) indexes the week. The dependent variable, (\text{CONTRIBUTION} +{it}), is the weekly OSS contributions of each developer. We add one to the weekly number of commits a developer contributed to GitHub and then take a logarithm to measure her weekly OSS contributions following previous literature (Hu et al., 2023; Moqri et al., 2018; Zhang and Zhu, 2011). A commit is a change made to an OSS project, such as adding, modifying, and deleting codes. (\text{AFTER} +{it}) is a dummy variable that equals one if the time period is after the day of lockdown and zero otherwise. (\text{LOCKDOWN}_{it}) is a dummy variable that equals one if the developer is in the treatment group (i.e., in the city + + +| Lockdown of Wuhan | Mean | T-test | Lockdown of Xi'an | Mean | T-test | +|-------------------|------|--------|-------------------|------|--------| +| | All the Wuhan developers | All the HMT developers | Difference | All the Xi'an developers | All the neighboring-city developers | Difference | +| Weeks | 174.761 | 224.249 | -49.488 *** | 258.309 | 259.105 | -0.796 | +| Student | 0.305 | 0.162 | 0.143 *** | 0.279 | 0.224 | 0.054 *** | +| Employee | 0.232 | 0.290 | -0.058 *** | 0.288 | 0.277 | 0.011 | +| Contact | 0.721 | 0.690 | 0.031 ** | 0.703 | 0.730 | -0.027 * | +| Number of projects| 21.780 | 26.074 | -4.294 *** | 27.349 | 30.320 | -2.970 | +| Commits | 709.337 | 1489.132 | -779.795 ** | 1869.550 | 1725.299 | 144.251 | +| Stars received | 126.292 | 77.595 | 48.697 | 139.702 | 260.150 | -120.448 | +| Issues received | 6.959 | 9.129 | -2.169 | 11.473 | 15.097 | -3.624 | +| Comments received | 12.684 | 24.629 | -11.945 ** | 29.706 | 39.904 | -10.198 | +| Stars sent out | 104.883 | 107.217 | -2.334 | 118.405 | 153.768 | -35.364 *** | +| Issues sent out | 8.740 | 12.421 | -3.681 * | 13.799 | 14.680 | -0.881 | +| Comments sent out | 22.530 | 47.056 | -23.526 ** | 61.226 | 49.162 | 12.064 | +| C | 0.041 | 0.041 | 0.000 | 0.039 | 0.036 | 0.003 | +| C++ | 0.086 | 0.061 | 0.025 *** | 0.073 | 0.064 | 0.009 | +| C# | 0.019 | 0.041 | -0.022 *** | 0.027 | 0.031 | -0.003 | +| Go | 0.026 | 0.024 | 0.002 | 0.065 | 0.070 | -0.005 | +| Java | 0.198 | 0.075 | 0.123 *** | 0.177 | 0.180 | -0.003 | +| JavaScript | 0.202 | 0.225 | -0.023 ** | 0.182 | 0.199 | -0.018 | +| PHP | 0.021 | 0.032 | -0.012 ** | 0.021 | 0.026 | -0.005 | +| Python | 0.170 | 0.196 | -0.026 ** | 0.186 | 0.148 | 0.038 *** | +| Ruby | 0.004 | 0.021 | -0.017 *** | 0.007 | 0.004 | 0.002 | +| Scala | 0.002 | 0.002 | -0.000 | 0.003 | 0.001 | 0.002 | +| TypeScript | 0.007 | 0.010 | -0.003 | 0.016 | 0.025 | -0.008 | +| Collaborators | 367.333 | 899.337 | -532.003 *** | 632.457 | 683.823 | -51.366 | +| Local collaborators| 0.835 | 10.044 | -9.208 *** | 1.342 | 2.193 | -0.851 *** | +| Average age of projects | 64.415 | 88.233 | -23.819 *** | 100.200 | 102.360 | -2.160 | +| Number of projects with GPL | 1.045 | 1.187 | -0.142 | 1.256 | 1.681 | -0.425 ** | + + + + +p < 0.1. +** p < 0.05. +*** p < 0.01. +where lockdown is implemented) and zero otherwise. CV$ +{it}$ contains a set of control variables that might influence a developer’s OSS contributions according to previous research (Fang and Neufeld, 2009; Moqri et al., 2018; Zhang and Zhu, 2011): the number of OSS projects created by the developer (REPO$ +{it}$), the number of weeks since the developer joined GitHub (TENURE$ +{it}$), the number of stars the developer received for her repositories (STARR$ +{it}$), the number of stars the developer sent out (STARS$ +{it}$), the number of issues the developer received for her repositories (ISSUER$ +{it}$), the number of issues the developer sent out (ISSUES$ +{it}$), the number of comments the developer received for her repositories (COMMENTR$ +{it}$), the number of comments the developer sent out (COMMENTS$ +{it}$), and the number of new COVID-19 cases in the developer’s region (CASE$ +{it}$). + + + + +To control for the effects of time-invariant individual characteristics of developer $i$, especially those that are unobservable, we incorporate the individual fixed effect $\mu_i$ in our DID model. Moreover, as opposed to the standard two-period DID model, our DID model spans ten periods for the Wuhan lockdown case and eight periods for the Xi’an lockdown case. Consequently, we need to control for variables that remain constant across subjects but vary over different periods. Therefore, we include the time fixed effect $\theta_t$, which comprises weekly time dummies that control for time trends. The LOCKDOWN$ +{it}$ and AFTER$ +{it}$ in the standard two-period DID model are then absorbed by the individual and time fixed effects, respectively. $\epsilon_{it}$ is the error term. The coefficient $\beta$ indicates the impact of lockdown on developers’ OSS contributions. A negative coefficient would suggest the lockdown reduces developers’ OSS contributions, whereas a positive coefficient would indicate otherwise. +---------------------------------------- +------------------------------- +Section 26: +3.4.2. Difference-in-difference-in-differences models + + +We now examine the impact of the absence of F2F interactions caused by the lockdowns. If F2F interactions serve as important motivations for OSS contributions, as previous research has suggested (Crowston et al., 2007; Stam, 2009), we expect that developers who regularly engaged in F2F meetings with their collaborators would be more profoundly affected by the lockdown. To this end, we use a GitHub developer’s engagement with online comments (i.e., GitHub-supported CMC) as a proxy for her tendency to meet OSS collaborators F2F before the lockdown. This approach is grounded by previous studies that have observed that people who engage more in CMC are also more likely to meet F2F. Such a correlation is understood to reflect underlying social needs and preferences (Huang et al., 2022; Khalis and Mikami, 2018; Suphan and Mierzejewska, 2016). Furthermore, CMC has been found to cultivate social relationships and facilitate the coordination of F2F meetings (DiMaggio et al., 2001; Howard et al., 2001; Kraut et al., 2002; Suphan et al., 2012). This relationship between online and offline interaction is further supported by Brandtzæg and Nov (2011), who discovered that Facebook users who prioritize CMC with close friends also interact more frequently in F2F settings. In addition, our survey study in Section 4.3.4 finds that developers in lockdowns who made more online comments to their local GitHub collaborators before the lockdown were also more likely to meet with each other F2F, which is consistent with the findings of previous studies (Huang et al., 2022; Khalis and Mikami, 2018; Suphan and Mierzejewska, 2016). + + +This intricate relationship between CMC and F2F interactions lays the groundwork for our DDD analysis. To operationalize a GitHub developer’s tendency to meet her local collaborators F2F, we compute the number of online comments she made to them on the GitHub platform before the lockdown. This metric serves as a proxy for her social engagement and preference for F2F interactions. Building on the baseline DID specification, we develop a more nuanced DDD specification: + + +$$CONTRIBUTION_{it} = \mu_i + \beta_i \times LOCKDOWN_{it} + \beta_i \times AFTER_{it} \times LOCKDOWN_{it} + \gamma CV_{it} + \beta_i \times AFTER_{it} \times LOCKDOWN_{it} + \beta_i \times AFTER_{it} \times LOCKDOWN_{it} + \epsilon_{it}$$ + + +where $LOCCOMS_i$ is the number of online comments that developer $i$ +made to her GitHub collaborators in the same region before the lockdown. It is important to note that the individual fixed effect $\mu_i$ absorbs the $\text{LOCKDOWN}_t \times \text{LOCCOMS}_i$ term (Foerderer, 2020). We anticipate the coefficient $\beta_3$ to be significant and negative, indicating that developers who engaged more in online interactions with their local collaborators were adversely affected by the lockdown, leading to reduced contributions to OSS projects (Miller et al., 2019). To ensure that our results are robust to this alternative explanation, we consider the following DDD specification: + + +$$\text{CONTRIBUTION} +{it} = \alpha + \beta_1 \text{AFTER}_t \times \text{LOCKDOWN}_i + \beta_2 \text{AFTER}_t \times \text{COMS}_i + \beta_3 \text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{COMS}_i + \gamma \text{CV}_i + \mu_i + \theta_t + \epsilon +{it}$$ + + +(3) + + +where COMS$_i$ is the number of online comments that developer $i$ made to all her GitHub collaborators (including non-local ones) before the lockdown. If the alternative explanation is true, then the coefficient $\beta_3$ should be significant like the one in Eq. (2), as the general social effects should apply to all the GitHub collaborators, regardless of their location. + + +On the other hand, if the coefficient $\beta_3$ is insignificant in Eq. (3) but significant in Eq. (2), this alternative explanation can be dismissed. + + + + +Results and robustness checks for Chinese lockdowns + + + + +4.1. Results from the difference-in-differences model + + +Table 3 reports the results of Eqs. (1)–(3). Columns (1) and (4) show the results of Eq. (1) for the Wuhan and Xi'an lockdowns, respectively. The coefficient of $\text{AFTER}_t \times \text{LOCKDOWN}_i$ in Column (1) is negative and statistically significant at the 1% significance level, suggesting that the Wuhan lockdown led to a reduction in developers’ OSS contributions. Specifically, a coefficient of $-0.111$ suggests that Wuhan developers’ contributions decreased by 10.5% ($= e^{-0.111} - 1$) over the five weeks following the lockdown. In contrast, the coefficient of $\text{AFTER}_t \times \text{LOCKDOWN}_i$ in Column (4) is positive and significant at the 5% level, suggesting the Xi'an lockdown resulted in an increase in developers’ OSS contributions. A coefficient of 0.086 suggests that the Xi'an developers’ contributions increased by roughly 9.0% ($= e^{0.086} - 1$) over the four weeks after the lockdown. + + +According to the findings of our survey study presented in Section 4.3.4, these contrasting results between the Wuhan and Xi'an lockdowns can be mainly attributed to an adaption effect. When COVID-19 initially emerged in Wuhan, the unprecedented nature of the virus, coupled with its rapid spread, and severity likely instilled a high level of fear and uncertainty among the population. Therefore, Wuhan developers may + + +| Dependent variable: CONTRIBUTION$_{it}$ | Wuhan Lockdown (2020) | Xi'an Lockdown (2021) | +|----------------------------------------|------------------------|------------------------| +| | (1) | (2) | (3) | (4) | (5) | (6) | +| $\text{AFTER}_t \times \text{LOCKDOWN}_i$ | $-0.111^{ +}$ | $-0.108^{ +}$ | $-0.110^{ +}$ | $0.086^{ +*}$ | $0.089^{ +}$ | $0.085^{ +}$ | +| | (0.029) | (0.030) | (0.030) | (0.043) | (0.043) | (0.043) | +| $\text{AFTER}_t \times \text{LOCCOMS}_i$ | $0.003^{ +}$ | $0.001^{ +}$ | $0.001^{ +}$ | $0.000^{ +}$ | $0.000^{ +}$ | $0.000^{ +}$ | +| $\text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{COMS}_i$ | $-0.007^{ +}$ | $0.003$ | $0.003$ | $0.000$ | $0.000$ | $0.000$ | +| | (0.003) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | +| $\text{AFTER}_t \times \text{COMS}_i$ | $0.000$ | $0.000$ | $0.000$ | $0.000$ | $0.000$ | $0.000$ | +| | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | +| $\text{REPO}_i$ | $0.024$ | $0.024$ | $0.024$ | $0.372^{ +}$ | $0.372^{ +}$ | $0.372^{ +}$ | +| | (0.022) | (0.022) | (0.022) | (0.028) | (0.028) | (0.028) | +| $\text{TENURE}_i$ | $-0.030$ | $-0.030$ | $-0.029$ | $0.044$ | $0.047$ | $0.044$ | +| | (0.039) | (0.039) | (0.039) | (0.051) | (0.051) | (0.051) | +| $\text{STARR}_i$ | $0.016^{ +}$ | $0.016^{ +}$ | $0.016^{ +}$ | $0.003$ | $0.003$ | $0.003$ | +| | (0.008) | (0.008) | (0.008) | (0.002) | (0.002) | (0.002) | +| $\text{STARS}_i$ | $0.015^{ +}$ | $0.015^{ +}$ | $0.015^{ +}$ | $0.027^{ +}$ | $0.027^{ +}$ | $0.027^{ +}$ | +| | (0.006) | (0.006) | (0.006) | (0.008) | (0.008) | (0.008) | +| $\text{ISSUE}_i$ | $-0.021$ | $-0.020$ | $-0.022$ | $0.014$ | $0.014$ | $0.014$ | +| | (0.025) | (0.025) | (0.025) | (0.038) | (0.038) | (0.038) | +| $\text{ISSUES}_i$ | $0.076^{ +}$ | $0.076^{ +}$ | $0.077^{ +}$ | $0.058^{ +}$ | $0.058^{ +}$ | $0.057^{ +}$ | +| | (0.028) | (0.028) | (0.028) | (0.033) | (0.033) | (0.033) | +| $\text{COMMENT}_i$ | $0.022$ | $0.022$ | $0.022$ | $0.011$ | $0.011$ | $0.011$ | +| | (0.014) | (0.014) | (0.014) | (0.011) | (0.011) | (0.011) | +| $\text{COMMENTS}_i$ | $0.047^{ +}$ | $0.047^{ +}$ | $0.047^{ +}$ | $0.053^{ +}$ | $0.053^{ +}$ | $0.053^{***}$ | +| | (0.014) | (0.014) | (0.014) | (0.007) | (0.007) | (0.008) | +| $\text{CASE}_i$ | $0.000$ | $0.000$ | $0.000$ | $-0.000$ | $-0.000$ | $-0.000$ | +| | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | +| Constant | $5.769$ | $5.727$ | $5.585$ | $-10.529$ | $-11.429$ | $-10.494$ | +| | (6.688) | (6.690) | (6.685) | (13.023) | (13.052) | (13.041) | +| Individual FE | Yes | Yes | Yes | Yes | Yes | Yes | +| Time FE | Yes | Yes | Yes | Yes | Yes | Yes | +| Observations | 32,160 | 32,160 | 32,160 | 14,704 | 14,704 | 14,704 | +| R-squared | 0.048 | 0.048 | 0.048 | 0.083 | 0.083 | 0.083 | + + +Robust standard errors in brackets. + + + + +$p < 0.1$. +** $p < 0.05$. +*** $p < 0.01$. + + + + +have found it challenging to focus on and contribute to OSS projects during this turbulent period (Neto et al., 2021; Ralph et al., 2020). On the other hand, the Xi'an lockdown occurred nearly two years after Wuhan's, following more than a dozen city-level lockdowns. By that time, the residents in Xi'an were much more familiar with the virus and the associated lockdown measures, and they did not experience the same level of fear as those in Wuhan. They have adapted more readily to the new lifestyles induced by lockdown measures, including the new norm of WFH. This adaptation, coupled with the opportunities offered by WFH, such as increased available time and flexibility, may have enabled Xi'an developers to increase their OSS contributions (Ford et al., 2021; Neto et al., 2021). + + +4.2. Results from the difference-in-difference-in-differences models + + +Columns (2) and (5) of Table 3 present the results of Eq. (2) for the Wuhan and Xi'an lockdowns, respectively. The significant and negative coefficient of ( \text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{LOCCOMS} ) in Column (2) suggests that Wuhan developers who engaged more in online comments with their local GitHub collaborators were more negatively affected by the Wuhan lockdown. As indicated by our survey results in Section 3.4.2, developers who made more online comments to their local collaborators were more likely to meet with each other F2F before the lockdown. Therefore, the above result indicates that Wuhan developers who were more likely to have F2F interactions before the lockdown experienced a more pronounced reduction in their OSS contributions. On the other hand, the significant and negative coefficient of ( \text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{LOCCOMS} ) in Column (5) suggests that the positive effect on OSS contributions was weaker for the Xi'an developers who engaged more in online comments with local collaborators before. These Xi'an developers were also more likely to have F2F interactions, reflecting a similar pattern to that observed in the Wuhan lockdown. These findings highlight the importance of F2F interactions in affecting OSS contributions, and the loss of these interactions during lockdowns has a significant impact on such contributions. + + +Columns (3) and (6) of Table 3 report the results of Eq. (3) for the Wuhan and Xi'an lockdowns, respectively. The coefficients of ( \text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{COMS} ) are insignificant at the 10% level in both columns. As elaborated in Section 3.4.2, these results demonstrate that developers who were more socially engaged (i.e., those who made more online comments to their local collaborators) might become more concerned about the pandemic's negative impacts on others, leading to a decrease in their OSS contributions. Instead, these results of Eqs. (2) and (3) highlight that it is not the social nature of the developers but specifically the loss of F2F interactions during lockdowns that influences developers' OSS contributions. + + +In summary, the DID regression results suggest that the Wuhan lockdown led to a significant reduction in developers' OSS contributions, while the Xi'an lockdown resulted in an increase. Further analysis through the DDD regressions highlights the importance of F2F interactions in driving developers' OSS contributions on GitHub. The absence of these F2F interactions, brought about by lockdown measures, appears to negatively influence such contributions. + + +4.3. Robustness checks + + +4.3.1. Parallel trends + + +The key identification assumption for the DID estimation is the parallel trends assumption. This assumption posits that, before the lockdown, the OSS contributions of both the treatment group and the control group would follow the same temporal trend. If this assumption was not satisfied, the estimated effects could be biased, as the results could be driven by systematic differences between the treatment and control groups rather than the lockdown itself. + + +To ascertain the validity of our analysis, we conduct two sets of tests to examine whether our analysis satisfies this assumption. First, we plot the weekly average contributions (per developer) made by the treatment group (blue) and the control group (red) during the time window surrounding the Wuhan lockdown in Fig. 1(a) and the Xi'an lockdown in Fig. 1(b). To measure a developer's weekly contributions, we add one to the weekly number of commits she contributed to GitHub and then take the logarithm, consistent with the measure in our DID and DDD models. The green vertical line in each figure demarcates the day of lockdown. As Fig. 1(a) shows, the treatment and control groups exhibit almost identical contribution trends before the Wuhan lockdown, thereby fulfilling the parallel trends assumption. On the other hand, in the five weeks following the day of Wuhan lockdown, the contributions of the treatment group consistently fall below those of the control group, as evidenced by the substantial and persistent gap between the red and blue lines. Fig. 1(b) shows a similar pattern for the Xi'an lockdown, where both the treatment and control groups tend to contribute less over time before the lockdown, thus satisfying the parallel trends assumption. However, after the day of Xi'an lockdown, the control group continues the decreasing trend, while the treatment group exhibits a tendency to increase contributions. + + +Second, to further validate our findings, we adopt an event-study approach, a method commonly used in previous literature (Leslie and Wilson, 2020; Tanaka and Okamoto, 2021). This approach involves fitting the following equation: + + +[ +\text{CONTRIBUTION} +{it} = \alpha + \sum +{k=-n}^{0} \beta_k \text{WEEK} +{it-k} \times \text{LOCKDOWN}_i + \gamma CV +{it} + \mu_i + \theta_t + \epsilon_{it} +] + + +where ( n ) equals 5 for the Wuhan lockdown and equals 4 for the Xi'an lockdown. ( \text{WEEK}_{it} ) is a dummy variable that equals one if week ( t ) corresponds to ( k ), and zero otherwise. We do not construct the week ( k = 0 ) in our sample but use the day of lockdown to separate the pre-treatment and post-treatment periods. ( k = -1 ) indicates the week just before the day of lockdown, so it is dropped from the equation as the reference week. Intuitively, ( \beta_k ) captures the difference in contributions between the treatment and control groups in each week relative to ( k = -1 ). We expect the two groups to make similar contributions before the day of lockdown (( k < 0 )) and to diverge after the day of lockdown (( k > 0 )). + + +Fig. 2(a) and Fig. 2(b) show the estimated ( \beta_k ) in Eq. (4) for the Wuhan and Xi'an lockdowns, separately. The green vertical line in each figure represents the day of lockdown, while the gray dotted lines surrounding each coefficient depict 95% confidence intervals. In both figures, the estimated ( \beta_k ) (( k < 0 )) are all nearly zero, indicating no pre-treatment difference in the contribution trends between the treatment and control groups. Such a pattern confirms that the parallel trends assumption is satisfied in our analysis. + + +4.3.2. A falsification test + + +To ensure that our estimated effects are not artifacts of seasonality, we conduct a falsification test to demonstrate that the effects are not replicated in a period without the lockdowns. This involves repeating the DID analysis for the same time window in previous years when COVID-19 had not yet emerged (Cui et al., 2022; Zhang and Zhu, 2011). For the Wuhan lockdown, we repeat the DID analysis using data a lunar year ago, as the time window encompasses the Chinese New Year holiday. For the Xi'an lockdown, we use data from two years earlier, considering that some developers might have experienced lockdowns a year ago, during a period when lockdowns had become more common in China. The control variable ( \text{CASE}_{it} ) is excluded from this analysis since COVID-19 had not yet broken out during these earlier periods. This falsification test serves as a robustness check. If our original DID analysis was merely capturing seasonal effects, we would expect to find significant effects in these previous years as well. However, the absence of such effects would strengthen the validity of our main findings, confirming +that the observed changes in OSS contributions are indeed attributable to the lockdowns and not to underlying seasonal patterns. + + +Table 4 reports the results of the falsification test. The placebo-treated treatment effects are found to be insignificant for both Wuhan and Xi’an lockdowns. This implies that the developers in the treated groups did not significantly change their contributions during the same time window in previous years, thus ruling out the seasonal effects as a driving factor behind the observed changes in OSS contributions. + + +4.3.3. Alternative samples + + +We also replicate the DID and DDD analyses using two alternative matched samples to ensure that our results are not driven by the specific choice of the caliper in propensity score matching. In the main analysis, we used a caliper of 0.3 for the Wuhan lockdown to ensure no statistical difference in developer characteristics between the treatment and control groups (Chen et al., 2019; Wang, 2022). The caliper defines the range within which the (logit of) propensity scores must fall to be considered a valid match (Cochran and Rubin, 1973). While a narrower caliper can result in the inclusion of fewer subjects, it can also enhance the balance between the treatment and control groups, thereby reducing bias in estimating treatment effects (Wang, 2022; Wang et al., 2013). To further validate our findings, following Wang (2022), we employed alternative calipers of 0.1 for the Wuhan lockdown and 0.001 for the Xi’an lockdown. The new PSM with these calipers yielded a matched sample of 1557 developers for the Wuhan lockdown and a matched sample of 906 developers for the Xi’an lockdown, in both the treatment and control groups. Table 5 presents the t-test results after matching with the new calipers. Importantly, for both lockdowns, none of the differences between the treatment and control groups were found to be significant at the 10% level, indicating the two groups remained comparable for the DID analysis after matching, even with the alternative calipers. + + +Table 6 shows the regression results of Eqs. (1)–(3) based on the alternative matched samples. The coefficients of ( \text{AFTER}_t \times \text{LOCKDOWN}_i ), ( \text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{LOCCOMS}_i ), and ( \text{AFTER}_t \times \text{LOCKDOWN}_i \times \text{COMS}_i ) are found to be consistent with those in the main analyses for both Wuhan and Xi’an lockdowns, suggesting that our results are not driven by the choice of the caliper in the PSM process. + + +4.3.4. A survey study + + +We further complement our empirical analyses with a survey study, conducted to delve into the underlying mechanisms and influencing factors behind the changes in developers’ OSS contributions before and after the lockdowns. This survey targeted the treated developers in our +matched sample who had provided their email addresses on GitHub, encompassing 879 Wuhan developers and 463 Xi'an developers. To encourage participation, we offered an incentive of 20 Chinese Yuan to each respondent who successfully completed the questionnaire. The questionnaire, detailed in Appendix A, was designed with questions answered on a five-point Likert scale. Eventually, we received 109 responses from the Wuhan developers and 71 responses from the Xi'an developers. + + +Another objective of our survey study was to justify an important assumption underlying our DDD analysis: developers who engaged in more online comments with their local collaborators on GitHub were also more likely to meet F2F. To examine this relationship, we surveyed the treated developers about their tendencies in both online commenting and F2F interactions with their local collaborators. We then conduct a correlation test on these tendencies for both the Wuhan and Xi'an developers, the results of which are detailed in Table A1 in Appendix A. The findings reveal significant and positive correlation coefficients between the tendencies for online commenting and F2F interactions. This supports the assumption of our DDD analysis, reinforcing the validity of our empirical approach and the conclusions drawn from it. + + +Table 7 shows the results of a linear regression analysis that explores various surveyed factors to explain the changes in OSS contributions during the two Chinese lockdowns. The dependent variable represents the change in contributions, calculated as the difference between a respondent's total contributions on GitHub during the post-treatment period and her total contributions during the pre-treatment period. The independent variables consist of the respondents' ratings for each of the surveyed factors, as detailed in Questions 2–6 in the questionnaire provided in Appendix A. These factors were carefully selected for inclusion in the questionnaire based on previous research findings related to work productivity during COVID-19-induced WFH scenarios (Bao et al., 2022; Ford et al., 2021; Miller et al., 2021; Neto et al., 2021; Walters et al., 2022). +---------------------------------------- +------------------------------- +Section 27: +Table 4 + + +Falsification test results for Chinese lockdowns. + + +| Dependent variable: CONTRIBUTION +i + | Wuhan lockdown | Xi'an lockdown | +|-------------------------------------------|----------------|---------------| +| (1) | (2) | +| AFTER +i + × LOCKDOWN +i + | 0.018 | −0.024 | +| (0.019) | (0.024) | +| REPO +i + | 0.128 *** | 0.296 *** | +| (0.028) | (0.031) | +| TENURE +i + | 0.010 | 0.030 | +| (0.026) | (0.036) | +| STARR +i + | 0.001 | −0.000 | +| (0.001) | (0.000) | +| STARR +i + | 0.045 *** | 0.030 *** | +| (0.007) | (0.005) | +| ISSUER +i + | −0.002 | 0.045 | +| (0.038) | (0.050) | +| ISSUES +i + | 0.078 ** | 0.095 ** | +| (0.040) | (0.045) | +| COMMENTR +i + | 0.006 | −0.010 | +| (0.013) | (0.014) | +| COMMENTS +i + | 0.079 *** | 0.078 *** | +| (0.016) | (0.015) | +| Constant | −0.928 | −4.253 | +| (3.143) | (5.374) | +| Individual FE | Yes | Yes | +| Time FE | Yes | Yes | +| Observations | 32,160 | 14,704 | +| R-squared | 0.151 | 0.083 | + + +Robust standard errors in brackets. + + + + +p < 0.1. +** p < 0.05. +*** p < 0.01. +---------------------------------------- +------------------------------- +Section 28: +Table 5 + + +T-tests in the alternative matched sample for Chinese lockdowns. + + +| | Wuhan lockdown | Xi'an lockdown | +|--------------------------|----------------|---------------| +| | Treatment group | Control group | Difference | Treatment group | Control group | Difference | +| Weeks | 177.281 | 177.547 | −0.266 | 257.616 | 258.481 | −0.865 | +| Student | 0.277 | 0.290 | −0.013 | 0.277 | 0.281 | −0.004 | +| Employee | 0.244 | 0.247 | −0.003 | 0.286 | 0.278 | 0.008 | +| Contact | 0.705 | 0.713 | −0.008 | 0.702 | 0.721 | −0.019 | +| Number of projects | 21.423 | 21.780 | −0.357 | 27.156 | 25.883 | 1.273 | +| Commits | 663.480 | 499.008 | 164.473 | 1710.940 | 1299.883 | 411.057 | +| Stars received | 64.570 | 90.427 | −25.857 | 140.910 | 126.865 | 14.044 | +| Issues received | 5.550 | 5.620 | −0.071 | 11.545 | 9.189 | 2.357 | +| Comments received | 10.196 | 11.230 | −1.034 | 29.975 | 27.722 | 2.253 | +| Stars sent out | 94.332 | 94.636 | −0.304 | 119.413 | 110.372 | 9.041 | +| Issues sent out | 6.821 | 7.455 | −0.634 | 12.710 | 10.969 | 1.741 | +| Comments sent out | 16.331 | 19.586 | −3.255 | 46.818 | 40.392 | 6.426 | +| C | 0.043 | 0.041 | 0.002 | 0.040 | 0.044 | −0.004 | +| C++ | 0.086 | 0.083 | 0.003 | 0.073 | 0.086 | −0.013 | +| C# | 0.021 | 0.022 | −0.001 | 0.028 | 0.032 | −0.004 | +| Go | 0.027 | 0.028 | −0.001 | 0.064 | 0.070 | −0.006 | +| Java | 0.153 | 0.170 | −0.017 | 0.180 | 0.174 | 0.006 | +| JavaScript | 0.209 | 0.215 | −0.006 | 0.184 | 0.180 | 0.004 | +| PHP | 0.021 | 0.021 | 0.001 | 0.021 | 0.020 | 0.001 | +| Python | 0.182 | 0.168 | 0.014 | 0.183 | 0.179 | 0.004 | +| Ruby | 0.004 | 0.004 | 0.000 | 0.004 | 0.003 | 0.001 | +| Scala | 0.002 | 0.001 | 0.001 | 0.001 | 0.000 | 0.001 | +| TypeScript | 0.008 | 0.006 | 0.002 | 0.017 | 0.020 | −0.003 | +| Collaborators | 184.192 | 167.135 | 17.057 | 561.135 | 527.577 | 33.557 | +| Local collaborators | 0.789 | 0.830 | −0.041 | 1.254 | 1.092 | 0.162 | +| Average age of projects | 65.386 | 65.249 | 0.137 | 100.184 | 98.570 | 1.615 | +| Number of projects with GPL | 1.002 | 1.140 | −0.138 | 1.267 | 1.329 | −0.062 | + + + + +p < 0.1. +** p < 0.05. +*** p < 0.01. +---------------------------------------- +------------------------------- +Section 29: +Table 6 + + +Regression results for the alternative sample for Chinese lockdowns. + + +| | Wuhan lockdown | | Xi'an lockdown | | +|----------------------|----------------|----------------------|----------------|----------------------| +| | (1) | (2) | (3) | (4) | (5) | (6) | +| +AFTER + × +LOCKDOWN + | | | | | | +| | −0.106 *** | −0.103 *** | −0.101 *** | 0.089 *** | 0.092 ** | 0.090 ** | +| | (0.030) | (0.030) | (0.030) | (0.043) | (0.044) | (0.043) | +| +AFTER + × +LOCCOMS + | | | | | | +| | 0.003 *** | (0.001) | | 0.001 ** | (0.000) | | +| +AFTER + × +LOCKDOWN + | | | | | | +| × +LOCCOMS + | −0.007 *** | (0.003) | | −0.003 *** | (0.001) | | +| +AFTER + × +COMS + | | | | | | +| × +COMS + | 0.000 | (0.000) | | 0.000 | (0.000) | | +| +REPO + | 0.024 | 0.024 | 0.024 | 0.371 *** | 0.371 *** | 0.371 *** | +| | (0.021) | (0.021) | (0.021) | (0.028) | (0.028) | (0.028) | +| +TENURE + | −0.038 | −0.038 | −0.038 | 0.046 | 0.050 | 0.047 | +| | (0.040) | (0.040) | (0.040) | (0.051) | (0.051) | (0.051) | +| +STARR + | 0.014 ** | 0.014 ** | 0.014 ** | 0.003 | 0.003 | 0.003 | +| | (0.007) | (0.007) | (0.007) | (0.002) | (0.002) | (0.002) | +| +STARS + | 0.015 ** | 0.015 ** | 0.015 ** | 0.027 *** | 0.027 *** | 0.027 *** | +| | (0.006) | (0.006) | (0.007) | (0.008) | (0.008) | (0.008) | +| +ISSUER + | −0.036 | −0.036 | −0.037 | 0.010 | 0.010 | 0.012 | +| | (0.027) | (0.027) | (0.027) | (0.039) | (0.039) | (0.039) | +| +ISSUES + | 0.088 *** | 0.087 *** | 0.089 *** | 0.065 * | 0.064 * | 0.063 * | +| | (0.031) | (0.031) | (0.031) | (0.036) | (0.035) | (0.036) | +| +COMMENTR + | 0.023 | 0.023 | 0.023 | 0.012 | 0.012 | 0.012 | +| | (0.015) | (0.015) | (0.015) | (0.011) | (0.011) | (0.011) | +| +COMMENTS + | 0.047 *** | 0.047 *** | 0.047 *** | 0.052 *** | 0.052 *** | 0.050 *** | +| | (0.016) | (0.016) | (0.016) | (0.008) | (0.008) | (0.009) | +| +CASE + | −0.000 | −0.000 | −0.000 | −0.000 | −0.000 | −0.000 | +| | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | +| +Constant + | 7.166 | 7.125 | 7.101 | −11.153 | −12.068 | −11.456 | +| | (6.877) | (6.879) | (6.874) | (13.010) | (13.038) | (13.058) | +| +Observations + | 31,140 | 31,140 | 31,140 | 14,496 | 14,496 | 14,496 | +| +R-squared + | 0.048 | 0.048 | 0.048 | 0.082 | 0.082 | 0.084 | + + +Robust standard errors in brackets. + + + + +p < 0.1. +** p < 0.05. +*** p < 0.01. + + + + +Columns (1) and (2) of Table 7 present the regression results based on the responses from the Wuhan and Xi'an developers, respectively. These results reveal that fear related to the COVID-19 pandemic and housework burden, which significantly curtailed OSS contributions during Wuhan’s initial lockdown, no longer impacted Xi'an developers in 2021. On the other hand, the availability of uninterrupted time and increased flexibility positively influenced Xi'an developers' OSS contributions, a pattern not observed among their Wuhan counterparts in 2020. These findings, taken together with our DID and DDD regression results, highlight an adaptation effect of Xi'an developers. + + +More specifically, we posit that the Xi'an lockdown, occurring nearly two years after the Wuhan's and following numerous city-level lockdowns, allowed developers to adapt to the new norm of remote work. This adaptation allowed Xi'an developers to leverage the flexibility and opportunities provided by WFH, resulting in increased OSS contributions. In contrast, Wuhan developers, facing the novel threat of COVID-19, were impeded by fear and possibly lacked the capacity to engage in voluntary activities like OSS contributions. Moreover, the results show consistent patterns for both Wuhan and Xi'an developers, where the lack of F2F interactions significantly reduced their OSS contributions, while increased available time at home positively influenced them. These findings offer valuable insights into our understanding of how individuals adapt to unprecedented disruptions, providing valuable guidance for stakeholders in preparing for future challenges and fostering resilience. +---------------------------------------- +------------------------------- +Section 30: +5. Results for the US lockdowns + + +In the preceding sections, we have conducted a comprehensive examination of the impacts of lockdowns on OSS contributions within the context of China. To broaden our understanding and assess the applicability of our findings beyond China, this section introduces the results of an additional empirical analysis, focusing on the lockdowns in the US. As explained in Section 1, the rationale for focusing on the US stems from its prominent role in the global OSS development community, as well as its unique circumstances surrounding the implementation of lockdown measures (i.e., stay-at-home orders) during the COVID-19 pandemic. By comparing the observed effects in China with those in the US, we seek to determine whether similar patterns emerge across different regions. This comparative analysis not only enhances the robustness of our findings but also contributes valuable insights into the broader implications of lockdown measures on the OSS development community worldwide. + + +During the early stages of the virus' spread, between March and April of 2020, a total of 45 states and the District of Columbia in the US implemented either statewide or partial-state stay-at-home orders. These orders restricted residents from leaving their homes except for essential activities, such as obtaining food and performing essential work functions. In contrast, the remaining 5 states in the US questioned the necessity of such strict lockdown measures and refrained from issuing stay-at-home orders (Wu et al., 2020). One primary rationale behind this resistance was the belief that the residents would continue to... +leave their homes for shopping or work, rendering the stay-at-home orders ineffective (Wang, 2022). + + +In alignment with the methodology outlined in Wang (2022), our study design constructs a control group consisting of OSS developers in all the states that refrained from implementing any stay-at-home orders. To form a treatment group, we follow the approach employed in earlier studies (Muralidharan and Prakash, 2017; Wang, 2022), selecting developers in states that both implemented statewide stay-at-home orders and are geographically adjacent to the control states. This selection criterion is based on the assumption that neighboring states are more likely to share similarities with the control group in both observable and unobservable characteristics. To refine our selection, we first extract developers who had at least one public repository and were exclusively located in one state within the US. We then further narrow down the treatment group by including only developers in states with fewer than ten thousand GitHub developers, ensuring consistency with the control group, where all states meet this criterion. The resulting control group consists of developers in five states – Arkansas, Iowa, Nebraska, North Dakota, and South Dakota. The treatment group includes developers in six neighboring states – Louisiana, Mississippi, Missouri, Montana, Tennessee, and Wisconsin. Table 8 provides a detailed summary of the start and end dates of the stay-at-home orders in these states, as obtained from the official announcements of each respective state. This process enhances the comparability between the treatment and control groups, thereby strengthening the validity of our analysis. + + +Following the approach delineated by Wang (2022), we focus on the time window spanning from March 9, 2020, to April 20, 2020. This timeframe ensures that all developers in the treatment group have at least two weeks of data before and after the implementation of the stay-at-home orders. Consistent with Section 3.2, we include only those developers who joined GitHub before the chosen time window and pushed at least one commit during that period. Through this selection process, we arrive at a final data sample comprising 2583 treated developers and 4487 control developers. + + +Like our analysis of Chinese lockdowns, we employ DID combined with PSM on the final data sample of US lockdowns. First, we apply a one-on-one nearest neighbor matching without replacement, selecting a control developer for each treated developer. This matching is based on the same set of covariates used in the analysis of Chinese lockdowns, ensuring methodological consistency. Through this procedure, we obtain 2583 matched pairs of both the treatment and control groups. Table 9 summarizes the mean values of the pre-treatment characteristics for the treatment and control groups before and after matching. The t-test results confirm that there are no significant differences across these characteristics between the treatment and control groups after matching. This successful matching enhances the validity of our subsequent analysis by ensuring that the treatment and control groups are comparable in terms of observable characteristics, thereby minimizing potential biases. + + +We then estimate the impact of stay-at-home orders on OSS contributions using the matched sample, employing a time-varying DID model: + + +$$CONTRIBUTION_{it} = \alpha + \beta ORDER_{it} + \gamma CV_{it} + \mu_i + \theta_t + \epsilon_{it}$$ (5) + + +We also estimate the moderating effects of comment interactions with local collaborators and all collaborators using two separate models: + + +$$CONTRIBUTION_{it} = \alpha + \beta_1 ORDER_{it} + \beta_2 ORDER_{it} \times LOCCOMS_{it} + \gamma CV_{it} + \mu_i + \theta_t + \epsilon_{it}$$ (6) + + +$$CONTRIBUTION_{it} = \alpha + \beta_1 ORDER_{it} + \beta_2 ORDER_{it} \times COMS_{it} + \gamma CV_{it} + \mu_i + \theta_t + \epsilon_{it}$$ (7) + + +where $i$ indexes the developer and $t$ indexes the date. $ORDER_{it}$ is a binary variable that equals one if the state where developer $i$ is located implemented a stay-at-home order on date $t$ or earlier, and zero otherwise. The definitions of the remaining variables are consistent with those in Eqs. (1)–(3). + + +Table 10 reports the results from the estimation of Eqs. (5)–(7). These results adhere to the parallel trends assumption and remain robust when considering an alternative matched sample (please see the detailed robustness tests described in Appendix B). The coefficient of $ORDER_{it}$ is insignificant across all specifications, indicating that stay-at-home orders in the US did not have a significant impact on developers’ OSS contributions. The insignificance of both moderating effects further corroborates this finding. These findings contrast with the impacts observed during the Wuhan and Xi’an lockdowns, suggesting that the effects identified in the Chinese context may not be generalized to the less strict lockdowns implemented in the US. + + +The contract between the findings in China and the US may be attributed to the underlying differences in the stringency and enforcement of lockdown measures between these two significant nations. In China, the lockdowns were characterized by strict restrictions that +---------------------------------------- +------------------------------- +Section 31: +Table 7 + + +What explains the changes in OSS contributions during Chinese lockdowns? + + +| Dependent variable: change in contributions | Wuhan developers | Xi’an developers | +|--------------------------------------------|------------------|-----------------| +| (1) | (2) | +| Available time | 0.202 + | 0.725 + | +| (0.092) | (0.216) | +| Interruptions | −0.123 | −0.389 + | +| (0.083) | (0.159) | +| Flexibility | 0.029 | 0.409 +* | +| (0.066) | (0.189) | +| Work environment | 0.158 | 0.129 | +| (0.094) | (0.186) | +| Fear | −0.987 + | −0.304 | +| (0.072) | (0.264) | +| Lack of F2F interactions | −0.288 + | −0.190 + | +| (0.068) | (0.089) | +| Lack of work-life boundary | −0.148 | −0.172 | +| (0.138) | (0.190) | +| Lack of self-discipline | −0.013 | −0.180 | +| (0.065) | (0.198) | +| Taking care of family | −0.034 | −0.105 | +| (0.069) | (0.190) | +| Housework | −0.144 + | −0.160 | +| (0.523) | (0.179) | +| Constant | 3.921 + | −0.344 | +| (0.523) | (1.751) | +| Observations | 109 | 71 | +| R-squared | 0.850 | 0.614 | + + +Robust standard errors in brackets. + + + + +$p < 0.1$. + + + + +** $p < 0.05$. + + +*** $p < 0.01$. +---------------------------------------- +------------------------------- +Section 32: +Table 8 + + +Status of stay-at-home orders by state. + + +| State | Acronym | Order start date | Order end date | +|----------------|---------|------------------|----------------| +| Control group | | | | +| Arkansas | AR | No statewide order | | +| Iowa | IA | No statewide order | | +| Nebraska | NE | No statewide order | | +| North Dakota | ND | No statewide order | | +| South Dakota | SD | No statewide order | | +| Treatment group| | | | +| Louisiana | LA | March 23, 2020 | May 15, 2020 | +| Mississippi | MS | April 3, 2020 | May 11, 2020 | +| Missouri | MO | April 6, 2020 | May 3, 2020 | +| Montana | MT | March 28, 2020 | April 26, 2020 | +| Tennessee | TN | March 31, 2020 | April 30, 2020 | +| Wisconsin | WI | March 25, 2020 | May 26, 2020 | +required residents not to leave home except for emergencies. These restrictions were often rigorously enforced and severely limited developers’ opportunities for F2F interactions. On the other hand, the stay-at-home orders in the US were less strict, allowing residents to leave their homes for a broader range of activities such as shopping or work. This relatively lenient approach may have allowed US developers to adapt more easily to the new circumstances, leaving an insignificant impact on their work and lifestyles. Consequently, this may mitigate the negative impacts of the lockdown measures on their OSS contributions. Moreover, the less strict nature of the US orders may not have provided more available time at home for OSS contributions, as developers could still engage in many of their usual activities outside the home. + + + + +Conclusion + + + + +The lockdowns induced by the COVID-19 pandemic have catalyzed a global shift towards WFH, demonstrating its feasibility on an unprecedented scale. While previous research has explored the broader implications of remote work, the nuanced dynamics between F2F and CMC in the context of work productivity remains an intricate and underexplored area. This complexity is particularly salient within technology-driven domains such as OSS development. Our study first leverages two lockdowns in China – Wuhan 2020 and Xi’an 2021 – as natural experiments to study their causal impacts on developers’ OSS contributions on GitHub. To improve the generalizability and relevance of our findings from Chinese lockdowns, we also further extend our analysis to encompass the impacts of stay-at-home orders implemented across different states of the US during the early stage of the pandemic. + + +Our findings present a nuanced picture of the impact of lockdowns on developers’ OSS contributions. We discovered that the Xi’an lockdown in 2021 corresponded to a 9.0% increase in OSS contributions, while the Wuhan lockdown in 2020 saw a 10.5% reduction. This apparent contradiction is illuminated by our subsequent survey study, which reveals that the differing impacts can be mainly attributed to an adaptation effect related to the COVID-19 pandemic. More specifically, as the Xi’an lockdown occurred nearly two years after Wuhan’s, during which numerous city-level lockdowns had been implemented in China. This allowed developers to adapt to the new norm of WFH, optimizing the flexibility and opportunities provided by WFH to increase their OSS contributions. In stark contrast, the Wuhan lockdown, occurring at the onset of this pandemic when the virus was new, severe, and spreading rapidly, created a climate of fear and uncertainty. This atmosphere compounded by factors such as increased housework responsibilities, significantly impeded Wuhan developers’ ability to focus on OSS contributions. However, these once influential factors became insignificant during the 2021 Xi’an lockdown, highlighting the adaptability and resilience of individuals in the context of remote work during large-scale disruptions. Moreover, we found consistent patterns across both Wuhan and Xi’an developers, where the lack of F2F interactions significantly reduced their OSS contributions, while increased available time at home positively influenced them. In addition to our study on China, we employed DID analysis to assess the generalizability of our findings by examining the impact of stay-at-home lockdown orders in the US on developers’ OSS contributions. Interestingly, we found no significant impact of US lockdowns on these contributions. We posit that this may be due to the less strict nature of stay-at-home orders in the US, which may not have significantly disrupted developers’ work and lifestyle, thereby exerting minimal effects on their OSS contributions. + + +Our contributions are threefold. First, our findings contribute valuable insights into the effects of remote work on productivity, exploring how individuals adapt to remote work norms during prolonged disruptions such as the pandemic. These insights offer stakeholders, including individuals, organizations, and governments, the knowledge needed to prepare for future disruptions and foster sustainable resilience. Second, our findings shed light on the negative impact of reduced F2F interactions, thereby challenging the assumption that CMC can seamlessly +---------------------------------------- +------------------------------- +Section 33: +Table 9 + + +| | Before matching | After matching | +|----------------------|-----------------|---------------| +| | Mean | T-test | Mean | T-test | +| | Treatment group | Control group | Difference | Treatment group | Control group | Difference | +| Weeks | 237.195 | 251.904 | -14.710 *** | 237.195 | 232.121 | 5.074 | +| Student | 0.177 | 0.160 | 0.016 | 0.177 | 0.184 | -0.008 | +| Employee | 0.391 | 0.407 | -0.016 | 0.391 | 0.389 | 0.001 | +| Contact | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | +| Number of projects | 18.946 | 19.269 | -0.323 | 18.946 | 18.906 | 0.039 | +| Commits | 2072.852 | 2039.565 | 33.286 | 2072.852 | 1813.961 | 258.891 | +| Stars received | 48.550 | 71.115 | -22.565 | 48.550 | 39.534 | 9.016 | +| Issues received | 14.561 | 15.099 | -0.448 | 14.561 | 11.596 | 2.965 | +| Comments received | 44.084 | 47.101 | -3.017 | 44.084 | 32.748 | 11.336 | +| Stars sent out | 53.217 | 53.018 | 0.199 | 53.217 | 49.277 | 3.940 | +| Issues sent out | 25.280 | 26.767 | -1.487 | 25.280 | 23.936 | 1.345 | +| Comments sent out | 138.069 | 163.793 | -25.723 | 138.069 | 137.564 | 0.506 | +| C | 0.019 | 0.025 | -0.006 | 0.019 | 0.021 | -0.001 | +| C++ | 0.033 | 0.043 | -0.011 ** | 0.033 | 0.034 | -0.001 | +| C# | 0.055 | 0.047 | 0.008 | 0.055 | 0.055 | 0.000 | +| Go | 0.011 | 0.017 | -0.006 * | 0.011 | 0.009 | 0.002 | +| Java | 0.101 | 0.115 | -0.014 * | 0.101 | 0.102 | -0.001 | +| JavaScript | 0.211 | 0.188 | 0.023 ** | 0.211 | 0.216 | -0.006 | +| PHP | 0.039 | 0.051 | -0.012 ** | 0.039 | 0.037 | 0.003 | +| Python | 0.123 | 0.133 | -0.011 | 0.123 | 0.123 | 0.000 | +| Ruby | 0.025 | 0.025 | -0.000 | 0.025 | 0.027 | -0.002 | +| Scala | 0.002 | 0.004 | -0.002 | 0.002 | 0.002 | 0.000 | +| TypeScript | 0.018 | 0.020 | -0.001 | 0.018 | 0.015 | 0.003 | +| Collaborators | 1225.186 | 1203.774 | 21.412 | 1225.186 | 1114.715 | 110.471 | +| Local collaborators | 1.377 | 2.207 | -0.830 *** | 1.377 | 1.377 | -0.000 | +| Average age of projects | 89.021 | 95.269 | -6.249 *** | 89.021 | 87.204 | 1.817 | +| Number of projects with GPL | 1.081 | 1.181 | -0.100 | 1.081 | 1.122 | -0.040 | + + +p < 0.1. + + +**p < 0.05. + + +***p < 0.01. +Table 10 +Regression results for the US lockdowns. + + +| | (1) | (2) | (3) | +|------------------|-----------|-----------|-----------| +| ORDER +i + | 0.000 | -0.000 | 0.000 | +| (0.007) | (0.007) | (0.007) | | +| ORDER +i + × LOCCOMS +i + | 0.001 | | | +| (0.001) | | | | +| ORDER +i + × COMS +i + | | -0.000 | | +| (0.000) | | (0.000) | | +| REPO +i + | 0.221 + | 0.221 + | 0.221 + | +| (0.120) | (0.120) | (0.120) | | +| TENURE +i + | 0.000 + | 0.000 + | 0.000 + | +| (0.000) | (0.000) | (0.000) | | +| STARR +i + | 0.012 + | 0.012 + | 0.012 + | +| (0.007) | (0.007) | (0.007) | | +| STARS +i + | 0.000 | 0.000 | 0.000 | +| (0.002) | (0.002) | (0.002) | | +| ISSUER +i + | 0.012 | 0.012 | 0.012 | +| (0.029) | (0.029) | (0.029) | | +| ISSUES +i + | 0.090 + | 0.090 + | 0.090 + | +| (0.028) | (0.028) | (0.028) | | +| COMMENTR +i + | 0.036 + | 0.036 + | 0.036 + | +| (0.009) | (0.009) | (0.009) | | +| COMMENTS +i + | 0.076 + | 0.076 + | 0.076 +** + | +| (0.017) | (0.017) | (0.017) | | +| CASE +i + | -0.000 | -0.000 | -0.000 | +| (0.000) | (0.000) | (0.000) | | +| Constant | -0.580 | -0.580 | -0.580 | +| (0.448) | (0.448) | (0.448) | | +| Individual FE | Yes | Yes | Yes | +| Time FE | Yes | Yes | Yes | +| Observations | 222,138 | 222,138 | 222,138 | +| R-squared | 0.049 | 0.049 | 0.049 | + + +Robust standard errors in brackets. + + + p < 0.05. + +* + p < 0.01. + + +substitute for F2F interactions without any detrimental effects on productivity. This is especially pertinent in inherently digital domains such as OSS development. Our study adds a nuanced perspective to the broader discourse on the comparative impacts of CMC vs. F2F interactions on virtual team performance. This contribution is particularly important in today's environment, where the reliance on CMC due to the shift towards WFH has not only intensified but continues to shape the way we work and collaborate, even beyond the pandemic era (Airbnb, 2022; Warren, 2020). Third, unlike previous research that mainly relied on survey methods to investigate the impacts of lockdowns, our study embraced systematic causal analysis methods such as DID analysis. Using empirical data from GitHub, this rigorous approach, reinforced with various robustness tests and complemented by a survey study, established a multifaceted research framework. It opens new avenues for exploring the impact of policy interventions or organizational strategies in response to similar disruptions, thereby extending the applicability and relevance of our findings. + + +Moreover, our findings may help open-innovation platforms and organizations that depend on collaborative contributions formulate WFH-related strategies or policies (Airbnb, 2022; Warren, 2020). First, these stakeholders may need to recognize that individuals' adaptation to WFH can vary significantly over time and across different contexts, and strategies must be tailored accordingly. For instance, many contextual factors analyzed in our survey should be accounted for, such as changes in work time, interruptions, flexibility, remote work technology conditions, and housework duties. Second, the absence of F2F interactions, a vital component of collaboration, requires the exploration of alternative methods to compensate for this drawback. For instance, platforms could invest in advanced collaboration tools designed to replicate or even enhance the interaction experience in a virtual environment, such as facial recognition systems that can identify and emphasize micro-expression or emotional cues. Third, the positive impact of increased home time highlights the importance of flexible work policies. These policies should enable individuals to capitalize on the benefits of remote work without sacrificing productivity. At last, the initial negative impact of fear suggests that emotional support and well-being should be essential to remote work-related policies or strategies, especially during unprecedented disruptions like the COVID-19 pandemic. + + +Some limitations of our study generate directions and opportunities for future research. For instance, although it is reassuring that our study leverages two citywide lockdowns in China and statewide stay-at-home orders in the US, the contrasting findings between them highlight the complexity of remote work and suggest a need for further research to further understand the generalizability of our findings across different cultures, industries, and types of work. Second, our study focuses on OSS contributions measured by the number of commits. Future research needs to consider other measures of innovation-related work productivity such as code quality or creativity. + + +CRediT authorship contribution statement + + +Jin Hu: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data curation, Visualization, Writing - original draft, Writing - review & editing. Daning Hu: Conceptualization, Methodology, Supervision, Project administration, Funding acquisition, Writing - original draft, Writing - review & editing. Xuan Yang: Investigation, Funding acquisition, Writing - review & editing. Michael Chau: Supervision, Writing - review & editing. + + +Declaration of competing interest + + +The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. + + +Data availability + + +Data will be made available on request. + + +Acknowledgement + + +The authors gratefully acknowledge funding from Guangdong Province Focus Research Project (Grant Number: 2019KZDZX1014), Guangdong Province Research Fund (Grant Number: 2019QN01X277), National Natural Science Foundation of China (Grant Numbers: 71971106, 72001099), and Shenzhen Humanities & Social Sciences Key Research Bases. + + +Appendix A. Questionnaires for the survey analysis + + +The questionnaire for the Wuhan developers includes the following six questions. +1. Please indicate your choice on the following statements based on your experience before January 23, 2020 (i.e., the day of Wuhan lockdown). (1 = Strongly disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly agree). +1.1 (OnlineFrequency) I often made online comments to GitHub developers in the same city with me (hereinafter referred to as local collaborators) on GitHub. ___ +1.2 (OnlinePreference) I enjoyed making online comments to my local collaborators on GitHub. ___ +1.3 (OnlineNeed) My project tasks required me to make online comments to my local collaborators on GitHub. ___ +1.4 (OfflineFrequency) I often interacted with my local collaborators offline. ___ +1.5 (OfflinePreference) I enjoyed interacting with my local collaborators offline. ___ +1.6 (OfflineNeed) My project tasks required me to interact with my local collaborators offline. ___ + + +Please answer Questions 2–5 based on your lockdown experience during the five weeks after January 23, 2020, compared to the five weeks before that day. + + + + +Did the lockdown give you more time available for making OSS contributions on GitHub? + + + + +| Option | Code | +|---------------------------------|------| +| Gave me much less time | 1 | +| Gave me less time | 2 | +| Neutral: same as before lockdown| 3 | +| Gave me more time | 4 | +| Gave me much more time | 5 | + + + + +Did you have more interruptions when making OSS contributions on GitHub? + + + + +| Option | Code | +|---------------------------------|------| +| Much fewer interruptions | 1 | +| Fewer interruptions | 2 | +| Neutral: same as before lockdown| 3 | +| More interruptions | 4 | +| Much more interruptions | 5 | + + + + +Did you have more flexibility for making OSS contributions on GitHub? + + + + +| Option | Code | +|---------------------------------|------| +| Much less flexible | 1 | +| Less flexible | 2 | +| Neutral: same as before lockdown| 3 | +| More flexible | 4 | +| Much more flexible | 5 | + + + + +How was your work environment (e.g., internet bandwidth and hardware) at home for making OSS contributions on GitHub? + + + + +| Option | Code | +|---------------------------------|------| +| Much worse work environment | 1 | +| Worse work environment | 2 | +| Neutral: same as before lockdown| 3 | +| Better work environment | 4 | +| Much better work environment | 5 | + + + + +How would you rate each of the following factors in their respective impacts on your contributions to GitHub during the five weeks after January 23, 2020, compared to the five weeks before that day? (1 = Very low impact, 2 = Low impact, 3 = Neutral, 4 = High impact, 5 = Very high impact). + + + + +| Factor | Code | +|---------------------------------------------|------| +| 6.1 Fear related to the COVID-19 pandemic | | +| 6.2 Lack of face-to-face interactions with my collaborators | | +| 6.3 Lack of work-life boundary | | +| 6.4 Lack of self-discipline | | +| 6.5 Taking care of my family | | +| 6.6 Doing housework | | + + +The questionnaire for the Xi’an developers is the same as that for the Wuhan developers except for the following changes: + + + + +… before December 23, 2021 (i.e., the day of Xi’an lockdown). … + + + + +Please answer Questions 2–5 based on your lockdown experience during the four weeks after December 23, 2021, compared to the four weeks before that day. … + + + + +… during the four weeks after December 23, 2021, compared to the four weeks before that day. … +Table A1 +Correlation test results for the first survey question. + + + + +| Correlation between | Wuhan developers (1) | Xi’an developers (2) | +|---------------------|----------------------|----------------------| +| OnlineFrequency & OfflineFrequency | 0.305 *** | 0.253 ** | +| OnlinePreference & OfflinePreference | 0.423 *** | 0.283 ** | +| OnlineNeed & OfflineNeed | 0.442 *** | 0.540 *** | + + +p < 0.1. + +p < 0.05. + +**p < 0.01. + + +Appendix B. Robustness checks for the US lockdowns + + +To test the parallel trends assumption for the US lockdowns, we adopt an event-study approach by fitting the following equation: + + +[ +\text{CONTRIBUTION} +{it} = \alpha + \sum +{k=n, k \neq 1}^{n} \beta_k T_{itk} + \gamma CV_{it} + \mu_i + \theta_t + \epsilon_{it} \tag{B1} +] + + +where ( n ) equals 28. ( T_{itk} ) represents a series of dummies that indicate the chronological distance between the observation and the actual date when the state where developer ( i ) resides implemented stay-at-home orders. ( k = 1 ) designates the date immediately preceding the treatment, and thus it is omitted from the equation, serving as the reference date. + + +Fig. B1 shows the estimated coefficients ( \beta_k ) from Eq. (B1). The green vertical line represents the day when the stay-at-home order was enacted. The accompanying gray dotted lines delineate the 95% confidence intervals for each coefficient. Notably, the estimated ( \beta_k ) values for ( k < 0 ) are virtually zero, indicating that there is no significant pre-treatment difference in the contribution trends between the treatment and control groups. Therefore, our DID analysis satisfies the parallel trends assumption, reinforcing the validity of our DID analysis for the US lockdowns. + + + + +We also perform another robustness check by re-estimating Eqs. (5)–(7) using an alternative matched sample. This is achieved by incorporating a caliper of 0.1 in the PSM procedure, resulting in a matched sample that includes 2568 pairs of developers across both the treatment and control groups. The summary of ( t )-test results, presented in Table B1, reveals no statistically significant differences between the treatment and control groups at the 10% significance level. This outcome substantiates the comparability of the two groups following the matching process. Table B2 summarizes the results of Eqs. (5)–(7) derived from the alternative matched sample. The coefficients of ( \text{ORDER} +{it} ), ( \text{ORDER} +{it} \times \text{LOCCOMS} +i ), and ( \text{ORDER} +{it} \times \text{COMS}_i ) are all found to be statistically insignificant. This outcome implies that the implementation of stay-at-home orders in the US does not have a significant influence on developers’ OSS contributions. + + +Table B1 + + +| | Before matching | After matching | +|----------------------|-----------------|----------------| +| | Treatment group | Control group | Difference | Treatment group | Control group | Difference | +| +Weeks + | 237.195 | 251.904 | -14.710 +| 236.515 | 232.990 | 3.525 | +| +*Student + | 0.177 | 0.160 | 0.016 | 0.177 | 0.181 | -0.004 | +| +Employee + | 0.391 | 0.407 | -0.016 | 0.390 | 0.391 | -0.002 | +| +Contact + | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | +| +Number of projects + | 18.946 | 19.269 | -0.323 | 18.651 | 18.921 | -0.270 | +| +Commits + | 2072.852 | 2039.565 | 33.286 | 2058.688 | 1822.745 | 235.943 | +| +Stars received + | 48.550 | 71.115 | -22.565 | 48.081 | 39.739 | 8.342 | +| +Issues received + | 14.561 | 15.009 | -0.448 | 13.848 | 11.662 | 2.186 | + + +(continued on next page) +Table B1 (continued) + + +| Before matching | After matching | +|-----------------|----------------| +| | Treatment group | Control group | Difference | Treatment group | Control group | Difference | +| Comments received | 44.084 | 47.101 | -3.017 | 42.707 | 32.934 | 9.773 | +| Stars sent out | 53.217 | 53.018 | 0.199 | 45.862 | 49.546 | -3.684 | +| Issues sent out | 25.280 | 26.767 | -1.487 | 23.762 | 24.065 | -0.303 | +| Comments sent out| 138.069 | 163.793 | -25.723 | 133.803 | 138.307 | -4.504 | +| C | 0.019 | 0.025 | -0.006 | 0.019 | 0.021 | -0.001 | +| C++ | 0.033 | 0.043 | -0.011 ** | 0.033 | 0.034 | -0.001 | +| C# | 0.055 | 0.047 | 0.008 | 0.055 | 0.055 | 0.001 | +| Go | 0.011 | 0.017 | -0.006 * | 0.011 | 0.009 | 0.002 | +| Java | 0.101 | 0.115 | -0.014 * | 0.102 | 0.103 | -0.001 | +| JavaScript | 0.211 | 0.188 | 0.023 ** | 0.208 | 0.217 | -0.009 | +| PHP | 0.039 | 0.051 | -0.012 ** | 0.040 | 0.037 | 0.003 | +| Python | 0.123 | 0.133 | -0.011 | 0.123 | 0.124 | -0.001 | +| Ruby | 0.025 | 0.025 | -0.000 | 0.025 | 0.027 | -0.002 | +| Scala | 0.002 | 0.004 | -0.002 | 0.002 | 0.002 | 0.000 | +| TypeScript | 0.018 | 0.020 | -0.001 | 0.018 | 0.015 | 0.003 | +| collaborators | 1225.186 | 1203.774 | 21.412 | 1074.696 | 1118.728 | -44.032 | +| Local collaborators | 1.377 | 2.207 | -0.830 *** | 1.354 | 1.384 | -0.030 | +| Average age of projects | 89.021 | 95.269 | -6.249 *** | 88.861 | 87.490 | 1.370 | +| Number of projects with GPL | 1.081 | 1.181 | -0.100 | 1.075 | 1.127 | -0.052 | + + + + +p < 0.1. +** p < 0.05. +*** p < 0.01. + + + + +Table B2 +Regression results for alternative sample for the US lockdowns. + + +| Dependent variable: CONTRIBUTION +i + | (1) | (2) | (3) | +|---------------------------------------------|-----|-----|-----| +| ORDER +i + | 0.000 | 0.000 | 0.000 | +| (0.007) | (0.007) | (0.007) | +| ORDER +i + × LOCCOMS +i + | 0.001 | 0.001 | 0.001 | +| (0.001) | (0.001) | (0.001) | +| ORDER +i + × COMS +i + | 0.000 | 0.000 | 0.000 | +| (0.000) | (0.000) | (0.000) | +| REPO +i + | 0.225 * | 0.225 * | 0.225 * | +| (0.120) | (0.120) | (0.120) | +| TENURE +i + | 0.000 * | 0.000 * | 0.000 * | +| (0.000) | (0.000) | (0.000) | +| STARR +i + | 0.012 * | 0.012 * | 0.012 * | +| (0.007) | (0.007) | (0.007) | +| STARS +i + | -0.006 ** | -0.006 ** | -0.006 ** | +| (0.003) | (0.003) | (0.003) | +| ISSUER +i + | 0.012 | 0.012 | 0.012 | +| (0.029) | (0.029) | (0.029) | +| ISSUES +i + | 0.089 *** | 0.089 *** | 0.089 *** | +| (0.028) | (0.028) | (0.028) | +| COMMENTR +i + | 0.036 *** | 0.036 *** | 0.036 *** | +| (0.009) | (0.009) | (0.009) | +| COMMENTS +i + | 0.075 *** | 0.075 *** | 0.075 *** | +| (0.017) | (0.017) | (0.017) | +| CASE +i + | -0.000 | -0.000 | -0.000 | +| (0.000) | (0.000) | (0.000) | +| Constant | 0.593 | 0.592 | 0.593 | +| (0.451) | (0.451) | (0.451) | +| Individual FE | Yes | Yes | Yes | +| Time FE | Yes | Yes | Yes | +| Observations | 220,848 | 220,848 | 220,848 | +| R-squared | 0.049 | 0.049 | 0.049 | + + +Robust standard errors in brackets. + + + + +p < 0.1. +** p < 0.05. +*** p < 0.01. + + + + +References + + +Airbnb, 2022. Airbnb's Design for Employees to Live and Work Anywhere. https://news.airbnb.com/airbnbs-design-to-live-and-work-anywhere/. + + +Asay, M., 2020. COVID-19 Isn't Slowing Open Source—Watch for Developer Burnout. https://www.techrepublic.com/article/covid-19-isnt-slowing-open-source-watch-for-developer-burnout/. + + +Bao, L., Li, T., Xia, X., Zhu, K., Li, H., Yang, X., 2022. How does working from home affect developer productivity? – a case study of Baidu during COVID-19 pandemic. SCIENCE CHINA Inf. Sci. 65, 1–15. +Barber, B.M., Jiang, W., Morse, A., Puri, M., Tookes, H., Werner, I.M., 2021. What explains differences in finance research productivity during the pandemic? J. Finance 76, 1655–1699. + + +Bergel, B.J., Bergel, E.B., Balsmeier, P.W., 2006. The reality of virtual teams. Competition Forum 4, 427–432. + + +Boden, D., Molotch, H.L., 1994. The compulsion of proximity. In: Friedland, R., Boden, D. (Eds.), Newcomers, Time and Modernity. University of California Press, Berkeley, pp. 257–286. + + +Brandzert, P.B., Nov, O., 2011. Facebook use and social capital—a longitudinal study. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Barcelona, Spain, pp. 454–457. + + +Brucks, M.S., Levav, J., 2022. Virtual communication curbs creative idea generation. Nature 605, 108–112. + + +Butler, J., Jaffe, S., 2021. Challenges and gratitude: a diary study of software engineers working from home during COVID-19 pandemic. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice. IEEE, Madrid, Spain, pp. 362–363. + + +Chen, S., Ma, H., Wu, Q., 2019. Bank credit and trade credit: evidence from natural experiments. J. Bank. Financ. 108, 105616. + + +Chen, J., Chen, W., Liu, E., Luo, J., Song, Z.M., 2022a. The Economic Cost of Lockdown in China: Evidence from City-to-city Truck Flows. + + +Chen, X., Guo, M., Shangguan, W., 2022b. Estimating the impact of cloud computing on firm performance: an empirical investigation of listed firms. Inf. Manag. 59, 103603. + + +Cochran, W.G., Rubin, D.B., 1973. Controlling bias in observational studies: a review. Sankhya: Indian J. Stat. Ser. A 35, 417–446. + + +Colombo, G., 2020. Open Source and COVID-19: Open Source Will Come Out Stronger on the Other Side of the Pandemic. https://www.finos.org/blog/open-source-and-covid-19-open-source-will-come-out-stronger-on-the-other-side-of-the-pandemic. + + +Cranton, C.D., 2001. The mutual knowledge problem and its consequences for dispersed collaboration. Organ. Sci. 12, 346–371. + + +Cranton, C.D., Webber, S.S., 2005. Relationships among geographic dispersion, team processes, and effectiveness in software development work teams. J. Bus. Res. 58, 755–765. + + +Crowston, K., Howison, J., Masango, C., Ereyel, U.Y., 2007. The role of face-to-face meetings in technology-supported self-organizing distributed teams. IEEE Trans. Prof. Commun. 50, 185–203. + + +Cui, R., Ding, H., Zhu, F., 2022. Gender inequality in research productivity during the COVID-19 pandemic. Manuf. Serv. Oper. Manag. 24, 707–726. + + +Daft, R.L., Lengel, R.H., 1986. Organizational information requirements, media richness and structural design. Manag. Sci. 32, 554–571. + + +Daniel, S., Stewart, K., 2016. Open source project success: resource access, flow, and integration. J. Strateg. Inf. 25, 159–176. + + +Davidson, J., Mannan, U., Naik, R., Dua, J., Jensen, C., 2014. Older adults and free/open source software: a diary study of first-time contributors. In: Proceedings of the International Symposium on Open Collaboration. Association for Computing Machinery, New York, NY, United States, pp. 1–10. + + +Dennis, A.R., Fuller, R.M., Valacich, J.S., 2008. Media, tasks, and communication processes: a theory of media synchronicity. MIS Q. 32, 575–600. + + +DiMaggio, P., Hargittai, E., Neuman, W.R., Robinson, J.P., 2001. Social implications of the internet. Annu. Rev. Sociol. 27, 307–336. + + +Fang, Y., Neufeld, D., 2009. Understanding sustained participation in open source software projects. J. Manag. Inf. Syst. 25, 9–50. + + +Fang, H., Wang, L., Voss, J., 2010. Human mobility restrictions and the spread of the novel coronavirus (2019-nCoV) in China. J. Public Econ. 191, 104272. + + +Foerderer, J., 2020. Interfirm exchange and innovation in platform ecosystems: Evidence from Apple’s worldwide developers conference. Manag. Sci. 66, 4772–4778. + + +Ford, D., Storey, M.-A., Zimmermann, T., Bird, C., Jaffe, S., Maddila, C., Butler, J.L., Houck, B., Nagappan, N., 2021. A tale of two cities: software developers working from home during the COVID-19 pandemic. ACM Trans. Softw. Eng. Methodol. 31, 1–37. + + +Forsgren, N., 2020. Octoverse Spotlight: An Analysis of Developer Productivity, Work Cadence, and Collaboration in the Early Days of COVID-19. https://github.com/blog/2020-05-06-octoverse-spotlight-an-analysis-of-developer-productivity-work-cadence-a-n-colaboration-in-the-early-days-of-covid-19. + + +Foss, N.J., Jeppesen, L.B., Rullani, F., 2021. How context and attention shape behaviors in online communities: a modified garbage can model. Ind. Corp. Chang. 30, 1–18. + + +GitHub, 2022a. GitHub Language Support. https://docs.github.com/en/get-started/learning-about-github/github-language-support. + + +GitHub, 2022b. How GitHub Builds Software. https://github.com/about. Hambrick, D.C., Davison, S.C., Snell, S.A., Snow, C.C., 1998. When groups consist of multiple nationalities: towards a new understanding of the implications. Organ. Stud. 19, 181–205. + + +Hertel, G., Niederer, S., Hermann, S., 2003. Motivation of software developers in open source projects: an internet-based survey of contributors to the linux kernel. Res. Policy 32, 1159–1177. + + +Howard, P.E., Rainie, L., Jones, S., 2001. Days and nights on the internet: the impact of a major technology on South Africa. Res. Sci. 45, 383–404. + + +Hu, J., Hu, D., Yang, X., Chau, M., 2023. Can firms improve performance through external contributions to their open-source software projects?. In: Proceedings of the 31th European Conference on Information Systems (ECIS), Kristiansand, Norway. + + +Huang, L., Zhong, D., Fan, W., 2022. Do social networking sites promote life satisfaction? The explanation from an online and offline social capital transformation. Inf. Technol. People 35, 703–722. + + +Khaliq, A., Mikami, A.Y., 2018. Talking face-to-face: associations between online and offline interactions of online relationships. Comput. Hum. Behav. 89, 88–97. + + +Kock, N., 2004. The psychobiological model: towards a new theory of computer-mediated communication based on Darwinian evolution. Organ. Sci. 15, 327–348. + + +Kraut, R.E., Lewis, S.H., Swezey, L.W., 1982. Listener responsiveness and the coordination of conversation. J. Pers. Soc. Psychol. 43, 718–731. + + +Kraut, R., Kiesler, S., Boneva, B., Cummings, J., Helgeson, V., Crawford, A., 2002. Internet paradox revisited. J. Soc. Issues 58, 49–74. + + +von Krogh, G., Haefliger, S., 2012. Carrots and rainbows: motivation and social practice in open source software development. MIS Q. 36, 649–674. + + +Lau, D.C., Murmigan, J.K., 1998. Demographic diversity and fruitfulness: the compositional dynamics of organizational groups. Acad. Manag. Rev. 23, 325–340. + + +Leslie, E., Wilson, R., 2020. Sheltering in place and domestic violence: evidence from calls for service during COVID-19. J. Public Econ. 189, 104241. + + +Lipnitzki, J., Stamps, J., 1999. Virtual teams: the new way to work. Strateg. Leader. 27, 14–19. + + +Miller, C., Widder, D.G., Kastner, C., Vasilieus, B., 2019. Why do people give up flossing? A study of contributor disengagement in open source. In: IFIP International Conference on Open Source Systems. Springer, pp. 116–129. + + +Miller, C., Rodeghero, P., Storey, M.-A., Ford, D., Zimmermann, T., 2021. “How was your weekend?” Software development teams working from home during COVID-19. In: IEEE/ACM 43rd International Conference on Software Engineering. IEEE, pp. 624–636. + + +Moqi, M., Mei, X., Qiu, L., Bandypadhyay, S., 2018. Effect of “following” on contributions to open source communities. J. Manag. Inf. Syst. 35, 1188–1217. + + +Muralidharan, K., Prakash, N., 2017. Cycling to school: increasing secondary school enrollment for girls in India. Am. Econ. J. Appl. Econ. 9, 321–350. + + +Negoita, B., Vial, G., Shaikh, M., Labbe, A., 2019. Code forking and software development project sustainability. In: Evidence from GitHub, Fortieth International Conference on Information Systems, Munich, Germany. + + +Neto, P.A.d.M.S., Mannan, U.A., de Almeida, E.S., Nagappan, N., Lo, D., Singh, Kochhar, P., Gao, C., Ahmed, I., 2021. A deep dive into the impact of COVID-19 on software development. IEEE Trans. Softw. Eng. 48, 3342–3360. + + +NicCanna, C., Razzak, M.A., Noll, J., Beecham, S., 2021. Globally distributed development during COVID-19. In: 2021 IEEE/ACM 40th International Workshop on Software Engineering Research and Industrial Practice. IEEE, pp. 18–25. Virtual Conference. + + +Ocker, R., Fjermedal, J., Hiltz, S.R., Johnson, K., 1998. Effects of four modes of group communication on the outcomes of software requirements determination. J. Manag. Inf. Syst. 15, 99–118. + + +O’Mahony, S., Ferraro, F., 2007. The emergence of governance in an open source community. Acad. Manag. J. 50, 1079–1106. + + +Peters, P., Baltes, S., Adisaputri, G., Torkar, R., Kovalenko, V., Kalinowski, M., Novielli, N., Yoo, S., Devroye, X., Tan, X., Zhou, M., Turhan, B., Hoda, R., Hata, H., Robles, G., Fard, A.M., Alkadhri, R., 2020. Pandemic programming. Emir. Softw. Eng. 25, 1–35. + + +Shah, S.K., 2006. Motivation, governance, and the viability of hybrid forms in open source software development. Manag. Sci. 52, 1000–1014. + + +Sheridan, A., Andersen, A.L., Hansen, E.T., Johannesen, N., 2020. Social distancing laws cause only small losses of economic activity during the COVID-19 pandemic in Scandinavia. Proc. Natl. Acad. Sci. 117, 20468. + + +Smite, D., Moe, N.B., Klotins, E., Gonzalez-Huerta, J., 2023. From forced working-from-home to voluntary working-from-anywhere: two revolutions in telework. J. Syst. Softw. 195, 111509. + + +Sproll, L., Kiesler, S., 1986. Reducing social context cues: electronic mail in organizational communication. Manag. Sci. 32, 1492–1512. + + +Stam, W., 2009. When does community participation enhance the performance of open source software projects? 1287–1299. + + +Straus, S.G., McGrath, J.E., 1994. Does the medium matter? The interaction of task type and technology on group performance and member reactions. J. Appl. Psychol. 79, 397–405. + + +Suphan, A., Mierzejewska, B.L., 2016. Boundaries between online and offline realms: how social grooming affects students in the USA and Germany. Inf. Commun. Soc. 19, 1287–1305. + + +Suphan, A., Feuls, M., Fieseler, C., 2012. Social media’s potential in improving the mental wellbeing of the unemployed. In: Eriksson-Bakka, K., Looma, A., Krook, E. (Eds.), Exploring the Abyss of Inequalities. Springer, Berlin, pp. 10–28. + + +Tanaka, T., Okamoto, S., 2021. Increase in suicide following an initial decline during the COVID-19 pandemic in Japan. Nat. Hum. Behav. 5, 229–238. + + +Toussaint, A.M., DeMarie, S.M., Hendrickson, A.R., 1998. Virtual teams: the workplace of the future. Acad. Manag. Perspect. 12, 17–29. + + +Wakefield, R.L., Leidner, D.E., Garrison, G., 2008. Research note—a model of conflict, leadership, and performance in virtual teams. Inf. Syst. Res. 19, 434–455. + + +Walters, C., Mehl, G.G., Piraino, P., Jansen, J.D., Kriger, S., 2022. The impact of the pandemic-enforced lockdown on the scholarly productivity of women academics in South Africa. Res. Policy 51, 104403. + + +Wang, G., 2022. Stay at home to save: effectiveness of stay-at-home orders in containing the COVID-19 pandemic. Proc. Oper. Manag. 31, 2289–2305. + + +Wang, Y., Cai, H., Li, C., Jiang, Z., Wang, L., Song, J., Xia, J., 2013. Optimal caliper width for propensity score matching of three treatment groups: a Monte Carlo study. PLoS One 8, e101405. + + +Warren, T., 2020. Microsoft Is Letting More Employees Work From Home Permanently. https://www.theverge.com/2020/10/9/21508964/microsoft-remote-work-from-home-microsoft-2019?fbclid=IwAR08H1r0lBjymHbfw4fYApVhHcdRvK5tv5z2qYTaUYe6c8Q6ynMkXzQxQ4. + + +Walters, C., Mehl, G.G., Piraino, P., Jansen, J.D., Kriger, S., 2022. The impact of the pandemic-enforced lockdown on the scholarly productivity of women academics in South Africa. Res. Policy 51, 104403. +Wellman, B., Salaff, J., Dimitrova, D., Garton, L., Gulia, M., Haythornthwaite, C., 1996. Computer networks as social networks: collaborative work, telework, and virtual community. Annu. Rev. Sociol. 22, 213–238. + + +Wikipedia, 2022. Han Chinese. https://en.wikipedia.org/wiki/Han_Chinese. + + +Wu, J., Smith, S., Khurana, M., Siemaszko, C., DeJesus-Banos, B., 2020. Stay-at-home Orders Across the Country. https://www.nbcnews.com/health/health-news/here-are-stay-home-orders-across-country-n1168736. + + +Xu, B., Jones, D.R., Shao, B., 2009. Volunteers’ involvement in online community based software development. Inf. Manag. 46, 151–158. + + +Yang, X., Li, X., Hu, D., Wang, H.J., 2021. Differential impacts of social influence on initial and sustained participation in open source software projects. J. Assoc. Inf. Sci. Technol. 72, 1133–1147. + + +Zhang, X.M., Zhu, F., 2011. Group size and incentives to contribute: a natural experiment at Chinese Wikipedia. Am. Econ. Rev. 101, 1601–1615. +---------------------------------------- +------------------------------- +Section 34: +Automating Dependency Updates in Practice: An Exploratory Study on GitHub Dependabot + + +Runzhi He, Hao He, Yuxia Zhang, Minghui Zhou + + +Abstract—Dependency management bots automatically open pull requests to update software dependencies on behalf of developers. Early research shows that developers are suspicious of updates performed by dependency management bots and feel tired of overwhelming notifications from these bots. Despite this, dependency management bots are becoming increasingly popular. Such contrast motivates us to investigate Dependabot, currently the most visible bot on GitHub, to reveal the effectiveness and limitations of state-of-art dependency management bots. We use exploratory data analysis and a developer survey to evaluate the effectiveness of Dependabot in keeping dependencies up-to-date, interacting with developers, reducing update suspicion, and reducing notification fatigue. We obtain mixed findings. On the positive side, projects do reduce technical lag after Dependabot adoption and developers are highly receptive to its pull requests. On the negative side, its compatibility scores are too scarce to be effective in reducing update suspicion; developers tend to configure Dependabot toward reducing the number of notifications; and 11.3% of projects have deprecated Dependabot in favor of other alternatives. The survey confirms our findings and provides insights into the key missing features of Dependabot. Based on our findings, we derive and summarize the key characteristics of an ideal dependency management bot which can be grouped into four dimensions: configurability, autonomy, transparency, and self-adaptability. + + +Index Terms—Dependency Management, Software Engineering Bot, Dependabot, Mining Software Repositories +---------------------------------------- +------------------------------- +Section 35: +1 INTRODUCTION + + +To update or not to update, that is the question haunting software engineers for decades. The software engineering “gurus” would argue that keeping software dependencies up-to-date minimizes technical debt, increases supply chain security, and ensures software project sustainability in the long term [1]. Nonetheless, it requires not only substantial effort but also extra responsibility from developers. Consequently, many developers adhere to the practice of “if it ain’t broke, don’t fix it” and the majority of existing software systems use outdated dependencies [2]. + + +One promising solution for this dilemma is to use bots to automate all dependency updates. Therefore, dependency management bots are invented to automatically open pull requests (PRs) to update dependencies in a collaborative coding platform (e.g., GitHub) in the hope of saving developer effort. Recently, dependency management bots are increasingly visible and gaining high momentum among practitioners. The exemplars of these bots, including Dependabot [3], Renovate Bot [4], PyUp [5], and Synk Bot [6], have opened millions of PRs on GitHub [7] and are adopted by a variety of industry teams (according to their websites). + + +However, the simple idea of using a bot does not save the world. The early work of Mirhosseini and Parnin [8] on Greenkeeper [9] reveals that: only 32% of Greenkeeper PRs are merged because developers are suspicious of whether a bot PR will break their code (i.e., update suspicion) and feel annoyed about a large number of bot PRs (i.e., notification fatigue). Since then, similar bots have emerged, evolved, and gained high popularity, among them the most visible one on GitHub is Dependabot [7] with many improvements (Section 2.3). However, it remains unknown to what extent can these bots overcome the two limitations of Greenkeeper identified by Mirhosseini and Parnin [8] in 2017. + + +To shed light on improving dependency management bots and software engineering bots in general, we present an exploratory study on Dependabot. Our study answers the following four research questions (RQs) to empirically evaluate the effectiveness of Dependabot version update in different dimensions (detailed motivations in Section 3): + + + + +RQ1: + To what extent does Dependabot reduce the technical lag of a project after its adoption? + + +RQ2: + How actively do developers respond to and merge pull requests opened by Dependabot? + + +RQ3: + How effective is Dependabot’s compatibility score in allaying developers’ update suspicion? + + +RQ4: + How do projects configure Dependabot for automating dependency updates? + + + + +As we find that many projects have deprecated Dependabot in favor of other alternatives, we ask an additional RQ: + + + + +RQ5: + How do projects deprecate Dependabot and what are the developers’ desired features for Dependabot? +To answer the RQs, we sample 1,823 popular and actively maintained GitHub projects as the study subjects. We conduct exploratory data analysis on 502,752 Dependabot PRs from these projects and use a survey of 131 developers to triangulate our findings. Our findings provide empirical characterizations of Dependabot’s effectiveness in various dimensions. More importantly, we discover important limitations of Dependabot (a state-of-the-art bot) in overcoming update suspicion and notification fatigue, along with the missing features for overcoming the limitations. Based on the findings, we summarize four key properties of an ideal dependency management bot (i.e., configurability, autonomy, transparency, and self-adaptability) as a roadmap for software engineering researchers and bot designers. +---------------------------------------- +------------------------------- +Section 36: +2 Background and Related Work + + +2.1 Dependency Update + + +In modern software development, updating dependencies is not only important but also non-trivial. A typical software project may have tens to thousands of dependencies and each of the outdated ones induces risks [10]. However, each update may contain breaking changes which can be hard to discover and fix [11]. This situation inspires research into understanding update practices, designing metrics, and inventing approaches to support dependency updates. + + +Bavota et al. [12] find that updates in the Apache ecosystems are triggered by major changes or a large number of bug fixes, but may be prevented by API removals. Kula et al. [2] discover that 81.5% of the 4600 studied Java/Maven projects on GitHub still keep outdated dependencies due to lack of awareness and extra workload. Pashchenko et al. [13] find through semi-structured interviews that developers face trade-offs when updating dependencies (e.g., vulnerabilities, breaking changes, policies). + + +Researchers have proposed measurements to quantify the “freshness” or “outdatedness” of software dependencies and applied them to various software ecosystems. Cox et al. [14] propose several metrics to quantify “dependency freshness” and evaluate them on a dataset of industrial Java systems. A series of studies [15], [16], [17], [18], [19], [20] introduce the notion of technical lag, a metric for measuring the extent of project dependencies lagging behind their latest releases, and investigate the evolution of technical lag in Debian [15], npm [16], [17], [18], the Libraries.io dataset [19], and Docker images [20]. They find that technical lag tends to increase over time, induces security risks, and can be mitigated using semantic versioning. + + +There has been a long line of research in software engineering for supporting the automated update of software. Since API breaking changes form the majority of update cost, most studies propose automated approaches to match and adapt evolving APIs (e.g., [21], [22], [23], [24], [25]). However, Cossette and Walker [26] reveal through manual analysis that real API adaptation tasks are complex and beyond the capability of previous automated approaches. Recently, research interest in automated API adaptation is surging again with works on Java [27], JavaScript [28], Python [29], Android [30], etc. + + +On the other hand, practitioners often take conservative update approaches: upstream developers typically use semantic versioning [31] for signaling version compatibility; downstream developers perform most updates manually and detect incompatibilities through release notes, compilation failures, and regression testing. Unfortunately, studies [32], [33], [34], [35] reveal that none of them work well in guaranteeing update compatibility. Generally, providing such guarantees is still a challenging open problem [36]. + + +2.2 Dependency Management Bots + + +Perhaps the most noticeable automation effort among practitioners is dependency management bots. These bots automatically create pull requests (PRs) to update dependencies either immediately after a new release is available or when a security vulnerability is discovered in the currently used version. In other words, dependency management bots solve the lack of awareness problem [2] by automatically pushing update notifications to developers. + + +Mirhosseini and Parnin [8] conduct a pioneering study on Greenkeeper and find that developers update dependencies 1.6x more frequently with Greenkeeper, but only 32% of Greenkeeper PRs are merged due to two major limitations: + + + + +Update Suspicion: + If an automated update PR breaks their code, developers immediately become suspicious of subsequent PRs and are reluctant to merge them. + + +Notification Fatigue: + If too many automated update PRs are generated, developers may feel annoyed about the notifications and simply ignore all the update PRs. + + + + +Rombaut et al. [37] find that Greenkeeper issues for in-range breaking updates induce a large maintenance overhead, and many of them are false alarms caused by project CI issues. + + +The limitations of Greenkeeper align well with the challenges revealed in the software engineering (SE) bot literature. Wessel et al. [38] find that SE bots on GitHub have interaction problems and provide poor decision-making support. Erlenhov et al. [39] identify two major challenges in “Alex” bot (i.e., SE bots that autonomously perform simple tasks) design: establishing trust and reducing interruption/noise. Wyrich et al. [7] find that bot PRs have a lower merge rate and need more time to be interacted with and merged. Two subsequent studies by Wessel et al. [40], [41] qualitatively show that noise is the central challenge in SE bot design but it can be mitigated by certain design strategies and the use of a “meta-bot.” Shihab et al. [42] draw a picture of SE bot technical and socio-economic challenges. Santhanam et al. [43] provide a systematic mapping of the SE bot literature. + + +Since Mirhosseini and Parnin [8], many other bots have emerged for automating dependency updates, such as Dependabot [3] (preview release in May 2017) and Renovate Bot [4] (first release in January 2017). Greenkeeper itself reaches end-of-life in June 2020 and its team merged with Synk Bot [6]. All these bots are widely used: according to Wyrich et al. [7], they opened the vast majority of bot PRs on GitHub (six out of the top seven). The top two are occupied by Dependabot [3] and Dependabot Preview [44] with ∼3 million PRs and ∼1.2 million PRs, respectively. Erlenhov et al. [45] find that under a strict SE bot definition, almost all bots in an existing bot commit dataset [46] are dependency management bots and they are frequently adopted, discarded, switched, and even simultaneously used by GitHub projects, indicating a fierce competition among them. +2.3 Dependabot + + +Among different dependency management bots, Dependabot [3] is the most visible one in GitHub projects [7]. Dependabot Preview was launched in 2017 [47] and acquired by GitHub in 2019 [48]. In August 2021, it was shut down in favor of the new, GitHub native Dependabot [49] operating since June 2020, which offers two main services: + + + + + + +Dependabot version update + [50]: If a configuration file named +dependabot.yml + is added to a GitHub repository, Dependabot will begin to open PRs that update project dependencies to the latest version. Developers can specify the exact Dependabot behavior in +dependabot.yml + (e.g., update interval and the max number of PRs). + + + + + + +Dependabot security update + [51]: Dependabot scans the entire GitHub to find repositories with vulnerable dependencies. Even if no +dependabot.yml + is supplied, Dependabot still alerts repository owners and repository owners can tell Dependabot to open PRs that update vulnerable dependencies to their patched versions. + + + + + + +Figure 1 shows an example Dependabot PR [52]. Apart from all other details, one especially interesting Dependabot feature is the +compatibility score + badge. According to GitHub documentation [53]: +An update’s compatibility score is the percentage of CI runs that passed when updating between specific versions of the dependency. + In other words, the score uses the large-scale regression testing data available in GitHub CI test results to estimate the risk of breaking changes in a dependency update. This looks like a promising direction for solving the +update suspicion + problem, as previous studies have shown that project test suites are often unreliable in detecting update incompatibilities [34] and the false alarms introduce significant maintenance overhead [37]. However, the score’s effectiveness in practice remains unknown. + + +For the +notification fatigue + problem, Wessel et al. [40] suggest SE bots offer flexible configurations and send only relevant notifications. Both solutions have been (principally) implemented by Dependabot, but it is still unclear whether the specific configuration options and notification strategies taken by Dependabot are really effective in practice. Alfadel et al. [54] find that developers receive Dependabot security PRs well: 65.42% of PRs are merged and most are merged within a day. However, security PRs only constitute a small portion of Dependabot PRs (6.9% in our dataset), and developers perceive security updates as highly relevant [13]. The effectiveness of Dependabot version update in general seems to be more problematic. Soto-Valero et al. [55] find that Dependabot opens many PRs on bloated dependencies. Cogo and Hassan [56] provides evidence on how the configuration of Dependabot causes issues for developers. As stated by two developers in a GitHub issue [57]: 1) +I think we’d rather manage dependency upgrades ourselves, on our own time. We’ve been frequently bitten by dependency upgrades causing breakages. We tend to only upgrade dependencies when we’re close to being ready to cut a release. + 2) +Also Dependabot tends to be pretty spammy, which is rather annoying. + + +To the best of our knowledge, a comprehensive empirical investigation into the adoption of the Dependabot version update service is still lacking. Such knowledge from Dependabot can help the formulation of general design guidelines for dependency management bots and unveil important open challenges for fulfilling these guidelines. +---------------------------------------- +------------------------------- +Section 37: +3 RESEARCH QUESTIONS + + +Our study goal is to evaluate the practical effectiveness of the +Dependabot version update + service. In this Section, we elaborate on the motivation of each RQ toward this goal. + + +The Dependabot version update service is designed to make developers aware of new versions and help them keep project dependencies up-to-date. To quantitatively evaluate the extent to which Dependabot fulfills its main design purpose (i.e., +keeping dependencies up-to-date +), we reuse metrics from the technical lag literature [16], [18] and ask: + + +RQ1: + To what extent does Dependabot reduce the technical lag of a project after its adoption? + + +To help developers keep dependencies up-to-date, Dependabot intervenes by automatically creating update PRs when new versions become available, after which developers can interact with (e.g., comment, merge) these PRs. We evaluate the effectiveness of this interaction process by measuring the extent to which developers interact smoothly with Dependabot PRs, forming the next RQ: + + +RQ2: + How actively do developers respond to and merge pull requests opened by Dependabot? + + +One major limitation of Greenkeeper is that developers tend to be suspicious of whether a dependency update will introduce break changes [8] (i.e., +update suspicion +). On the other hand, Dependabot helps developers establish confidence on update PRs using the +compatibility score + feature (Section 2.3). To quantitatively evaluate the effectiveness of this feature against update suspicion, we ask: + + +RQ3: + How effective is Dependabot’s compatibility score in allaying developers’ update suspicion? + + +The other major limitation of Greenkeeper is that developers tend to be overwhelmed by a large number of update PRs [8] (i.e., +notification fatigue +). On the other hand, Dependabot provides flexible configuration options for controlling the amount notifications (Section 2.3). To explore how developers configure (and re-configure) the number of notifications generated by Dependabot, we study real-world Dependabot configurations and ask: + + +RQ4: + How do projects configure Dependabot for automating dependency updates? + + +During our analysis, we discover that a non-negligible portion of projects in our studied corpus have deprecated Dependabot and migrated to other alternatives. As an in-depth retrospective analysis of the reasons behind these deprecations can help reveal important Dependabot limitations and future improvement directions, we ask: + + +RQ5: + How do projects deprecate Dependabot and what are the developers’ desired features for Dependabot? +4 STUDY DESIGN + + +An overview of our study is shown in Figure 2. The study follows a mix-method study design where we obtain results from repository data analysis and triangulate them with a developer survey. In this Section, we introduce the data collection and survey methods. The specific analysis methods will be presented along with their results in Section 5. + + +4.1 Data Collection + + +Project Selection. + As the first step, we need to collect a sample of engineered and maintained GitHub projects using or once used Dependabot version update in their workflow. We focus on the GitHub native Dependabot (released on June 1, 2020) and do not include Dependabot Preview in our study because the former provides much richer features and allows us to obtain the latest, state-of-the-art results. + + +We begin with the latest dump of GHTorrent [58] (released on March 6, 2021), a large-scale dataset of GitHub projects widely used in software engineering research (e.g., [7], [34]). We find a noticeable gap in the GHTorrent dataset from July 2019 to early January 2020 (also observed by Wyrich et al. [7]). Focusing solely on GitHub native Dependabot allows us to circumvent threats caused by this gap because all its PRs are created after January 2020. + + +We select projects with at least 10 merged Dependabot PRs to keep only projects that have used Dependabot to some degree. To filter out irrelevant, low-quality, or unpopular projects, we retain only non-fork projects with at least 10 stars, as inspired by previous works [55], [59], [60]. Since projects without sustained activities may not perform dependency updates on a regular basis and induce noise in technical lag analysis (RQ1), we query GitHub APIs [61] and retain projects with a median weekly commit of at least one in the past year. To exclude projects that have never utilized Dependabot version update, we clone and retain only projects with some git change history on dependabot.yml. After all the filtering steps, we end up with 1,823 projects. + + +PR Collection. + We use GitHub REST API [61] and a web scraper to find all Dependabot PRs (before February 14, 2022) in the projects and collect PR statistics, CI test results, and timeline events. By leveraging a distributed pool of Cloudflare workers [62], this web scraper empowers us to bypass the limitation of GitHub APIs (which is unhandy for collecting CI test results for PRs) and retrieve PR events and CI test results at scale. The PR body can tell which dependency this PR is updating, its current version, and its updated version. By the end of this stage, we obtain 540,665 Dependabot PRs (71.1% with a CI test result), updating 15,590 dependencies between 167,841 version pairs. + + +Our next task is to identify security updates from all of the PRs created by Dependabot. However, Dependabot is no longer labeling security updates due to security reasons. Instead, Dependabot is showing a banner on the PR web page which is only visible to repository administrators by default [51]. Therefore, we choose to construct a mirror of the GitHub security advisory database [63] and identify security PRs ourselves by checking whether the PR updates a version with a vulnerability entry at the time of PR creation. More specifically, we identify a PR to be a security update PR if: 1) the dependency and its current version matches a vulnerability in the GitHub security advisory database; 2) the updated version is newer than the version that fixes this vulnerability (i.e., no vulnerability after update); 3) the PR is created after the vulnerability disclosure in CVE. Eventually, we identify 37,313 security update PRs (6.9%) from the 540,665 Dependabot PRs in total. + + +Dataset Overview. + As illustrated in Table 1, projects in our dataset are mostly engineered, popular GitHub projects with a large code base, active maintenance, rich development history, and frequent Dependabot usage. We notice a long-tail distribution in the metrics concerning the size of the project, i.e., number of contributors, lines of code, and commit frequency, which is expected and common in most mining software repository (MSR) datasets [35], [64], [65]. + + +Most (44.1%) projects in our dataset utilize the npm package ecosystem, followed by Maven (12.3%), PyPI (11.7%), and Go modules (7.8%). Among the Dependabot PRs, those that update npm packages constitute even a higher portion (64.9%), followed by PyPI (8.9%), Go modules (4.3%), Bundler (3.9%), and Maven (3.9%), as packages in the npm ecosystem generally evolve faster [66]. + + +Dependabot has opened hundreds of PRs for most of the projects (mean = 304, median = 204), even up to thousands for some of them. This likely indicates a high workload for project maintainers. In terms of the most updated dependencies, it is not surprising that all + + +| Statistics | Mean | Median | Distribution | +|-----------------------------|--------|--------|--------------| +| # of Stars | 1423.92| 66.00 | | +| # of Commits | 2837.11| 1040.50| | +| # of Contributors | 26.50 | 12.00 | | +| Lines of Code (thousands) | 98.18 | 19.89 | | +| # of Commits per Week | 10.07 | 4.00 | | +| Age at Adoption (days) | 1018.18| 714.00 | | +| # of Dependabot PRs | 304.56 | 204.00 | | +| # of Dependabot Interactions| 644.54 | 410.00 | | +| # of Commits | 477.00 | 331.50 | | +| # of Followers | 168.00 | 53.50 | | +| Years of Experience (GitHub)| 10.37 | 10.68 | | +TABLE 2: Survey Questions and Their Results (131 Responses in Total) + + +| 5-Point Likert-Scale Questions | Distribution | Avg. | +|--------------------------------|--------------|------| +| (RQ1) Dependabot helps my project keep all dependencies up-to-date. | 50% | 4.44 | +| (RQ2) Dependabot PRs do not require much work to review and merge. | 25% | 3.94 | +| (RQ2) I respond to a Dependabot PR fast if it can be safely merged. | 25% | 4.42 | +| (RQ2) I ignore the Dependabot PR or respond slower if it cannot be safely merged. | 25% | 3.78 | +| (RQ2) I handle a Dependabot PR with higher priority if it updates a vulnerable dependency. | 25% | 4.19 | +| (RQ2) It requires more work to review and merge a Dependabot PR if it updates a vulnerable dependency. | 25% | 2.49 | +| (RQ2) Dependabot often opens more PRs than I can handle. | 25% | 2.73 | +| (RQ3) Compatibility scores are often available in Dependabot PRs. | 50% | 2.95 | +| (RQ3) If a compatibility score is available, it is effective in indicating whether the update will break my code. | 50% | 2.95 | +| (RQ4) Dependabot can be configured to fit the needs of my project. | 50% | 3.54 | +| (RQ4) I configure Dependabot to make it less noisy (i.e., only update certain dependencies, scan less frequently, etc.) | 50% | 3.27 | + + +Multiple Choice Questions + + +| (RQ5) Are your GitHub repositories still using Dependabot for automating version updates? | 50% | 0.89 | +| (RQ5) If not, why? | 50% | (Results in § 5.5) | + + +Open-Ended Questions∗ + + +| (RQ5) Regardless of current availability, what are the features you want most for a bot that updates dependencies? Do you have any further opinions or suggestions? | 50% | (Results in § 5.5) | + + +∗ Where appropriate, we also use evidence from open-ended question responses to support the results in RQ1 - RQ4. + + + + +The survey has been approved by the Ethics Committee of Key Laboratory of High Confidence Software Technology, Ministry of Education (Peking University) under Grant No. CS20220011. + + + + +For each candidate, we send personalized emails to them (with information about how they used Dependabot), to avoid being perceived as spam. We try our best to follow common survey ethics [71], e.g., clearly introducing the purpose of this survey, being transparent about what we will do to their responses, etc. To increase the chance of getting a response and to contribute back to the open-source community, we offer to donate $5 to an open-source project of the respondents’ choice if they opt in. Therefore, we believe we have done minimal harm to the open-source developers we have contacted, and the results we get about Dependabot far outweigh the harm. In fact, we get several highly welcoming responses from the survey participants, such as: 1) keep up the good work! 2) If you would like to consult more, just ping me on +...Cheers! + + +The bottom half of Table 1 summarizes the demographics of the 131 survey respondents, showing that they are highly experienced with both Dependabot (a median of 410 interactions) and open source development (five to 15 years of experience, hundreds of commits, and many followers). +---------------------------------------- +------------------------------- +Section 38: +5 METHODS AND RESULTS + + +5.1 RQ1: Technical Lag + + +5.1.1 Repository Analysis Methods + + +We evaluate the effectiveness of Dependabot version updates by comparing the project technical lag at two time points: the day of Dependabot adoption ($T_0$) and 90 days after adoption (i.e., $T_0 + 90$). We choose 90 days as the interval to avoid the influence of deprecations as more than 85% of them happen 90 days after adoption. Since technical lag naturally increases over time [16], [18], we include an additional time point for comparison: 90 days before adoption (i.e., $T_0 - 90$). + + +For a project $p$ at time $t \in {T_0 - 90, T_0, T_0 + 90}$, we denote all its direct dependencies as $\text{deps}(p, t)$ and define the technical lag of project $p$ at time $t$ as: + + +$$\text{techlag}(p, t) = \frac{\sum_{d \in \text{deps}(p, t)} \text{mean}(0, t_{\text{latest}}(d) - t_{\text{adopted}}(d))}{|\text{deps}(p, t)|}$$ +Here ( t_{\text{latest}}(d) ) denotes the release time of ( d )'s latest version at time ( t ) and ( t_{\text{adopted}}(d) ) denotes the release time of ( d )'s adopted version. We use ( \max ) to guard against the occasional case of ( t_{\text{latest}}(d) < t_{\text{adopted}}(d) ) (e.g., developers may continue to release 0.9.x versions after the release of 1.0.0). + + +This technical lag definition is inspired by Zerouali et al. [18] but with several adjustments. First, we use only their time-based variant instead of their version-based variant because cross-project comparisons would not be intuitive using the latter. Second, we use the mean value of all dependencies instead of maximum or median as the overall technical lag, because we intend to measure the overall effectiveness of Dependabot for both keeping most dependencies up-to-date and eliminating the most outdated ones. + + +We exclude projects with an age of fewer than 90 days at Dependabot adoption and projects that deprecate Dependabot within 90 days. We also exclude projects that migrate from Dependabot Preview since they may introduce bias into results. Since the computation of technical lag based on dependency specification files and version numbers requires non-trivial implementation work for each package ecosystem, we limit our analysis on JavaScript/npm, the most popular ecosystem in our dataset. We further exclude projects with no eligible npm dependencies configured for Dependabot in ( T_0 - 90 ), ( T_0 ), or ( T_0 + 90 ). After all the filtering, we retain 613 projects for answering RQ1. + + +We adopt the Regression Discontinuity Design (RDD) framework to estimate the impact of adopting Dependabot on project technical lags. RDD uses the level of discontinuity before/after an intervention to measure its effect size while taking the influence of an overall background trend into consideration. Given that technical lag tends to be naturally increasing over time [16], [17], [18], RDD is a more appropriate statistic modeling approach for our case compared with hypothesis testing approaches (e.g., one-side Wilcoxon rank-sum tests). Following previous SE works that utilized RDD [72], [73], we use sharp RDD, i.e., segmented regression analysis of interrupted time series data. We treat project-level technical lag as a time series function, compute the technical lag for each project every 15 days from ( T_0 - 90 ) to ( T_0 + 90 ), use ordinary least square regression to fit the RDD model, and watch for the presence of discontinuity at Dependabot adoption, formalized as the following model: + + +[ +y_i = \alpha + \beta \cdot \text{time} +i + \gamma \cdot \text{intervention}_i + \theta \cdot \text{time} +{\text{after intervention}}_i + \sigma_i +] + + +Here ( y_i ) denotes the output variable (i.e., technical lag for each project in our case); ( \text{time} ) stands for the number of days from ( T_0 - 90 ); ( \text{intervention} ) binarizes the presence of Dependabot (0 before adopting Dependabot, 1 after adoption); ( \text{time}_{\text{after intervention}} ) counts the number of days from ( T_0 ) (0 when ( T_0 - 90 \leq \text{time} < T_0 )). +---------------------------------------- +------------------------------- +Section 39: +5.1.2 Repository Analysis Results + + +We present technical lags and their delta between time points in Table 3. We plot diagrams in Figure 3 to reflect how different projects increase/decrease their technical lag from ( T_0 - 90 ) to ( T_0 + 90 ). The first surprising fact we notice is that the technical lag of approximately one-third (216/613) of projects is already decreasing between ( T_0 - 90 ) and ( T_0 ), even if technical lag tends to increase over time [16], [18]. This indicates that these projects are already taking a proactive dependency update strategy even before adopting Dependabot. On the other hand, for about half (303/613) of the projects, the technical lag increases prior to Dependabot adoption, and 94 projects keep the technical lag unchanged. For all projects, the mean and median technical lag at ( T_0 - 90 ) is 73.68 and 16.27 days, respectively; they decrease at ( T_0 ) to 48.99 and 13.96 days, respectively; at ( T_0 + 90 ), 159 (25.9%) of the 613 projects have already achieved a zero technical lag. + + +Between ( T_0 ) and ( T_0 + 90 ), projects lower their technical lag even further from a mean of 48.99 days and a median of 13.96 days to a mean of 25.38 days and a median of 3.62 days. Among the 303 projects with an increasing technical lag between ( T_0 - 90 ) and ( T_0 ), about two-thirds (220) of them see a decrease after adopting Dependabot; among the 216 projects with decreasing technical lag, nearly half (94) of them see a decrease. More than one-third (219, 35.7%) of projects achieve completely zero technical lag 90 days after Dependabot adoption. Although there are still some increases, the magnitude is much smaller (e.g., 75% quantile of only +1.75 days between ( T_0 ) and ( T_0 + 90 ) compared with 75% quantile of +14.37 days between ( T_0 - 90 ) and ( T_0 )). + + +Table 4 shows that the regression variable ( \text{intervention} ) has a statistically significant negative coefficient (( \text{coef.} = -31.2137, p < 0.001 )), indicating the adoption of Dependabot might have reduced technical lag and kept dependencies up-to-date in the sampled 613 projects. A more straightforward look at this trend can be observed in Figure 4: at ( T_0 ), project-level technical lag has a noticeable decrease, and there is a discontinuity between the liner-fitted technical lag before/after adoption. ( \text{time} ) and ( \text{time}_{\text{after intervention}} ) have negative coefficients, echoing with our earlier findings: +---------------------------------------- +------------------------------- +Section 40: +Table 3: Technical Lag (days) for 613 npm Projects + + +| Metric | Mean | Median | Distribution | +|--------|------|--------|--------------| +| ( \text{techlag}(p, T_0 - 90) ) | 73.68 | 16.27 | | +| ( \Delta \text{in Between} ) | -24.96 | 0.00 | | +| ( \text{techlag}(p, T_0) ) | 48.99 | 13.96 | | +| ( \Delta \text{in Between} ) | -23.61 | -0.61 | | +| ( \text{techlag}(p, T_0 + 90) ) | 25.38 | 3.62 | | +---------------------------------------- +------------------------------- +Section 41: +Table 4: The Estimated Coefficients and Significance Levels for the RDD Model We Fit (Section 5.1.1). + + +| Feature | Coef. | Std. Err. | ( t ) | ( p ) | +|---------|-------|-----------|------|------| +| Intercept* | 66.5209 | 4.595 | 14.477 | 0.000 | +| ( \text{intervention} ) | -31.2137 | 5.694 | -5.306 | 0.000 | +| ( \text{time} ) | -0.0743 | 0.079 | -0.945 | 0.345 | +| ( \text{time}_{\text{after intervention}} ) | -0.1011 | 0.100 | -1.008 | 0.314 | + + + + +( p < 0.001 ) +the technical lag of sampled projects is already on decrease before Dependabot adoption and the introduction of Dependabot adds up to this decreasing trend. However, both of the coefficients are not comparable to that of intervention and are not statistically significant ($p > 0.3$). +---------------------------------------- +------------------------------- +Section 42: +5.1.3 Triangulation from Survey + + +Most developers agree that Dependabot is helpful in keeping their project dependencies up-to-date: 55.8% responded with +Strongly Agree + and 35.7% with +Agree + (Table 2). As noted by one developer: +Dependabot does a great job of keeping my repositories current. + This is because Dependabot serves well as an automated notification mechanism that tells them about the presence of new versions and pushes them to update their dependencies. As mentioned by two developers: 1) +Dependabot is a wonderful way for me to learn about major/minor updates to libraries. + 2) +Dependabot can be a bit noisy, but it makes me aware of my dependencies. + + +However, some of the developers do not favor using Dependabot for automating dependency updates but only use Dependabot as a way of notification. For example: 1) +I just use it for notifications about updates, but do them manually and check if anything broke in the process. + 2) +I am just using Dependabot to tell me if there is something to update and then update all in a single shot with plain package managers. + + +This indicates that they do not trust the reliability of Dependabot for automating updates and they do not think the current design of Dependabot can help them reduce the manual workload of updates. As an example, one developer states that: +Dependency management is currently much easier just utilizing yarn/npm. We use Dependabot merely because it has been recommended, but updating dependencies was faster when I solely used the command line. + + +One developer suggests that using Dependabot only for update notifications has become such a common use case that they would prefer a dedicated, less noisy tool solely designed for this purpose: +It (Dependabot) becomes more like an update notification, i.e. I’m leveraging only half of its capability. Could there be something designed solely for this purpose? Less invasive, more informative, and instead of creating a PR for every package’s update, I would like to see a panel-style hub to collect all the information for me to get a better overview in one place. + + + + +Findings for RQ1: +---------------------------------------- +------------------------------- +Section 43: +90 days after adopting Dependabot, projects decrease their technical lag from an average of 48.99 days to an average of 25.38 days. 35.7% of projects achieve zero technical lag 90 days after adoption. The adoption of Dependabot is a statistically significant intervention as indicated by RDD. Developers agree on its effectiveness in notifying updates, but question its effectiveness in automating updates. +---------------------------------------- +------------------------------- +Section 44: +5.2 RQ2: Developers’ Response to Pull Requests +---------------------------------------- +------------------------------- +Section 45: +5.2.1 Repository Analysis Methods + + +Inspired by prior works [7], [54], we use the following metrics to measure the receptiveness (i.e., how active developers merge) and responsiveness (i.e., how active developers respond) of Dependabot PRs: + + + + +Merge Rate +: The proportion of merged PRs. + + +Merge Lag +: The time it takes for a PR to be merged. +---------------------------------------- +------------------------------- +Section 46: +5.2.2 Repository Analysis Results + + +Table 5 shows the PR statistics we obtain for each group. The high merge rates (>70%) indicate the projects are highly + + + + +TABLE 5: PR Statistics in Different Groups. + All lags are measured in days. $\bar{x}$ represents the mean and $\mu$ represents the median over all PRs in this group. + + +| Statistics | regular | sec/conf | sec/nconf | +|----------------|---------|----------|-----------| +| # of PRs | 502,752 | 13,406 | 23,907 | +| Merge Rate | 70.13% | 73.71% | 76.01% | +| Merge Lag | $\bar{x}=1.76$, $\mu=0.18$ | $\bar{x}=3.45$, $\mu=0.18$ | $\bar{x}=8.15$, $\mu=0.76$ | +| Close Lag | $\bar{x}=8.63$, $\mu=3.00$ | $\bar{x}=14.42$, $\mu=5.00$ | $\bar{x}=26.83$, $\mu=5.71$ | +| Resp. Lag | $\bar{x}=2.27$, $\mu=0.17$ | $\bar{x}=3.74$, $\mu=0.17$ | $\bar{x}=8.59$, $\mu=0.51$ | + + + + +Close Lag +: The time it takes for a PR to be closed (i.e., not merged into the project code base). + + +Response Lag +: The time it takes for a PR to have human interactions, including any observable action in the PR’s timeline, e.g., adding a label or assigning a reviewer. + + + + +The merge rate is intended to measure receptiveness and the latter three are intended to measure responsiveness. + + +We assume that results may differ for PRs in different groups. We expect that 1) developers are both more receptive and more responsive to security updates due to their higher priority of eliminating security vulnerabilities; and 2) projects that use Dependabot version update (i.e., contain +dependabot.yml +) are more responsive to Dependabot PRs. To verify our expectations, we divide PRs into three groups: + + + + +regular +: Dependabot PRs that update a package to its latest version when the old version does not contain any known security vulnerabilities. + + +sec/conf +: Security PRs that update a package with vulnerabilities to its patched version and are opened when the project has a +dependabot.yml + file in its repository (i.e., using Dependabot version update). + + +sec/nconf +: Security PRs opened when the project does not have a +dependabot.yml + file in its repository. These PRs are opened either before the adoption or after the deprecation of Dependabot version update. + + + + +We examine the significance of inter-group metric differences with unpaired Mann-Whitney tests and Cliff’s delta ($\delta$). Following Romano et al. [74], we consider the effect size as negligible for $|\delta| \in [0, 0.147]$, small for $|\delta| \in [0.147, 0.33]$, medium for $|\delta| \in [0.33, 0.474]$, and large otherwise. +receptive to Dependabot PRs regardless of whether they are security-related. They are more receptive to security PRs: their merge rate is 74.53%, even higher than 65.42% reported on Dependabot Preview security updates [54]. This may be because projects welcome security updates even more, or just because the projects we selected are such. + + +Alfadel et al. [54] find that Dependabot security PRs take longer to close than to merge. Our data illustrate a similar story: regular Dependabot PRs take a median of 0.18 days (≈ four hours) to merge and a median of 3.00 days to close. The difference is statistically significant with a large effect size ($p < 0.001$, $\delta = 0.91$). + + +The response lag, however, does not differ much from the merge lag in all groups, which confirms the timeliness of developers’ response towards Dependabot PRs. We observe human activities in 360,126 (72.2%) Dependabot PRs, among which 280,276 (77.8%) take less than one day to respond. However, this also indicates an inconsistency between fast responses and slow closes. As a glance at what caused this inconsistency, we sample ten closed PRs with developers’ activities before closing and inspect their event history. We find 9 out of 10 PRs are closed by Dependabot itself, for the PR being obsolete due to the release of a newer version or a manual upgrade (similar to the observation by Alfadel et al. [54]). Activities are development-related (e.g., starting a discussion, assigning reviewers) in 5 PRs, while the rest are interactions with Dependabot (e.g., @dependabot rebase). + + +Surprisingly, security PRs require a longer time to merge ($p < 0.001$, $\delta = 0.87$), close ($p < 0.001$, $\delta = 0.72$), and respond ($p < 0.001$, $\delta = 0.87$) with large effect sizes, regardless of whether the project is using Dependabot version update. Though Dependabot version update users do process security updates quicker (at least merge lag and response lag are noticeably shorter), this difference is not significant with negligible or small effect sizes ($\delta \leq 0.23$). +---------------------------------------- +------------------------------- +Section 47: +5.2.3 Triangulation from Survey + + +In general, developers agree that Dependabot PRs do not require much work to review and merge (34.1% Strongly Agree, 40.3% Agree, 14.0% Neutral). + + +We find that they follow two different patterns of using Dependabot. One pattern is to rapidly merge the PR if the tests pass and manually perform the update by hand otherwise (65.2% Strongly Agree, 19.7% Agree, 9.1% Neutral). In the latter case, they will respond to the Dependabot PR slower, or let Dependabot automatically close the PR after the manual update (36.4% Strongly Agree, 26.5% Agree, 20.5% Neutral). For example: I almost never have to look at Dependabot PRs because I have tests, and 99.99% of PRs are merged automatically. Rarely (when dependency changes API for example) I have to manually add some fixes/updates... As mentioned in Section 5.1.3, another pattern is to use Dependabot PRs solely as a way of notification and always perform manual updates. Both cases contribute to the much larger close lag we observe in Dependabot PRs. + + +In terms of security updates, most developers do handle security PRs with a higher priority (56.7% Strongly Agree, 16.3% Agree, 14.0% Neutral), but they do not think security PRs require more work to review and merge (19.4% Totally Disagree, 36.4% Disagree, 26.4% Neutral). One possible explanation for the slower response, merge, and close of security PRs is that developers consider some security vulnerabilities as irrelevant to them: I want it (Dependabot) to ignore security vulnerabilities in development dependencies that don’t actually get used in production. + + +Developers have a mixed opinion on whether Dependabot opens more PRs than they can handle (15.9% Strongly Agree, 15.2% Agree, 22.0% Neutral, 20.5% Disagree, 26.5% Totally Disagree). Whether the PR workload introduced by Dependabot is acceptable may depend on other factors (e.g., the number of dependencies and how fast packages evolve), as indicated by two respondents: 1) The performance of Dependabot or other similar bots could depend on the number of dependencies a project has. For smaller projects, with a handful of dependencies, Dependabot will be less noisy and usually safe as compared to large projects with a lot of dependencies. 2) The utility of something like Dependabot depends heavily on the stack and number of dependencies you have. JS is much more noisy than Ruby, for example, because Ruby moves more slowly. +---------------------------------------- +------------------------------- +Section 48: +Findings for RQ2: + + + + +70% of Dependabot PRs are merged with a median merge lag of four hours. Compared with regular PRs, developers are less responsive (more time to respond, close or merge) but more receptive (higher merge rate) to security PRs. Developers tend to rapidly merge PRs they consider “safe” and perform manual updates for the remaining PRs. +---------------------------------------- +------------------------------- +Section 49: +5.3 RQ3: Compatibility Score +---------------------------------------- +------------------------------- +Section 50: +5.3.1 Repository Analysis Methods + + +We explore the effectiveness of compatibility scores in two aspects: Availability, and Correlation with Merge Rate. + + +1) Availability: We begin our analysis by understanding the data availability of compatibility scores, for they would not take effect if they are absent from most of the PRs. For this purpose, we obtain compatibility scores from badges in PR bodies, which point to URLs defined per dependency version pair. That is, Dependabot computes one compatibility score for each dependency version pair $\langle d, v_1, v_2 \rangle$ and show the score to all PRs that update dependency $d$ from $v_1$ to $v_2$. In case this computation fails, Dependabot generates an unknown compatibility score for $\langle d, v_1, v_2 \rangle$. + + +Since compatibility scores are computed in a data-driven manner, we wonder if the popularity of the updated dependencies affects their availability. As a quick evaluation, we sample 20 npm dependencies with more than one million downloads per week as representatives for popular dependencies. Next, we retrieve the release history of these dependencies by querying the npm registry API, retaining only releases that came available after January 1, 2020 (recall that all Dependabot PRs in our dataset are created after January 2020, Section 4). For the releases in each dependency, we get all possible dependency version pairs from a Cartesian product (1,629 in total) and query their compatibility scores from corresponding Dependabot URLs. + + +2) Correlation with Merge Rate: In theory, if developers perceive compatibility scores as reliable, PRs with higher compatibility scores will be more likely to get merged. To quantitatively evaluate this, we compare merge rates for PRs with different compatibility scores. Since PRs that update the same version pair share the same score, we further +Fig. 5: Distribution of compatibility scores and available CI test results over the version pairs of axios. + + +(a) Compatibility Score (b) # of CI Test Results + + +TABLE 6: Compatibility Score and PR Merge Rate + + +| Compatibility Score | # of PRs | Merge Rate | +|---------------------|---------|------------| +| unknown | 485,501 | 69.96% | +| < 80% | 1,321 | 30.20% | +| < 90%, >= 80% | 1,605 | 67.48% | +| < 95%, >= 90% | 1,794 | 73.19% | +| < 100%, >= 95% | 2,228 | 84.43% | +| == 100% | 10,303 | 80.30% | + + +utilize Spearman’s $\rho$ to measure the correlation between a) compatibility score for a dependency version pair $(d, v_1, v_2)$, and b) merge rate for all PRs that update $d$ from $v_1$ to $v_2$. + + +As we will show in Section 5.3.2, compatibility scores are abnormally scarce. Although we have reached Dependabot maintainers for explanations, they claim such information to be confidential and refuse to share any details. We compute the number of CI test results for each dependency version pair and analyze their overall distribution to provide possible explanations for such scarcity. + + +5.3.2 Repository Analysis Results + + +1) Availability: Compatibility scores are extremely scarce: Only 3.4% of the PRs and 0.5% of the dependency version pairs have a compatibility score other than unknown. Merely 0.18% of the dependency version pairs have a value other than 100%. Its scarcity does not become better even among the most popular npm dependencies: 1,604 (98.5%) of the 1,629 dependency version pairs we sample only have a compatibility score of unknown, 10 (0.6%) have a compatibility score of 100%, and 15 (0.9%) have a compatibility score less than 100%. As an example, we plot a compatibility score matrix for axios, which has the most (15) version pairs with compatibility scores, in Figure 5a. + + +2) Correlation with Merge Rate: We summarize the merge rates for PRs with different compatibility scores in Table 6. We can observe that for PRs with a compatibility score, a high score indeed increases their chance of being merged: if the score is higher than 90%, developers are more likely to merge the PR. By contrast, if the score is lower than 80%, developers become very unlikely (30.20%) to merge. The Spearman’s $\rho$ between compatibility score and merge rate is 0.37 ($p < 0.001$), indicating a weak correlation according to Prion and Haerling’s interpretation [75]. + + +Figure 6 shows the number of dependency version pairs with more than $x$ CI test results. We can observe an extreme Pareto-like distribution: for the 167,053 dependency version pairs in our dataset, less than 1,000 have more than 50 CI test results and less than 100 have more than 150 CI test results. For the case of axios (Figure 5b), the compatibility scores are indeed only available for version pairs with available CI test results. It is hard to explain why the scores are missing even for some version pairs with many CI test results (e.g., the update from 0.19.2 to 0.20.0), as we do not know the underlying implementation details. + + +5.3.3 Triangulation from Survey + + +Developers have diverging opinions on whether compatibility scores are available (7% Strongly Agree, 24.8% Agree, 38.8% Neutral, 17.8% Disagree, 11.6% Totally Disagree) and whether compatibility scores are effective if they are available (4.7% Strongly Agree, 21.7% Agree, 45.7% Neutral, 19.4% Disagree, and 8.5% Totally Disagree). The answer distributions and the high number of Neutral responses likely indicate that many developers do not know how to rate the two statements [76], because compatibility scores are too scarce and most developers have not been exposed to this feature. As replied by one developer: Compatibility scores and vulnerable dependencies detection are great, I use Dependabot a lot but was not aware they exist...(They) should be more visible to the user. Another developer does express concerns that compatibility scores are not effective, saying that Dependabot’s compatibility score has never worked for me. + + +Further, several developers (6 responses in our survey) hold the belief that Dependabot only works well in projects with a high-quality test suite. For example: + + +1) Dependabot works best with a high test coverage and if it fails people it’s likely because they have too little test coverage. +2) Dependabot without a good test suite is indeed likely too noisy, but with good tests and an understanding of the code base it is trivial to know whether an update is safe to update or not. + + +Findings for RQ3: + + +Compatibility scores are too scarce to be effective: only 3.4% of PRs have a known compatibility score. For those PRs with one, the scores have a weak correlation ($\rho = 0.37$) with the PR merge rate. Its scarcity may be because most dependency version pairs do not have sufficient CI test results (i.e., a Pareto-like distribution) for inferring update compatibility. As a result, developers think Dependabot only works well in projects with high-quality test suites. +5.4 RQ4: Configuration + + +5.4.1 Repository Analysis Methods + + +Dependabot offers tons of configuration options for integration with project workflows, such as who to review, how to write commit messages, how to label, etc. In this research question, we only focus on the options related to notifications because we expect them to be possible countermeasures against noise and notification fatigue. More specifically, we investigate the following options provided by Dependabot: + + +1) schedule.interval: This option is mandatory and specifies how often Dependabot scans project dependencies, checks for new versions, and opens update PRs. Possible values include "daily", "weekly", and "monthly". + + +2) open-pull-requests-limit: It specifies the maximum number of simultaneously open Dependabot PRs allowed in a project. The default value is five. + + +3) allow: It tells Dependabot to only update a subset of dependencies. By default, all dependencies are updated. + + +4) ignore: It tells Dependabot to ignore a subset of dependencies. By default, no dependency is ignored. + + +The latter two options are very flexible and may contain constraints exclusive to some package ecosystems, e.g., allowing updates in production manifests or ignoring patch updates according to the semantic versioning convention [31]. + + +To understand developers’ current practice of configuring Dependabot, we parse 3,921 Dependabot configurations from 1,588 projects with a dependabot.yml in their current working tree. For schedule.interval and open-pull-requests-limit, we count the frequency of each value. For allow and ignore, we parse different options and group them into three distinctive strategies: + + +1) default: allowing Dependabot to update all dependencies, which is its default behavior; + + +2) ignorelist: configuring Dependabot to ignore a subset of dependencies; + + +3) allowlist: configuring Dependabot to only update on a subset of dependencies. + + +We further explore the modification history of Dependabot configurations to observe how developers use configuration as a countermeasure against noise in the wild. For this purpose, we find all commits in the 1,823 projects that modified dependabot.yml and extract eight types of configuration changes from file diffs: + + +1) +interval: Developers increase schedule.interval. + + +2) -interval: Developers decrease schedule.interval. + + +3) +limit: Developers increase open-pull-requests-limit. + + +4) -limit: Developers decrease open-pull-requests-limit. + + +5) +allow: Developers allow some more dependencies to be automatically updated by Dependabot. + + +6) -allow: Developers no longer allow some dependencies to be automatically updated by Dependabot. + + +7) +ignore: Developers configure Dependabot to ignore some dependencies for automated update. + + +8) -ignore: Developers configure Dependabot to no longer ignore some dependencies for automated update. + + + + +Note that 235 of the 1,823 projects do not have dependabot.yml in their current working tree which we will investigate in RQ5. One project may depend on more than one package ecosystem (e.g., both npm and PyPI) and have separate configurations for each of them. + + + + +Finally, we analyze configuration modifications by time since Dependabot adoption. We mainly focus on the bursts of modification patterns, because bursts illustrate the lag from the developers’ perception of noise to their countermeasures to mitigate the noise. + + +5.4.2 Repository Analysis Results + + +The current configurations of Dependabot show that most projects configure Dependabot toward a proactive update strategy: 2,203 (56.2%) of schedule.interval are "daily" while merely 276 (7.04%) of them are a conservative "monthly". 1,404 (35.8%) of the open-pull-requests-limit configurations are higher above the default value while only a negligible proportion (2.3%) is lower. For allow and ignore options, most of the configurations (3,396, 86.7%) adopt the default strategy, less (380, 9.7%) use ignorelist, and a small proportion (50, 1.3%) use allowlist. + + +The modifications tell us another story. 776 (42.57%) of the 1,823 projects in our dataset have modified the Dependabot configuration options we study (e.g., update interval) and they contain 2.18 modification commits on average (median = 1.00). Figure 7 illustrates the proportion of each modification type, which shows that projects increase schedule.interval and lower open-pull-requests-limit more frequently than doing the opposite. As demonstrated in Figure 8, projects can increase schedule.interval any time after Dependabot adoption but more likely to reduce open-pull-requests-limit only after several months of Dependabot usage. schedule.interval determines how often Dependabot bothers developers to a large extent, and we are seeing developers of 336 projects increasing it in 868 configurations. We further confirm this behavior as a countermeasure against noise from a real-life example where developers reduce the frequency to monthly to reduce noise [77]. open-pull-requests-limit quantifies the devel- +operators’ workload on each interaction, which is also noise-related as indicated by a developers’ complaint: Dependabot PRs quickly get out of hand [78]. If we focus on modifications that happen 90 days after Dependabot adoption, we find nearly two-thirds (62.5%) of open-pull-requests-limit changes belong to -limit. Our observations indicate the following phenomenon. At the beginning of adoption, developers configure Dependabot to interact frequently and update proactively. However, they later get overwhelmed and suffer from notification fatigue, which causes them to reduce interaction with Dependabot or even deprecate Dependabot (RQ5). As an extreme case, one developer forces Dependabot to open only 1 PR at a time to reduce noise [79]. + + +Ignoring certain dependencies seems to be another noise countermeasure, for developers tend to add an ignored dependency more often than remove one (Figure 7). For example, a commit says update ignored packages...so they are never automatically updated to stop noise [80]. However, we also observe cases where developers add ignored dependencies due to other intentions, such as handling breaking changes [81] and preserving backward compatibility [82]. For +allow and -allow, we observe an interesting burst of -allow (Figure 8c) earlier but more +allow dependencies later, but we do not find any evidence explaining such trend. + + +5.4.3 Triangulation from Survey + + +Although more than half of respondents think Dependabot can be configured to fit their needs (25.6% Strongly Agree and 30.2% Agree), some do not (7.8% Totally Disagree and 14% Disagree). As a peek into this controversy, one developer says, I think people that complain about how noisy it is (I’ve seen a lot of this) just don’t configure things correctly. + + +More than half (50.4%) of respondents have configured Dependabot to make it less noisy, but roughly one-third (32.6%) have not (21.2% Strongly Agree, 29.5% Agree, 16.7% Neutral, 20.5% Disagree, 12.1% Totally Disagree). It is possible that the default configurations of Dependabot only work for projects with a limited number of dependencies and these dependencies are not fast-evolving (see Section 5.2.3); for other projects, developers need to tweak the configurations multiple times to find a sweet spot for their projects. However, many respondents eventually find that Dependabot does not offer the options they want for noise reduction, such as update grouping and auto-merge. We will investigate this in-depth in RQ5. + + +Findings for RQ4: +The majority of Dependabot configurations imply a proactive update strategy, but we observe multiple patterns of noise avoidance from configuration modifications, such as increasing schedule intervals, lowering the maximum number of open PRs, and ignoring certain dependencies. + + +5.5 RQ5: Deprecations & Desired Features + + +5.5.1 Repository Analysis Methods + + +To locate projects that may have deprecated Dependabot, we find projects with no dependabot.yml in their current working trees, resulting in 235 projects. For each of them, we identify the last commit that removes dependabot.yml, inspect their commit messages, and identify any referenced issues/PRs following the GitHub convention. If the dependabot.yml removal turns out to be due to a project restructure or stop of maintenance, we consider it as a false positive and exclude it from further analysis. + + +For the remaining 206 projects, we analyze reasons for deprecation from commit messages and issue/PR text (i.e., titles, bodies, and comments). Since a large proportion of text in commit messages, issues, and PRs are irrelevant to Dependabot deprecation reasons, two authors read and re-read all text in the corpus, retaining only the relevant. They encode reasons from text and discuss them until reaching a consensus. They do not conduct independent coding and measure inter-rater agreement because the corpus is very small (only 27 deprecations contain documented reasons). + + +For each of the confirmed deprecations, we check bot configuration files and commit/PR history to find possible migrations. We consider a project as having migrated to another dependency management bot (or other automation approaches) if it meets any of the following criteria: + + +1) developers have specified the migration target in the commit message or issue/PR text; +2) dependabot.yml is deleted by another dependency management bot (e.g., Renovate Bot automatically deletes dependabot.yml in its setup PR); +3) the project adopts another dependency management bot within 30 days before or after Dependabot deprecation. + + +To obtain the developers’ desired features for a dependency management bot, we ask two optional open-ended questions at the end of the survey (Table 2). The two questions are answered by 97 and 46 developers, respectively. To identify recurring patterns from the answers, two authors of this paper (both with >6 years of software development experience and familiar with using Dependabot) conduct open coding [83] on the responses to generate an initial set of codes. They read and re-read all answers to familiarize themselves with and gain an initial understanding of them. Then, one author assigns text in answers to some initial codes that reflects common features in dependency management bots and discusses with the other author to iteratively refine the codes until a consensus is reached. They further conduct independent coding on the answers using the refined codes and exclude answers that do not reflect anything related to this RQ. As each response may contain multiple codes, we use MASI distance [84] to measure the distance between two raters’ codes and Krippendorff’s alpha [85] to measure inter-rater reliability. The Krippendorff’s alpha we obtain is 0.865, which satisfies the recommended threshold of 0.8 and indicates a high reliability [85]. + + +5.5.2 Repository Analysis Results + + +We confirm 206 of the 235 candidates to be real-life Dependabot deprecations, which is substantial considering that our dataset only contains 1,823 projects. From Figure 9, we can observe that Dependabot deprecations are evenly distributed over time in general with a few fluctuations, mostly coming from organization-wide deprecations. For instance, the maximum value in December 2020 is caused by 26 Dependabot deprecations in octokit, the official GitHub API client implementation. +We encode nine categories of reasons from the 27 deprecations that explicitly mentioned their reasons: + + +1) +Notification Fatigue (9 Deprecations): + Developers do recognize Dependabot’s overwhelming notifications and PRs as the central issue in their experience with Dependabot. As noted by one developer: “I’ve been going mad with dependabot alerts which are annoying and pointless. I’d rather do manual upgrades than use this” [86]. + + +2) +Lack of Grouped Update Support (7 Deprecations): + By the Dependabot convention, each PR updates one dependency and one dependency only, which comes unhandy in two scenarios: a) related packages tend to follow similar release schedules, which triggers Dependabot to raise a PR storm on their updates [87]; b) in some cases, dependencies must be updated together to avoid breakages [88]. The excessive notifications and additional manual work quickly frustrate developers. For example: a) My hope was that we can better group dependency upgrades. With the default configuration, there is some grouping happening, but most dependencies would be upgraded individually [89]; b) Also, a lot of packages have to be updated together. Separate PRs for everything isn’t very fun [90]. + + +3) +Package Manager Incompatibility (7 Deprecations): + Developers may have compatibility issues after the introduction of a new package manager or a newer version of the package manager. In the seven cases we have found, five concern yarn v2, one concerns npm v7 (specifically lockfile v3), and one concerns pnpm. To make matters worse, Dependabot may even have undesirable behaviors, e.g., messing around with yarn lockfiles [91], when encountered with such incompatibilities. This contributes to developers’ update suspicion, as merging pull requests leads to possible breakages in dependency specification files. At the time of writing, Dependabot still has no clear timeline on supporting pnpm [92] or yarn v2 [93]. For the unlucky part of Dependabot users, it means to revert [94], to patch Dependabot PRs manually or automatically [95], or to migrate to an alternative, e.g., Renovate Bot [96]. + + +4) +Lack of Configurability (5 Deprecations): + Dependabot is also deprecated due to developers’ struggle to tailor a suitable configuration. For example: a) it appears that we’re not able to configure Dependabot to only give us major/minor upgrades [97]; b) Dependabot would require too much configuration long-term – too easy to forget to add a new package directory [98]. Developers mention that other dependency management bots can provide more fine-grained configuration options such as update scope and schedule: (Renovate Bot) has a load more options we could tweak too compared to Dependabot if we want to reduce the frequency further [99]. + + +5) +Absence of Auto-Merge (3 Deprecations): + Alfadel et al. [54] illustrate that auto-merge features are tightly associated with rapid PR merges. However, GitHub refused to offer this feature in Dependabot [100], claiming that auto-merge allows malicious dependencies to propagate beyond the supervision of project maintainers. This may render Dependabot impractical, as claimed by a developer: (the absence of auto-merge) creates clutter and possibly high maintenance load. + + +We notice a non-negligible proportion (8.17%) of pull requests are merged by third-party auto-merge implementations (e.g., a CI workflow or a GitHub App). Unfortunately, they may become dysfunctional on public repositories after GitHub enforced a change on Dependabot PR triggered workflows [101]. This turns out to be the last straw for several Dependabot deprecations. As a developer states, they dropped Dependabot because latest changes enforced by GitHub prevent using the action in Dependabot’s PR’s context. + + +6) +High CI Usage (3 Deprecations): + Maintainers from 3 projects complain that Dependabot’s substantial, auto-rebasings PRs have devoured their CI credits. In their words, Dependabot’s CI usage is what killed us with Dependabot, and a waste of money and carbon. + + +Other reasons for Dependabot deprecation include: 7) +Dependabot Bugs (2 Deprecations), + 8) +Unsatisfying of Branch Support (1 Deprecation), + and 9) +Inability to Modify Custom Files (1 Deprecation). + + +The deprecation of Dependabot does not necessarily mean developers’ loss of faith in automating dependency updates. Actually, over two-thirds (68.4%, 141/206) of the projects turn to another bot or set up custom CI workflows to support their dependency updates. Among them, Renovate Bot (122) is the most popular migration target, followed by projen (15), npm-check-updates (2) and depfu (1). +---------------------------------------- +------------------------------- +Section 51: +5.5.3 Triangulation from Survey + + +Among the 131 surveyed developers, 14 (10.7%) tell us they have deprecated Dependabot in their projects. Most of the reasons they provide fall within our analysis and the frequency distribution is highly similar. There are two exceptions: one deprecates because Dependabot frequently breaks code and one deprecates because their entire project has been stalled. Developers also respond in our survey that they think automated dependency management is important and beneficial for their projects but the limitations of Dependabot causes them to do the deprecation. For example: Dependabot could be great, it just needs a few fixes here and there. It’s unclear why Dependabot hasn’t been polished. They also reply to us that Renovate Bot does provide some features that they need (e.g., grouped update PRs). + + +We identify nine major categories of developers’ desired features (each corresponds to one code) from the answers provided by 84 respondents. The remaining categories are discarded as they are only supported by one answer (which thus may be occasional and not generalizable). We will explain each category in the order of popularity. + + +1) +Group Update PRs (29 Respondents): + This category refers to the feature of automatically grouping some dependency updates into one PR instead of opening one PR for each update. It is most frequently mentioned and developers consider this feature as an important measure for making the handling of bot PRs less tedious, repetitive, and time-consuming. They want the bot to automatically identify dependencies that should be updated together and +merge them into one PR update because many libraries (e.g., symfony, @typescript-eslint, babel) version all packages under a single version. They also want the bot to automatically find and merge “safe” updates into one PR while leaving “unsafe” updates as single PRs for more careful reviewing. + + +2) Package Manager Support (20 Respondents): This category refers to the feature of supporting more package managers (and their corresponding ecosystems) or features for the bot to align with the conventions in the package manager/ecosystem. Developers have expressed their desire for the bot to support Gradle, Flatter, Poetry, Anaconda, C++, yarn v2, Clojure, Cargo, CocoaPods, Swift Package Manager (in iOS), etc., indicating that dependency management bots, if well designed and implemented, can indeed benefit a wide range of developers and software development domains. Dependabot does claim support for many package managers mentioned before but it still needs to be tailored and improved in, e.g., performance and update behaviors: a) When I have 3 open Poetry updates I can merge one and then have to wait 15 minutes for the conflicts to be resolved. b) Perhaps for node.js projects the ability to update package.json in addition to package.lock, so the dependency update is made explicit. + + +3) Auto-Merge (19 Respondents): This category refers to the feature of automatically merging some update PRs into the repository if certain conditions are satisfied. As mentioned in Section 5.3.3, some developers believe as long as their projects have high-quality test suites, it will be trivial to review the update PR and they would prefer them to be merged automatically if the tests pass. + + +Despite the significant demand, this feature also seems to be especially controversial because doing this means offloading trust and giving bot autonomy. Although GitHub considers it unacceptable due to security risks [100], our survey clearly indicates that many still want to do this even if they are well aware of the risks. They also think the responsibility of risk control, e.g., vetting new releases, should be given to some capable central authority, not them. Here are three response examples: a) While this might be somewhat dangerous, and should be configurable somehow, [auto-merge] is something that basically already happens when I merge such PRs. b) If I am merging with Dependabot like 60 deps a day - I don’t know if some of the versions are not published by hackers who took over the repository account, so it would be great if there was some authority where humans actually check the changes and mark them secure. c) For me it’d be good if I could mute all notifications about Dependabot PRs except for when tests failed, indicating that I need to manually resolve some issues. Otherwise I’d be happy not to hear about it updating my deps. + + +4) Display Release Notes (8 Respondents): This category refers to the feature of always showing some sort of release notes or change logs in update PRs to inform developers of the changes in an update. Although Dependabot sometimes can provide release notes in PRs (Figure 1), it fails for 24.8% of the PRs in our dataset. One possible reason for this is that release notes are often missing or inaccessible in open source projects [35], which is also confirmed by one of our survey respondents: Most npm package updates feel unnecessary and the maintainers very often don’t bother to write meaningful release notes...At the same time, I shouldn’t expect maintainers to go through all of their dependencies’ changelogs either, so perhaps the tool should find those release notes for me. + + +5) Avoid Unnecessary Updates (7 Respondents): This category refers to the feature of providing a default behavior and configuration options to avoid updates that most developers in an ecosystem perceived as unnecessary. The most frequently mentioned feature is the ability to define separate update behaviors for development and production (or runtime) dependencies. Many developers would avoid the automatic update of development dependencies because they perceive such updates as mostly noise and there is very little gain in keeping development dependencies up-to-date. Other mentioned features include the ability to detect and avoid updates on bloated dependencies and to only provide updates for dependencies with real security vulnerabilities. + + +6) Custom Update Action (5 Respondents): This category of features refers to the ability to define custom update behaviors (using, e.g., regular expressions) to update dependencies in unconventional dependency files. + + +7) Configurability (5 Respondents): This category refers to the case of developers expressing that dependency management bots should be highly configurable, but does not provide any further information on the specific configuration options they want, e.g., more configuration options. + + +8) git Support (4 Respondents): This category of features concerns the integration of dependency management bots with the version control system (in our case, git). The specific mentioned features include automatic rebase, merge conflict resolution, squashing, etc., all of which help ensure that bot PRs will not incur additional work on developers (e.g., manipulating git branches and resolving conflicts). + + +9) Breaking Change Impact Analysis (3 Respondents): This feature category refers to the ability to perform program analysis to identify breaking changes and their impact on client code, e.g., something like a list of parts of my codebase that might be impacted by the update would be useful. This could be based on a combination of changes listed in the release notes and an analysis of where the package is used in my code. + + +The developers’ desired features align well with the reasons for Dependabot deprecation, indicating that feature availability can be an important driver for the migrations and competition between dependency management bots. + + +Findings for RQ5: + + +11.3% of the studied projects have deprecated Dependabot due to notification fatigue, lack of grouped update support, package manager incompatibility, lack of configurability, absence of auto-merge, etc. 68.4% of them migrate to other ways of automation, among which the most common migration target is Renovate Bot (86.5%). We identify nine categories of developers’ desired features that align well with the Dependabot deprecation reasons. +---------------------------------------- +------------------------------- +Section 52: +6 DISCUSSION + + +6.1 The State of Dependency Management Bots + + +In a nutshell, our results indicate that Dependabot could be an effective solution for keeping dependency up-to-date (RQ1, RQ2), but often with significant noise and workloads (RQ1, RQ4, RQ5), many of which could not be mitigated by the features and configuration options offered by Dependabot (RQ5). Apart from that, Dependabot’s compatibility score solution is hardly a success in indicating the +compatibility of a bot update PR (RQ3). As of March 2023, Dependabot is still under active development by GitHub with the majority of effort in supporting more ecosystems (e.g., Docker, GitHub Actions) and adding features to reduce noise (e.g., automatically terminate Dependabot in inactive repositories), according to the GitHub change log [102]. Still, there is plenty of room for improvement to tackle the update suspicion and notification fatigue problem [8]. + + +Among other dependency management bots, Renovate Bot is an actively developed and popular alternative for Dependabot version update (RQ5), while Greenkeeper [9] has been deprecated, PyUp [5] seems to be no longer under active development, and Synk Bot [6] mainly offers security-focused solutions. As of March 2023, Renovate Bot provides more features and configuration options than Dependabot for fine-tuning notifications, including update grouping and auto-merge [103]; it also provides merge confidence badges with more information than Dependabot [104]. However, it is still unclear whether the features and strategies taken by Renovate Bot are actually effective in practice, and we believe Renovate Bot could be an important study subject for future dependency management bot studies. + + +6.2 What Should be the Key Characteristics of a Dependency Management Bot? + + +In this section, we try to summarize the key characteristics of an ideal dependency management bot based on the results from our analysis and previous work. We believe they can serve as general design guidelines for practitioners to design, implement, or improve dependency management bots (or other similar automation solutions). + + +Configurability. + Wessel et al. [40] argue that noise is the central challenge in SE bot design and re-configuration should be the main countermeasure against noise. For the case of Dependabot, we find that Dependabot also causes noise to developers by opening more PRs than developers can handle (RQ4), and developers can re-configure multiple times to reduce its noise (RQ4). However, re-configuration is not always successful due to the lack of certain features in Dependabot, causing deprecations and migrations (RQ5). Just as many other software development activities, it is also unlikely for a “silver bullet” to be present, as noted by one of our survey respondents, +...there is no best practice in dependency management which is easy, fast and safe. + + +Therefore, we argue that configurability, i.e., offering the highest possible configuration flexibility for controlling its update behavior, should be one of the key characteristics of dependency management bots. This helps the bot to minimize unnecessary update notifications and attempts so that developers are less interrupted. Apart from the options already provided by Dependabot, our study indicates that the following configuration options should be present in dependency management bots: + + +1) +Grouped Updates: + Dependency management bots should provide options to group multiple updates into one PR. Possible options include grouping all “safe” updates (e.g., not breaking the CI checks) and updates of closely related dependencies (e.g., different components from the same framework). + + +2) +Update Strategies: + Dependency management bots should allow developers to specify which dependency to update based on more conditions, such as whether the dependency is used in production, the severity of security vulnerabilities, whether the dependency is bloated, etc. + + +3) +Version Control System Integration: + Dependency management bots should allow developers to define how the bot should interact with the version control system, including which branch to monitor, how to manipulate branches and handle merge conflicts, etc. + + +Autonomy. + According to the SE bot definition by Erlenhov et al. [39], the key characteristics of an “Alex” type of SE bot are its ability to autonomously handle (often simple) development tasks and its central design challenges include minimizing interruption and establishing trust with developers. However, without the auto-merge feature, Dependabot is hardly autonomous and this lack of autonomy is disliked by developers (RQ5); in extreme cases, developers use Dependabot entirely as a notification tool but not as a bot (Section 5.1.3). This lack of autonomy is also causing a high level of interruption and workload to developers using Dependabot in their projects (RQ5). + + +We argue that autonomy, i.e., the ability to perform dependency updates autonomously without human intervention under certain conditions, should be one of the key characteristics of dependency management bots. This characteristic is only possible when the risks and consequences of dependency updates are highly transparent and developers know when to trust these updates. Within the context of GitHub, we believe the current dependency management bots should offer the configuration option to merge update PRs when the CI pipeline passes. This option can be turned on for projects that have a well-configured CI pipeline with thorough static analysis, building, and testing stages, when the developers believe that their pipeline can effectively detect incompatibilities in dependency updates (Section 5.3.3). + + +With respect to the security concern of +auto-merge being used to quickly propagate a malicious package across the ecosystem + [100], we argue that the responsibility of verifying new releases in terms of security should not be given to independent developers as they usually do not have the required time and expertise (RQ5). Instead, package hosting platforms (e.g., npm, Maven, PyPI) should vet new package releases and quickly take down malicious releases to minimize their impact. These practices are also advocated in the literature on software supply chain attacks [105]. + + +Transparency. + Multiple previous studies, both on SE bots and on other kinds of software bots, point to the importance of transparency in bot design. For example, Erlenhov et al. [39] shows that developers need to establish the trust that the bot can perform correct development tasks. Similarly, Godulla et al. [106] argue that transparency is vital for bots used in corporate communications. In the context of code review bots, Peng and Ma [107] find that contributors expect the bot to be transparent about why a certain code reviewer is recommended. To reduce update suspicion [8] in dependency management bots, developers also need to know when to trust the bot to perform dependency updates. + + +We argue that transparency, i.e., the ability to transparently demonstrate the risks and consequences of a dependency update, should be one of the key characteristics of dependency management bots. However, the Dependabot compatibility score feature is hardly a success toward this +direction and developers only trust their own test suites. Beyond compatibility scores and project test suites, the following research directions may be helpful in enabling transparency in dependency management bots and establishing trust in the bot users: + + +1) +Program Analysis +: One direction to achieve this is to leverage program analysis techniques. There have been significant research and practitioner effort on breaking change analysis [36], two of which have demonstrated the potential of using static analysis in assessing bot PR compatibility [34], [108]. Still, given the extremely large scale of bot PRs [7], more research and engineering effort is needed to implement lightweight and scalable approaches to support each popular ecosystem. + + +2) +CI Log Analysis +: Another direction is to extend the idea of compatibility score with sophisticated techniques that learn more knowledge from CI checks. Since CI checks are scarce for many version pairs (RQ3), it will be interesting to explore techniques that transfer knowledge from other version pairs so that the matrix in Figure 5a can be less sparse. The massive CI checks available from Dependabot PRs would be a promising starting point. + + +3) +Release Note Generation +: Dependabot sometimes fails in locating and providing a release note for the updated dependency, and even if there is one, the maintainers very often don’t bother to write meaningful release notes, as noted by one respondent. This situation can be mitigated by applying approaches on software change summarization (e.g., [109]) and release note generation (e.g., [110]). + + +Self-Adaptability. + The ability to adapt to the specific environment and its dynamics is considered as one of the key characteristics of a “rational agent” in artificial intelligence [111], [112]. Dependency management bots can also be considered as autonomous agents working in the artificial environment of social coding platforms (e.g., GitHub). However, our findings reveal that Dependabot often cannot operate in the ways expected by developers (RQ5) and reconfigurations are common (RQ4). Such failures (e.g., update actions, package manager incompatibility, git branching) will lead to interruption and extra work for developers. + + +We argue that self-adaptability, i.e., the ability to automatically identify and self-adapt to a sensible default configuration in a project’s environment, should be one of the key characteristics of dependency management bots. For GitHub projects, its environment can include its major programming languages, package managers & ecosystems, the workflows used, the active timezone, developer preferences and recent activities, etc. A dependency management bot should have the ability to automatically generate a configuration file based on such information, and recommend configuration changes when the environment has changed (e.g., developer responses to bot PRs become slower than usual). This can be implemented by providing a semi-automatic recommender system for recommending an initial configuration to developers and prompting bot PRs for modifying their own configurations after bot adoption. +---------------------------------------- +------------------------------- +Section 53: +6.3 Comparison with Previous Work + + +Several previous studies have also made similar recommendations based on results from Greenkeeper or Dependabot [8], [37], [54], [56]. Studies on Greenkeeper [8], [37] show that dependency management bot causes noise to developers and CI test results are unreliable, but they do not investigate the effectiveness of bot configurations as a countermeasure against noise. Studies on Dependabot [54], [56] either focuses on a different aspect (i.e., security updates [54]) or provides specific recommendations on Dependabot features [56]. Compared with the previous studies, the contributions of our study are: 1) a systematic investigation of the Dependabot version update service, and 2) a comprehensive four-dimension framework for dependency management bot design. + + +The implications of our study are also related to the larger literature of SE bots and dependency management. With respect to the two fields, the contribution of our study is a unique lens of observation, i.e., Dependabot, that results in a set of tailored recommendations for dependency management bot design. We have carefully discussed in Section 6.2 about how the implications of our study confirm, extend, or echo the implications from existing literature. +---------------------------------------- +------------------------------- +Section 54: +6.4 Threats to Validity +---------------------------------------- +------------------------------- +Section 55: +6.4.1 Internal Validity + + +In RQ1, we have provided a holistic analysis of the impact of Dependabot adoption without incorporating possible confounding factors (e.g., the types of dependencies and the characteristics of projects). Consequently, it is difficult for our study to establish a firm answer on the effectiveness of adopting Dependabot and future work is needed to better quantify such impact among possible confounding factors. + + +Several approximations are used throughout our analysis. In RQ2, we resort to identify security PRs ourselves which may introduce hard-to-confirm errors (only repository owners know whether their PRs are security-related). The merge rate may not accurately reflect the extent to which Dependabot updates are accepted by developers as some projects may use different ways of accepting contributions. To mitigate this threat, we focus on projects that have merged at least 10 Dependabot PRs with the intuition that these projects are unlikely to accept Dependabot PRs in other ways if they have already merged many of them. In RQ3, Dependabot’s compatibility scores may change over time and it is impossible to know the score at the time of PR creation. In RQ4, Dependabot supports ecosystem specific matchers in dependency specifications, e.g., @angular/*, which we do not consider when parsing configuration files. However, we believe the noise introduced above should be minor and will not invalidate our findings or hinder the reproducibility of our data analysis. Like other studies involving manual coding, our analysis of developer discussions and survey responses are vulnerable to author bias. To mitigate this, two authors double-check all results and validate findings with project commit/PR histories for RQ5; they further conduct inter-rater reliability analysis for RQ5 when the dataset becomes larger. Finally, our own interpretation of the data (RQ1 - RQ5) may also be biased towards our own judgment. To mitigate this, we triangulate our key findings using a developer survey and derive implications based on both our analysis and developers’ feedback. +6.4.2 External Validity + + +Just like all case studies, generalizing our specific findings in each RQ to other dependency management bots and even to other projects that use Dependabot should be cautious. Our dataset only contains popular and actively maintained GitHub projects, many of which are already taking proactive updating strategies. Therefore, our findings may not generalize to projects of a smaller scale or more reluctant to update dependencies. The survey responses are collected through convenience sampling which may introduce possible, yet unknown biases in terms of experience, age, gender, development role, etc., so the generalization of our survey results to a broad developer audience should be cautious. The outcome of Dependabot usage may also not generalize to other dependency management bots due to their functionality and user base differences. In RQ1, we only base our analysis on JavaScript/npm projects which may not generalize to other ecosystems with different norms, policies, and practices [11]; the comparison of dependency management bot usage in different ecosystems could be an important avenue for future work. Despite these, we believe the implications we obtain for dependency management bot design should be general. Our proposed framework in Section 6.2 form a roadmap for dependency management bot designers. Our methodology could be applied in future studies to compare the effectiveness of different bots. +---------------------------------------- +------------------------------- +Section 56: +7 Conclusion + + +We present an exploratory study on Dependabot version update service using repository mining and a survey, and we identify important limitations in the design of Dependabot. From our findings, we derive a four-dimension framework in the hope that it can help dependency management bot design and inspire more research work on related fields. + + +Several directions of future work arise from our study. For example, investigating and comparing other dependency management bots, especially Renovate Bot, can help verify the generalizability of our proposed framework. An empirical foundation on the factors affecting the effectiveness of bot adoption is also necessary. It will be interesting to investigate the recommendation of bot configurations to developers, or to study how different approaches (e.g., program analysis, machine learning, release note generation) can help developers assess the compatibility of bot PRs. +---------------------------------------- +------------------------------- +Section 57: +8 Data Availability + + +We provide a replication package at Figshare: + + +https://figshare.com/s/78a92332e4843d64b984 + + +The package can be used to replicate the results from repository mining. To preserve the privacy of survey respondents, we choose not to disclose any raw data from the survey. + + +Acknowledgments + + +This work is supported by the National Key R&D Program of China Grant 2018YFB1004201 and the National Natural Science Foundation of China Grant 61825201. We sincerely thank the developers who participated in our survey. + + +References + + +[1] T. Winters, T. Manshreck, and H. Wright, Software Engineering at Google: Lessons Learned from Programming over Time. O’Reilly Media, 2020. +[2] R. G. Kula, D. M. Germán, A. Ouni, T. Ishio, and K. Inoue, “Do developers update their library dependencies? - an empirical study on the impact of security advisories on library migration,” Empir. Softw. Eng., vol. 23, no. 1, pp. 384–417, 2018. +[3] https://github.com/dependabot. +[4] https://github.com/renovatebot. +[5] https://pyup.io/. +[6] https://github.com/snyk-bot. +[7] M. Wyrich, R. Ghit, T. Haller, and C. Müller, “Bots don’t mind waiting, do they? comparing the interaction with automatically and manually created pull requests,” in 3rd IEEE/ACM International Workshop on Bots in Software Engineering, BotSE@ICSE 2021, Madrid, Spain, June 4, 2021. IEEE, 2021, pp. 6–10. +[8] S. Mirhosseini and C. Parnin, “Can automated pull requests encourage software developers to upgrade out-of-date dependencies?” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 2017, pp. 84–94. +[9] https://greenkeeper.io/. +[10] https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021. +[11] C. Bogart, C. Kästner, J. D. Herbsleb, and F. Thung, “When and how to make breaking changes: Policies and practices in 18 open source software ecosystems,” ACM Trans. Softw. Eng. Methodol., vol. 30, no. 4, pp. 42:1–42:56, 2021. +[12] G. Bavota, G. Canfora, M. D. Penta, R. Oliveto, and S. Panichella, “How the apache community upgrades dependencies: an evolutionary study,” Empir. Softw. Eng., vol. 20, no. 5, pp. 1275–1317, 2015. +[13] I. Pashchenko, D. L. Vu, and F. Massacci, “A qualitative study of dependency management and its security implications,” in CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020. ACM, 2020, pp. 1513–1531. +[14] J. Cox, E. Bouwers, M. C. J. D. van Eekelen, and J. Visser, “Measuring dependency freshness in software systems,” in 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2. IEEE Computer Society, 2015, pp. 109–118. +[15] J. M. González-Barahona, P. Sherwood, G. Robles, and D. Izquierdo-Cortazar, “Technical lag in software compilations: Measuring how outdated a software deployment is,” in Open Source Systems: Towards Robust Practices - 13th IFIP WG 2.13 International Conference, OSS 2017, Buenos Aires, Argentina, May 22-23, 2017, Proceedings, ser. IFIP Advances in Information and Communication Technology, vol. 496, 2017, pp. 182–192. +[16] A. Zerouali, T. Mens, and E. Constantinou, “On the evolution of technical lag in the npm package dependency network,” in 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. IEEE Computer Society, 2018, pp. 404–414. +[17] A. Zerouali, E. Constantinou, T. Mens, G. Robles, and J. M. González-Barahona, “An empirical analysis of technical lag in npm package dependencies,” in New Opportunities for Software Reuse - 17th International Conference, ICSR 2018, Madrid, Spain, May 21-23, 2018, Proceedings, ser. Lecture Notes in Computer Science, vol. 10826. Springer, 2018, pp. 95–110. +[18] A. Zerouali, T. Mens, J. M. González-Barahona, A. Decan, E. Constantinou, and G. Robles, “A formal framework for measuring technical lag in component repositories - and its application to npm,” J. Softw. Evol. Process., vol. 31, no. 8, 2019. +[19] J. Stringer, A. Tahir, K. Blincoe, and J. Dietrich, “Technical lag of dependencies in major package managers,” in 27th Asia-Pacific Software Engineering Conference, APSEC 2020, Singapore, December 1-4, 2020. IEEE, 2020, pp. 228–237. +[20] A. Zerouali, T. Mens, A. Decan, J. M. González-Barahona, and G. Robles, “A multi-dimensional analysis of technical lag in Debian-based docker images,” Empir. Softw. Eng., vol. 26, no. 2, p. 19, 2021. +[21] K. Chow and D. Notkin, “Semi-automatic update of applications in response to library changes,” in 1996 International Conference +on Software Maintenance (ICSM ’96), 4-8 November 1996, Monterey, CA, USA, Proceedings. IEEE Computer Society, 1996, p. 359. + + +[22] J. Henkel and A. Diwan, “CatchUp: capturing and replaying refactorings to support API evolution,” in 27th International Conference on Software Engineering (ICSE 2005), 15-21 May 2005, St. Louis, Missouri, USA. ACM, 2005, pp. 274–283. + + +[23] Z. Xing and E. Stroulia, “API-evolution support with Diff-CatchUp,” IEEE Trans. Software Eng., vol. 33, no. 12, pp. 818–836, 2007. + + +[24] H. A. Nguyen, T. T. Nguyen, G. W. Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen, “A graph-based approach to API usage adaptation,” in Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-21, 2010, Reno/Tahoe, Nevada, USA. ACM, 2010, pp. 302–321. + + +[25] B. Dagenais and M. P. Robillard, “Recommending adaptive changes for framework evolution,” ACM Trans. Softw. Eng. Methodol., vol. 20, no. 4, pp. 19:1–19:35, 2011. + + +[26] B. Cossette and R. J. Walker, “Seeking the ground truth: a retrospective study on the evolution and migration of software libraries,” in 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-20), SIGSOFT/FSE’12, Cary, NC, USA - November 11 - 16, 2012. ACM, 2012, p. 55. + + +[27] K. Huang, B. Chen, L. Pan, S. Wu, and X. Peng, “REPFINDER: finding replacements for missing APIs in library update,” in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 2021, pp. 266–278. + + +[28] B. B. Nielsen, M. T. Torp, and A. Møller, “Semantic patches for adaptation of JavaScript programs to evolving libraries,” in 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 2021, pp. 74–85. + + +[29] S. A. Haryono, F. Thung, D. Lo, J. Lawall, and L. Jiang, “ML-CatchUp: Automated update of deprecated machine-learning APIs in Python,” in IEEE International Conference on Software Maintenance and Evolution, ICSEM 2021, Luxembourg, September 27 - October 1, 2021. IEEE, 2021, pp. 584–588. + + +[30] S. A. Haryono, F. Thung, D. Lo, L. Jiang, J. Lawall, H. J. Kang, L. Semino, and C. Müller, “AndroEvolve: Automated Android API update with data flow analysis and variable denormalization,” Empir. Softw. Eng., vol. 27, no. 3, p. 73, 2022. + + +[31] https://semver.org/. + + +[32] S. Mostafa, R. Rodriguez, and X. Wang, “Experience paper: a study on behavioral backward incompatibilities of Java software libraries,” in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 - 14, 2017. ACM, 2017, pp. 215–225. + + +[33] S. Raemaekers, A. van Deursen, and J. Visser, “Semantic versioning and impact of breaking changes in the Maven repository,” J. Syst. Softw., vol. 129, pp. 140–158, 2017. + + +[34] J. Hejderup and G. Gousios, “Can we trust tests to automate dependency updates? A case study of Java projects,” J. Syst. Softw., vol. 183, p. 111097, 2022. + + +[35] J. Wu, H. He, W. Xiao, K. Gao, and M. Zhou, “Demystifying software release note issues on GitHub,” in Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Pittsburgh, USA, May 16-17, 2022. ACM, 2022. + + +[36] P. Lam, J. Dietrich, and D. J. Pearce, “Putting the semantics into semantic versioning,” in Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2020, Virtual, November, 2020. ACM, 2020, pp. 157–179. + + +[37] B. Rombaut, F. R. Cogo, B. Adams, and A. E. Hassan, “There’s no such thing as a free lunch: Lessons learned from exploring the overhead introduced by the Greenkeeper dependency bot in npm,” ACM Transactions on Software Engineering and Methodology, 2022. + + +[38] M. S. Wessel, B. M. de Souza, I. Steinmacher, I. S. Wiese, I. Polato, A. P. Chaves, and M. A. Gerosa, “The power of bots: Characterizing and understanding bots in OSS projects,” Proc. ACM Hum. Comput. Interact., vol. 2, no. CSCW, pp. 182:1–182:19, 2018. + + +[39] L. Erlenhov, F. G. de Oliveira Neto, and P. Leitner, “An empirical study of bots in software development: characteristics and challenges from a practitioner’s perspective,” in ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. ACM, 2020, pp. 445–455. + + +[40] M. S. Wessel, I. Wiese, I. Steinmacher, and M. A. Gerosa, “Don’t disturb me: Challenges of interacting with software bots on open source software projects,” Proc. ACM Hum. Comput. Interact., vol. 5, no. CSCW2, pp. 1–21, 2021. + + +[41] M. S. Wessel, A. Abdellatif, I. Wiese, T. Conte, E. Shihab, M. A. Gerosa, and I. Steinmacher, “Bots for pull requests: The good, the bad, and the promising,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. IEEE, 2022, pp. 274–286. + + +[42] E. Shihab, S. Wagner, M. A. Gerosa, M. Wessel, and J. Cabot, “The present and future of bots in software engineering,” IEEE Software, 2022. + + +[43] S. Santhanam, T. Hecking, A. Schreiber, and S. Wagner, “Bots in software engineering: a systematic mapping study,” PeerJ Comput. Sci., vol. 8, p. e866, 2022. + + +[44] https://github.com/apps/dependabot-preview. + + +[45] L. Erlenhov, F. G. de Oliveira Neto, and P. Leitner, “Dependency management bots in open-source systems - prevalence and adoption,” PeerJ Comput. Sci., vol. 8, p. e849, 2022. + + +[46] T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filippova, and A. Mockus, “Detecting and characterizing bots that commit code,” in MSR ’20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, 2020, pp. 209–219. + + +[47] https://www.indiehackers.com/interview/living-off-our-savings-and-growing-our-saas-to-740-mo. + + +[48] https://www.indiehackers.com/product/dependabot-acquired-by-github-1g7T7DN1rGEZM204shF. + + +[49] https://github.com/baker/dependabot-preview/. + + +[50] https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated/configuration-options-for-dependency-updates. + + +[51] https://docs.github.com/en/code-security/supply-chain-security/managing-vulnerabilities-in-your-projects-dependencies/about-alerts-for-vulnerable-dependencies#access-to-dependabot-alerts. + + +[52] Pull Request #1127 of datadesk/baker. + + +[53] https://docs.github.com/en/code-security/supply-chain-security/managing-vulnerabilities-in-your-projects-dependencies/about-dependabot-security-updates. + + +[54] M. Alfadel, D. E. Costa, E. Shihab, and M. Mkhallalati, “On the use of Dependabot security pull requests,” in 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021. IEEE, 2021, pp. 254–265. + + +[55] C. Soto-Valero, T. Durieux, and B. Baudry, “A longitudinal analysis of bloated Java dependencies,” in ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 1021–1031. + + +[56] F. R. Cogo and A. E. Hassan, “Understanding the customization of dependency bots: The case of dependabot,” IEEE Software, 2022. + + +[57] Pull Request #4317 of caddyserver/caddy. + + +[58] G. Gousios, “The GHTorrent dataset and tool suite,” in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR ’13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 233–236. + + +[59] N. Munaiah, S. Kroh, C. Cabrey, and M. Naqappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 3219–3253, 2017. + + +[60] H. He, R. He, H. Gu, and M. Zhou, “A large-scale empirical study on Java library migrations: prevalence, trends, and rationales,” in ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 478–490. + + +[61] https://docs.github.com/en/rest. + + +[62] https://workers.cloudflare.com/. + + +[63] https://github.com/advisories. + + +[64] M. Goeminnie and T. Mens, “Evidence for the Pareto principle in open source software activity,” in the Joint Proceedings of the 1st International workshop on Model Driven Software Maintenance and 5th International Workshop on Software Quality and Maintainability. Citeseer, 2011, pp. 74–82. + + +[65] Y. Zhang, M. Zhou, A. Mockus, and Z. Jin, “Companies’ participation in OSS development—an empirical study of OpenStack,” IEEE Trans. Software Eng., vol. 47, no. 10, pp. 2242–2259, 2021. +[66] A. Decan, T. Mens, and P. Grosjean, “An empirical comparison of dependency network evolution in seven software packaging ecosystems,” Empir. Softw. Eng., vol. 24, no. 1, pp. 381–416, 2019. + + +[67] https://github.com/dependabot/dependabot-core/issues/4146. + + +[68] R. Likert, “A technique for the measurement of attitudes.” Archives of Psychology, 1932. + + +[69] https://tools4dev.org/resources/how-to-choose-a-sample-size/. + + +[70] X. Tan, M. Zhou, and Z. Sun, “A first look at good first issues on GitHub,” in ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. ACM, 2020, pp. 398–409. + + +[71] https://www.qualtrics.com/blog/ethical-issues-for-online-surveys/. + + +[72] Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, and B. Vasilescu, “The impact of continuous integration on other software development practices: a large-scale empirical study,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 2017, pp. 60–71. + + +[73] N. Cassee, B. Vasilescu, and A. Serebrenik, “The silent helper: The impact of continuous integration on code reviews,” in 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020. IEEE, 2020, pp. 423–434. + + +[74] J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek, “Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’s d for evaluating group differences on the nse and other surveys,” in Annual Meeting of the Florida Association of Institutional Research, vol. 177, 2006, p. 34. + + +[75] S. Prion and K. Haerling, “Making sense of methods and measurement: Spearman-rho ranked-order correlation coefficient,” Clinical Simulation in Nursing, vol. 10, p. 535–536, 10 2014. + + +[76] P. Sturgis, C. Roberts, and P. Smith, “Middle alternatives revisited: How the neither/nor response acts as a way of saying “i don’t know”?” Sociological Methods & Research, vol. 43, no. 1, pp. 15–38, 2014. + + +[77] Pull Request #259 of dropbox/stone. + + +[78] Pull Request #3155 of tuist/tuist. + + +[79] Pull Request #663 of ros-tooling/action-ros-ci. + + +[80] Commit #b337b5f of justeat/httpclient-interception. + + +[81] Pull Request #1260 of asynkron/protoactor-dotnet. + + +[82] Commit #a06b04e of Azure/bicep. + + +[83] S. H. Khandkar, “Open coding,” University of Calgary, vol. 23, p. 2009, 2009. + + +[84] R. J. Passonneau, “Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation,” in Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 22-28, 2006. European Language Resources Association (ELRA), 2006, pp. 831–836. + + +[85] K. Krippendorff, Content Analysis: An Introduction to its Methodology. Sage publications, 2018. + + +[86] Pull Request #134 of skytable/skytable. + + +[87] Comment from Issue #1190 of dependabot/dependabot-core. + + +[88] Issue #1296 of dependabot/dependabot-core. + + +[89] Pull Request #2635 of giantswarm/happa. + + +[90] Commit #8cecf22 of Fate-Grand-Automata/FGA. + + +[91] Pull Request #1976 of stoplightio/spectral. + + +[92] Issue #1736 of dependabot/dependabot-core. + + +[93] Issue #1297 of dependabot/dependabot-core. + + +[94] Issue #202 of nitzano/gatsby-source-hashnode. + + +[95] Issue #26 of replygirl/tc. + + +[96] Pull Request #1987 of stoplightio/spectral. + + +[97] Pull Request #2916 of codalab/codalab-worksheets. + + +[98] Pull Request #126 of lyft/clutch. + + +[99] Pull Request #3622 of video-dev/hls.js. + + +[100] Comment from Issue #1973 of dependabot/dependabot-core. + + +[101] Issue #60 of ahmadnassri/action-dependabot-auto-merge. + + +[102] https://github.blog/changelog/label/dependabot/. + + +[103] https://docs.renovatebot.com/. + + +[104] https://docs.renovatebot.com/merge-confidence/. + + +[105] M. Zimmermann, C. Staicu, C. Tenny, and M. Pradel, “Small world with high risks: A study of security threats in the npm ecosystem,” in 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019. USENIX Association, 2019, pp. 995–1010. + + +[106] A. Godulla, M. Bauer, J. Dietlmeier, A. Lück, M. Matzen, and F. Vaaßen, “Good bot vs. bad bot: Opportunities and consequences of using automated software in corporate communications,” 2021. + + +[107] Z. Peng and X. Ma, “Exploring how software developers work with mention bot in github,” CCF Trans. Pervasive Comput. Interact., vol. 1, no. 3, pp. 190–203, 2019. + + +[108] D. Foo, H. Chua, J. Yeo, M. Y. Ang, and A. Sharma, “Efficient static checking of library updates,” in Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. ACM, 2018, pp. 791–796. + + +[109] L. F. Cortes-Coy, M. L. Vásquez, J. Aponte, and D. Poshyvanyk, “On automatically generating commit messages via summarization of source code changes,” in 14th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2014, Victoria, BC, Canada, September 28-29, 2014. IEEE Computer Society, 2014, pp. 275–284. + + +[110] L. Moreno, G. Bavota, M. D. Penta, R. Oliveto, A. Marcus, and G. Canfora, “ARENA: an approach for the automated generation of release notes,” IEEE Trans. Software Eng., vol. 43, no. 2, pp. 106–127, 2017. + + +[111] D. Poole, A. Mackworth, and R. Goebel, Computational Intelligence: A Modern Approach. Pearson Education, Inc., 2010. + + +Runzhi He is currently an undergraduate student at the School of Electronics Engineering and Computer Science (EECS), Peking University. His research mainly focuses on open source sustainability and software supply chain. He can be contacted via rzhe@pku.edu.cn + + +Hao He is currently a Ph.D. student at the School of Computer Science, Peking University. Before that, he received his B.S. degree in Computer Science from Peking University in 2020. His research addresses socio-technical sustainability problems in open source software communities, ecosystems, and supply chains. More information can be found on his personal website https://hehao98.github.io/ and he can be reached at hehao98@pku.edu.cn. + + +Yuxia Zhang is currently an assistant professor at the School of Computer Science and Technology, Beijing Institute of Technology (BIT). She received her Ph.D. in 2020 from the School of Electronics Engineering and Computer Science (EECS), Peking University. Her research interests include mining software repositories and open-source software ecosystems, mainly focusing on commercial participation in open-source. She can be contacted at yuxiazh@bit.edu.cn. +Minghui Zhou received the BS, MS, and Ph.D. degrees in computer science from the National University of Defense Technology in 1995, 1999, and 2002, respectively. She is a professor in the School of Computer Science at Peking University. She is interested in software digital sociology, i.e., understanding the relationships among people, project culture, and software products through mining the repositories of software projects. She is a member of the ACM and IEEE. She can be reached at zhmh@pku.edu.cn. +---------------------------------------- +------------------------------- +Section 58: +Open Source Software Sustainability: Combining Institutional Analysis and Socio-Technical Networks + + +LIKANG YIN, University of California, Davis, USA +MAHASWETA CHAKRABORTI, University of California, Davis, USA +YIBO YAN, University of California, Davis, USA +CHARLES SCHWEIK, University of Massachusetts Amherst, USA +SETH FREY, University of California, Davis, USA +VLADIMIR FILKOV, University of California, Davis, USA + + +CCS Concepts: • Human-centered computing → Empirical studies in collaborative and social computing. + + +Additional Key Words and Phrases: Institutional Design; Socio-technical Systems; OSS Sustainability + + +ACM Reference Format: +Likang Yin, Mahasweta Chakraborti, Yibo Yan, Charles Schweik, Seth Frey, and Vladimir Filkov. 2022. Open Source Software Sustainability: Combining Institutional Analysis and Socio-Technical Networks. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 404 (November 2022), 23 pages. https://doi.org/10.1145/3555129 + + +ABSTRACT +Sustainable Open Source Software (OSS) forms much of the fabric of our digital society, especially successful and sustainable ones. But many OSS projects do not become sustainable, resulting in abandonment and even risks for the world’s digital infrastructure. Prior work has looked at the reasons for this mainly from two very different perspectives. In software engineering, the focus has been on understanding success and sustainability from the socio-technical perspective: the OSS programmers’ day-to-day activities and the artifacts they create. In institutional analysis, on the other hand, emphasis has been on institutional designs (e.g., policies, rules, and norms) that structure project governance. Even though each is necessary for a comprehensive understanding of OSS projects, the connection and interaction between the two approaches have been barely explored. + + +In this paper, we make the first effort toward understanding OSS project sustainability using a dual-view analysis, by combining institutional analysis with socio-technical systems analysis. In particular, we (i) use linguistic approaches to extract institutional rules and norms from OSS contributors’ communications to represent the evolution of their governance systems, and (ii) construct socio-technical networks based on longitudinal collaboration records to represent each + + +Authors’ addresses: Likang Yin, lkyin@ucdavis.edu, University of California, Davis, CA, USA; Mahasweta Chakraborti, mchakraborti@ucdavis.edu, University of California, Davis, CA, USA; Yibo Yan, ybyan@ucdavis.edu, University of California, Davis, CA, USA; Charles Schweik, cschweik@umass.edu, University of Massachusetts Amherst, MA, USA; Seth Frey, sethfrey@ucdavis.edu, University of California, Davis, CA, USA; Vladimir Filkov, vfilkov@ucdavis.edu, University of California, Davis, CA, USA. + + +Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. + + +© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. +2573-0142/2022/11-ART404 $15.00 +https://doi.org/10.1145/3555129 + + +Proc. ACM Hum.-Comput. Interact., Vol. 6, No. CSCW2, Article 404. Publication date: November 2022. +project’s organizational structure. We combined the two methods and applied them to a dataset of developer digital traces from 253 nascent OSS projects within the Apache Software Foundation (ASF) incubator. We find that the socio-technical and institutional features relate to each other, and provide complimentary views into the progress of the ASF’s OSS projects. Refining these combined analyses can help provide a more precise understanding of the synchronization between the evolution of institutional governance and organizational structure. +---------------------------------------- +------------------------------- +Section 59: +1 INTRODUCTION + + +Open Source Software (OSS) is a multi-billion dollar industry. A majority of modern businesses, including all major tech companies, rely on OSS without even knowing it. OSS contributions are an important manifestation of computer-supported collaborative work, for the high degree of technical literacy typical of OSS contributors. Even though this popularity attracts many software developers to open source, more than 80% of OSS projects are abandoned [37]. + + +The failure of collaborative work in OSS has received attention from two perspectives. In software engineering, the focus has been on understanding success and sustainability from the socio-technical perspective: the OSS developers’ day-to-day activities and the artifacts they create. In the management domain, on the other hand, emphasis has been on institutional designs (e.g., policies, rules, and norms) that structure governance and OSS project administration. In particular, systems that generate public goods address these and other endemic social challenges by creating governance institutions for attracting, maintaining, incentivizing, and coordinating contributions. Ostrom [32] defines institutions as “… prescriptions that humans use to organize all forms of repetitive and structured interactions…”. Institutions guide interactions between participants in an OSS project, and can be informal such as established norms of behavior, or more formalized as written or codified rules. These norms and formalized rules, along with the mechanisms for rule creation, maintenance, monitoring, and enforcement, are the means through which collective action in OSS development occur [37], and they can be tiered or nested, as in the context of OSS projects embedded within an overarching OSS nonprofit organization. + + +Both methods have separately been shown to be utilitarianly describing the state of a process, however, combining the two perspectives has been barely explored. In this paper, we undertake a convergent approach, considering from one side OSS projects’ socio-technical structure and the other aspects of their institutional design. Our goal is to use these two perspectives synergistically, to identify when they strengthen and complement each other, and to also refine our understanding of OSS sustainability through the two methodological approaches. Central to our approaches is the idea that trajectories of individual OSS projects can be understood in the convergent framework through the context provided by similar projects that already are being readily sustained or have been abandoned. + + +We leverage a previously published dataset [47] of traces representing OSS developer’s day-to-day activities as part of the Apache Software Foundation Incubator (ASFI) project. These developers are a part of projects that have decided to undergo the process of incubation, toward becoming part of the ASF, and benefiting from the services it provides to member projects. The dataset includes historical traces and a sustainability label (graduation or retirement) for each project. Graduation is an indication of successful incubation and the readiness of a nascent project to join ASF proper, otherwise the project is retired. In other words and importantly, in this paper, we use the ASFI project outcomes of graduation or retirement as a measure of sustainability of the project. We assume that graduated projects are sustained longer than retired ones, although that might not +always be the case\textsuperscript{1}. But key hurdles that OSS projects have to demonstrate to graduate is that they can (1) produce new releases, and (2) show the ability to attract new developers. Both of these factors arguably are key to the sustainability of OSS projects. + + +We utilize this dataset to study the extent to which graduated and retired projects differ from each other, from the point of view of both the socio-technical structure and the institutional governance. On the socio-technical side, we construct the monthly longitudinal social and technical networks for each project, and calculate several measures describing the features of the networks. On the institutional governance side, we implement a classifier trained on manual annotations of institutional statements in the publicly accessible email communications among ASF participants. Then we compare the findings of our socio-technical and institutional metrics for project-level and individual-level activities. Next, we perform exploratory data analyses, deep-dive case studies, and eventually, we look at how socio-technical measures associate with the prevalence of institutional statements, and evolutionary trajectories during OSS project incubation to sustainability. In summary, we find that: + + + + +We can effectively extract governance content from email discussions in the form of institutional statements, and they fall into 12 distinguishable topics. + + +Projects with different graduation (i.e., sustainability) outcomes differ in how much governance discussion occurs within their communities, and also in their socio-technical structure. + + +Self-sustained projects (i.e., graduated) have a more socially active community, achieving it within their first 3 months of incubation, and they demonstrate more active contributions to documentation and more active communication of policy guidance via institutional statements. + + +A project’s socio-technical structure is temporally associated with the institutional communications that occur, depending on the role of the agent (mentor, committer, contributor) communicating institutional statements. + + + + +To provide the most relevant context, recently, Yin et. al. [46] showed that socio-technical networks can be used to effectively predict whether a project will graduate or retire from the ASF incubator. That work did not include any institutional or governance analysis. Here, we focus on closing the gap by studying the relationship between the organizational structure (i.e., the socio-technical system) and institutional governance in peer-contributed OSS projects. Our study is the first attempt to provide a common framework for simultaneous, socio-technical structure and institutional, analysis of OSS projects, in order to describe and understand a process affected by both, that is, project gaining self-sustaining and self-governing community and eventually graduating from the ASF incubator. We are hopeful that refining this convergent approach, of structural and institutional analyses, will open new ways to consider and study emergent properties like project sustainability. +---------------------------------------- +------------------------------- +Section 60: +2 THEORETICAL FRAMEWORK + + +Here we introduce the theories behind the two different viewpoints, Institutional Analysis and Development (IAD) and Social-Technical Systems (STS), as well as Contingency Theory serving as the glue between institutional governance and the organizational structure of OSS projects. + + +\textsuperscript{1}For example, it could be that some ASFI retired projects simply could not adapt to the policies and requirements set in the ASFI program but yet continue on, ‘in the wild’ or perhaps aligned with a different OSS foundation. +2.1 Institutional Theory and Commons Governance + + +OSS projects are a form of digital commons, or more precisely, Commons-Based Peer Production (CBPP) [37]. Legal scholar Yochai Benkler [2] introduced the phrase CBPP to describe situations where people work collectively over the Internet, and where organizational structure is less hierarchical. While CBPP situations are found in a variety of settings (e.g., collaborative writing, open source hardware) Benkler argues that OSS is the ‘quintessential instance’ of CBPP. + + +There is a relatively long history of the study of governance in commons settings, arguably led by Nobel laureate Elinor Ostrom and her groundbreaking book Governing the Commons [31]. Ostrom’s Institutional Analysis and Development (IAD) framework was developed to study the governance institutions that communities develop to self-manage natural resources. Much of this research focuses on the governance and sustainability of natural resource settings, e.g., water [6], marine [19], and forest [16] settings. + + +A key challenge in natural resource commons settings is that individuals who cannot easily be excluded from extracting resources from the pool of available natural resources often have little incentive to contribute toward the production or maintenance of that resource – what are commonly referred to as ‘free-riders’ [29]. In forest, fishery, and water settings, the free-rider problem in open access settings can lead to a problem termed by Hardin as the ‘Tragedy of the Commons’ [20]. Ostrom famously pushed back against Hardin’s analysis and over a course of a lifetime of work, highlighted that communities can avoid tragedy through hard work in developing self-governing institutions. + + +OSS commons are fundamentally different from natural resources in that digital resources can be readily replicated and are not subject to degradation due to over-harvesting. Therefore, if over-appropriation is not a problem, is there a potential tragedy of the commons in an OSS context? Invariably the answer is yes, and it lies at the heart of the idea of OSS sustainability. The tragedy occurs when there are free-riders and insufficient human resources available to continue to further develop and maintain the software and, as a result, the software project fails to achieve the functionality and use that was perhaps envisioned when it began, and becomes abandoned [36]. Ostrom and Hess [22] aptly describe this tragedy as ‘collective inaction.’ + + +Ostrom’s Nobel Prize-winning body of work was studying how humans collectively act and craft self-governing institutional arrangements to effectively avoid the tragedy in natural resource settings. Central in this effort was the introduction and evolution of the Institutional Analysis and Development (IAD) framework [32]. Later, IAD was applied to the study of digital or knowledge commons [17, 22] and explicitly to the study of self-governance in OSS, where Schweik and English undertook the first study of technical, community, and institutional designs of a large number of OSS projects [37]. + + +With that being said, prior work has found that self-governing OSS projects develop highly organized social and technical structures [5]. Those having foundation support, like the ASF, may additionally be in the process of organizing the developers’ structured interactions under a second tier of governance prescriptions as required by the ASF Incubator. We refer to an individual institutional prescription as an Institutional Statement (IS), which can include rules and norms, and which we define as a shared linguistic constraint or opportunity that prescribes, permits, or advises actions or outcomes for actors (both individual and corporate) [10, 39]. Institutions, understood operationally as collections of institutional statements, create situations for structured interaction for collective action. In other words, configurations of ISs affect the way collective action is organized. In the context of ASF and OSS projects, incubator ISs can affect OSS project social and technical structure. +With IS and other approaches to institutional analysis, it becomes possible to articulate the relationships between governance, organizational, and technical variables. For example, previous studies on OSS often report code modularity as a key technical design attribute [28, 30]. Hissam et al. [23] write: ‘A well-modularized system … allows contributors to carve off chunks on which they can work.’ Open and transparent verbal discussion between OSS team members and other ASF officials (e.g., mentors) about OSS project or ASF institutional design, captured in the form of institutional statements, could then predict effort by project contributors to restructure their project’s technical infrastructure to be more modular and inviting to new contributors. Using the approaches of institutional analysis, we extract institutional content from open access email exchanges between OSS project contributors to understand the role of communication governance information in OSS project sustainability. + + +2.2 Socio-Technical System Theory + + +A Socio-Technical System (STS) comprises two entities [42]: the social system where members continuously create and share knowledge via various types of individual interactions, and the technical system where the members utilize the technical hardware to accomplish certain collective tasks. STS theory can be considered to combine the views from both engineers and social scientists, an intermediary entity of sorts, that transfers the institutional influence to individuals [35]. The theory of STS is often referenced when studying how a technical system is able to provide efficient and reliable individual interactions [21], and how the social subsystem becomes contingent in the interactions and further affects the performance of the technical subsystem [15]. Moreover, the socio-technical system theory plays an important role in analyzing collective behavior in OSS projects [3]. OSS projects have also been studied from a network point of view [12, 24]. González-Barahona et al. [18] proposed using technical networks, where nodes are the modules in the CVS repository and edges indicate two modules share common committers, to study the organization of ASF projects. In socio-technical systems, organizations can intervene through long-term or short-term means. Smith et al. [40] propose two conceptual approaches, ‘outside’ and ‘inside’: ‘outside’ approaches represent the socio-technical and are managerial in approach. ‘Inside’ approaches are more reflexive about the role of management in co-constituting the socio-technical. + + +From that perspective, the Apache Software Foundation (ASF) community is a unique system that has both outside influence regulations from ASF board and members and inside governance managed or self-governed by individual Project Management Committees (PMC). + + +2.3 Contingency Theory, or There Are No Panaceas in Self-Governance + + +Contingency theory is the notion that there is no one best way to govern an organization. Instead, each decision in an organization must depend on its internal structure, contingent upon the external context (e.g., stakeholder [43], risk [9], schedule [45], etc.). Joslin et al. [25] find that project success is associated with the methodologies (e.g., processes, tools, methods, etc.) adopted by the project. Here, in particular, we treat the institutional statements as an abstraction of the methodologies in OSS development. As the organizational context changes over time, to maintain consistency, the project must adapt to its context accordingly. Otherwise, conflicts and inefficiency occur [1], i.e., not a single organizational structure is equally effective in all cases. Similar arguments have been made in the field of institutional analysis, arguing that there are no panaceas or standard blueprints for guiding the institutional design of a collective action problem [33]. + + +To address the conflicts caused by incompatibilities with the project’s context, previous work suggests thinking holistically. Lehtonen et al. [26] consider the project environment as all measurable spatio-temporal factors when a project is initiated, processed, adjusted, and finally terminated. They suggest that the same factor can have an opposite influence on the projects under a different +context. Joslin et al. [25] consider project governance to be part of the project context, concluding that project governance can impact the use and effectiveness of project methodologies. + + +As per contingency theory, during ASFI projects’ incubation, developers and mentors have to make in-time decisions on their organizational structure, contingent on what is happening in the institutional rules and governance, and vice versa. +---------------------------------------- +------------------------------- +Section 61: +3 RESEARCH QUESTIONS + + +Reflecting on the previous discussion, the primary goal of this paper is to demonstrate that the evolution of a project from a nascent state to a sustainable state can be studied effectively by combining the two different methodologies of socio-technical network analysis and institutional analysis. + + +We reported in prior sections that a variety of scholars have utilized a socio-technical systems approach to analyze collective behavior in OSS projects. We also described how institutional analysis is useful in understanding collective action in OSS settings. To enable to dual-view on sustainability, we first describe and evaluate our automated approach to identifying institutional statements in project emails. + + +RQ1: Are there institutional statements contained in ASF Incubator project email discussions? Can we effectively identify them? + + +With the next two research questions, we assess the utility of our convergent approach to the Institutional Analysis (IAD) and STS frameworks. In the case of the ASF incubation program, there are two eventual outcomes: either a project graduates from the ASF incubator and becomes a full-fledged ASF-associated project, or it retires without achieving that goal. In this context, we operationalize a sustainable state as one where an OSS project graduates from the ASF incubator program, rather than retires. We ask: + + +RQ2: Is OSS project evolution toward sustainability readily observable through the dual lenses of institutional and socio-technical analysis? And how do such temporal patterns differ? + + +Per institutional analysis theory, strategies, norms, and rules can affect the social and technical organizations of projects. Governance and organization, per social theories, must work hand-in-hand to make viable socio-technical systems. Ill-designed institutional arrangements would introduce inefficiencies into the system, and such inefficiencies may amplify deviant behaviors and irregular structures in the system. Such influential links from institutional design to the organizational structure can be, in fact, bi-directional. In effect, in a sustainable system, an ill-formed organizational structure may instigate new rules to adjust and improve such structure, further improving efficiencies in the systems. + + +Thus, we hypothesize that the feedback, if any, between project governance and project organization should be observable, specifically in that intensified governance discussion should precede and/or follow changes to the project organizational structure. As a reminder, we consider institutional statements as indicators of intensified discussions of OSS project self-governance or new incubator requirements on that self-governance. We also consider socio-technical network parameters as indicators of organizational structure. Thus, we ask: + + +RQ3: Are periods of increased Institutional Statements frequency followed by changes in the project organizational structure, and vice-versa? + + +In the following section, we introduce the methodologies approaching the above three research questions. +4 DATA AND METHODS + + +To study the difference between projects that graduate ASFI (i.e., become sustainable) and those that do not, in this paper we use a collection of large-scale data sets comprising Institutional Statements and Socio-Technical variables extracted from all graduated and retired projects from the Apache Software Foundation Incubator, ASFI. In ASFI, graduation is an indication that a nascent project is sufficiently sustainable to join ASF proper(^2), otherwise the project is retired. Our combing through the Apache lists, inspecting the data, and speaking to project and community members have shown that almost all failures to graduate are sustainability failures. On rare occasions, some projects have retired for reasons other than sustainability, e.g., some are not a good fit for the Apache model(^3), despite evidence that projects are generally sufficiently aware of the ASF model before entering incubation according to their project proposal(^4). + + +For the socio-technical networks, we collected historical trace data of commits, emails, and incubation outcomes for 253 ASFI projects, which have available archives of both commits and emails from 03/29/2003 to 02/01/2021(^5). Among those, 204 projects have already graduated, and 49 have retired. ASF incubator projects that are still in incubation are not studied in this paper. + + +We collected the ASF incubator project data from the ASF mailing list archives(^6), which are open access and can be retrieved through the archive web page lists, http://mail-archives.apache.org/mod_mbox/. They contain all emails and commits from the project’s ASF incubator entry date, and are current. The project URLs follow the pattern: proj_name - list_name/(YYYYMM).mbox. For example, the full URL for the dev mailing list of the Apache Accumulo project, in Dec 2014, is http://mail-archives.apache.org/mod_mbox/accumulo-dev/201412.mbox. Each such .mbox file contains a month of mailing list messages from the project, for the date specified in the URL. Here dev stands for ‘emails among developers’. Notably, there are some sites that are not following the pattern, e.g., ‘ASF-wide lists’ are not project-owned mailing lists, and the list ‘incubator.apache.org’ contains data of more than one project. + + +To extract Institutional Statements, we combined our email data set with a prior data set on ASF policy documents. In a given organization, institutional statements are characterized by a finite set of semantic roles (e.g. ASF Board, Mentors, contributors, etc. in ASF), and their interactions (e.g. management committees requesting reports from projects, developers voting to induct committers in ASF), in specific contexts. To account for their representation in our training corpus, we included institutional statements from not only ASF project-level email exchanges among participants, but also ASF policy documents. The supplementary set of Institutional Statements included 328 policies, which were compiled from ASF policy documents (e.g., Apache Cookbook, PPMC Guide, Incubator Policy, etc), in an economic analysis of the ASF Incubator’s policies [38]. + + +4.1 Pre-processing + + +We collected all 1,330,003 emails across the ASF Incubator projects, from 03/29/2003 to 02/01/2021 (under mailing lists of ‘commit’, ‘dev’, ‘user’, etc.). We find that 128,257 (about 9.6%) emails are automatically generated and broadcast by continuous integration tools (i.e., bots). Because the amount of such emails is substantial, but they carry less meaningful social or institutional information, and list members rarely reply to them, we use regular expression rules to identify and eliminate them from the corpus, leaving us 1,201,746 emails. + + + + +(^2)ASF’s guide to project graduation: https://incubator.apache.org/guides/graduation.html + + +(^3)ASF’s reason behind projects’ retirement: https://incubator.apache.org/projects/#retired + + +(^4)ASF incubator projects’ proposal https://cwiki.apache.org/confluence/display/INCUBATOR/Proposals + + +(^5)Our code and data is available at Zenodo: https://doi.org/10.5281/zenodo.5908030 + + +(^6)During the submission of this study, ASF had moved their email archives to Pony Mail system. +And, for the technical contribution side, many projects, especially those over ten years old that used SVN, utilized a bot for extensive mailings, thus forming outliers in the dataset. Thus, we eliminate commit messages from automated bots (e.g., ‘buildbot’), 253,758 out of 3,654,196 (about 14.4%) commit messages, and email messages from issues/bug tracking bots (e.g., ‘GitBox’). Moreover, we find some developers contributed commits by directly changing/uploading massive non-source code files (e.g., data, configuration, and image files). Since committing non-coding files can form outliers in the data set, we choose to apply the GitHub Linguist(^7) to identify 731 collective programming language and markup file extensions, and exclude any other non-coding commits (e.g., creating/deleting folders, upload images, etc.). +---------------------------------------- +------------------------------- +Section 62: +4.2 Constructing Socio-technical Networks + + +Network science approaches have been prominent in studying complex systems, e.g., OSS projects [4, 41]. Since networks can contain rich information for both the elements (i.e., nodes) and their interactions (i.e., edges), in this study, we use socio-technical networks to anchor the abstraction of socio-technical systems. We define the projects’ socio-technical structure using social (email-based) and technical (code-based) networks, extracted from their emails to the mailing lists and commits to source files. Similar to the approach by Bird et al. [3], we form a social network (weighted directed graph) for each project in each incubation month, from the communications between developers: a directed edge from developer (A) to (B) forms if (B) has replied to (A)’s post in a thread or if (A) has emailed (B) directly. The weight of the edge represents the communication frequency between a pair of developers. The technical bipartite networks (weighted bipartite graph) are formed in a similar way. For each project in each month, we include an un-directed edge between a developer (A) and a source file (F) if developer (A) has committed to the source file (F) that month (excluding the SVN branch names). The weight of the edge represents the committing frequency between the developer and the source file. In summary, social networks are weighted directed graphs. We form edges between two developer nodes, if one developer replied to or referenced the other’s email. Technical networks are undirected bipartite graphs, with developers forming one set of nodes, coding files forming the other, and a link being drawn when a developer contributed to a coding file. We use the networkx package from Python for the network-related implementation. +---------------------------------------- +------------------------------- +Section 63: +4.3 Extracting Institutional Statements + + +We combined the email exchange data set with the ASF policy document data to fine-tune a BERT-based [8] classifier, for automatic detection of ISs (see Sect. 2.1 for the definition of IS). + + +To start, we hand-annotated a small subset of our data for ISs as follows. After selecting a random subset of 313 email threads from incubator project lists, two hand-coders labeled the sentences in them as ‘IS’ or ‘Not IS’, on the basis of whether they fit the definition of Institutional Statements. They resolved disagreements through discussion and recorded these conclusions, achieving a peak out-of-sample agreement between 0.75 to 0.80. A sentence was coded as an IS only if it was a complete sentence; fragments such as parenthetical mentions of rules or resources were not annotated as positive. This resulted in 6,805 labeled sentences (i.e., ‘IS’ or ‘Not IS’); 273 were labeled as IS. + + +We treated all 328 policies from the ASF documents as institutional statements, since policy documents provide arguably more formal institutional sample text compared to the norm in the email discussions. Thus, we had 601 Institutional Statements in total across these two coded datasets. + + +Institutional statements refer to prescriptions and shared constraints in the form of norms, rules, and strategies that are meant to mobilize and organize actors towards collective actions. The examples of institutional statements provided in Table 1 provide some instances of developer exchanges. + + + + +(^7)GitHub Linguist https://github.com/github/linguist +Table 1. Selected Examples of Institutional Statements Found in ASFI Project Email Discussions. + + +| Project | Date | Institutional Statements | +|---------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------| +| Airflow | 21 Dec 2016| … running in our Lab there is virtually no restriction what we could do, however I will hand select people who have access to this environment. I will also hold ultimate power to remove access from anyone … | +| ODF | 07 Dec 2011| Please vote on releasing this package as < Package >. The vote is open for the next 72 hours and passes if a majority of at least three +1 ODF Toolkit PMC votes are cast … | +| Airflow | 24 Feb 2017| … Next steps: 1) will start the voting process at the IPMC mailinglist. … So, we might end up with changes to stable. … 2) Only after the positive voting on the IPMC and finalisation I will rebrand the RC to Release. | + + +that encompass norms and strategies with institutional implications. The first example from the Airflow project, dated 12/21/2016, involves a situation where certain developers find the computational infrastructure provided by ASF insufficient for testing and development requirements, and discuss setting up alternate arrangements to meet the bottleneck. Faced with resource limitations, one developer offers an externally hosted cloud environment through his private resources. The selected excerpt is a quote from the individual establishing the terms for using the alternate resources he may offer to the project members, including access permission and usage restrictions. ASF projects conduct voting from time to time to gather community consensus on matters of significance. The following example from ASFI project ODF, dated 12/07/2011 describes the stepwise process expected to be followed by members project-wide to conduct a vote that decides on the approval of the release of the current candidate under development. The final example from Airflow, 02/24/2017 also pertains to a similar process, where a developer discusses the voting process and the implications, especially in terms of subsequent steps that need to be fulfilled to ensure product release. + + +BERT-based Sequential Classifier. In natural speech, such as emails, ISs can appear as whole sentences, parts of sentences, or span multiple sentences. They are also relatively sparse, with their institutional quality dependent on their inherent interpretation as well as context. Framing IS extraction as a sequential sentence classification task in the context of self-contained email segments, instead of labeling individual sentences helps take into account contextual cues. + + +We used the sequential sentence classifier developed by Cohan et al. [8], which leverages Bidirectional Encoder Representations from Transformers (BERT) sequence classifier [11] to classify sentences in documents. BERT can be employed to generate the representation for a sentence, through joint encoding over its neighboring sentences and then leveraging the corresponding sentence separator ' +' token’s tuned embedding for downstream applications, such as sentence labeling, extractive summarizing, etc. Thus, our classifier comprises BERT for attention-based joint encoding across sentences followed by a feedforward classifier to predict sentence labels based on these separator ' +' vectors. + + +To test the performance of the classifier on email IS extraction, we held-out 40 email threads (12.5%, randomly split) out of our 313 hand-annotated email threads. The training was performed on the combined set of the remaining 273 coded email threads and the ASF policy documents. The coded training and, respectively, testing email data contained 231 and, respectively, 42 institutional statements. For both training and testing, email threads were processed to generate classifier inputs as follows. To include neighboring context while meeting length limits of the BERT-based text classifier, for each email document, sentences were first chunked into segments using a sliding window of up to 256 BERT sub-word (wordpiece) tokens. This resulted in segments containing 6 contiguous sentences each, on average, comprising as many full sentences as could +be accommodated in the specified subword limit. The rolling window had a step of 1 full sentence. We generated 3322 and 384 email segments for training and testing, respectively. For the policy documents, each policy with its sentences was treated as a segment, leading to 328 additional segments in the training data. There are several reasons to support the inclusion of ASF policies to augment positive training examples. (1) In terms of semantic information, they are about institutional themes and actions. This was expected to help the language model learn what sets apart Institutional themes from regular development activities and artifacts. (2) ASF policies are critical in common pool resource management and institutional operations as they describe roles, responsibilities and regulate actions, and are often invoked in email discussions(^8). (3) The institutional statements of the formal policies are the source texts that in-email references to IS are drawing from when they discuss ASF’s rules in email. From this perspective, they are a vital source text for detecting these statements as they occur in email settings. Hence, while apparently sourced from formal bylaws beyond emails, ASF policies are indeed institutional statements relevant and recurring in developer conversations and are hence included in the training data. + + +We fine-tuned our classifier end-to-end against the corresponding labels for sentences in the segment. The training stage was conducted with a batch size of 16 and a learning rate of (2 \cdot 10^{-5}), for 6 epochs. All other hyperparameters were left as defaults. To account for the class imbalance, we randomly oversampled training data segments that had at least one IS sentence to match the number of segments that had no IS sentences (1:1). In both the training and predicting phase, we did not incorporate any temporal information, other than the sequentiality captured by the segments. That is, when extracting the institutional statements, the model does not require the exact time of the discussion. + + +During testing or prediction, due to variable length of context preceding or following each sentence in any particular segment, we treat a sentence in an email as a ‘positive’ classification, if it has been detected as an IS in at least one segment. The performance of the model has been reported in terms of the F1-score, precision, and recall with respect to the positive (‘IS’) label detected for sentences in the test email set in Sect. 5.1. +---------------------------------------- +------------------------------- +Section 64: +4.4 Topics Identification in Institutional Statements + + +The purpose of text modeling is to describe the text given a specific corpus, and provide numerically measurable relationships among texts, e.g., topics identification, measuring similarity, etc. We use a Latent Dirichlet Allocation (LDA) model to get semantically meaningful topics to better understand the extracted institutional statements. LDA is an unsupervised clustering approach [48], which when given a set of documents, iteratively discovers relevant topics present in them, based on the word distributions and relative prevalence in each document. We used LDA to identify prominent topic clusters occurring among all institutional statements extracted from our email archives through our trained classifier (see Sec 4.3). No prior training from our coded email set against pre-identified topic labels was used to train the LDA model. We use the coherence score provided by the \texttt{gensim} package [44] to optimize the performance of the LDA model with respect to the number of topics; a higher coherence score represents a better clustering performance. We select the LDA model with the highest coherence score from which to draw the clusters. However, since the LDA model does not automatically generate a label for each cluster, we need to assign a label intuitively based on our domain knowledge of the ASF incubation process. Naming each topic cluster certainly carries some risks on interpretation, however, we believe that providing all top keywords for each cluster reduces such risk. + + +(^8)https://lists.apache.org/thread/zykybdvnk9cwx03pnrfl2br9nkcb7q3f +Table 2. Summary statistics for the monthly socio-technical variables and the counts of institutional statements from project mentors, committers, and contributors after removal of the top 2% of outliers. The numbers in parentheses denote the values after the removal of inactive months (i.e., absent of emails/commits). Prefix s_ denotes features in the social network while t_ represents the technical network. + + +| Statistic | Mean | St. Dev. | 25% | 75% | +|----------------------------|------------|-------------|------|------| +| s_num_nodes | 13.04 (16.96) | 14.56 (15.04) | 4 (7) | 17 (22) | +| s_graph_density | 0.30 (0.30) | 0.27 (0.22) | 0.12 (0.14) | 0.40 (0.40) | +| s_avg_clustering_coef | 0.22 (0.29) | 0.23 (0.21) | 0 (0.11) | 0.39 (0.43) | +| s_weighted_mean_degree | 11.83 (15.56) | 12.03 (12.81) | 4 (7.43) | 16 (19.71) | +| t_graph_density | 0.37 (0.68) | 0.41 (0.32) | 0 (0.36) | 1 (1) | +| t_num_dev_nodes | 1.18 (2.21) | 1.59 (1.60) | 0 (1) | 2 (3) | +| t_num_file_nodes | 60.99 (114.83) | 153.94 (197.25) | 0 (6) | 38 (126) | +| t_num_file_per_dev | 28.79 (53.57) | 80.46 (104.23) | 0 (4) | 20 (54.5) | +| num_IS_mentor | 15.46 (15.99) | 24.46 (25.01) | 0 (1) | 20 (20) | +| num_IS_committer | 9.34 (12.89) | 19.36 (22.36) | 0 (0) | 10 (16) | +| num_IS_contributor | 13.18 (16.36) | 21.72 (24.42) | 0 (2) | 18 (21) | + + +4.5 Variables of Interest + + +We draw institutional and socio-technical project features and variables on the basis of each framework’s predictions for our research questions. Our socio-technical variables are pulled from a recent study on forecasting the sustainability of OSS projects [46], showing high predictive power of socio-technical variables. All metrics are aggregated over monthly intervals, for each project, from the start to the end of its incubation. + + +Longitudinal Socio-Technical Metrics: + For each project network, for each month, we constructed the social and technical networks, and from them calculate various organizational structure measures. In our tables and results, the prefix t_ in a variable’s name indicates it is of the technical (code) network, while the prefix s_ in a variable’s name indicates it is of the social (email) network. For the monthly social networks, we calculate the weighted mean degree s_weighted_mean_degree (sum of all nodes’ weighted degree divided by the number of nodes), average clustering coefficient s_avg_clustering_coef (the average ratio of closed triangles over open triangles), graph density s_graph_density. In the technical bipartite networks, for each month, we calculate the number of unique developer nodes t_num_dev_nodes, the number of unique file nodes t_num_file_nodes, the number of files per developer t_num_file_per_dev, and the graph density t_graph_density. + + +Institutional Statements Frequency Metrics: + For each project, for each month, we added up the ISs in all emails of that month sent by each of the following three separate and identifiable groups of people: ASF mentors (num_IS_mentor), registered ASF committers (num_IS_committer), and contributors (num_IS_contributor). We summarize their statistics in Table 2. As noted earlier, there is a final group of emails not accounted here, sent by bots. Similar to calendar entries, they may be useful, but are not the object of our study here. + + +4.6 Granger Causality + + +Time series data allows for the identification of relationships between temporal variables that go beyond association. One approach, +Granger causality +, is a statistical test for identifying quasi-causality between pairs of temporal variables [13]. Given two such variables, $X_t$ and $Y_t$, the Granger causality test calculates the p-value of $Y_t$ being generated by a statistical model including only $Y$’s +prior values, $Y_{t-1}, Y_{t-2}$, etc., versus it being generated by a model that in addition to $Y$’s prior values, also includes $X$’s prior values $X_{t-1}, X_{t-2}$. Thus, Granger causality simply compares a base model involving only $Y$ to a more complex model involving $Y$ and $X$, and calculates if the latter is a better fit to the data. In the context of Granger causality, prior values are called lagged values, with $X_{t-1}$ having a lag of 1, $X_{t-2}$ having a lag of 2, etc. If the Granger causality test returns a small enough p-value (e.g., $< 0.01$), it is interpreted as the rejection of the null hypothesis, thus establishing that $X$ Granger causes $Y$. + + +The Granger causality test makes an assumption that the time-series on which it is applied are stationary, meaning they do not have a trend or seasonal effects. It is necessary to test for stationarity before running the Granger causality. We use the augmented Dickey-Fuller test [7], as implemented in +adf.test + from the R package +tseries + [27], to test stationarity. Both institutional and socio-technical variables were found to be stationary. We note that a distinction is typically made between scientific causality based on controlled experiments, and Granger causality, with the latter only satisfying one (precursor property) of multiple different properties of causality. Because of that, when Granger causality is used, the word ‘causality’ is always preceded by ‘Granger’. We also note that this test does not identify the sign, if any (i.e., positive or negative) of the Granger causality. It simply says if one exists. We use the +pgRangerTest + function to test Granger causality. +---------------------------------------- +------------------------------- +Section 65: +5 RESULTS + + +In this section, we answer the proposed research questions by adopting a dual-view, from the institutional analysis and socio-technical network perspectives. We first establish the utility of our IS identification methodology. + + +5.1 RQ$_1$: Are there institutional statements contained in ASF Incubator project discussions? If any, can we effectively identify the content of ISs? + + +Detecting Institutional Statements. First, we focus on the ability of our BERT-based classifier to identify institutional statements in the emails. When tested on the 857 held out sentences from the 40 email threads in our test set, see Sect 4.3, our classifier achieved a precision score of 0.667, recall score of 0.681, and F1 score of 0.674 on classifying Institutional Statements, demonstrating it is able to extract ISs from developer email exchanges in spite of there being only 5.1% ISs. + + +For model validation against overfitting, we sought to perform stratified cross-validation (CV) on our training data. We note that our data was not ideal for a CV study: we had (1) limited data size (2) uneven distribution of ISs across the email threads and (3) class imbalance between IS and non-IS sentences. E.g., due to the limited data size, emails with high IS density could find their way in the train but not the test split, and dramatically increase the variance in cross-validation results. To ameliorate that, for more uniform stratification we chunked up each of the 273 threads in our training data into 442 sub-emails of 20 contiguous sentences each (the email threads had a mean length of 22 sentences). We fine-tuned our classifier end-to-end against the corresponding labels for sentences in the sub-emails. The subsequent input segment generation and training of the pipeline were otherwise kept unchanged. We obtained a mean F1 score on positive labeling of sentences with ISs of 0.603, with some high IS variability between folds still persisting. + + +We consider these performance results satisfactory given that we had a small and highly imbalanced data set (273 ISs out of 6,805 sentences). There are strong indications that increasing the positive examples in the training data set will further increase our classifier’s performance. Of course, it is challenging to ascertain if classifier performance varies across projects due to limited + + + + +9When we fine-tuned the classifier with only the 273 training email threads (i.e., without Institutional statements from the ASF policy documents), the F1 for positive label was found to be about 20% lower. +Fig. 1. Comparing graduated (in blue) vs retired (in red) projects along the number of Institutional Statements (IS) (color online). The Mann-Whitney U test p-val is sufficiently small (in brackets), suggesting significant differences in means between groups. + + +We ran our classifier on the full corpus of 1,201,746 emails (after bot email removal) across all ASF incubator projects. It identified 313,140 ISs in the emails, for an average of 0.261 sentence-level ISs per email. Table 2 shows descriptive statistics for both the socio-technical variables and the number of institutional statements from project mentors, committers, and contributors, calculated in monthly intervals, per project. + + +We find that the classifier’s errors are also informative. In one set of false positives, participants described plans for an event occurring outside of Apache and the relevant incubator project, not the kind of process or behavioral constraint typical of ISs. It was probably detected as an IS due to its semantic similarity to rules and guidelines which make up other positive examples. Conversely, the sentence ‘Send it to + and see what the reaction is’ was missed as an IS, despite appearing in the context of contributor agreements. This miss is likely due to the fact that many such recommendations are made in the emails that would not be considered institutional, because they indicate a particular individual as an individual, rather than in their institutional role. + + +Institutional Statements Over Roles and Sustainability Status. We turn to some exploratory analysis, to demonstrate the utility of our chosen features when reasoning about differences between graduated and retired projects. Comparing graduated and retired projects, we find a significant difference in the number of ISs. For example, in Figure 1(a), the number of IS sent by mentors in graduated projects is statistically higher than retired projects (the Mann-Whitney U test is used for testing the difference in means). This, along with the fact that graduated projects tend to be more active socially overall compared to retired projects (i.e., more email exchanges), suggests the mentors of retired projects are concerned about the projects’ community progressing, thus, most of the email content is about rules and guidance. On the other hand, it is also plausible that mentors engage more socially and less institutionally with graduated projects, which may benefit those projects more. The numbers of ISs sent by committers and contributors show similar patterns. We investigate them longitudinally in the next section. + + +Topics Identification in Institutional Statements. We use the Latent Dirichlet Allocation (LDA) model to study the token-level topics in institutional statements. By optimizing the LDA coherence score, we get the optimal number of topics of 12. The result further enables us to study which words are important to each topic. We present the clusters of top words for each topic in Table 3. + + +As this table reveals, words are well extracted from the institutional statements and are distinguished from each other. For example, in the first topic (i.e., ‘Progress Report’), there is a cluster of words – ‘review’, ‘board’ (which relates to ASF board), ‘submit’, and ‘report’ – all of which are +Table 3. Topics Identified in Institutional Statements. + + +| ID | Heuristic Topic | Top Sample Words | +|-----|-----------------------|-------------------------------------------------------| +| 1 | Progress Report | review, require, meeting, board, submit, report | +| 2 | Collective Decision | vote, start, proposal, thread, close, day, bind | +| 3 | Project Release | release, issue, think, fix, branch, policy | +| 4 | Community | project, email, send, community, behalf, incubation, talk | +| 5 | Report Review | board, report, time, meeting, prepare, reminder, review | +| 6 | Mailing List Issues | list, mailing, discussion, question, issue, comment, request | +| 7 | Documentation | update, wiki, page, website, documentation, link, doc | +| 8 | Software Testing | release, source, build, test, note, artifact, check | +| 9 | licensing Policy | license, file, software, version, copyright, compliance | +| 10 | Routine Work | project, committer, help, work, way, code | +| 11 | Mentorship | podling, report, form, mentor, know, sign, month, wish | +| 12 | Software Distribution | work, repository, information, file, distribute, commit | + + +associated with the important incubator rule that requires projects to report regular progress reports. While in topic 7, words like ‘update’, ‘wiki’, ‘page’, ‘website’, and ‘documentation’ emerge, all related to requirements projects need to address related to their website or documentation requirements. The results advance the institutional theory under the software engineering domain, arguably that the IS is associated with OSS sustainability, suggest diving deeper into the connections between the social-technical system and institutional analysis. + + +RQ1 Summary: + We demonstrated that institutional analysis methodologies can capture differences between graduated projects and retired projects. We also showed that we can effectively identify meaningful institutional statements, and common topics, from ASF incubator projects’ emails. + + +5.2 +RQ2: + Is OSS project evolution toward sustainability observable through the dual lenses of institutional and socio-technical analysis? And how do such temporal patterns differ? + + +In this section, our goal is to contrast graduated and retired projects over time in both IS space and socio-technical space. Projects exit the ASF incubator at different times. In effect, there will be a larger variance during the end of the incubation month. Therefore, we restrict ourselves to the first 24 months for all projects (more than 60% projects stayed within 24 months in the incubator). + + +Topic Evolution Over Time. + After identifying the words that contribute to various identified topics, by aggregating over all projects, we get the volume, which is measured by the number of tokens contributing to that topic, of each topic in each month. Moreover, since there exist trends in the number of IS, we subtract the mean volume for each month, separately for the graduated and retired projects. We present them in Figure 2, where the x-axis is the number of months after their incubation start, and the y-axis indicates the relative volume compared to the mean. + + +The results of Mann-Whitney U test show 10 out of 12 topics are significantly different in their means between graduated and retired projects (p-val < 0.01). Not significant were topic 9 (licensing policy) and topic 12 (software distribution). Additionally, the augmented Dickey-Fuller test suggests that over time, 9 out of 12 topics are not stationary (i.e., temporal trends exist, with p-val... +Fig. 2. Topics Evolution for graduated projects (in blue) compared to retired projects (in red). The x-axis indicates the i-th month from their incubation start and the y-axis represents the relative volume of the topics. Mann-Whitney U test found 10 out of 12 topics are significantly different in their means between graduated and retired projects (p-val < 0.01). Not significant were topic 9 (licensing policy) and topic 12 (software distribution). + + +< .01), except for topic 2 (collective decision), topic 6 (mailing lists), and topic 12 (software distribution). The testing results prompt us to analyze the difference in project-level dynamics between graduated and retired projects. + + +We observe an increasing trend of Topic 1 ‘Progress Report’ with a small seasonal effect, suggesting the projects are learning the ‘Apache Way’ and more actively discussing their regular project reporting over time. And such seasonal effect is found to be more significant in Topic 5 (‘Report Review’). Project releases, documentation, and software testing, are all connected to the number of people participating regularly. Retired projects are on average smaller than the graduated ones, which is the likely explanation for the differences. E.g., in Figure 3(f), we show that graduated projects, on average, have more source files than retired projects. Moreover, we find that Topic 9, ‘license policy’, has an increasing trend in the earlier stages of incubation (e.g., months 1-7) which makes sense in that the shift from one OSS license to the license required by ASF is an important discussion that projects would want to address earlier on. + + +On the contrary, the longitudinal pattern of IS language related to software testing is relatively rare at the beginning of project incubation. It suggests that in earlier stages of incubation, developers are more likely focused on the transition to the incubator and perhaps less on new code development and testing. On the other hand, such transitions were implemented in a fast manner, with testing discussions increasing rapidly in incubation months 3, 4, and 5. + + +By comparing graduated and retired projects, we find that, Topic 10, ‘Routine work’, to be the dominant topic for both types of projects, almost through all projects’ incubation (i.e., remain high volume compared to other topics). We also find that graduated projects tend to be more active on Topic 7 ‘Documentation’ and Topic 3 ‘Project Release’. Interestingly, on the other hand, mentorship-related ISs (Topic 11) are found to be more active in retired projects rather than in graduated projects. One possible reason is that retired projects did seek help from their mentors when their projects were experiencing downturns, and further issuing institution-wise statements. +Fig. 3. The averaged monthly IS and ST variables between graduated projects and retired projects. On the top are the IS measures; On the bottom are ST measures. Shades indicate one st. error away from the mean. Month index 0 indicates the incubation starting month (color online). + + +Metric Evolution. + We continue by exploring the evolution of our metrics over time. Looking at the mentors’ ISs, shown in Figure 3(a), we can see that even at the beginning of their incubation, mentors email a greater number of ISs to projects that eventually graduate compared to ones that eventually retire. + + +Next, we see that the number of ISs in mentor emails decline for both graduated projects and retired projects before month 5, suggesting that ASFI mentor activity may decrease after incubating projects work through the first steps of the incubation process. + + +Then, we visually identify an increasing trend of IS from mentors around month 6 for graduated while 5 for retired projects. One possible reason is the fact that mentors start helping projects when they are experiencing difficulties or downturns. It is consistent with ASF mentorship that during the early stage of the incubation, developers are required to make institutional-related decisions, e.g., voting for reports, discussing the ASF required licensing, and the community-related issues, and it is in these kinds of areas where mentors come to help. + + +On the Socio-Technical networks side, shown in Figure 3(d), for the first 6 months, we can see the graduated projects have a clear increasing trend in the number of nodes in social networks, while it seems to be constant in retired projects. We can see a slight decrease around month 10 to month 12 for both types of projects, suggesting 10 months might be a good timing for mentors to intervene/motivate their projects, if they are experiencing some difficulties. + + +RQ2 Summary: + We identify socio-technical and institutional signatures of OSS project evolution, and evidence that it differs between graduated and retired projects, and that these patterns can even be distinguished by institutional heuristic topics. On the institutional side, both graduated and retired projects have more stable institutional topics during their first 3 months. On the Socio-Technical network side, graduated projects keep attracting community over their first 6 months, while retired projects are unstable during their first 3 months. +5.3 Case Study: Association Between Institutional Governance and Organizational Structure + + +To communicate concretely how the institutional and socio-technical dimensions interact within the ASFI ecosystem, we showcase four diverse instances of their mutual interrelationship. + + +Case A. In July 2011, the HCatalog project announced a vote for its first Release Candidate (RC), the first officially distributed version of its code. Because a project’s RC’s reflect on the whole ASF, they require approval from the foundation after project contributors have given their approval. In preparation for the first vote, developers double-checked the installation process and reported missing files and features. This drove contributions to the code and documentation, e.g., release notes were added after being reported missing. The contributors then cast their votes. With four people’s votes, the product was approved and a proposal was forwarded to Apache Incubator leadership for approval. + + +Case B. In December 2010, an independent developer emailed the Jena project community to share their idea for a new feature, and was asking how to proceed toward contributing it. Their query includes policy questions, such as whether they must obtain an Individual Contributor License Agreement (ICLA). A developer responds that the policy does not require an ICLA for the type of smaller contribution that the volunteer is proposing. The developer then guides the volunteer through established project processes for contributing to the code, including what mailing lists to use and how to submit their feature as a patch. + + +Case C. In December 2016, a developer in the Airflow project community raised concerns over the integration testing infrastructure offered by Apache, citing unnecessary obstacles it imposes on volunteer contributors. The developer offers their resources as an alternative, with the caveats that they will administer it and control access. This triggers a discussion on the technical merits of the developer’s concerns, and a policy discussion as to whether ASF permits the use of unofficial alternative infrastructure options. Several developers conclude that a transition is technically advisable and institutionally sound, and the community transitions to the alternative integration testing framework. + + +Case D. In September 2015, the Kalumet project received a proposal that it be retired from ASFI after its code had been languishing for several months. Contributors agreed upon retirement almost unanimously. One contributor, identifying features of the project that could be of use to other ASF and ASFI projects, suggests distributing key parts of its functionality to other active projects. The retirement vote is ultimately followed by developer effort distributing Kalumet’s assets. + + +These cases illustrate how institution-side policy discussion and sociotechnical-side project contributions interact, with developments on the artifact motivating policy discussions, and policy constraints steering developer effort. With longitudinal data on both institutional and socio-technical variables, we now transition to a quantitative investigation of these relationships. + + +5.4 RQ3: Are periods of increased Institutional Statements frequency followed by changes in the project organizational structure, and vice-versa? + + +In the previous RQs, we conducted exploratory and qualitative studies of the IS extraction technology, and of IS and socio-technical variable changes over time. In this section, we investigate the temporal relationship between our measures of institutional governance and organizational structure, as OSS projects progress on their incubation trajectories. As predicted by contingency theory, our hypothesis is that during project evolution, developers and mentors must make time... +Fig. 4. The Granger Causality between Institutional Statements and Socio-Technical networks. The blue/purple directed links indicate Granger causality from ST/IS measures, respectively. A green bi-directional link indicates that there is two-way significant temporal relationship (p-val < .001). Graduated projects seem to have fewer links from ST variables to IS variables, suggesting a more unidirectional flow from institutional to sociotechnical changes in successful projects (color online). + + +for decisions related to their organizational structure, contingent on ASF-required institutional arrangements and governance. That is, incubating projects change their organizational structure based on the institutional norms and rules being discussed, as required of them as a potential new member of the ASF community. And vice versa, organizational changes can incite follow-up discussions about institutional processes. To test for RQ3, here we use the pair-wise Granger causality test with lagged order of 2. We run the test for all pairs between the institutional statements and socio-technical variables, resulting in 36 separate tests for the graduated projects set and 36 for the retired ones. We adjust our p-values for multiple hypothesis testing to control false discovery rate, using the Benjamini-Hochberg procedure [14]. We only consider significant with p-val < 0.001. + + +The results are summarized in Figure 4, where a directed edge from node $X$ to node $Y$ indicates that $X$ Granger-causes $Y$, i.e., change in $X$ is the precursor to the change in $Y$. Also, as discussed in Section 4.6, the Granger approach we used is not a complete test of causality, but does yield an effect and its directionality, although without effect size or sign. + + +We observe a large number, 31 (out of 72 total), of Granger-causal relationship between the measures of institutional governance and the organizational structure. Of those 31 Granger-causal relationships, 15 are from the graduated set and 16 from the retired set, and 8 of the relationships are shared between the sets. We conclude that there is a significant Granger-causality between changes in institutional governance discussions and the organizational structure of the projects. We note 8 bidirectional relationships(^\text{13}), the remaining 15 are unidirectional. + + +(^{13})Bidirectional causality indicates feedback of some sort. E.g., supply causes demand, and demand in turn, causes supply. +We look at graduated projects first. Interestingly, Figure 4, top, shows that the number of ISs from mentors, committers, and contributors has effects on the technical network, and vice-versa for the latter two. Namely, IS from all roles (mentors, committers, and contributors) Granger-cause changes in the technical networks, i.e., on developer file productivity ($t_{num_files_per_dev}$), and total number of coding files changed ($t_{num_file_nodes}$) variables. Mentor IS, additionally, Granger-cause changes to number of developers ($t_{num_dev_nodes}$). This is consistent with ASFI expectations that a mentor’s emails provide advice and engage people, and conversely, that a drop in engagement may elicit mentors’ engagement. Mentors usually do not code, which is presumably why they Granger-cause but do not appear in feedback relationships with any of the technical network variables. + + +Notably absent, however, are links from mentor and contributor ISs into social network variables. Only committer ISs (bidirectionally) Granger-cause changes in the social network density, which, perhaps, simply indicates that ISs from committers induce substantial traffics in the social network, which in turn gets committers to discuss policy and rules issues. We have observed situations where mentors are likely to interrupt the projects when the projects become less active (either socially or technically)(^{14}). On the other hand, it could also be that a mentor is reacting to some particular broader discussion among developers, e.g., one on a monthly report. + + +Together, the above tells a story of the importance to the technical networks of changes in any IS variable. Surprisingly, mentor IS changes are not as consequential to the social network, seemingly at odds with the ASF community-first goals. Thus, there may be room to enhance community engagement with mentors and vice-versa. + + +RQ 3 Summary: + In both graduated and retired projects, there are no inputs from the IS into the social network variables, even though there are IS inputs into all technical network variables. Retired projects exhibit less bidirectionality between ST and IS variables. Finally, and interestingly, among retired projects, there are causal inputs into contributor ISs from both the social and technical variables. This is not the case for the graduated projects. +---------------------------------------- +------------------------------- +Section 66: +6 DISCUSSION + + +In this study, we use individual institutional prescriptions, Institutional Statements (IS), and the Socio-Technical (ST) network features to reason about OSS project sustainability. OSS projects are a form of digital public goods which, like other public goods (e.g., water, forest, marine, etc.), can be subject to degradation due to over-harvesting, e.g., in the form of free-riders who take advantage of OSS but do not contribute to the required resources for development and maintenance of the software. Ostrom’s work illuminated the fact that many communities avoid the dreaded ‘Tragedy of the Commons’, and other collective action problems, through the hard work of designing and implementing self-governing institutions. In that context, the ASF is a nonprofit foundation that, through its incubation program, encourages nascent OSS projects to follow some ASF-guided operational-level rules or policies around their self-governance. The OSS projects that join the ASF incubator trade some of the freedom of unlimited institutional choice in exchange for incubator resources that increase their chances of enduring the collective action problems that characterize OSS development [36], and becoming sustainable in the long run. + + +We found that in the ASF Incubator, the amount of institutional statements and levels of socio-technical variables are associated with projects graduation outcome, suggesting that the measures of institutional governance and organizational structure can signal information on sustainability. + + +(^{14})An example of mentor interrupting project warble: https://lists.apache.org/thread/x6h8pzhmfwtyy354ml1xm9sylq4y5r7l +In particular, in RQ1, the Mann-Whitney U test shows that the graduated projects have significantly more ISs from all three types of participants: committers, contributors, and project mentors than retired projects. This, presumably, is indicative of more active or intentional self-governance. In theoretical and empirical work on commons governance, it is well documented that getting self-governing institutions ‘right’ is hard work and takes time and effort [32]. This is consistent with a narrative that participants in graduated projects debate and work harder on their project’s operational-level institutional design. + + +Recent work has shown that ASFI graduate and retired projects have sufficiently different socio-technical structures [46], so that graduation can be predicted early on in development at 85+% accuracy. The results in RQ2, show that, for the first 3 months of incubation, developer nodes in the social networks of graduated projects increase at a higher rate (means increase from 10.1 to 17.1, and from 7.3 to 9.1 for graduated and retired projects, respectively), suggesting graduated projects were able to keep developers contributing more actively or recruit more new members. On the other hand, for the first 3 months, we also found that the amount of Institutional Statements by mentors increases in graduated projects, and decreases in retired projects (from 19.7 to 22.7 vs 22.6 to 14.6, for graduated and retired projects, respectively), suggesting that the initial help from project’s mentors is of importance. + + +To further study the effects of ISs, we performed a deep-dive into IS topics. We found the topics of institutional-relevance in the graduated projects differ from those of the retired projects, specifically, we find that the topic of documentation (topic 7) in graduated projects is more prevalent than in retired projects. On the other hand, we found that the topics of mentorship (topic 11) of retired projects are significantly higher than retired projects, signaling that the retired projects might be struggling during the incubation. Combined with the fact that there are more developer nodes in both the social and technical networks, together the findings suggest that graduated projects have more capacity and energy to attend to non-coding issues, like documentation, than retired projects do. However, even among graduated projects there is still diversity in the institutional statements. Thus, as predicted by contingency theory, as well as Ostrom’s theory of institutional diversity [33], a one-size-fits-all solution to a successful trajectory toward sustainability is not likely. Instead, future work should focus on gathering larger corpora of data, to be able to resolve individual or small-group differences in sustainable projects. + + +Our framework allowed us to combine the IS and STS structures and study them together over time. With it, in RQ3, we found two-way, causal correlations between socio-technical variables and ISs over time, arguably indicating that OSS project socio-technical structure and their governance structure evolve together, as a coupled system. In addition, our methods point to a way to study possible interventions in underperforming projects. Specifically, the finding that in retired projects there are bi-directional links from committer’s ISs to all three features of technical networks (i.e., ( t_{\text{num_dev_nodes}} ), ( t_{\text{num_file_per_dev}} ), ( t_{\text{num_file_nodes}} )), suggest that increase in committer’s IS are interleaved with changes in features of the socio-technical networks. + + +As for the design implications, in addition to the current categories of mailing lists in ASF incubator (e.g., ‘commit’, ‘dev’, ‘user’, etc.), there can be a benefit to creating a separate mailing list, for institutionally-related discussions to help committers (and also for mentors and contributors) participate faster in those discussions in a timely manner. This could be made more useful using technology for self-monitoring, with which project participants could monitor a project’s digital traces and discussions in order to more quickly react to episodic events. Some such tools have already been created for socio-technical networks in ASFI projects [34], and could be extended to include ISs as well. Such tools can help identify entry points and targets for interventions, whereby underperforming projects could be leaned on, internally or externally, via rules or advice to adjust their trajectories. +Contributions to Institutional Analysis and Socio-technical System Theory. Making a full circle, our findings also point to ways in which the theories we started from can be refined or extended. We find, in Sect. 5.4, evidence that the features of OSS projects’ socio-technical systems co-change together with the amount of Institutional Statements in them, and that the co-change relationships are sparse. This evidence of co-change implies that the OSS projects’ structure and their governance form a (loosely) coupled system. From a controllability point of view, a dynamically coupled system refines Smith et al.’s mechanistic binary notion of ‘inside’ and ‘outside’ interventions [40]. + + +Our findings also suggest that for OSS projects, adopting additional rules and norms (e.g., by joining ASFI) can be worth the loss of some freedoms, as the Institutional Statements (Sect. 5.2, 5.3, 5.4) seem to serve to organize the project’s actions and discussions, as predicted by Siddiki et al. [39] and Crawford and Ostrom [10]. Thus, our findings tie in with, and potentially extend the Institutional Analysis Design (IAD) view, suggesting that the feedback between the socio-technical system structure and institutional governance analysis is sufficiently direct and significant, and should be considered unitary in further studies. + + +More practically, our institutional statement predictor, although still a work in progress, can effectively predict atomic elements of self-governance. As such, it can be used as a tool to provide quantitative data for applying institutional analysis and design (IAD) more generally, e.g., to OSS projects that are outside of ASF, or self-governed systems with public documents and discussion forums. +---------------------------------------- +------------------------------- +Section 67: +7 THREATS TO VALIDITY + + +First, our data is from only hundreds of projects ASF incubator projects. Thus, generalizing the implications beyond ASF, or even beyond the ASF Incubator projects carries potential risks, for example, OSS projects in other incubator programs may not have mentors. Expanding the dataset beyond the ASF incubator, e.g., with additional projects from other OSS incubator programs could lower this risk. Second, we do not consider communication channels other than the ASF mailing lists, e.g., in-person meetings, website documentation, private emails, etc. However, ASF mandates the use of the public mailing lists for most project discussions, a policy that ensures a particularly low risk of missing institutional or socio-technical information. Annotations of the Institutional Statements (IS) can be biased by individual annotators, while we gave the annotators sufficient training and reference documentation which lowers the risk. We expect the performance of the classifier as we increase the size of the training set and better incorporate contextual information, and we plan to distinguish types of ISs for future work. In OSS projects, developers may use their different emails or aliases, which in turn complicates the identification of distinct developers, while assigning and insisting on using a unique apache.org domain email address reduces such risks. + + +Finally, as noted in Sect. 4, there are likely cases where OSS projects that have retired from the ASF Incubator program still go on to become sustained over time. In these instances, some OSS projects entering the ASFI may simply not be a good fit for the ASF culture and institutional requirements or policies and ultimately retire as a result. In this paper, we explicitly use graduation as a measure of sustainability given that this is an ultimate goal of the ASFI – to create projects that can indeed be sustainable. But we want to recognize the point that few retired projects still could become sustainable by following a different path than association with ASF. + + + + +15 The Apache Way: http://theapacheway.com/on-list/ +16 ASF committer emails: https://infra.apache.org/committer-email.html +8 CONCLUSION + + +Understanding why OSS projects cannot meet the expectations of nonprofit foundations may help others improve their individual practice, organizational management, and institutional structure. More importantly, understanding the relationship between institutional design and socio-technical aspects in OSS can bring insights into the potential sustainability of such projects. Here we showed that quantitative network science features can capture the organizational structure of how developers collaborate and communicate through the artifacts they create. Combining the two perspectives, socio-technical measures, and institutional analysis, we leverage the unique affordances of the Apache Software Foundation’s OSS Incubator project to extend the modeling of OSS project sustainability, leveraging a novel longitudinal dataset, a vast text and log corpus, and extrinsic labels for the success and failure of project sustainability. + + +ACKNOWLEDGEMENTS + + +The authors greatly thank the reviewers for their constructive comments. This material is based upon work supported by the National Science Foundation under GCR grant no. 2020751 and no. 2020900. + + +REFERENCES + + +[1] Barclay, D. W. Interdepartmental conflict in organizational buying: The impact of the organizational context. +Journal of Marketing Research + 28, 2 (1991), 145–159. + + +[2] Benkler, Y. +The wealth of networks +. Yale University Press, 2008. + + +[3] Bird, C., Gourley, A., Devanbu, P., Gertz, M., and Swaminathan, A. Mining email social networks. In +Proceedings of the 2006 international workshop on Mining software repositories + (2006), pp. 137–143. + + +[4] Bird, C., Nagappan, N., Gall, H., Murphy, B., and Devanbu, P. Putting it all together: Using socio-technical networks to predict failures. In +2009 20th International Symposium on Software Reliability Engineering + (2009), IEEE, pp. 109–119. + + +[5] Bird, C., Pattison, D., D’Souza, R., Filkov, V., and Devanbu, P. Latent social structure in open source projects. In +Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering + (2008), pp. 24–35. + + +[6] Blomquist, W., et al. +Dividing the waters: governing groundwater in Southern California +. ICS Press Institute for Contemporary Studies, 1992. + + +[7] Cheung, Y.-W., and Lai, K. S. Lag order and critical values of the augmented dickey–fuller test. +Journal of Business & Economic Statistics + 13, 3 (1995), 277–280. + + +[8] Cohan, A., Beltagy, I., King, D., Dalvi, B., and Weld, D. S. Pretrained language models for sequential sentence classification. In +Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing + (Hong Kong, China, 2019), Association for Computing Machinery, p. 3693–3699. + + +[9] Cooke-Davies, T. The “real” success factors on projects. +International journal of project management + 20, 3 (2002), 185–190. + + +[10] Crawford, S., and Ostrom, E. A grammar of institutions. +American Political Science Review + 89, 3 (1995), 582–600. + + +[11] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. +arXiv preprint arXiv:1810.04805 + (2018). + + +[12] Ducheneaut, N. Socialization in an open source software community: A socio-technical analysis. +Computer Supported Cooperative Work (CSCW) + 14, 4 (2005), 323–368. + + +[13] Dumitrescu, E.-I., and Hurlin, C. Testing for granger non-causality in heterogeneous panels. +Economic modelling + 29, 4 (2012), 1450–1460. + + +[14] Ferreira, J., and Zwinderman, A. On the benjamini–hochberg method. +The Annals of Statistics + 34, 4 (2006), 1827–1849. + + +[15] Fischer, G., and Herrmann, T. Socio-technical systems: a meta-design perspective. +International Journal of Sociotechnology and Knowledge Development (IJSKD) + 3, 1 (2011), 1–33. + + +[16] Fleischman, F., Loken, B., Garcia-Lopez, G., and Villamayor-Tomas, S. Evaluating the utility of common-pool resource theory for understanding forest governance and outcomes in Indonesia between 1965 and 2012. +International Journal of the Commons + 8, 2 (2014). + + +[17] Frischmann, B., Madison, M., and Strandburg, K. +Governing Knowledge Commons +. Oxford University Press, 2014. +[18] González-Barahona, J. M., Lopez, L., and Robles, G. Community structure of modules in the apache project. In Proceedings of the 4th International Workshop on Open Source Software Engineering (2004), IET, pp. 44–48. + + +[19] Gruby, R. L., and Basurto, X. Multi-level governance for large marine commons: politics and polycentricity in palau’s protected area network. Environmental science & policy 33 (2013), 260–272. + + +[20] Hardin, G. The tragedy of the commons: the population problem has no technical solution; it requires a fundamental extension in morality. science 162, 3859 (1968), 1243–1248. + + +[21] Herrmann, T., Hoffmann, M., Kunau, G., and Loser, K.-U. A modelling method for the development of groupware applications as socio-technical systems. Behaviour & Information Technology 23, 2 (2004), 119–135. + + +[22] Hess, C., and Ostrom, E. Understanding knowledge as a commons: From theory to practice. JSTOR, 2007. + + +[23] Hissam, S., Weinstock, C. B., Plakosh, D., and Asundi, J. Perspectives on open source software. Tech. rep., Carnegie Mellon Univ Pittsburgh PA - Software Engineering Inst, 2001. + + +[24] Joblin, M., and Apel, S. How do successful and failed projects differ? a socio-technical analysis. ACM Trans. Softw. Eng. Methodol. (dec 2021). + + +[25] Joslin, R., and Müller, R. The impact of project methodologies on project success in different project environments. International Journal of Managing Projects in Business (2016). + + +[26] Lehtonen, P., and Martinsuo, M. Three ways to fail in project management and the role of project management methodology. Project Perspectives 28, 1 (2006), 6–11. + + +[27] Lopez, J. H. The power of the adf test. Economics Letters 57, 1 (1997), 5–10. + + +[28] Narduzzo, A., and Rossi, A. The role of modularity in free/open source software development. In Free/Open source software development. Igi Global, 2005, pp. 84–102. + + +[29] Olson, M. The logic of collective action [1965]. Contemporary Sociological Theory 124 (2012). + + +[30] O’Reilly, T. Lessons from open-source software development. Communications of the ACM 42, 4 (1999), 32–37. + + +[31] Ostrom, E. Governing the commons: The evolution of institutions for collective action. Cambridge university press, 1990. + + +[32] Ostrom, E. Understanding institutional diversity. Princeton university press, 2009. + + +[33] Ostrom, E., Janssen, M., and Andereis, J. Going beyond panaceas. Proceedings of the National Academy of Sciences 104, 39 (2007), 15176–15178. + + +[34] Ramchandran, A., Yin, L., and FilKov, V. Exploring apache incubator project trajectories with apex. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) (2022), IEEE, p. Accepted. + + +[35] Ropohl, G. Philosophy of socio-technical systems. Techné: Research in Philosophy and Technology 4, 3 (1999), 186–194. + + +[36] Schweik, C. M., and English, R. Tragedy of the foss commons? investigating the institutional designs of free/libre and open source software projects. First Monday (2007). + + +[37] Schweik, C. M., and English, R. C. Internet success: a study of open-source software commons. MIT Press, 2012. + + +[38] Sen, A., Atkisson, C., and Schweik, C. M. Cui bono: Do open source software incubator policies and procedures benefit the projects or the incubator? Available at SSRN (2021). + + +[39] Siddiki, S., Heikkila, T., Weible, C. M., Pacheco-Vega, R., Carter, D., Curley, C., Deslatte, A., and Bennett, A. Institutional analysis with the institutional grammar. Policy Studies Journal (2019). + + +[40] Smith, A., and Stirling, A. Moving outside or inside? objectification and reflexivity in the governance of socio-technical systems. Journal of Environmental Policy & Planning 9, 3-4 (2007), 351–373. + + +[41] Surian, D., Tian, Y., Lo, D., Cheng, H., and Lim, E.-P. Predicting project outcome leveraging socio-technical network patterns. In 2013 17th European Conference on Software Maintenance and Reengineering (2013), IEEE, pp. 47–56. + + +[42] Trist, E. The evolution of socio-technical systems: A conceptual framework and an action research program. Ontario Ministry of Labour, 1981. + + +[43] Turner, J. R., and Müller, R. Communication and co-operation on projects between the project owner as principal and the project manager as agent. European management journal 22, 3 (2004), 327–336. + + +[44] Řehůřek, R., Sojka, P., et al. Gensim—statistical semantics in python. Retrieved from genism. org (2011). + + +[45] Wearn, S., and Stanbury, A. A study of the reality of project management: Wg morris and gh hough, john wiley, uk (1987) e 29.95, isbn 0471 95513 pp 295. International Journal of Project Management 7, 1 (1989), 58. + + +[46] Yin, L., Chen, Z., Xuan, Q., and FilKov, V. Sustainability forecasting for apache incubator projects. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2021), Association for Computing Machinery, p. 1056–1067. + + +[47] Yin, L., Zhang, Z., Xuan, Q., and FilKov, V. Apache software foundation incubator project sustainability dataset. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (2021), IEEE, pp. 595–599. + + +[48] Yu, H., and Yang, J. A direct lda algorithm for high-dimensional data—with application to face recognition. Pattern recognition 34, 10 (2001), 2067–2070. + + +Received July 2021; revised November 2021; accepted April 2022 +---------------------------------------- +------------------------------- +Section 68: +The Labor of Maintaining and Scaling Free and Open-Source Software Projects + + +R. STUART GEIGER∗, University of California, San Diego; Department of Communication and Halicioglu Data Science Institute, USA +DOROTHY HOWARD, University of California, San Diego; Department of Communication and Feminist Labor Lab, USA +LILLY IRANI, University of California, San Diego; Department of Communication, The Design Lab, and Feminist Labor Lab, USA + + +Free and/or open-source software (or F/OSS) projects now play a major and dominant role in society, constituting critical digital infrastructure relied upon by companies, academics, non-profits, activists, and more. As F/OSS has become larger and more established, we investigate the labor of maintaining and sustaining those projects at various scales. We report findings from an interview-based study with contributors and maintainers working in a wide range of F/OSS projects. Maintainers of F/OSS projects do not just maintain software code in a more traditional software engineering understanding of the term: fixing bugs, patching security vulnerabilities, and updating dependencies. F/OSS maintainers also perform complex and often-invisible interpersonal and organizational work to keep their projects operating as active communities of users and contributors. We particularly focus on how this labor of maintaining and sustaining changes as projects and their software grow and scale across many dimensions. In understanding F/OSS to be as much about maintaining a communal project as it is maintaining software code, we discuss broadly applicable considerations for peer production communities and other socio-technical systems more broadly. + + +CCS Concepts: • Social and professional topics → Computer supported cooperative work; Socio-technical systems; Computing profession; Project and people management; • Software and its engineering → Open source model. + + +Additional Key Words and Phrases: open source, free software, maintenance, infrastructure, labor + + +ACM Reference Format: +R. Stuart Geiger, Dorothy Howard, and Lilly Irani. 2021. The Labor of Maintaining and Scaling Free and Open-Source Software Projects. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 175 (April 2021), 28 pages. https://doi.org/10.1145/3449249 +---------------------------------------- +------------------------------- +Section 69: +1 INTRODUCTION + + +Free and/or open-source software (or F/OSS) refers to a broad set of working processes, social movements, and organizations that have formed around the production and distribution of software, + + +∗The majority of the work on this project was conducted when Geiger was affiliated with the Berkeley Institute for Data Science at the University of California, Berkeley + + +Authors’ addresses: R. Stuart Geiger, University of California, San Diego; Department of Communication and Halicioglu Data Science Institute, 9500 Gilman Dr, La Jolla, California, USA, 92093; Dorothy Howard, University of California, San Diego; Department of Communication and Feminist Labor Lab, 9500 Gilman Dr, La Jolla, California, USA, 92093; Lilly Irani, University of California, San Diego; Department of Communication, The Design Lab, and Feminist Labor Lab, 9500 Gilman Dr, La Jolla, California, USA, 92093. + + +Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). + + +© 2021 Copyright held by the owner/author(s). +2573-0142/2021/4-ART175. https://doi.org/10.1145/3449249 + + +Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 175. Publication date: April 2021. + + +This work is licensed under a Creative Commons Attribution International 4.0 License. +© 2021 Copyright held by the owner/author(s). +2573-0142/2021/4-ART175. https://doi.org/10.1145/3449249 +with a complex and contested history going back decades. These movements have been extensively studied from many disciplinary perspectives, as well as the subject of substantial commentary from its members, across its many factions [e.g. 49, 102, 125]. These software projects publicly release their source code, rather than various commercial models of software in which firms require payment to use software and/or restrict the ability for users to modify the software. Practitioners often describe F/OSS as being ‘free’ in two ways: free in being available at no cost (called “free as in beer”), and free in having source code available and licensed such that users can modify it (called “free as in speech”) [78]. However, it is important to ask about how the work of maintaining these projects fits into these paradigms of free-ness, when F/OSS and other similar peer production projects require labor and material resources [41, 84, 124]. + + +In prior decades, many early F/OSS projects began as hobbyist efforts to build alternatives to commercial proprietary software from the tech industry. Many early contributors volunteered spare time or negotiated with their employer to let them spend work time on F/OSS [8, 27, 29, 64, 76, 94]. Many F/OSS projects have become more commercial and part of the tech sector over the past two decades [10, 47]. Today, F/OSS has grown such that many projects have become the dominant product in their sector and are extensively relied upon by commercial software firms (e.g. Linux, Apache, Python). Many of the most successful F/OSS projects are not user-facing applications, but software infrastructure that are relied upon by companies inside and outside of the software industry, such as operating systems, programming languages, software libraries, servers, and web components. A 2020 survey of 950 enterprise-sized companies across sectors reported that 95% said open source software was important to their infrastructure strategy and 77% would adopt more open source software the next year [103]. F/OSS is also relied upon by government entities, non-profits, and activist movements, where the free cost and the ability to modify it can be crucial. + + +As F/OSS projects have become more critically embedded into organizations and economies, there has been a major shift about questions of “sustainability” within many projects, especially those that began as volunteers’ side projects. This term is used to call attention to whether projects will keep developing and maintaining what others rely on, as all software must be maintained to continue to be useful to its users. Nadia Eghbal’s influential report on the topic opens with the Heartbleed bug in OpenSSL, a F/OSS software library used by two-thirds of websites to handle encryption, leading to the worst security vulnerability in the web’s history [41]. Despite the critical centrality of OpenSSL, the project’s maintainers had long struggled to find the time and money to work on it. Eghbal quotes the lead maintainer’s public post: “The mystery is not that a few overworked volunteers missed this bug; the mystery is why it hasn’t happened more often” [88]. Eghbal’s recent work suggests that small numbers of individual developers often do the bulk of the work in many F/OSS projects and can have somewhat transactional relationships with contributors and users, in contrast to how predominant narratives present F/OSS as composed of large collaboration-driven communities [42]. + + +Our research question asks how the work of maintaining these projects changes as F/OSS projects become key dependencies for others, including well-resourced organizations in and out of the tech sector. We conducted 37 qualitative interviews with current or former F/OSS contributors and maintainers. Our focus was projects that began as purely-volunteer efforts and have since become widely relied upon as infrastructure for other organizations beyond the project. We find that as projects scale across all kinds of dimensions — number of users, contributors, or maintainers; kinds of users, contributors, or maintainers; size, complexity, and features of the codebase; interdependence in a software ecosystem; and more — the work of maintaining the project and the meaning of being a maintainer can dramatically change. Scale brings new tasks and changes the nature of existing tasks. For example, for projects with few or no users, providing technical support to users can be an exciting opportunity to grow the community around the project. Yet for a large-scale +project with millions of users, this can become an overwhelming flood of demands that requires establishing specific rules, roles, and norms, such as developing processes for triaging user support requests. + + +In particular, we find that the ostensibly-technical work of software engineering takes on more organizational, communicative, and even competitive aspects at larger scales. This is a well-established theme about the socio-technical nature of computer-supported cooperative work, of software engineering, and of work in general. However, our study details how the activities and experiences of this maintenance work change as projects grow, develop, and become embedded within broader networks of people, code, money, and institutions, including corporations, governments, academia, non-profits, and other F/OSS projects. We conclude by discussing the “scalar labor” of managing a project as it scales, such as how the deferral of this labor of scaling can create consequences for projects down the line – a problem we term “scalar debt.” Maintenance tasks pile up, requiring a massive amount of often less-visible work to build more organizational capacity to keep up with an onslaught of demands. Finally, we discuss how F/OSS maintainers of popular projects also sometimes face additional work as they become hypervisible and even microcelebrities, which is contrary to how infrastructural maintenance is typically described as “behind the scenes” or “invisible” work. +---------------------------------------- +------------------------------- +Section 70: +2 BACKGROUND AND LITERATURE + + +2.1 Trajectories of F/OSS research + + +Our study took place in 2019-2020, during an era of F/OSS that is different to prior decades, when foundational works about F/OSS proliferated. These classic accounts (e.g. [24]) suggested that F/OSS is composed of ideologically-driven collaborative and voluntary communities, producing public goods intended to supplant proprietary software alternatives. Past work documented F/OSS’s connections to early internet engineers and makers [16], universities and academic research, and an opposition to corporate-friendly copyright and patent law [25, 76]. Academics and practitioners have discussed F/OSS as a social movement, which has splintered, with “open source” rising as a competing movement to free software, one that transformed the original anti-commercial values of free software. [77]. + + +Researchers have focused on how F/OSS contributors collaborate and organize. F/OSS projects that have an “open development model” [53] are studied with other “peer production” [6] communities like Wikipedia or citizen science. Unlike in firms where employees are directed by managers, these projects often rely on self-directed contributions from individuals or individuals working between private industry and voluntary F/OSS contributor communities. Popular accounts often marvel at the relatively high quality of products produced from this ostensibly ‘anarchistic’ approach (e.g. [123], see also critiques from [114, 124, 127]). However, past work has repeatedly shown how this is made possible though less-visible coordination, articulation, and conflict resolution work, done to review, assemble, and align others’ contributions. [18, 31, 46, 69, 83, 124]. On leadership and F/OSS, more work involves predicting who becomes a leader [48, 59] or leaders’ motivations [82], although past work has discussed the roles of leaders, who often resolve conflicts, mentor newcomers, set rules, and organize tasks [4, 30, 74]. + + +In a literature review on F/OSS within and beyond Computer-Supported Cooperative Work (CSCW), Germonprez et al. [57] note that most studies in CSCW and Human-Computer Interaction (HCI) are about either “input” topics like developer motivations or “process” topics about collaboration and governance. They also detail the massive transformation in F/OSS from past decades. Early work often found that contributors were purely volunteers working in loosely-formalized quasi-organizations, operating more by ad-hoc rules. They cite more recent findings showing the +rise of paid roles [109], corporate involvement [10, 47, 56, 100], and more formal organizational structures [20, 45] — from non-profit foundations based on fundraising to revenue-generating business models — which are increasingly the norm, especially in popular and longstanding projects. Finally, Germonprez et al. note that F/OSS projects are often studied as single-project case studies — which [32] also find in their 2012 review. However, they discuss how contemporary F/OSS projects exist in “complex supply chains” [57, p. 9], with multiple cascading interdependencies with other F/OSS projects in meso-level ecosystems. + + +These supply chains are not only to other F/OSS projects. F/OSS projects have complex relationships with the tech industry, governments, academia, and non-profits. Ekbia and Nardi [43] discuss the wide dependency of industrial profit on volunteer or under-compensated labor, as part of a set of practices they call “heteromation.” They argue the global economy relies extensively on digitally-managed forms of un- or under-compensated labor, from user-generated content to microtasking platforms to F/OSS. What has been variously called crowdsourcing [15], cognitive surplus [115], or peer production and the “wealth of networks” [6], Ekbia and Nardi argue can all be understood as heteromation. Computational industries that benefit from this labor are being subsidized by other institutions that support these people working cheaply or for free, ranging from welfare states, universities, family, or charity. While issues of money, financial sustainability, and corporate relationships have long been studied in F/OSS, what is less studied are the lived experiences of F/OSS maintainers as their projects become enmeshed within institutions that can have significant access to financial, social, or cultural capital. + + +2.2 Infrastructural maintenance labor + + +Scholars have long drawn attention to how the work of maintaining technologies is often ignored and neglected. This scholarship notes the importance of maintenance in shaping the forms and functions of technologies beyond moments of invention [39, 111]. Scholars in Computer-Supported Cooperative Work have long emphasized how less visible, ostensibly ‘non-technical’ labor is crucial to the functioning of computational infrastructures, especially the “human infrastructure” in scientific cyberinfrastructure [80]. This work is often underbudgeted because it is unrecognized or undervalued, but necessary to make those systems ‘seamless’ [107] to their users. Eghbal draws more on metaphors of public infrastructure such as road and other public works to suggests that F/OSS is infrastructure needs to be considered through that lens [41]. + + +Following Jackson’s call to take maintenance and repair as the essence of technology [73], empirical studies of such practices have become more common in many areas. One theme is that tasks and roles construed as “maintenance” or “repair” often involve responsibilities beyond the technological, particularly in maintaining and repairing social and institutional relationships [37, 66, 71, 72, 110]. Infrastructure and maintenance work is also often discussed alongside other work sometimes invisibilized because of gendered or classed assumptions about its nature and importance [44, 51, 70, 117]. + + +For example, Orr’s ethnography of photocopy repair technicians showed how they can serve as the customer’s primary point of contact with the photocopy company, managing that relationship more than designated account representatives [99]. Suchman’s analyses of expert systems emerging from Xerox PARC also found that computer engineers underestimated the complexity of what secretaries do [121]. However, few in this literature have examined cases in which maintenance work becomes highly visible and even constitutive of leadership, as it can be in large-scale F/OSS projects. + + +While there is often an impulse to always make work visible, other literature show how making this work more visible can come with regimes of surveillance, micromanaging, or self-censorship [14, 97, 120]. +2.3 The many meanings of “scale” + + +We are interested in how the work of maintaining F/OSS projects changes as projects scale, but what exactly is scale? In and out of F/OSS, it is common to refer to organizations, communities, and platforms as being “small scale” or “large scale,” which often compresses many aspects into a single term. In CSCW and HCI, scale is often a synonym for number of users, where classic work designed systems intended to operate with a pre-defined range of simultaneous users [e.g. 61]. In anthropology, Carr and Lempert [122] argue that when people use terms like “large scale” or “at scale,” they often are intuiting a kind of synthetic construct that combines multiple related but distinct measures; in our case, like number of users, number of user organizations, kinds of users, interdependence in an ecosystem, number of contributors, and so on. + + +Recent work in CSCW has similarly identified more multivalent understandings of scale. These include Lee & Paine’s “model of coordinated action” [81], in which they identify seven different dimensions along which software-mediated organizations can range: number of participants, number of communities of practice, physical/geographical distribution, nascence of routines, planned permanence, rate of turnover, and level of a/synchronicity in interactions. Our findings relate to both how these specific dimensions emerged as relevant for F/OSS maintainers, as well as their insight that shifts in each these dimensions can occur independently, but shifts in one dimension can also impact or depend on all the others. Studies of scientific cyberinfrastructures more widely demonstrate this theme, where scaling also includes integrating with interdependent projects and standards [3] – often called “embeddedness” [9, 40, 117] — and supporting a wider range of use cases over longer periods of time [75]. These studies have shown the various “tensions across the scales” [105] that emerge as projects grow. +---------------------------------------- +------------------------------- +Section 71: +3 METHODS AND METHODOLOGY + + +3.1 Research methods + + +This qualitative research is primarily based on semi-structured interviews with 37 maintainers of F/OSS projects in 2019-2020. Interviews lasted a median of 55 minutes and covered a range of open-ended topics, including top-level questions on: the interviewee’s personal history in F/OSS, the kinds of work they do, how their roles and participation in the project has changed over time, governance and decision-making, funding and financial sustainability, motivation/demotivation and burnout, careers, how technologies and platforms impact participation, and work/life balance. + + +As is common with non-random sampling in qualitative research, we sought to strategically sample for diversity across many dimensions [95], rather than seek the kind of random uniform representative sample common in survey research. We specifically choose to recruit and interview a broad set of maintainers that varied across geography, national origin, age, employment status and sector, and gender. We made these efforts to sample for demographic diversity, while reflecting on how structural problems such as the gender gap among F/OSS contributors [38] present challenges to recruiting a diverse sample. We did not originally ask interviewees their demographics, but sent a post-interview survey, which 85% completed. For gender, 19% identified as women/female, 81% as men/male, and 0% as non-binary or other. For race/ethnicity (which allowed multiple selections), 72% identified as white/Caucasian (66% exclusively so), 16% as Hispanic/Latinx, 13% as Indian/South Asian, 6% as East or Southeast Asian, 3% as Black/African, and 3% as other. Interviewees were born in 14 different countries on 5 continents; the US was the most common with 47%. Interviewees currently reside in 12 different countries on 5 continents; the US was the most common with 56%. Ages spanned 25 to 64 years old, with 53% aged 30-39 years old. + + +We also sampled for diversity to recruit maintainers of different kinds of F/OSS projects. The projects whose maintainers we interviewed range from having a single developer-maintainer to +hundreds of contributors, and have a similar variance in terms of number of users. Some have existed for decades and have complex governance structures, roles, and norms, while others are relatively new. These projects represent a range of topical areas and locations within the technical stack, including operating systems, programming languages, software libraries, development environments, web frameworks, servers, databases, packaging, data analytics and research computing, devops, electronic arts and media, and more. Our focus on ensuring that our interview pool included maintainers from projects across these dimensions of scale follows from existing work on the importance of scale as a component of ethnographic work on CSCW infrastructures [101, 104] and broader globally-distributed phenomena [86]. + + +Our recruitment methods involved utilizing our existing personal networks, attending F/OSS conferences and events, a call for participation shared on Twitter, and cold e-mailing F/OSS maintainers. We also conducted snowball sampling, asking our interviewees to suggest potential interviewees to us. To help our sampling for diversity, we utilized techniques similar to “trace ethnography” [55] in our recruitment methods to identify potential maintainers to recruit, based on available user data on social coding platforms including GitHub (see [34, 36, 126]. We identified core contributors through GitHub timelines, recent commits, and release notes. Our interviewees generally self-identified as current or former maintainers on their own terms. We interviewed maintainers who hold various roles within a wide range of F/OSS projects. We particularly focused on projects that have become relied upon as infrastructure by others and either are or began as largely based on volunteer labor. However, as we asked maintainers about all the projects they had worked on, we encountered projects beyond this. + + +3.2 Methodology and interpretive approaches + + +Our interpretive approach is grounded in symbolic interactionism, which focuses on how actors organize their interactions with the world, with one another, and with themselves through categories that emerge in their social worlds [12] and in wider discourses [22]. We have transcribed and inductively analyzed the interviews for themes using a grounded theory approach [23, 119], which involves a multi-stage process of coding statements with iteratively-generated themes. These themes identify social processes in common across our organizational site, generalizing across specific, local experiences while remaining bound to the particularities of the cultures and work processes under examination. + + +As we conducted interviews, many participants reflected not just on their own practices, but on the practices of others, including their broader theories about the political economy and history of F/OSS [67]. As we move from research to findings, we reflect upon how participants may have related information to researchers based on what they understood the study was about and who the research might affect, and reflexivity as a “recursive” [76] pattern in F/OSS communities [68]. Through member-checking in the form of sharing transcripts and our findings, we gave participants opportunities to give feedback and engage with our interpretations. + + +Our relationship to these communities is neither purely as outsiders or insiders. Our project was funded by non-profit foundations who also directly fund F/OSS projects. We have all worked as former or current F/OSS contributors or maintainers and have regularly attended various F/OSS-related meetups and events. All the authors are ethnographers embedded in larger ongoing research projects in this area — either in F/OSS projects themselves or in organizations that rely on and/or contribute to F/OSS projects. The data we present in this paper is centered on our set of 37 interviews, although our broader ethnographic experiences have informed both the kinds of questions we asked and how we interpreted interviewees’ responses. +4 FINDINGS + + +4.1 What is a F/OSS maintainer and how does it change as projects scale? + + +“Maintainer” has a meaning within F/OSS that it rarely has in other technical domains. F/OSS maintainers do perform upkeep and repair work, but the term also usually connotes a leadership role, as [42] also discusses. This leadership role is often enacted through access permissions to the project’s code repositories: becoming a maintainer typically involves being given the technical capacity to make changes to the project’s code. This includes the capacity to approve or reject proposed changes (called “pull requests” in GitHub-style platforms) from non-maintainer contributors, as well as the capacity to moderate the issue trackers where anyone can report a bug or request a feature. Beyond this use of access permissions to formalize maintainer status, the role of a maintainer (and the use of this term) varied widely, which [42] also finds. Like in firms, organizations, and social movements, F/OSS projects range widely in size, scale, complexity, popularity, and interdependence. This makes it difficult and unwise to make overarching generalizations. We instead illustrate how maintainership differs across different kinds of projects, particularly focusing on how the labor of maintaining a F/OSS project changes as projects develop, grow, and scale. + + +In projects we encountered with fewer contributors, there was usually one individual in a more singular leadership role, who leads by doing a majority of the work. The most common term interviewees used to refer to such an individual was “the maintainer,” although interviewees in corporate or academic institutions also noted that they sometimes code switch and use titles like “project lead” or “project manager” depending on the environment. In projects with a larger number of contributors, multiple maintainers often shared these responsibilities, sometimes with formal divisions of labor. In such projects, the leadership aspect of being what is sometimes called a “core maintainer” is analogous to being on a steering committee where decisions are made by consensus or voting [112]. + + +However, even in projects with many maintainers, there was often a primary leadership role, typically held by the original creator or someone who took responsibility after the creator departed. One common term for this is the “benevolent dictator for life” or “BDFL,” although one maintainer with such a role we interviewed described this more as “the person that tends to feel the most ownership for things that go wrong.” Finally, while most projects we encountered used “maintainer” to describe these roles, some instead used “core developer” to signify this dual upkeep-leadership role, as [42] also discusses. + + +In the following sections, we identify kinds of tasks that are involved in F/OSS maintainer positions that change as the project grows in interdependencies, complexity, and users. Some of these only become apparent and necessary at certain scales, such as organizing events or coordinating with other F/OSS projects. Other tasks occur at all scales, but can become quite different at larger scales, such as providing support to users, fixing bugs, and developing new features. We do not intend this to be a comprehensive survey of maintenance work in F/OSS, and so focus on the relationship between scale and labor. + + +4.2 Maintaining Users: User support + + +When we asked our interviewees about the work of maintaining their projects, user support was a major topic. For the maintainer of a new project with few users, the first user who asks for help or raises an issue is a sign of validation and success. F/OSS projects are meant to be used, and a common attitude from maintainers of smaller projects was that whether the user’s issue was due to their misunderstanding or a bug in the software, the maintainer can learn something from it. Maintainers told us how some users who ask for help become contributors and co-maintainers, or alternatively donors and patrons. Yet our interviewees who maintained large, well-known projects +with many users identified user support as an overwhelming and never-ending chore, particularly for projects that use GitHub-style collaboration platforms. One interviewee stated that “user support is something I do very regularly during the evenings during the week, or during the weekends. That actually takes a large chunk of my free time.” For these maintainers, user support is an around the clock reality of their position. + + +Requests for user support come through many channels, including messages sent to their private e-mails and social media accounts. While users often seek software help in Q&A sites like StackOverflow, a project’s maintainers are not generally obligated to be present in those spaces, although some do [134]. GitHub has allowed any user on the web with an account to open an issue for a F/OSS project hosted on that site. The number of open issues, potentially numbering in the thousands, is prominently displayed on the project’s landing page, creating reputational pressure for the project. Managing and triaging the issue queue was often identified by our interviewees as a major task for maintainers of large-scale projects, although there was variation in the level of obligation. Some stated that maintainers may not have an obligation to actually fulfill the requests in the issue, but did have an obligation to respond and acknowledge the issue in a timely manner – sometimes described as within 24 or 48 hours, though others said that a week or more was acceptable. + + +Eghbal suggests maintainers engage in the “curation” of user and contributor interactions under intense time pressure [42]. As projects grew, larger projects often implement rules, recommendations, and even templates for raising issues. For example, it is common for larger projects to actively discourage using issues to request assistance in using properly-functioning parts of the software. However, a common tension arises when users report what they experience as bugs, but what the project’s contributors and maintainers see this as the software operating properly. Maintainers of large or central projects we interviewed told us about users who were “disrespectful,” “entitled,” or “demanding,” of their time and attention. This is also a growing topic of public discussion within F/OSS, with several talks and articles about how to be respectful to maintainers [19, 62, 79, 113]. + + +As the work of investigating, triaging, and resolving issues intensified, maintainers reported feelings of demotivation, exhaustion, and burnout. This was especially common for maintainers of larger projects, who often mentioned user support as one of the more emotionally-intensive aspects of being a maintainer. As one interviewee discussed: “I think burnout can come from a lot of different things. It can come from constant bombardment of issues and notifications and you’re constantly reminded of the things that you’re not doing.” Several interviewees noted that the way users interact with contributors and maintainers was crucial, with a few kind words or less-demanding phrasing going a long way for maintainers, which [131] also finds. However, this can be complicated in the global landscape of F/OSS, with several interviewees discussing cross-cultural or language barriers. + + +As projects scaled, interviewees described how reciprocity affected how they felt about F/OSS work. One interviewee stated that a top priority for them would be when the user requesting support is another F/OSS contributor in a related project seeking to fix a genuine bug that affects the ability for the two projects to be used together. Interviewees also expressed enthusiasm for supporting educational institutions, if an educator was having issues as part of teaching the software in their class. In contrast, maintainers we interviewed expressed frustration at the user support work generated when a large tech company integrated the F/OSS project as part of software they were building and selling — particularly if this company had not “given back” through financial donations or in-kind donations of labor (e.g. their developers regularly contributing to F/OSS projects). One interviewee noted that demanding free-riders are not limited to for-profit corporations, as some academic researchers had behaved with similar attitudes. The difference between how maintainers framed their experiences as collective work or exploited labor related to +the social and communicative relationships maintainers had to those with whom they collaborated, coordinated, or contributed. + + +4.3 Maintaining “Mainline” Code, Scaling Trust + + +F/OSS maintenance is not only about repairing and fixing. It is crucially about updating and changing to stay relevant. As the project grows, users expect a canonical version of the software, even as the number and diversity of contributors and user needs might expand. As contributors scale, maintainers must devise ways to scale trust. Version control practices are central to managing changes, especially with many contributors. The open development model of contemporary F/OSS projects typically involves a contributor making their own copy of the entire codebase, making whatever changes they see fit, and then submitting their modified version for review, approval, and merging. Traditionally, a maintainer decides what patches to accept and keeps a canonical version of the source code, regularly making public releases to keep the rest of the project up to date. + + +Smaller projects typically begin with a single maintainer, but as they begin to get more and more proposed contributions, some solo maintainers give a regular contributor commit rights, maintainer status, and let them manage specific releases. However, the founding maintainer must trust this new maintainer, because by default, they have the full technical privileges to accept any proposed changes. In one case mentioned in our interviews, a solo maintainer had been unable to spend as much time maintaining a project. When someone they did not know asked to be a co-maintainer, they happily accepted. However, this new maintainer added code to the software that silently used users’ computers to mine cryptocurrency and deposit the profits in their account. + + +As projects scale the number of maintainers, code review processes are a common way of producing trustworthy code. Code review is the process by which one or more designated individuals who did not author the change have to approve a pull request before a maintainer can accept and merge it. The process is somewhat similar to academic peer review, especially in that there can be many cycles of review and revision between the code reviewer(s) and the original author. Code reviewers typically read through each line of code for specific issues, with contemporary social coding platforms supporting fine-grained line-level comments. Code reviewers typically look for bugs and inefficiencies, plus conformity with the project’s code style, naming conventions, and approach to modularity. In some projects, only maintainers can do code review, but others allow a wider set of trusted non-maintainers to participate. + + +In smaller projects, code reviews might be informal and implicit, but formally specifying such rules can be a crucial aspect of scaling a project’s number of contributors, code reviewers, maintainers, and codebase. As projects grow, maintainers have to devise ways distributing the work of review so they do not have to recheck submissions to the mainline or canonical version. For example, the Linux kernel still uses this mode of development, in which only its creator and lead maintainer Linus Torvalds can accept patches to the official “mainline” codebase. This was much easier when the project was much smaller, but it has since grown to many thousands of contributors. The project has developed a cascading “chain of trust” [28] where a top tier of subsystem maintainers are responsible for various sections of the codebase, making the decisions about what patches to accept. Some subsystem maintainers delegate responsibility further or have their own processes for making decisions about their part of the codebase. As the Linux kernel’s documentation describes: + + +“...top-level maintainers will ask Linus to ‘pull’ the patches they have selected for merging from their repositories. If Linus agrees, the stream of patches will flow up into his repository, becoming part of the mainline kernel. The amount of attention that Linus pays to specific patches received in a pull operation varies. It is clear that, +sometimes, he looks quite closely. But, as a general rule, Linus trusts the subsystem maintainers to not send bad patches upstream." [28] + + +Version control and code reviews may seem purely technical, but they express the direction of the project. Authority to merge changes often means authority to set and enforce a specific vision of the project. Because of the necessity of keeping a single canonical code repository in this more traditional approach, the model of having a single lead maintainer who is ultimately the final decision-maker became widely prevalent in F/OSS, known as the “benevolent dictator” or “benevolent dictator for life” (BDFL). In one interview, a maintainer described a tense environment caused by the transition from a BDFL model to a more democratized system of decision-making. In essence, interpersonal relationships were strained when maintainers sought to democratize leadership roles within their project, after contributors felt their own speed and progress was limited by the power of the BDFL to veto group decisions and the BDFL resisted change. + + +4.4 The Labor of Managing and Maintaining Donations of Labor + + +Those who contribute code to F/OSS projects are donating the products of their labor, but those donations also generate new work for maintainers. When discussing the subject of what changes to merge, maintainers of projects with more contributors told us about perceived mismatches in expectations from non-maintainers who made the proposed changes / pull requests. In these cases, a non-maintainer added or expanded the code in a way they found useful. They contributed this code through a pull request, feeling they have generously donated time, effort, and intellectual property. However, from the maintainer’s perspective, the new pull requests were a heavy obligation, both in the time to review them and the long-term costs in maintaining this code indefinitely. Wiggins [132] discusses a similar trend in citizen science as “free as in puppies,” where such donations commit the recipient to care for it for years, even if the value of the contribution is uncertain. + + +As such, our interviewees who maintained F/OSS projects with more contributors mentioned the importance of rules requiring that new code follow certain standards that make the code easier to review and maintain. Maintainers we interviewed from projects with more rules and procedures around merging changes told us about cases in which the contributor became increasingly frustrated with what the maintainer was asking them to do in order to approve the proposed changes. In some cases, the contributor abandoned their contribution and the project altogether. In addition, sometimes the pull request perfectly conforms with all rules, but would take the project in a direction that the maintainers have decided is out of scope, and is thus rejected: + + +“As a maintainer, you don’t just merge things. You also try to be a thought leader for what’s happening and why you’re going to go a certain direction or not — and being able to politely say ‘We don’t want to go that way’ for things you know you don’t want to. That’s the hardest thing, I think, because you can have situations where someone is earnestly trying to add something and you don’t want to shut them down, but sometimes it’s something that was declared you weren’t going to do in this project...” + + +Because of the open contribution model in which anyone on the web can propose new changes, the work of being a maintainer of a F/OSS project typically involves a substantial amount of labor in managing the labor and even emotions of others. For our interviewees, this emotional labor was one of the more difficult and draining aspects of their position. + + +4.5 Scaling through automation: continuous integration, build systems, and testing + + +As projects grew in contributors, “continuous integration” (CI), build systems, and code testing were ways of automating code review and testing. CI involves automated processes that build a project’s codebase across multiple platforms, then run pre-written, scripted tests to check if it is functioning... +Automated "linting" practices even check for proper formatting and style conventions within projects. CI services are directly integrated into GitHub-style platforms, which is one of the many ways bots and software automation can govern and transform virtual organizations [54, 107]. A 2016 survey found that 40% of the 34,000 most popular F/OSS projects on GitHub use CI, rising to 70% when examining the 500 most popular projects. [63] + + +The number of CI tests that are run can be staggeringly large. In the Python programming language’s standard math library, there are currently 134 different “unit tests” that check the inputs and outputs of the square root (sqrt) function alone. Major software libraries for programming languages can have tens of thousands of tests, and programming languages themselves can have hundreds of thousands of tests. These can be computationally intensive, a point we will return to. + + +For many maintainers of projects at all scales, continuous integration, build systems, and testing was a major strategy they relied on to automate the labor-intensive and interpersonally-intensive tasks of code review. This was particularly the case in rejecting code requests: + + +“The other thing we’ve found is that if the computer tells them it’s wrong, they take that better than if a human does. So, the more automated things you can do that will catch low-hanging fruit, the less offense it causes people. So [the] computer says, ‘you’ve got a tab instead of spaces here,’ they don’t mind that. But if someone tells them that, they get grumpy about it.” + + +Maintainers also described CI and testing as a strategy they relied upon to help them manage their workloads. In several interviews, when we asked about avoiding burnout, work/life balance, or advice to give to new maintainers, these strategies were the first responses. One interviewee who helps manage a large ecosystem of projects discussed how they require these kinds of measures for all projects in that ecosystem, such that “there are packages that I maintain that I have not updated in two plus years, because things just work. When something breaks, I will get an email.” + + +Like all automation, these strategies redistribute and generate new forms of labor. Tests must continually be written and updated, especially when new features are added. Some projects require that new functions or features cannot be added without also adding appropriate levels of testing. Those gifts that create labor could be mitigated when they come with testing. As projects become increasingly complex, however, new kinds of integration tests must be written to check that the various subsets of the codebase work together. Testing also grows as projects become more integrated within an interdependent ecosystem of projects, all depending and relying on each other. It is increasingly common for projects to test that proposed changes do not break anything in other projects in the ecosystem. Although developing and maintaining these CI processes is itself a labor-intensive process, the work can be distributed to contributors who create automated tests for the code they write, rather than code reviewers responsible for catching bugs. In this way, it can help distribute maintainer labor out more widely as the project scales in terms of its codebase, featureset, contributor base, and interdependence within a software ecosystem. + + +Yet automation is computationally expensive, which can challenge F/OSS values that privilege relying on free and open platforms. A 2017 blog post about testing in the Rust programming language [2] reported that over 126,000 total tests are run for each proposed change/pull request across 20 different software configurations, with each taking 2 hours of computing time. The post references that the project only has resources to run one or possibly a few tests at the same time. Each additional pull request adds to the queue, meaning that contributors may have to wait for days to see if their proposed change breaks the testing suite. The author describes how longer queues can lead to conflict between contributors, who all need the CI system to approve their changes before they move on to the next stage of code review and approval. + + +[2]https://github.com/python/cpython/blob/master/Lib/test/cmath_testcases.txt +The only way for CI in some of these projects to scale up is to use either commercial cloud computing or a self-hosted server cluster. Contributors are supposed to run the full test suite on their own computer before submitting pull requests, but it is important for the project to test it in a wide range of configurations, as well as have a common public infrastructure that verifies the tests actually passed. GNU GCC, part of the free software movement, maintains their own distributed “compile farm,” with donated servers hosted by those in the movement.(^3) Commercial CI platforms have been growing, including Microsoft’s Azure Pipelines (which runs on Microsoft’s cloud computing infrastructure) and services from the venture capital funded companies CircleCI and AppVeyor (which run on cloud computing infrastructure from Google, Amazon, or Microsoft). These commercial CI platforms often give public F/OSS projects a free single CPU to run a single test at a time (Microsoft Azure currently gives 10 free simultaneous tests to F/OSS projects), but charge for more simultaneous tests – a necessity as projects grow in complexity. These commercial CI infrastructures challenge open source cultures that privilege creating autonomous, freely available infrastructures using freely available infrastructures, what anthropologist Chris Kelty has called a “recursive public” [76]. + + +The decision to go beyond the free tier of CI services can be the first time a F/OSS project takes on a recurring financial expense, driving organizational changes and work. For smaller projects, free tiers are often sufficient. As projects grow their codebase and contributor base, they have to decide whether to fundraise to pay for more simultaneous tests or to deal with the strains of not being able to easily verify that proposed changes do not break the project. For those who fundraise for more CI resource, this can require projects to add accounting and financial roles (although events are more often the first time this occurs, which we discuss in a later section). Even the non-commercial, self-hosted alternative that projects like GNU GCC follow also require dedicated maintenance roles and the soliciting of donations to fund the self-hosted compile farm. + + +4.6 Ecosystem work: from interdependence to competition + + +One major dimension in which F/OSS projects scale is their interdependence on other F/OSS projects, which can take a variety of forms. First, they can become relied upon as critical infrastructure by other F/OSS projects. This is especially the case for software libraries, programming languages, and operating systems. Here, the number of users typically means the number of other developers building software using the F/OSS project. The chain of cascading dependencies can grow quite complex: a program may only rely on a few explicitly imported dependencies, but those projects and their dependencies can make the chain hundreds of projects long. + + +Maintainers must manage relationships with projects that are both “upstream” (that they depend on) and “downstream” (that depend on them). If a project modifies its features, it can break downstream projects that are expecting this feature to work consistently. This is one common use for continuous integration, in which a project will regularly test its own functionality on beta or release versions of upstream projects it relies on. There are many issues that arise in interdependent software ecosystems beyond these more fine-grained issues around compatibility between new versions of software. One maintainer shared the difficulties that arose when a project they depended on began to face internal conflict, which forced them to either pick a side or do additional work to maintain compatibility with two projects. + + +Complex and interdependent F/OSS software ecosystems often find themselves needing to coordinate on high-level tasks and decisions. Conferences and conventions can be a major site to do this work. Some interviewees even described these ecosystem-level conferences in similar terms as political delegations, such as referring to a perceived need to send representatives. Some + + +(^3)https://cfarm.tetaneutral.net/ +have large blocks of time dedicated to open discussions on topics relevant to projects across the ecosystem. Such ecosystem-level topics include more software-specific issues that would require consensus to implement new features across an ecosystem, such as issues in such as packaging and release managers, data types, hardware support, or user telemetry. + + +Another major, perennial, and often controversial ecosystem-level topic is the proposed consolidation of related or competing projects within the same ecosystem. It is often common in F/OSS ecosystems for many projects to be created that solve a similar problem. There may be good reasons for multiple related competing projects to circulate in the same ecosystem, but navigating a crowded ecosystem can be confusing and frustrating for both users and developers. In such ecosystems, it can be common for someone to suggest there are too many competing packages and there needs to be a consolidation. + + +One interviewee shared a case where, during an open discussion session at an ecosystem-level conference, the maintainer of one project declared that some of the other competing projects in this ecosystem “needed to die,” seeking to gain support for consolidation then and there. No consensus was reached, but it raises the issue that high-intensity communicative work and representation at such meetings becomes a solution to some vying to keep projects alive within an ecosystem, and to secure funds. A project’s maintainers have can certainly decide to keep their project operating in an ecosystem that has decided to consolidate around another competing project, but they will find themselves with fewer and fewer users. + + +4.7 Growing a Community by Evangelizing + + +At a developer conference we attended for a particular subset of a F/OSS ecosystem, we observed a dedicated plenary session for 2-4 minute lightning talks, which were almost all pitches for a F/OSS project the speaker had developed. Many projects pitched were newer, less-established, and had not secured spots at the competitive conference, which some speakers noted. Maintainers asked for others to use their projects, with one explicitly imploring the audience to “rely upon” their project: to integrate it into their workflows and F/OSS projects. We asked about the role of this ritual: Why is it important to have a space for maintainers to convince others to use their often-fledgling F/OSS projects? And why did so many of them take the form of a “rely upon me” pitch, particularly when their project was not quite yet fully developed? + + +Our interviewees called these “rely upon us” pitches “evangelizing” and they brought it up frequently, in response to questions about scaling or even about general strategies for maintaining a project. In computing in general and F/OSS specifically, evangelizing is widely-used to describe efforts to sustain and maintain projects by bringing in more people [1, 85]. Maintainers repeatedly told us how important it was to have fellow contributors and maintainers to distribute the work and make their project sustainable. A key rationale was that users who rely on projects presumably become invested in those same projects’ success — particularly when those users are also F/OSS developers or well-resourced organizations. When a project is used and relied upon by programmers and software firms, it gains access to a potential pool of skilled labor and resources. More contributors and users also make projects appear more successful [5], and thus worthy of funding by entities who fund F/OSS. Like many startups, projects often signal their credibility through showing the logos of well-known companies and universities that rely on their projects on their websites. + + +The tasks which constitute this vast domain of evangelizing can include: developing social media accounts for F/OSS projects, maintaining educational resources and documentation, building and updating websites with F/OSS project information, moderating and building Q&A sites and forums, and giving talks at meetups, conferences, companies, and schools. The promotional communicational, educational, and evangelical activities done by maintainers F/OSS lay the conditions for +further expansion and infrastructural change — and more maintenance work. Some interviewees said they deeply enjoyed evangelizing work, while others described it as an exhausting task outside of their expertise. We also heard that highly-visible evangelizer-maintainers can receive personal credit for all the group effort in a project. This Matthew effect [93] of accumulated status can generate tensions in the project, even if they do not intend for this to happen and actively work to elevate other contributors and maintainers. + + +4.8 Building and Maintaining Relationships: Meetups and Events + + +It is common for F/OSS projects to hold in-person meetups and events, often to get new and/or existing contributors together to accomplish work and build relationships. These events vary widely, from conferences to hackathons to happy hours. Our findings align with existing work that has also found these events play a critical role in developing trust and maintaining positive, lasting social relations [90, 91, 104]. Maintainers described in-person events as having an essential function and critical value in helping maintainers build good relationships, which helps them better understand each other in virtual environments [26]. Several longtime maintainers of major projects told us stories of their first F/OSS event, which they claimed inspired them to get even more involved in F/OSS in a way that years of online conversations had not. Past work has described the gendered labor women take on to organize events that often goes unacknowledged when technical contributions are valued above social and organizational work [92], which our participants also reflected upon. + + +Projects with many contributors, resources, and institutional connections often run their own major conferences, while smaller projects often meet up in more ad-hoc events. Smaller projects also rely on ecosystem-level conferences, in which those from various related projects (e.g. those written in the same programming language or that serve similar purposes) organize collective events. These ecosystem-level conferences often have dedicated periods for projects of various sizes to hold their own events. As projects grow the number of contributors, many move from holding satellite events before or after major F/OSS conferences to their own conferences. + + +In both our interviews and our observation of these events, we found that maintainers were often key organizers at events for smaller projects, although many larger projects with more capacity to fundraise hire dedicated event organizers. Projects that make connections to companies or universities often get in-kind donations of space and event organizing labor. Major projects with many users, contributors, and maintainers hold events that are more like trade conventions, with thousands of attendees and high competition for speaking slots. Some companies even specialize in hosting F/OSS conferences on behalf of projects. The companies then take a portion of the registration fees, which for larger projects can be over $1000 USD. + + +Interviewees told us it was often easier to get companies to fund events than anything else – testing infrastructure, time for labor, or the other costs that can accrue for a project. Many ecosystem-level F/OSS events are directly sponsored by companies that either rely extensively on F/OSS projects or are business arms of F/OSS projects. Maintainers discussed the labor associated with events as projects scale, as holding events for more and more people becomes increasingly difficult and costly. We also heard struggles maintainers face when the project has users, contributors, and maintainers across the world, which is another form of scaling that is often seen as a key metric of success. In-person events for a global community require more work, skills, and resources, including tasks like visas and fundraising for travel grants. + + +Some maintainers we interviewed shared how they spent significant amounts of their personal money on events they organized, particularly when funding promises fell through or when costs exceeded budgets. While expenses were sometimes reimbursed through donations, the actual labor of organizing events was less likely to be compensated or recognized. Some maintainers took issue +with restrictions from donors who would not fund stipends or even travel funds for those in their projects who organized an event, even if excess funds were available for expenses such as attendee travel, venues, and catering. + + +4.9 Funding, finances, and donations + + +Funding, finances, and donations were a major topic in our interviews, which has become the widely-discussed in broader public conversations in F/OSS. Our interviewees described funding as a way to compensate for labor already performed in maintaining F/OSS projects, as well as to pay contributors to perform tasks that are not done voluntarily. However, we found that fundraising and fund management itself can involve substantial amounts of unanticipated and specialized labor, including seeking funding, writing proposals and budgets, accounting, reimbursements, and managing relationships with funders. There is a strong parallel here to other non-profit sectors, including academic research, charities, and political organizations, where the work around funding can become a substantial fraction of the work performed in maintaining an organization. + + +In academic research, however, scientists learn through their training that running a lab means maintaining a funding pipeline through networking, applications, and rejections [96]. Maintainers not trained in this system can be caught unprepared by the mismatch between their vision for the project and the reality of getting funding. Even for those who do not receive funding, the mere availability of potential funding can shift maintainers’ and projects’ conceptions of what F/OSS is and how their life and work fits into it. Funding can also push projects to develop more formal organizational structures, which in some cases can be at odds with their existing governance style and ethos. + + +4.9.1 Fundraising for maintenance: patronage and business models. + + +We found two general approaches maintainers took to funding: patronage models and business models. In patronage, maintainers solicit donations or grants, while business models involve a range of strategies to sell services on top of F/OSS projects. While it may seem more obvious that building up a business involves a substantial amount of work and start-up costs, patronage models also can involve massive work in getting patrons and maintaining a good relationship with them. Companies, foundations, governments, and individuals all donate, but can have idiosyncratic processes around those donations. Managing the patron relationship can be a complex task as projects gain more patrons, particularly if patrons have contradictory expectations. + + +In our interviews, maintainers for projects that had funding to regularly hire multiple full-time employees expressed a common sentiment: they found themselves doing less and less work on the software project itself, and more and more work on seeking funding and managing the project. While we especially saw this with maintainers of large-scale projects in academic settings, we also encountered it in non-academic projects. Interviewees who actively sought grants and patronage told us that both grant agencies and other patrons often only fund novelty and new features, not the necessary upkeep, repair, security, and compatibility work. Maintainers struggle with producing the visions of novel innovation needed to get funding, which is a common theme in scientific cyberinfrastructure [9] and public works [39, 111]. + + +This work around soliciting funding is not just about the raw amount of time or energy that maintainers spend trying to write proposals or find interested donors. There can be a heavy personal burden when maintainers become responsible for the livelihoods and careers of real people they have hired. One of our interviewees — an academic researcher whose grants fund employees to work on F/OSS — explained how getting funding can create an obligation to continue to get funding, to keep supporting people they hired. We particularly heard this in more academic-aligned F/OSS projects, where hiring graduate students or postdocs to work on F/OSS projects is common. +4.9.2 Money changes everything: the labor of spending. Once funding has been obtained and money is in some kind of account, the question of distribution and governance arises. Smaller projects grapple with learning how to navigate non-profit and for-profit laws around hiring, accounting, and taxes, requiring the project to bring in more kinds of expertise. As projects fundraise, maintainers can find themselves with obligations to expand their decision-making to include funders or those chosen by funders. This can be informal for smaller projects, but become more explicit as projects scale their fundraising. For example, the Linux Foundation has a Platinum membership level that costs $500,000 USD annually, and its corporate charter holds that about 80% of its Board of Directors are chosen by the Platinum members [50]. + + +With projects that receive grants from more traditional foundations (whether private or public), the grant proposal already specifies how the funds will be spent. However, many projects receive more ad-hoc funding from donors who do not require extensive budgeted proposals, especially those that solicit funding through Patreon-style platforms like OpenCollective or GitHub Sponsors. As one maintainer explained, getting the funds was the easiest part: + + +“...we created an OpenCollective, and a bunch of companies have contributed to it, but we didn’t really address the issue of how to disburse the funds. [...] No real system for figuring out how to spend it. [...] Do we pay them [contributors] money for that one pull request that they did? [...] The problem doesn’t solve itself just by the existence of money that’s available for the project. There still has to be a mechanism and a policy for, like, distributing it among the people on the project.” + + +The introduction of money into the project can bring the social relations of collaboration into conflict with labor and trade agreements. F/OSS projects often try to hire their long time volunteer contributors, no matter where they live. This means navigating through labor laws, varied immigration statuses, banking networks, or sanctions, which can be far more restrictive for some kinds of contributors than it is for others. Funding, then, transforms the structure of the organization, the possible formations of open source community, and what kinds of collaboration can sustain the maintenance of the project. +---------------------------------------- +------------------------------- +Section 72: +5 DISCUSSION: LABOR AND SCALE IN MAINTAINERSHIP + + +Our findings speak to two distinct but linked issues in F/OSS: labor and scale. Before we discuss more specific implications of our findings, we reflect about what we mean by the multi-faceted term “scale.” Through our interviews about topics of labor, it became apparent that scale was clearly important, introduced by interviewees in response to a wide range of questions about many different aspects of their work and positions. Based on our interviews, “scale” can refer to: the number of people who use the software; the use of the software within large and/or prestigious organizations; the number of contributors or maintainers in the project; the number of bug reports, issues, and/or proposed changes made; the geographic distribution of users, contributors, and/or maintainers; the amount of rules and governance procedures; the number of communication channels used; the amount and/or rate of internal and external communication; the size, complexity, or features of the software code; and the interdependence of the code within a broader software ecosystem. Scale was also invoked as a more holistic feeling, particularly from those who described how they felt their project had grown too much too fast, making scale closer to a signifier of affect, as [122] also describe. Our findings advance work that interprets scale as a multidimensional quality beyond the number of users/participants [81], and methodologies that use participants’ multiple understandings of scale as an analytic resource [86, 104]. +Table 1. Summary of forms of work with examples of how they change as F/OSS projects scale. + + +| At smaller scales | At larger scales | +|-----------------------------------------------------------------------------------|----------------------------------------------------------------------------------| +| +The maintainer(s) + | Many maintainers with various divisions of labor, hierarchies, and organizational structures. | +| Solo or lead maintainer who makes all or most decisions, often by doing most of the work on their own. | An overwhelming flood. Work has established rules, teams, and triaging. | +| +User support + | | +| An opportunity to retain and recruit new contributors. Work is ad-hoc. | An overwhelming flood. Work has established rules, teams, and triaging. | +| +Managing software development + | | +| Governance is often implicit and led by a lead/solo maintainer, who accepts or rejects all proposed changes. | Governance is often explicitly discussed, with a variety of formal rules and structures for decision-making. | +| +Code review and testing + | | +| Either no automated tests or lightweight tests managed by the lead/solo maintainer. | Widespread use of tests to review proposed changes and enforce rules. Managing testing is a dedicated role. | +| +Ecosystem-level work + | | +| Projects may rely on other more-established projects and have to adapt to changes made “upstream.” | Projects are embedded in an interdependent ecosystem, which must coordinate to ensure compatibility. | +| +Evangelizing + | | +| A crucial task to get new users and contributors. Lead/solo maintainer must work to get speaking spots. | Maintainers are routinely invited to speak at conferences and prestigious organizations, some are celebrities. | +| +Meetings and events + | | +| Smaller events focused on growing the user and contributor base, often organized by the lead/solo maintainer with little financial support. | Larger events that let contributors and maintainers coordinate and build relationships. Dedicated organizing roles with financial support. | +| +Funding and finances + | | +| Small to non-existent. All work is uncompensated, but projects may receive donations for small expenses. | Routine and successful enough to hire contributors, maintainers, and accountants. Debates over how to raise and spend funds. | + + +5.1 As projects scale, work not only increases, but fundamentally changes + + +As we summarize in Table 1, various kinds of work and positions of labor in F/OSS can become quite different at smaller and larger scales. Our findings extend prior work that investigates the different modes of scaling in technologically-mediated organizations and scientific cyberinfrastructure [3, 9, 75]. These similarities are potentially due to the fact that many F/OSS projects are also relied upon by decentralized communities (including science), where well-resourced user organizations and grant agencies contribute to their development and maintenance in a more ad-hoc fashion. + + +Carr and Lempert discuss how scale is not merely a matter of existing activities being amplified, but of work being fundamentally transformed by scale, because scale is deeply linked to power relations [122]. As F/OSS projects grow across many different understandings of scale, we showed how the kind of work involved in maintaining them also changes. For instance, in the findings we referred to an interviewee who shared how the growth of contributors to their F/OSS project necessitated the formalization and democratization of leadership positions when the “benevolent dictator” model was no longer sufficient, because it required all decisions to be approved by one person. + + +It is not simply that as projects grow, there is more work to be done, although this is also the case. New kinds of work are often needed and existing kinds of work become transformed. For +example, for a project with few users, providing user support to someone who raises an issue can be an exciting opportunity to grow the userbase. The sole maintainer likely does this work themselves, and has the capacity to attend to individual concerns. As F/OSS projects gain thousands or even millions of users, maintainers often must implement distributed approaches, like directing questions to Q&A sites or forming teams who solely triage the issue queue. This is also the case with continuous integration (CI), where when projects grow their codebase, interdependence, and contributors, more tests must be run, which exceed the free allowances of commercial CI offerings. This may initially seem to be a purely “technical” challenge, but raises questions about fundraising and organizational roles. + + +The CI example also illustrates the deeply socio-technical nature of work, which has long been an established concept in CSCW and organization studies [7, 98, 108, 133]. Our findings extend the literature on how maintenance and repair practices are bound up in social relationships [37, 66, 71, 72, 99, 110, 120]. One way of understanding this implication is that there is no “purely technical” work in F/OSS that only requires software engineering expertise, as all forms of work have interpersonal and organizational dimensions, even if those are often implicit. + + +5.2 Scalar labor: What is needed to grow in many directions? + + +Many F/OSS projects have a small user base and do not grow beyond a single maintainer-contributor [42], but as the prior section discussed, those that do become widely relied upon as infrastructure must be maintained with increasingly more work and different kinds of work. When this occurs, the maintainer(s) must constantly ensure there are enough people available and willing to work on what the project needs. Those people must also have the skills, resources, institutional knowledge, and organizational forms necessary for them to do that work well. We introduce the term “scalar labor” to describe these kinds of work that seek to ensure the project has capacity to meet its many growing needs, across the many dimensions in which the project may scale. We use “scalar” primarily as an adjective form of scale, but its mathematical meaning as magnitude without a specified direction can be an apt metaphor in F/OSS, particularly for projects that achieve what one interviewee called “catastrophic success.” + + +The concept of scalar labor overlaps with Ribes’s focus on “scalar devices” [104] in his studies of scientific cyberinfrastructure, which are the tools and practices that people in organizations use to understand and manage the size, scope, and spread of the organization. These include surveys, all-hands meetings, analyses of logs and digital traces, and other “little technologies of community.” Like other works on scaling, Ribes discusses the many heterogeneous dimensions that people tend to compress into the single term. While Ribes’s article is more methodologically focused on how ethnographers can study such organizations through scalar devices, we also found many of the same empirical findings in our ethnography of a similar kind of process, in a different set of social worlds. + + +In both F/OSS and scientific cyberinfrastructure, there can be a prevalent assumption that scaling-up is an inherent good, which is rewarded by funders who support projects that demonstrate successful scaling. Ribes also discusses how it can be far more difficult to manage projects as they seek to scale-up, particularly when scaling into becoming infrastructure for an entire academic discipline. Ribes’ “scalar device” is what sociologists call a sensitizing concept [11] that draws our attention more to knowing what scale an organization is at now, what it could be at in the future, and how those within it know the organization. Scalar labor, by contrast, draws our attention more to how this work transforms with a changing organizational, economic, and institutional context. Labor also calls attention to who does this work, who is recognized for doing this work, what it costs them, and how they are compensated or made whole for doing it. +Scalar labor is also related to Strauss’s concept of “articulation work” [118], which Gerson summarises as “making sure all the various resources needed to accomplish something are in place and functioning where and when they’re needed” [58]. Bietz et al.’s study of scientific cyberinfrastructure [9] introduces a related extension of articulation work in “synergizing,” which is the work of creating and maintaining a common field where quite different kinds of people, organizations, and systems can do articulation work. Synergizing calls our attention to how this work is impacted by the heterogeneity of interdependent people, organizations, and systems that must be coordinated, which was also certainly a theme in our findings. By emphasizing the labor dimension of F/OSS, scalar labor covers a similar range of activities as synergizing, but draws our attention to how this work is specifically impacted by the heterogeneity of different dimensions of growth, of which interdependence is but one mode of scaling. + + +Like articulation work and synergizing, scalar labor is complex and interdependent. A prime example of these phenomena is raising funds to host an event to evangelize to new users, who would then be mentored into contributors, who would then respond to bug reports and fix identified issues, and may even become mentored into maintainers themselves. In this example, a traditional software engineering firm that made money from selling licenses or services would simply hire someone directly to respond to bug reports and fix such issues. Some F/OSS projects with significant fundraising capacity do exactly this, which can relieve major burdens. However, most projects we encountered could only recruit volunteers, because they cannot charge for the free software and also struggle to achieve the status that would help them fundraise. In either case, the concept of scalar labor draws our attention to how growth is not sought for its own sake, but rather as a strategy to build capacity so that important maintenance work can be done. Yet because this growth itself can bring more and different kinds of work, even more growth may be needed to do that work. + + +Also like articulation work and synergizing, scalar labor is a useful concept for studying F/OSS (and other organizations that produce and maintain infrastructure) because it includes interpersonal, organizational, and financial skills that are often far outside of the scope of a traditional engineer’s duties — even though it is done to improve the project’s capacity to do traditional engineering work. Yet this work also often requires project-specific knowledge and the trust of contributors, which makes it difficult to delegate. This work can becomes what maintainers call “governance” – a longstanding topic in F/OSS research (e.g. [87]. Yet despite such a focus on governance, governance work is rarely analyzed as a form of labor, perhaps because it is seen to be more about organizational forms or decision-making. Kelty [76] is notable exception as he details how free software projects often make and enact governance decisions through software engineering, and they recognize those engineering decisions as constituting social values. The concept of scalar labor calls our attention to governance as a form of labor, which can be as much of an an exhausting, uncompensated, and invisible burden as it is the exercise of power. + + +5.3 Scalar debt and the consequences of ‘catastrophic success’ + + +The concept of scalar labor leads to a related issue: many projects that become widely relied upon accumulate what we call “scalar debt,” a term we introduce based on the concept of “technical debt” [33]. Technical debt refers to engineering decisions that initially help a project advance or expand quickly, but at a cost that must be ‘paid back’ later with even more work than was initially saved. Projects that grow rapidly and achieve what one interviewee called “catastrophic success” struggle in that they have not done enough scalar labor; that is, their growth in users has not led to a growth in the project’s capacity to maintain itself at this new scale. Paying down this scalar debt then comes at an immense cost to the maintainers, which in contemporary F/OSS parlance, is often referred to as finding a working “sustainability model” — a way to recruit new volunteers or raise +funds to hire contributors and maintainers to do work that is currently backlogged. This focus on the present is different from how “sustainability” is discussed in scientific cyberinfrastructure, where it usually refers to questions about whether the infrastructure will persist in the long-term, often future decades [105, 106]. + + +Scalar debt also is related to how F/OSS projects tend to develop management and governance structures on an ad-hoc basis over time, rather than preemptively plan these out before they are needed. This ad-hoc or “spontaneous” [35] governance model is common in F/OSS, as well as other peer production platforms [17, 124]. Researchers and practitioners in F/OSS have identified this a key strength [21, 130] — with some comparisons to the now-dominant Agile software development methodology [52, 129]. This ad-hocness in F/OSS has also been described as a product of shared ideological and cultural commitments, whose members may object to formalization and instead value autonomy and more distributed governance models [27, 35]. + + +However, this ad-hocness is also related to the resources and labor available to projects in the mostly-voluntary peer-production model that the F/OSS projects we studied began with. In our interviews, maintainers told us their concerns around how such formalization and bureaucratization can take substantial time, energy, social capital, and risk, meaning that there are very good reasons why a project may not want create such structures unless it is absolutely apparently necessary. However, one key case of scalar debt we encountered in multiple interviews was around Codes of Conduct and other moderation mechanisms, which have become particularly central to discussions of diversity and inclusion in F/OSS [38]. + + +It can be difficult to know when a project is taking on scalar debt, although once this becomes apparent, maintainers described how they were living in a constant state of incipient crisis, overwork, or burnout just to keep the projects from falling too far backwards. Ironically, when projects are in this state of “putting out fires,” more work must be done to get and manage resources necessary to help put out those fires, whether this involves recruiting and mentoring new volunteer contributors or raising funds to hire employees. The success of these sustainability approaches are not guaranteed: volunteers can leave, patrons can withdraw support, grants can be rejected, and business models can fail to be profitable. For maintainers in the overwhelming ‘putting out fires’ state, it can be a difficult choice as to whether they spend their scarce time and energy on an unproven strategy to leverage more resources for the project, versus spending that time putting out the fires currently flaring up. As [122] and [104] also discuss in non-F/OSS contexts, projects have to continuously recalibrate what scale they are currently at, which also can be an uncertain and labor-intensive task. + + +5.4 Scaling into becoming critical infrastructure for well-resourced organizations + + +This issue of scalar debt leads to a related issue that becomes apparent when a project scales into becoming relied upon as critical infrastructure by well-resourced organizations, from for-profit companies to universities to governments. As we previously reviewed, entire sectors of the global economy are now reliant on F/OSS projects. We use the term “labor” intentionally in this paper, which calls attention to how this work is part of the economy, contributing to the production and distribution of goods and services. In contrast, earlier work on F/OSS often emphasized the voluntary, altruistic, and alternative nature of this work, framing these projects more as communities. While the rising commercialization of F/OSS as a movement is a long and well-studied historical trend over the past two decades [10, 47, 77], we find that contemporary projects which begin as more voluntary, peer-production efforts can similarly transform as they scale. Early on, many smaller and more voluntary F/OSS projects seek to be relied upon, especially by large, well-known tech companies and universities, who can directly or indirectly help support the project. This can +bring not only users (who may become contributors), but also prestige, connections, a pittance of donations — that is, cultural, social, and financial capital [13]. + + +Yet several maintainers we interviewed described it as “blessing and a curse” to be relied upon by organizations that build their products on their projects. The companies benefited from their labor, but often did not offer resources or labor in return. We often heard how being relied upon in such a manner adds more work that is also more difficult; that software engineers at elite user organizations can be especially demanding and entitled. Maintainers must respond to these elite users in ways addressed in studies of emotional labor, that is, in managing the emotions of others [65]. Many such user organizations are “free riders” who do not contribute back. Even when some corporations contribute back to F/OSS, they can do so in ways that place additional demands on maintainers, by demanding additional code review or asking F/OSS contributors to managing relationships with patrons. These are all ways in which becoming critical infrastructure for well-resourced organizations can increase and transform the work of maintainers, even as it brings users, resources, and prestige. + + +Another issue that arises is that becoming relied upon as infrastructure by others can make maintainers feel morally responsible for whatever products are built using their projects. Some described becoming disillusioned or burned-out specifically because they began their project imagining it would be used by those who could not afford commercial alternatives, but found it was more used by companies to lower their own costs. Some even expressed that they had to wrestle with how their projects were used as part of products the maintainers believed were unethical or harmful. These frustrations are compounded by maintainers’ frustrations that the wealth derived by companies relying upon their technologies does not get shared with maintainers/contributors, evidencing an inequitable form of extraction. The activities around mentally working through such issues can also be seen as a form of both invisible work and scalar labor. + + +5.5 Scaling and the dynamics of hypervisibility + + +Scale impacts maintainers’ personal identities and relationships to broader publics. While much prior literature on maintenance and infrastructure in other sectors (e.g. power plants, transportation, commercial software) discusses maintenance as relatively less-visible and less-recognized work [39, 111, 117, 120], the centrality of maintainers as leaders of F/OSS projects leads to a different set of issues. Much of F/OSS work is done in public view because of the open nature of F/OSS work and especially the dominance of all-inclusive public code collaboration platforms like GitHub. As discussed in the sections on user support and proposed changes, maintainers can receive a deluge of requests from users and contributors, all of which are visible on the public web. Under public scrutiny, maintainers engage in the communicative labor of tracking management, emotional labor of user and contributor response, and, in sum, the production of the optics of a successful project. + + +Maintainers of projects that have achieved massive success and scale — that are widely relied-upon and/or have a large contributor base — can achieve a kind of “microcelebrity” [89] status, a term originally from studies of social media. Eghbal compares F/OSS maintainers to content creators on Youtube or Instagram, particularly those who earn a quasi-independent living through patronage [42]. We found that some maintainers grow into microcelebrities, fueled by the dynamics of social media and technology standardization. Such maintainers have hundreds of thousands of followers on social media sites like Twitter, write widely-read blog posts on the state of F/OSS, and are flown to speak at major conferences and companies. Such maintainers can play a major role in conflict resolution and governance issues on public platforms such as mailing lists, particularly when the governance model is more ad-hoc. These influential leaders can become substitutes for the reorganization of project decision making and conflict resolution – a case of scalar debt. +Evangelizing can reinforce this trend towards hypervisible microcelebrity maintainers, such as if an already famous maintainer is invited to give talks at F/OSS conferences to thousands of people, or flown out to give talks at companies and universities, which makes them even more famous. Funding, patronage and business models benefit from more famous figureheads, as well as often require that a single individual is the designated Principal Investigator on the grant or CEO of the business arm. Some microcelebrity maintainers told us they have to actively work against these dynamics, such as by sending others in their place when asked to speak at conferences. + + +Yet there still are forms of invisible work for the hypervisible. Such maintainers routinely receive torrents of unsolicited e-mails and private messages, from lavish praise to harassment. Much work also takes place outside of public code platforms, like writing grants or conflict resolution. Precisely because of their microcelebrity, these maintainers can be called in to adjudicate disputes behind the scenes. These findings suggest that maintenance labor is not always invisible; it can be hypervisible and highly valued. Given the dominant framing of maintenance and infrastructure as invisible work, [60, 116, 128] we urge future research into this intersection of issues. +---------------------------------------- +------------------------------- +Section 73: +6 CONCLUSION + + +The focus of this paper has been on the intersection of labor and scale in the context of maintaining F/OSS projects, but our findings contribute to understanding challenges faced by people engaging in a variety of types of collaborative work to build common information resources while simultaneously developing organizations and governance structures. In our interviews, maintainers described being burned out by changes in what was expected of them fundamentally changed as projects scaled. These interviews were rich with insights into the deep and varied commitments of F/OSS maintainers, but also the emotional toll doing F/OSS work can take. Our findings have a wide import for discussions of governance, leadership, and sustainability, in socio-technical systems, including crowdsourcing, citizen science, scientific cyberinfrastructure, and crisis informatics. Particularly, our focus on labor - and people’s reactions to changes in their labor - can help build awareness of how infrastructure sustainability is tied to the long-term well-being of maintainers as individuals and in their communities. + + +6.1 Limitations + + +Although we attempted to recruit a diverse group of participants for our interviews — with particular attention to the type/size of F/OSS project they worked on, employment, and geography — our findings are limited by the number of interviews we conducted and our recruitment methods. We have mostly studied projects that have been relied upon by others as infrastructure and began as volunteer projects, so our findings do not speak to overwhelming majority F/OSS projects that are developed and used by a single person but released publicly, as well as to entirely corporate-developed F/OSS projects. We have also sought to capture a kind of longitudinal view by focusing on maintainers, some of whom have long histories of involvement. A more traditional longitudinal study would capture these issues of scale with even more depth. Like in all interview-based studies, memories may be less accurate, so this study could be complemented with more detailed contemporaneous methods of capturing the work that maintainers do day-to-day, from participant-observation to diary studies to analyses of trace data. + + +We also acknowledge how we are implicated in the same kinds of systems of F/OSS sustainability as our participants. All authors have direct participant experience in F/OSS projects as contributors or maintainers, which gives us a sensitivity to these topics, but also means that we can lack some analytical distance that some strands of social science value. In particular, the fact that we were funded to study these issues by non-profit foundations that are also direct funders of F/OSS projects. +— which was public knowledge that we disclosed prior to our interviews — may impact the kinds of responses we received. + + +6.2 Recommendations and future work + + +Contributors and maintainers might better manage difficulties posed by scale if they regularly have conversations about what responsibilities entail, how much time and effort that work takes, and how the distribution of workloads and resources should change when the project changes. Maintainers may benefit from explicitly acknowledging when scalar debt is being taken on, as is sometimes commonly acknowledged when technical debt is being taken on. Focusing on these questions of scalar labor brings to light how scale is not always a universally good thing — even though there are broad pressures on projects that equate scale with success, as [104] also discusses in science. The benefits of scaling and success many also not be equitably distributed, as we discussed around the less visible and more gendered labor of event organizing, versus the dynamics that lead to microcelebrity maintainers. Finally, because efforts to build capacity and reduce the burdens of maintenance work itself can compound the amount of work to be done, funders and donors can be mindful of the opportunity costs that projects spend in soliciting resources. This can involve more lightweight funding mechanisms that require less up-front work on the part of maintainers and project leaders. + + +Many areas in this paper might be expanded in future work. Specifically, we are interested in unpacking the effects of corporate reliance on F/OSS projects on maintainers’ working and emotional lives. Although we brought in value misalignment as one way to interpret maintainers’ reactions when corporations took but didn’t give back to F/OSS, we believe more work can be done in this area to understand the political economy of value misalignment and the effects of corporate reliance on maintainers’ mental health and well-being. This might involve conducting additional interviews that focus on projects’ growth trajectories or focusing on projects that experienced the ‘catastrophic success’ gestured to in the discussion. Further exploring these areas might contribute valuable and actionable insights to improve F/OSS sustainability. +---------------------------------------- +------------------------------- +Section 74: +7 ACKNOWLEDGMENTS + + +The authors would like to thank Alexandra Paxton, Nelle Varoquaux, Chris Holdgraf for their ongoing feedback, as well as Linwei Lu, Julio Gonzalez, and the CSCW reviewers for their insights. We are thankful to the cohort, advisors, and program managers of the Ford/Sloan Critical Digital Infrastructures Initiative in helping us plan this research. We appreciate the time our anonymous interviewees spent talking with us and reviewing various drafts of this work. We are thankful to Stacey Dorton for administrative support. This work has been financially supported by the Ford and Sloan Foundation through the Critical Digital Infrastructures Initiative (grant G-2018-11354), the National Science Foundation (grant DDRIG #1947213), as well as the Gordon & Betty Moore Foundation (grant GBMF3834) and Alfred P. Sloan Foundation (grant 2013-10-27) through the Moore-Sloan Data Science Environments grant to UC-Berkeley. + + +REFERENCES + + +[1] Morgan G. Ames, Daniela K. Rosner, and Ingrid Erickson. 2015. Worship, faith, and evangelism: Religion as an ideological lens for engineering worlds. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, New York, 69–81. https://doi.org/10.1145/2675133.2675282 + + +[2] Brian Anderson. 2017. How Rust is tested. https://brson.github.io/2017/07/10/how-rust-is-tested + + +[3] Karen S Baker, David Ribes, Florence Millerand, and Geoffrey Bowker. 2005. Interoperability strategies for scientific cyberinfrastructure: Research and practice. In Proceedings of the American Society for Information Science and Technology (2005). https://doi.org/10.1002/meet.14504201237 +[4] Flore Barcellini, Françoise Détienne, and Jean-Marie Burkhardt. 2014. A situated approach of roles and participation in open source software communities. +Human–Computer Interaction + 29, 3 (2014), 205–255. https://doi.org/10.1080/07370024.2013.812409 + + +[5] Ann Barcomb. 2016. Episodic volunteering in open source communities. In +Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering +. 1–3. https://doi.org/10.1145/2915970.2915972 + + +[6] Yochai Benkler. 2007. +The Wealth of Networks: How Social Production Transforms Markets and Freedom +. Yale University Press. + + +[7] Richard Bentley, John A. Hughes, David Randall, Tom Rodden, Peter Sawyer, Dan Shapiro, and Ian Sommerville. 1992. Ethnographically-informed systems design for air traffic control. In +Proceedings of the 1992 ACM Conference on Computer-Supported Cooperative Work +. 123–129. https://doi.org/10.1145/143457.143470 + + +[8] Magnus Bergquist and Jan Ljungberg. 2001. The power of gifts: Organizing social relationships in open source communities. +Information Systems Journal + 11, 4 (2001), 305–320. https://doi.org/10.1046/j.1365-2575.2001.00111.x + + +[9] Matthew J Bietz, Eric PS Baumer, and Charlotte P Lee. 2010. Synergizing in cyberinfrastructure development. +Computer Supported Cooperative Work (CSCW) + 19, 3-4 (2010), 245–281. https://doi.org/10.1007/s10606-010-9114-y + + +[10] Benjamin J. Birkinbine. 2015. Conflict in the Commons: Towards a Political Economy of Corporate Involvement in Free and Open Source Software. +The Political Economy of Communication + 2, 2 (2015). http://www.polecom.org/index.php/polecom/article/view/35 Number: 2. + + +[11] Herbert Blumer. 1954. What is wrong with social theory? +American Sociological Review + 19, 1 (1954), 3–10. + + +[12] Herbert Blumer. 1969. +Symbolic Interactionism: Perspective and Method +. University of California Press, Berkeley. + + +[13] Pierre Bourdieu. 1973. Cultural reproduction and social reproduction. In +Knowledge, Education, and Cultural Change +, Richard Brown (Ed.). London: Tavistock. + + +[14] Geoffrey C. Bowker and Susan Leigh Star. 2000. +Sorting Things Out: Classification and Its Consequences +. MIT Press. + + +[15] Daren C Brabham. 2013. +Crowdsourcing +. The MIT Press, Cambridge, MA. + + +[16] Dale A Bradley. 2006. The divergent anarcho-utopian discourses of the open source software movement. +Canadian Journal of Communication + 30, 4 (2006). + + +[17] A. Bruckman and A. Forte. 2008. Scaling consensus: Increasing decentralization in Wikipedia governance. In +Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS) + (2008). 157. + + +[18] Julia Bullard. 2016. Motivating invisible contributions: Framing volunteer classification design in a fanfiction repository. In +Proceedings of the 19th International Conference on Supporting Group Work + (Sanibel Island, Florida, USA, 2016-11-13) (GROUP ’16). ACM, 181–193. https://doi.org/10.1145/2957276.2957295 + + +[19] Brett Cannon. 2017. The give and take of open source. Talk at JupyterCon 2017. O’Reilly Media. https://www.oreilly.com/radar/the-give-and-take-of-open-source/ + + +[20] Andrea Capiluppi and Martin Michlmayr. 2007. From the cathedral to the bazaar: An empirical study of the lifecycle of volunteer community projects. In +Open Source Development, Adoption and Innovation +. Springer US, 31–44. https://doi.org/10.1007/978-0-387-72486-7_3 + + +[21] Eugenio Capra, Chiara Francalanci, and Francesco Merlo. 2008. An empirical study on the relationship between software design quality, development effort and governance in open source projects. +IEEE Transactions on Software Engineering + 34, 6 (2008), 765–782. + + +[22] Adele E Clarke. 2003. Situational analyses: Grounded theory mapping after the postmodern turn. +Symbolic Interaction + 26, 4 (2003), 553–576. + + +[23] Adele E Clarke and Susan Leigh Star. 2008. The social worlds framework: A theory/methods package. In +The Handbook of Science and Technology Studies +. MIT Press, Cambridge, MA, 113–137. + + +[24] Gabriella Coleman. 2004. The political agnosticism of free and open source software and the inadvertent politics of contrast. +Anthropological Quarterly + 77, 3 (2004), 507–519. + + +[25] Gabriella Coleman. 2009. Code is speech: Legal tinkering, expertise, and protest among free and open source software developers. +Cultural Anthropology + 24, 3 (2009), 420–454. + + +[26] Gabriella Coleman. 2010. The hacker conference: A ritual condensation and celebration of a lifeworld. +Anthropological Quarterly + (2010), 47–72. + + +[27] Gabriella Coleman. 2012. +Coding Freedom: The Ethics and Aesthetics of Hacking +. Princeton University Press, Princeton. + + +[28] The Kernel Development Community. 2018. How the development process works. The Linux Kernel documentation. https://www.kernel.org/doc/html/v4.15/process/2.Process.html + + +[29] Kevin Crowston. 2011. Lessons from volunteering and free/libre open source software development for the future of work. In +Researching the Future in Information Systems + (Berlin, Heidelberg, 2011) (IFIP Advances in Information and Communication Technology), Mike Chiasson, Ola Henfridsson, Helena Karsten, and Janice I. DeGross (Eds.). Springer, 215–229. https://doi.org/10.1007/978-3-642-21364-9_14 + + +[30] Kevin Crowston, Robert Heckman, Hala Annabi, and Chengetai Masango. 2005. A structurational perspective on leadership in Free/Libre Open Source Software teams. +Proceedings of the First International Conference on Open Source +[31] Kevin Crowston, Qing Li, Kangning Wei, U. Yeliz Eseryel, and James Howison. 2007. Self-organization of teams for free/libre open source software development. +Information and Software Technology* 49, 6 (2007), 564–575. https://doi.org/10.1016/j.infsof.2007.02.004 + + +[32] Kevin Crowston, Kangning Wei, James Howison, and Andrea Wiggins. 2012. Free/Libre Open-Source Software Development: What We Know and What We Do Not Know. +ACM Computing Surveys (CSUR) + 44, 2 (March 2012), 35. https://doi.org/10.1145/2089125.2089127 + + +[33] Ward Cunningham. 1992. The WyCash portfolio management system. In +Proceedings of the Object-oriented Programming Systems, Languages, and Applications (Addendum) + (Vancouver, British Columbia, Canada, 1992-12-01) (OOPSLA ’92). Association for Computing Machinery, 29–30. https://doi.org/10.1145/157709.157715 + + +[34] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In +Proceedings of the ACM 2012 conference on computer supported cooperative work +. ACM, New York, 1277–1286. + + +[35] Paul B. de Laat. 2007. Governance of open source software: state of the art. +Journal of Management & Governance + 11, 2 (2007), 165–177. https://doi.org/10.1007/s10997-007-9022-9 + + +[36] Luiz Felipe Dias, Igor Steinmacher, Gustavo Pinto, Daniel Alencar da Costa, and Marco Gerosa. 2016. How does the shift to github impact project collaboration?. In +2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) +. IEEE, 473–477. + + +[37] Fernando Domínguez Rubio. 2020. +Ecologies of the Modern Imagination at the Art Museum +. University of Chicago Press, Chicago. + + +[38] Christina Dunbar-Hester. 2019. +Hacking Diversity: The Politics of Inclusion in Open Technology Cultures +. Vol. 21. Princeton University Press. + + +[39] David Edgerton. 2011. +Shock of the Old: Technology and Global History Since 1900 +. Oxford University Press, Oxford. + + +[40] Paul N Edwards, Steven J Jackson, Geoffrey C Bowker, and Cory P Knobel. 2007. Understanding infrastructure: Dynamics, tensions, and design. +Report of NSF Workshop on “History & Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures” + (2007). https://deepblue.lib.umich.edu/bitstream/handle/2027.42/49353/UnderstandingInfrastructure2007.pdf + + +[41] Nadia Eghbal. 2016. +Roads and bridges: The unseen labor behind our digital infrastructure +. Ford Foundation. + + +[42] Nadia Eghbal. 2020. +Working in Public: The Making and Maintenance of Open Source Software +. Stripe Press. + + +[43] Hamid R Ekbia and Bonnie A Nardi. 2017. +Heteromation, and Other Stories of Computing and Capitalism +. MIT Press. + + +[44] Nathan Ensmenger. 2008. Fixing things that can never be broken: Software maintenance as heterogeneous engineering. In +Proceedings of the SHOT Conference +. + + +[45] Joseph Feller, Patrick Finnegan, Brian Fitzgerald, and Jeremy Hayes. 2008. From Peer Production to Productization: A Study of Socially Enabled Business Exchanges in Open Source Service Networks. +Information Systems Research + 19, 4 (2008), 475–493. https://doi.org/10.1287/isre.1080.0207 + + +[46] Anna Filippova and Hichang Cho. 2015. Mudslinging and Manners: Unpacking Conflict in Free and Open Source Software. In +Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing + (CSCW ’15). ACM, 1393–1403. https://doi.org/10.1145/2675133.2675254 + + +[47] Brian Fitzgerald. 2006. The Transformation of Open Source Software. +MIS Quarterly + 30, 3 (2006), 587–598. https://doi.org/10.2307/25148740 + + +[48] Lee Fleming and David M Waguespack. 2007. Brokerage, boundary spanning, and leadership in open innovation communities. +Organization Science + 18, 2 (2007), 165–180. + + +[49] Karl Fogel. 2005. +Producing Open Source Software: How to Run a Successful Free Software Project +. O’Reilly Media. + + +[50] Linux Foundation. [n.d.]. The Bylaws of the Linux Foundation. https://www.linuxfoundation.org/en/bylaws/ + + +[51] Sarah E. Fox, Kiley Sobel, and Daniela K. Rosner. 2019. Managerial Visions: Stories of upgrading and maintaining the public restroom with IoT. In +Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems +. 1–15. + + +[52] Erich Gamma. 2005. Agile, open source, distributed, and on-time: Inside the eclipse development process. In +International Conference on Software Engineering: Proceedings of the 27th International Conference on Software Engineering +, Vol. 15. 4–4. + + +[53] Juan Mateos Garcia, W Edward Steinmueller, et al. 2003. +The open source way of working: A new paradigm for the division of labour in software development? + SPRU. + + +[54] R. Stuart Geiger. 2011. The Lives of Bots. In +Wikipedia: A Critical Point of View +, G. Lovink and N. Tkacz (Eds.). Institute of Network Cultures, 78–93. http://www.stuartgeiger.com/lives-of-bots-wikipedia-cpov.pdf + + +[55] R. Stuart Geiger and David Ribes. 2011. Trace ethnography: Following coordination through documentary practices. In +2011 44th Hawaii International Conference on System Sciences +. IEEE, 1–10. + + +[56] Matt Germonprez, Julie E. Kendall, Kenneth E. Kendall, Lars Mathiassen, Brett Young, and Brian Warner. 2016. A Theory of Responsive Design: A Field Study of Corporate Engagement with Open Source Communities. +Information +[57] Matt Germonprez, Georg J.P. Link, Kevin Lumbard, and Sean Goggins. 2018. Eight Observations and 24 Research Questions About Open Source Projects: Illuminating New Realities. +Proc. ACM Hum.-Comput. Interact.* 2, CSCW, Article 57 (2018), 22 pages. https://doi.org/10.1145/3274326 + + +[58] Elihu M Gerson. 2008. Reach, bracket, and the limits of rationalized coordination: Some challenges for CSCW. In +Resources, Co-Evolution and Artifacts +. Springer, 193–220. + + +[59] Paola Giuri, Francesco Rullani, and Salvatore Torrisi. 2008. Explaining leadership in virtual teams: The case of open source software. +Information Economics and Policy + 20, 4 (2008), 305–315. + + +[60] Stephen Graham and Nigel Thrift. 2007. Out of order: Understanding repair and maintenance. +Theory, Culture & Society + 24, 3 (2007), 1–25. + + +[61] Kaj Grønbæk, Morten Kyng, and Preben Mogensen. 1992. CSCW challenges in large-scale technical projects—a case study. In +Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work +. 338–345. + + +[62] Scott Hanselman. 2015. +Bring Kindness back to Open Source +. https://www.hanselman.com/blog/bring-kindness-back-to-open-source + + +[63] Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In +Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016) +. Association for Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/2970276.2970358 + + +[64] Eric von Hippel. 2001. Innovation by User Communities: Learning from Open-Source Software. +MIT Sloan Management Review + 42, 4 (2001), 82–82. https://go.gale.com/ps/i.do?p=AONE&sw=w&issn=15329194&v=2.1&it=r&id=GALE%7CA77578225&sid=googleScholar&linkaccess=abs Sloan Management Review. + + +[65] Arlie Russell Hochschild. 1983. +The Managed Heart: Commercialization Of Human Feeling +. University of California Press, Oakland, California. + + +[66] Lara Houston, Steven J. Jackson, Daniela K. Rosner, Syed Ishtiaque Ahmed, Meg Young, and Laewoo Kang. 2016. Values in Repair. In +Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems + (San Jose, California, USA, 2016-05-07) (CHI ’16). ACM, 1403–1414. https://doi.org/10.1145/2858036.2858470 + + +[67] Dorothy Howard and R. Stuart Geiger. 2019. Ethnography, Genealogy, and Political Economy in the Post-Market Era of Free & Open-Source Software. In +Proceedings of CSCW ’19 Extended Abstracts +. + + +[68] Dorothy Howard and Lilly Irani. 2019. Ways of Knowing When Research Subjects Care. In +Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems +. 1–16. + + +[69] James Howison. 2015. Sustaining scientific infrastructures: transitioning from grants to peer production. In +iConference 2015 + (2015-03-15). https://www.ideals.illinois.edu/handle/2142/73439 Accepted: 2015-03-23T21:58:14Z Publisher: iSchools. + + +[70] Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In +Proceedings of the SIGCHI Conference on Human Factors in Computing Systems +. 611–620. + + +[71] Lilly C. Irani and M. Six Silberman. 2016. Stories we tell about labor: Turkopticon and the trouble with "design". In +Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems + (San Jose, California, USA, 2016-05-07) (CHI ’16). ACM, 4573–4586. https://doi.org/10.1145/2858036.2858592 + + +[72] Steven J. Jackson, Syed Ishtiaque Ahmed, and Md. Rashidujjaman Rifat. 2014. Learning, innovation, and sustainability among mobile phone repairers in Dhaka, Bangladesh. In +Proceedings of the 2014 conference on Designing interactive systems + (Vancouver, BC, Canada, 2014-06-21) (DIS ’14). Association for Computing Machinery, 905–914. https://doi.org/10.1145/2598510.2598576 + + +[73] Steven J Jackson, Alex Pompe, and Gabriel Krieshok. 2012. Repair worlds: maintenance, repair, and ICT for development in rural Namibia. In +Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work +. 107–116. + + +[74] C. Jensen and W. Scacchi. 2005. Collaboration, Leadership, Control, and Conflict Negotiation and the Netbeans.org Open Source Software Development Community. In +Proceedings of the 38th Annual Hawaii International Conference on System Sciences + (2005-01). 196b–196b. https://doi.org/10.1109/HICSS.2005.147 ISSN: 1530-1605. + + +[75] Helena Karasti, Karen S. Baker, and Florence Millerand. 2010. Infrastructure time: Long-term matters in collaborative development. +Computer Supported Cooperative Work (CSCW) + 19, 3 (2010), 377–415. https://doi.org/10.1007/s10606-010-9113-z + + +[76] Christopher M Kelty. 2008. +Two Bits: The Cultural Significance of Free Software +. Duke University Press. + + +[77] Christopher M Kelty. 2013. There is no free software. +The Journal of Peer Production + (2013). Issue 3. http://peerproduction.net/issues/issue-3-free-software-epistemics/debate/there-is-no-free-software/ + + +[78] Mathias Klang. 2005. Free software and open source: The freedom debate and its consequences. +First Monday + 10, 3 (2005). + + +[79] Nolan Lawson. 2017. What it feels like to be an open-source maintainer. Read the Tea Leaves. https://nolanlawson.com/2017/03/05/what-it-feels-like-to-be-an-open-source-maintainer/ +[80] Charlotte P. Lee, Paul Dourish, and Gloria Mark. 2006. The human infrastructure of cyberinfrastructure. In Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work (Banff, Alberta, Canada) (CSCW ’06). ACM, New York, NY, USA, 483–492. https://doi.org/10.1145/1180875.1180950 + + +[81] Charlotte P Lee and Drew Paine. 2015. From The matrix to a model of coordinated action (MoCA) A conceptual framework of and for CSCW. In Proceedings of the 18th ACM Conference on Computer-supported Cooperative Work & Social Computing. 179–194. + + +[82] Yan Li, Chuan-Hoo Tan, and Hock-Hai Teo. 2012. Leadership characteristics and developers’ motivation in open source software development. Information & Management 49, 5 (2012), 257–267. + + +[83] Yu-Wei Lin, Jo Bates, and Paula Goodale. 2016. Co-observing the weather, co-predicting the climate: Human factors in building infrastructures for crowdsourced data. Science and Technology Studies 29, 3 (2016), 10–27. http://dspace.stir.ac.uk/handle/1893/26101 Accepted: 2017-11-28T23:28:19Z Publisher: Finnish Society for STS. + + +[84] Arwid Lund. 2017. Wikipedia, Work and Capitalism. Springer, London. + + +[85] Jennifer Helene Maher. 2015. Software Evangelism and the Rhetoric of Morality: Coding Justice in a Digital Democracy. Routledge, London. + + +[86] George E. Marcus. 1995. Ethnography in/of the World System: The Emergence of Multi-Sited Ethnography. Annual Review of Anthropology 24, 1 (1995), 95–117. https://doi.org/10.1146/annurev.an.24.100195.000523 + + +[87] M Lynne Markus. 2007. The governance of free/open source software projects: monolithic, multidimensional, or configurational? Journal of Management & Governance 11, 2 (2007), 151–163. + + +[88] Steve Marquess. 2014. Of Money, Responsibility, and Pride. http://veridicalsystems.com/blog/of-money-responsibility-and-pride/ Library Catalog: veridicalsystems.com. + + +[89] Alice Marwick and Danah Boyd. 2011. To see and be seen: Celebrity practice on Twitter. Convergence 17, 2 (2011), 139–158. + + +[90] Ashwin Mathew and Coye Cheshire. 2017. Risky Business: Social Trust and Community in the Practice of Cybersecurity for Internet Infrastructure. IEEE. https://doi.org/10.24251/HICSS.2017.283 + + +[91] Ashwin J. Mathew. 2016. The myth of the decentralised internet. 5, 3 (2016). https://policyreview.info/articles/analysis/myth-decentralised-internet + + +[92] Amanda Menking and Ingrid Erickson. 2015. The heart work of Wikipedia: Gendered, emotional labor in the world’s largest online encyclopedia. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM. 207–210. + + +[93] Robert K Merton. 1968. The Matthew effect in science: The reward and communication systems of science are considered. Science 159, 3810 (1968), 56–63. + + +[94] Audris Mockus, Roy T. Fielding, and James Herbsleb. 2000. A case study of open source software development: the Apache server. In Proceedings of the 22nd International Conference on Software Engineering (Limerick, Ireland, 2000-06-01) (ICSE ’00). Association for Computing Machinery, 263–272. https://doi.org/10.1145/337180.337209 + + +[95] Lauren Morse, Janice M & Clark. 2019. The nuances of grounded theory sampling and the pivotal role of theoretical sampling. The SAGE Handbook of Current Developments in Grounded Theory (2019), 145–166. + + +[96] Chandra Mukerji. 1989. A Fragile Power: Scientists and the State. Princeton University Press. + + +[97] Joel Novek. 2002. IT, gender, and professional practice: Or, why an automated drug distribution system was sent back to the manufacturer. Science, Technology, & Human Values 27, 3 (2002), 379–403. https://doi.org/10.1177/016224390202700303 SAGE Publications. + + +[98] Wanda Orlikowski and Susan Scott. 2008. Sociomateriality: Challenging the separation of technology, work and organization. The Academy of Management Annals 2, 1 (2008), 433–474. + + +[99] Julian E Orr. 2016. Talking About Machines: An Ethnography of a Modern Job. Cornell University Press, Ithaca. + + +[100] Mathieu O’Neil, Laure Muselli, Mahin Raissi, and Stefano Zacchiroli. 2020. ‘Open source has won and lost the war’: Legitimising commercial–communal hybridisation in a FOSS project. New Media & Society (2020), 1461444820907022. + + +[101] Elena Parmiggiani. 2017. This Is Not a Fish: On the Scale and Politics of Infrastructure Design Studies. Computer Supported Cooperative Work (CSCW) 26, 1 (2017), 205–243. https://doi.org/10.1007/s10606-017-9266-0 + + +[102] Eric Raymond. 1999. The cathedral and the bazaar. In Readings in Cyberethics, Richard A. Spinello and Herman T. Tavani (Eds.). O’Reilly Press. + + +[103] RedHat. 2020. The State of Enterprise Open Source. https://www.redhat.com/cms/managed-files/rh-enterprise-open-source-report-detail-f21756-202002-en.pdf + + +[104] David Ribes. 2014. Ethnography of scaling, or, how to a fit a national research infrastructure in the room. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. 158–170. + + +[105] David Ribes and Thomas A Finholt. 2007. Tensions across the scales: planning infrastructure for the long-term. In Proceedings of the 2007 International ACM Conference on Supporting Group Work. 229–238. + + +[106] David Ribes and Thomas A Finholt. 2009. The long now of infrastructure: Articulating tensions in development. Journal of the Association for Information Systems (JAIS) (2009). +[107] David Ribes, Steven Jackson, R. Stuart Geiger, Matthew Burton, and Thomas Finholt. 2013. Artifacts that organize: Delegation in the distributed organization. +Information and Organization + 23, 1 (2013), 1–14. + + +[108] David Ribes and Charlotte P Lee. 2010. Sociotechnical studies of cyberinfrastructure and e-research: Current themes and future trajectories. +Computer Supported Cooperative Work (CSCW) + 19, 3-4 (2010), 231–244. + + +[109] Dirk Riehle, Philipp Riemer, Carsten Kolassa, and Michael Schmidt. 2014. Paid vs. Volunteer Work in Open Source. In +Proceedings of the 47th Hawaii International Conference on System Sciences +. 3286–3295. https://doi.org/10.1109/HICSS.2014.407 + + +[110] Daniela K. Rosner. 2014. Making Citizens, Reassembling Devices: On Gender and the Development of Contemporary Public Sites of Repair in Northern California. +Public Culture + 26, 1 (2014), 51–77. https://doi.org/10.1215/08992363-2346250 + + +[111] Andrew L. Russell and Lee Vinsel. 2018. After Innovation, Turn to Maintenance. +Technology and Culture + 59, 1 (2018), 1–25. https://doi.org/10.1353/tech.2018.0004 Publisher: Johns Hopkins University Press. + + +[112] Bert M Sadowski, Gaby Sadowski-Rasters, and Geert Duysters. 2008. Transition of governance in a mature open software source community: Evidence from the Debian case. +Information Economics and Policy + 20, 4 (2008), 323–332. + + +[113] Salvatore Sanfilippo. 2019. +The struggles of an open source maintainer +. http://antirez.com/news/129 + + +[114] Trebor Scholz. 2008. Market ideology and the myths of Web 2.0. +First Monday + 13, 3 (2008). + + +[115] Clay Shirky. 2010. +Cognitive Surplus: Creativity and Generosity in a Connected Age +. Penguin UK. + + +[116] Susan Leigh Star. 1999. The ethnography of infrastructure. +American behavioral scientist + 43, 3 (1999), 377–391. + + +[117] Susan Leigh Star and Anselm Strauss. 1999. Layers of silence, arenas of voice: The ecology of visible and invisible work. +Computer Supported Cooperative Work (CSCW) + 8, 1-2 (1999), 9–30. + + +[118] Anselm Strauss. 1988. The articulation of project work: An organizational process. +Sociological Quarterly + 29, 2 (1988), 163–178. + + +[119] Anselm Strauss and Juliet Corbin. 1994. Grounded theory methodology. +Handbook of Qualitative Research + 17 (1994), 273–85. + + +[120] Lucy Suchman. 1995. Making work visible. +Commun. ACM + 38, 9 (1995), 56–64. + + +[121] Lucy Suchman. 2007. +Human-machine Reconfigurations: Plans and Situated Actions +. Cambridge University Press. + + +[122] E. Carr Summerson and Michael Lempert. 2016. +Scale: Discourse and Dimensions of Social Life +. University of California Press. + + +[123] Don Tapscott and Anthony D Williams. 2008. +Wikinomics: How Mass Collaboration Changes Everything +. Penguin. + + +[124] Nathaniel Tkacz. 2014. +Wikipedia and the Politics of Openness +. University of Chicago Press. + + +[125] Linus Torvalds and David Diamond. 2002. +Just for Fun: The Story of an Accidental Revolutionary +. Harper Business. + + +[126] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Let’s Talk about It: Evaluating Contributions through Discussion in GitHub. In +Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering + (Hong Kong, China) (FSE 2014). ACM, New York, NY, USA, 144–154. https://doi.org/10.1145/2635868.2635882 + + +[127] José Van Dijck and David Nieborg. 2009. Wikinomics and its discontents: A critical analysis of Web 2.0 business manifestos. +New Media & Society + 11, 5 (2009), 855–874. + + +[128] Kazys Varnelis. 2008. +Invisible City: Telecommunication +. Actar Barcelona, New York. + + +[129] Juhani Warsta and Pekka Abrahamsson. 2003. Is open source software development essentially an agile method. In +Proceedings of the 3rd Workshop on Open Source Software Engineering +. 143–147. + + +[130] Steve Weber. 2004. +The Success of Open Source +. Harvard University Press. + + +[131] Kangning Wei, Kevin Crowston, U. Yeliz Eseryel, and Robert Heckman. 2017. Roles and politeness behavior in community-based free/libre open source software development. +Information & Management + 54, 5 (2017), 573–582. https://doi.org/10.1016/j.im.2016.11.006 + + +[132] Andrea Wiggins. 2013. Free as in puppies: compensating for ICT constraints in citizen science. In +Proceedings of the 2013 Conference on Computer Supported Cooperative Work +. 1469–1480. + + +[133] Susan Winter, Nicholas Berente, James Howison, and Brian Butler. 2014. Beyond the organizational ‘container’: Conceptualizing 21st century sociotechnical work. +Information and Organization + 24, 4 (2014), 250–269. + + +[134] Alexey Zagalsky, Carlos Gómez Teshima, Daniel M German, Margaret-Anne Storey, and Germán Poo-Caamaño. 2016. How the R community creates and curates knowledge: A comparative study of stack overflow and mailing lists. In +Proceedings of the 13th International Conference on Mining Software Repositories +. 441–451. + + +Received June 2020; revised October 2020; accepted December 2020 +---------------------------------------- +------------------------------- +Section 75: +License usage and changes: a large-scale study on GitHub + + +Christopher Vendome1 · Gabriele Bavota2 · Massimiliano Di Penta3 · Mario Linares-Vásquez1 · Daniel German4 · Denys Poshyvanyk1 + + +Published online: 6 June 2016 +© Springer Science+Business Media New York 2016 + + +Abstract Open source software licenses determine, from a legal point of view, under which conditions software can be integrated and redistributed. The reason why developers of a project adopt (or change) a license may depend on various factors, e.g., the need for ensuring compatibility with certain third-party components, the perspective towards redistribution or commercialization of the software, or the need for protecting against somebody else’s commercial usage of the software. This paper reports a large empirical study aimed at quantitatively and qualitatively investigating when and why developers adopt or change software licenses. Specifically, we first identify license changes in 1,731,828 commits, representing the entire history of 16,221 Java projects hosted on GitHub. Then, to understand the rationale of license changes, we perform a qualitative analysis on 1,160 projects written in seven different programming languages, namely C, C++, C#, Java, Javascript, Python, and Ruby—following an open coding approach inspired by grounded theory—on commit messages and issue tracker discussions concerning licensing topics, and whenever possible, try to build traceability links between discussions and changes. On one hand, our results highlight how, in different contexts, license adoption or changes can be triggered by various reasons. On the other hand, the results also highlight a lack of traceability of when and why licensing changes are made. This can be a major concern, because a change in the license of a system can negatively impact those that reuse it. In conclusion, results of the study trigger + + +Communicated by: Lin Tan + + +Christopher Vendome +cgvendome@email.wm.edu + + +1 The College of William and Mary, Williamsburg, VA, USA +2 Free University of Bozen-Bolzano, Bozen-Bolzano, Italy +3 University of Sannio, Benevento, Italy +4 University of Victoria, British Columbia, Canada +the need for better tool support in guiding developers in choosing/changing licenses and in keeping track of the rationale of license changes. + + +Keywords + Software licenses · Mining software repositories · Empirical studies +---------------------------------------- +------------------------------- +Section 76: +1 Introduction + + +In recent and past years, the diffusion of Free and Open Source Software (FOSS) projects is increasing significantly, along with the availability of forges hosting such projects (e.g., SourceForge(^1) or GitHub(^2)) and foundations supporting and promoting the development and diffusion of FOSS (e.g., the Apache Software Foundation,(^3) the GNU Software Foundation,(^4) or the Eclipse Software Foundation(^5)). The availability of FOSS projects is a precious resource for developers, who can reuse existing assets, extend/evolve them, and in this way create new work productively and reduce costs. For example, a blog post by IBM(^6) outlines the reasons pushing companies to reuse open source code: “Yes, this [the cost factor] is one of the most important factors that attract not only the small companies or start-up’s but also the big corporations these days”. This can happen not only in the context of open source projects, but it is more and more frequent in commercial projects. In a survey conducted by Black Duck,(^7) it was found that 78 % of the companies use open source code (double from 2010), 93 % claimed an increase in open source reuse, 64 % contribute to open source development, and over 55 % indicated a lack of formal guidance when utilizing open source code. The findings by Black Duck demonstrate two key implications: i) commercial reuse of open source code has been increasing, and ii) in general, there is a lack of oversight in how this reuse occurs. + + +Nevertheless, whoever is interested in integrating FOSS code in their software project (and redistributing along with the project itself), or modifying existing FOSS projects to create a new work—referred to as “derivative work”—must be aware that such activities are regulated by +software licenses + and in particular by the specific FOSS license of the project being reused. In order to license software projects, developers either add a +licensing statement + to source code files (as a comment at the beginning of each file) and/or include a textual file containing the license statement in the project source code root directory or in its sub-directories. + + +Generally speaking, FOSS licenses can be classified into +restrictive + (also referred to as “copyleft” or “reciprocal”) and +permissive + licenses. A restrictive license requires developers to use the same license to distribute new software that incorporates software licensed under such restrictive license (i.e., the redistribution of the derivative work must be licensed under the same terms); meanwhile, permissive licenses allow re-distributors to incorpo- + + + + + + +http://sourceforge.net + + +https://github.com + + +https://www.apache.org + + +http://www.gnu.org + + +http://www.eclipse.org/ + + +https://www.ibm.com/developerworks/community/blogs/6e6f6d1b-95c3-46df-8a26-b7efd8ee4b57/entry/why_big_companies_are_embracing_open_source119?lang=en + + +https://www.blackducksoftware.com/future-of-open-source + +rate the reused software under a difference license (Singh and Phelps 2009, Free Software Foundation 2015). The +GPL + (in all of its versions) is a classic example of a restrictive license. In Section 5 of the +GPL-3.0 +, the license addresses code modification stating that “You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy” (http://www.gnu.org/licenses/gpl.html). The +BSD + licenses are examples of permissive licenses. For instance, the +BSD 2-Clause + has two clauses that detail the use, redistribution, and modification of licensed code: (i) the source must contain the copyright notice and (ii) the binary must produce the copyright notice and contain the disclaimer in documentation (http://opensource.org/licenses/BSD-2-Clause). + + + + +When developers (or organizations) decide to make a project available as open source, they can license their code under one or many different existing licenses. The choice may be dictated by the set of dependencies that the project has (e.g., what libraries it uses) since those dependencies might have specific licensing constraints to those that reuse them. For instance, if a project links (statically) some +GPL + code, then it must be released under the same +GPL + version; failing to fulfill such a constraint could create a potential legal risk. Also, as shown by Di Penta et al. (2010), the choice of the licenses in a FOSS project may have a massive impact on its success, as well as on projects using it. For example—as it happened for the IPFilter project (http://www.openbsd.org/faq/pf)—a highly restrictive license may prevent others from redistributing the project (in the case of IPFilter, this caused its exclusion from the OpenBSD distributions). An opposite case is the one of MySQL connect drivers, originally released under +GPL-2.0 +, whose license was modified with an exception (Oracle http://www.mysql.com/about/legal/licensing/foss-exception/) to allow the driver’s inclusion in other software released under some open source licenses, which would otherwise be incompatible with the +GPL + (e.g., the original +Apache + license). In summary, the choice of the license—or even a decision to change an existing license—is a crucial crossroad point in the context of software evolution of every FOSS project. + + +In order to encourage developers to think about licensing issues early in the development process, some forges (e.g., GitHub) have introduced mechanisms such as the possibility of picking the project license at the time the repository is created. Also, there are some Web sites (e.g., http://choosealicense.com) helping developers to choose a license. Furthermore, there are numerous research efforts aimed at supporting developers in classifying source code licenses (Gobeille 2008; Germán et al. 2010b) and identifying licensing incompatibilities (Germán et al. 2010a). Even initiatives such as the Software Package Data Exchange (SPDX) (http://spdx.org) have been aimed at proposing a formal model to document the license of a system. However, despite of the effort put by the FOSS community, researchers, and independent companies, it turns out that developers usually do not have a clear idea on the exact consequences of licensing (or not) their code using a specific license, or they are unsure (for example, on how to re-distribute code licensed with a dual license among the other issues Vendome et al. 2015b). + + +Paper Contributions + This paper reports the results of a large empirical study aimed at quantitatively and qualitatively investigating when and why licenses change in open source projects, and to what extent is it possible to establish traceability links between licensing related-discussions and changes. First, we perform a quantitative analysis conducted on 16,221 Java projects hosted on GitHub. To conduct this study, we first mined the entire change history of the projects, extracting the license name (e.g., +GPL + or +Apache +) and version (e.g., v1, v2), when applicable, from each of the 4,665,611 files involved in a total of 1,731,828 commits. Starting from this data, we provide quantitative evidence on (i) the +diffusion of licenses in FOSS systems, (ii) the most common license-change patterns, and (iii) the traceability between the license changes to both the commit messages and the issue tracker discussions. After that, following an open coding approach inspired by grounded theory (Corbin and Strauss 1990), we qualitatively analyze a sample of commit messages and issue tracker discussions likely related to license changes. Such a qualitative analysis has been performed on 1,160 projects written in seven different languages: 159 C, 91 C++, 78 C#, 324 Java, 166 Javascript, 147 Python, and 195 Ruby projects. The results of this analysis provide a rationale on why developers adopt specific license(s), both for initial licensing and for licensing changes. + + +The study reported in this paper poses its basis on previous work aimed at exploring license incompatibilities (Germán et al. 2010a), license changes (Di Penta et al. 2010), license evolution (Manabe et al. 2010), and integration patterns (Germán and Hassan 2009). Building upon previous work on licensing analysis, this paper: + + + + +Constitutes, to the best of the authors’ knowledge, the largest study aimed at analyzing the change patterns in licensing of software systems (earlier work was limited to the analysis of up to six projects Manabe et al. 2010; Di Penta et al. 2010). + + +To the best of our knowledge, it is the first work aimed at explaining the rationale of license changes by means of a qualitative analysis of commit notes and issue tracker discussions. + + + + +The achieved results suggest that determining the appropriate license of a software project is far from trivial and that a community’s usage and expectations can influence developers when picking a license. We also observe that licensing expectations may be different based on the programming language. Although choosing a license is considered important for developers, even from early releases of their projects, forges and third party-tools provide little or no support to developers when performing licensing-related tasks, e.g., picking a license, declaring the license of a project, changing license from a restrictive one towards a more permissive one (or vice versa) and, importantly, keeping track of the rationale for license changes. For example, during the creation of a new repository, GitHub allows the user to select an initial license from a list of commonly used ones, but offers no guidance on the implications of such a choice, and simply redirects the user to http://choosealicense.com/; aside from this, GitHub offers no support for licensing management. Also, there is a lack of consistency and standardization in the mechanism that should be used for declaring a license (e.g., putting it in source code heading comments, separate license files, README files, etc.). Moreover, the legal nature of the licenses exacerbate this problem since the implications and grants or restrictions are not always clear for developers when the license is present. Last, but not least, the currently available Software Configuration Management (SCM) technology provides no support to trace licensing-related discussions and decisions onto actual changes, whereas such traceability links can be useful to understand the impact of such decisions. + + +Paper Structure + The paper is organized as follows. Section 2 relates this work to the existing literature on licensing analysis. Section 3 describes the study design and details the data analysis procedure. Results are reported and discussed in Section 4. Lessons learned from the study results are summarized in Section 5, while Section 6 discusses the threats to the study’s validity. Finally, Section 7 concludes the paper and outlines directions for future work. +2 Related Work + + +Our work is mainly related to (i) techniques and tools for automatically identifying and classifying licenses in software artifacts, and (ii) empirical studies focusing on different aspects of license adoption and evolution. + + +2.1 Identifying and Classifying Software Licenses + + +The problem of license identification has firstly been tackled in the FOSSology project (Gobeille 2008) aimed at building a repository storing FOSS projects and their licensing information and using a machine learning approach to classify licenses. Tuunanen et al. (2009) proposed ASLA, a tool aimed at identifying licenses in FOSS systems; the tool has been shown to determine licenses in files with 89% accuracy. + + +Germán et al. (2010b) proposed Ninka, a tool that uses a pattern-matching based approach for identifying statements that characterize various licenses. Given any text file as an input, Ninka outputs the license name and version. In the evaluation reported by the authors, Ninka achieved a precision $\sim 95\%$ while detecting licenses. Ninka is currently considered the state-of-the-art tool in the automatic identification of software licenses. + + +While the typical license classification problem arises when source code is available, in some cases, source code is not available—i.e., only byte code or binaries are available—and the goal is to identify whether the byte code has been produced from source code under a certain license. To this aim, Di Penta et al. (2010) combined code search and textual analysis to automatically determine a license under which jar files were released. Their approach automatically infers the license from decompiled code by relying on the Google Code search engine. Note that, differently from the previous techniques, the approach in Di Penta et al. (2010) is only able to identify the license family (e.g., GPL) without specifying the version (e.g., 2.0). + + +2.2 Empirical Studies on Licenses Adoption and Evolution + + +Di Penta et al. (2010) investigated—on six open source projects written in C, C++ and Java—the migration of licenses over the course of a project’s lifetime. The study suggests that licenses changed version and type during software evolution, but there was no generic patterns generalizable to the six analyzed FOSS projects. Also, Manabe et al. (2010) analyzed the changes in licenses but of FreeBSD, OpenBSD, Eclipse, and ArgoUML, finding that each project had different evolution patterns. + + +Germán and Hassan (2009) analyzed 124 open source packages exploited by several applications to understand how developers deal with license incompatibilities. Based on this analysis, they built a model outlining when specific licenses are applicable and what are their advantages and disadvantages. Later, Germán et al. (2010a) presented an empirical study focused on the binary packages of the Fedora-12 Linux distribution aimed at (i) understanding if licenses declared in the packages were consistent with those present in the source code files, and (ii) detecting licensing issues derived by dependencies between packages; they were able to find some licensing issues confirmed by Fedora. + + +Germán et al. (2009) analyzed the presence of cloned code fragments between the Linux Kernel and two distributions of BSD, i.e., OpenBSD and FreeBSD. The aim was to verify whether the cloning was performed in accordance to the terms of the licenses. Results +show that, in most cases, these code-migrations were admitted since they went from less restrictive licenses towards more restrictive ones. + + +Wu et al. (2015) investigated license inconsistencies between cloned files. They performed an empirical study on Debian 7.5 to demonstrate the ways in which licensing can become inconsistent between the file clones (e.g., the removal of a license in one of the clone pairs). + + +In our previous work (Vendome et al. 2015a), we focused our analysis only on Java projects. In this work, we expand our analysis to include six new languages—C, C++, C#, Javascript, Python, and Ruby. Also, our new grounded theory analysis features a categorization of commit messages and issue discussions into seven categories, in turn further detailed in a total of 27 sub-categories. In addition to extracting new support and rationale, we also defined new sub-categories and subsequently distilled lessons from this new data. For example, we observed that asserting a license is not standardized or consistent across languages, and it would benefit developers to have a consistent means of documenting and presenting the license of a system within a forge. + + +Vendome et al. (2015b) conducted a survey with developers that contributed to projects that had experienced changes in licensing to understand the rationale for adopting and changing licensing. The survey results indicated that facilitating commercial reuse is a common reason for license changes. Also the survey highlighted that, in general, developers have a lack of understanding of the legal implications of open source licenses, highlighting the need for recommenders aimed at supporting them in choosing and changing licenses. + + +While we share similar goals with prior related work—understanding insights into license usage and migration—our analysis is done on a much larger scale, including a (i) quantitative analysis on 16,221 Java projects, and (ii) a qualitative analysis upon a sample of commit messages and issue tracker discussions from 1,160 projects written in seven different programming languages. The latter allowed us to perform in-depth analysis of the rationale behind license usages and migrations. +---------------------------------------- +------------------------------- +Section 77: +3 Design of the Empirical Study + + +The goal of our study is to investigate license adoption and evolution in FOSS projects, with the purpose of understanding the overall rationale behind picking a particular license or changing licenses and of determining the underlying license change patterns. The perspective is of researchers interested in understanding what are the main factors leading towards specific license adoption and change. The context consists of (i) the change history of 16,221 Java open source projects mined from GitHub, which will be used to quantitatively investigate the goals of the study, and (ii) commit messages and issue tracker discussions from 1,160 projects written in seven different programming languages (i.e., C, C++, C#, Java, JavaScript, Python, and Ruby), which will be exploited for qualitative analysis. + + +3.1 Research Questions + + +We aim at answering the following research questions: + + + + + + +RQ1 + What is the usage of different licenses by projects in GitHub? This research question examines the proportions of different types of licenses that are introduced by FOSS projects hosted in GitHub. In doing this, we should consider that GitHub is a relatively young forge (launched in April 2008), which has seen exponential growth in the number +of projects over the past few years, and that most of the projects it hosts are young in terms of the first available commit or the date that the repository was created. + + + + + + +RQ₂ + What are the most common licensing change patterns? Our second research question investigates the popular licensing change patterns in the GitHub Open Source community with the aim of driving out—from a qualitative point of view—the rationale behind such change patterns (e.g., satisfying dependency constraints). + + + + + + +RQ₃ + To what extent are licensing changes documented in commit messages or issue tracker discussions? This research question investigates on whether licensing changes in a system can be traced to commit messages or issues’ discussions. + + + + + + +RQ₄ + What rationale do these sources contain for the licensing changes? This research question investigates the rationale behind the particular change in license(s) from a developer’s perspective. + + + + + + +We address our four research questions by looking at the licensing phenomenon from two different points of view, namely (i) a +quantitative + analysis of the licenses under which projects were released, their changes across their evolution history, and the ability to match these changes to either commit messages or issue tracker discussions; and (ii) a +qualitative + analysis of licensing-related discussions made by developers over the issue trackers and of the way in which developers documented licensing changes through commit messages. + + +For the quantitative analysis of licensing changes, we are interested in analyzing license migration patterns that fall in the following three categories: + + + + +No license → some License(s) – N2L. + This reflects the case in which developers realized the need for a license and added a licensing statement to files; + + +some License(s) → No license – L2N. + In this case, for various reasons, licensing statements have been removed from source code files; for example, because a developer accidentally added a wrong license/license version; + + +some License(s) → some other License(s) – L2L. + This is the most general case of a change in licensing between distinct licenses. + + + + +To address +RQ₁ +, +RQ₂ +, and +RQ₃ +, we perform a quantitative analysis by mining the version history of 16,221 Java projects, while to address +RQ₄ + we perform a qualitative analysis on the commit messages and issue tracker discussion of the 1,160 projects written in seven different programming languages. In the following subsections, we describe the two kinds of analysis in detail. +---------------------------------------- +------------------------------- +Section 78: +3.2 Quantitative Analysis + + +In order to generate the dataset to be used in the study, we mined the version history of 16,221 Java projects publicly available on GitHub. GitHub hosts over twelve million Git repositories covering many popular programming languages, and provides a public API (https://developer.github.com/v3/) that can be used to query and mine project information. Also, the Git version control system allows for local cloning of the entire repository, which facilitates the comprehensive analysis of the project change-history and thus of the license changes happened in each commit. + + +To extract data for our quantitative analysis, we first identified a comprehensive list of projects hosted on GitHub by implementing a script exploiting GitHub’s APIs. The computation of the comprehensive list resulted in over twelve million projects. Since the infrastructure we use for license extraction only supports Java systems (as it will be explained later), we filtered out all systems that were not written in Java, obtaining a list +of 381,161 Java projects hosted on GitHub. We cloned all 381,161 git repositories locally for a total of 6.3 Terabytes of storage space. In our analysis, we randomly sampled 16,221 projects due to the computation time of the aforementioned infrastructure. + + +Once the Git repositories had been cloned, we used a code analyzer developed in the context of the MARKOS European project (Bavota et al. 2014) to extract license information at commit-level granularity. The MARKOS code analyzer uses the Ninka license classifier (Germán et al. 2010b) to identify and classify licenses contained in all the files hosted under the version control system of each project. For each of the 16,221 projects in our study, the MARKOS code analyzer mined the change log, producing the following information for each commit: + + + + +Commit Id: + The identifier of the commit that is currently checked out from the Git repository and analyzed; + + +Date: + The timestamp associated with the commit; + + +Author: + The person responsible for the commit; + + +Commit Message: + The message attached to the commit; + + +File: + The path of the files committed; + + +Change to File: + A field to indicate whether each file involved in the commit was Added, Deleted, or Modified; + + +License Changed: + A boolean value indicating whether the particular file has experienced a change in license in this commit with respect to its previous version. This feature applies to modified files only. In the case of an addition or deletion of a file, this field is set to false; + + +License: + The name and version (e.g., GPL-2.0) of each license applied to the file. + + + + +The computation of such information for all 16,221 projects took almost 40 days, and resulted in the analysis of a total of 1,731,828 developers’ commits involving 4,665,611 files. Note that for the BSD and CMU licenses Ninka was not able to correctly identify its variants (reporting it as BSD var and CMU var). Additionally, the GPL and the LGPL may contain a “+” after the version number (e.g., 3.0+), which represents a clause in the license granting the ability to use future versions of the license (i.e., the GPL-2.0+ would allow for utilization under the terms of the GPL-3.0). Also, we have values of “no license” and “unknown”, which represents the case that no license was attached to the file or Ninka was unable to determine the license. + + +To determine whether there is a trend in the proportions of adopted licenses over the observed years, we used the Augmented Dickey-Fuller (ADF) test (Dickey and Fuller 1979, 1981). This test is widely used to test stationarity of time series. The test can be used to reject two different null hypotheses $H_0$: the time series is not significantly stationary or $H_0$: the time series is not significantly explosive; the latter can be used to determine whether there is a significantly increasing trend in the time series. In our statistical tests, we considered a significance level of 0.05 (i.e., we rejected null hypotheses for $p$-values < 0.05). + + +We quantitatively analyzed the collected data by presenting descriptive statistics about the license adoption and the most common atomic license changes that we found. The latter are defined as the commits in which we detected a specific kind of license change within at least one source code or textual file. For example, given a commit with three files experiencing the licensing change No license $\rightarrow$ Apache-2.0, and 10 files with GPL-2.0 $\rightarrow$ GPL-3.0, the atomic license changes from that commit are one No License $\rightarrow$ Apache-2.0 change and one GPL-2.0 $\rightarrow$ GPL-3.0 change. We prefer not to count the number of changes at file level as it was done in previous work (Di Penta et al. 2010) to avoid inflating our analysis because of large commits and to make comparable commits performed on both small +and large projects. It is possible that this coarse-grained analysis may fail to capture some license changes, for example due to a change in licensing of a dependency, although also in this case, in principle, the licensing changes should be reflected at project level when appropriate. + + +In the end, we identified a total of 1,833 projects with atomic license changes out of our dataset of 16,221 projects. This subset of projects was used to investigate license change traceability. Intuitively, we require the presence of license changes in order to determine how well changes in licensing are documented in either the commit messages or issue tracker discussion. Therefore, we used a web crawler to identify, among these 1,833 projects, those using the GitHub issue tracker, finding a total of 1,586 projects having at least one issue on it. To link the licensing changes to commit messages/issue reports, we performed both string matching and date matching between either the commit messages or the issue tracker discussions and the extracted licensing information (e.g., license name or date that license was committed). We decided to rely on commit messages and issue discussions because (i) these two sources of information are publicly available for the considered subject projects; and (ii) both commit messages and issue discussions are likely to report, with a different level of detail, the rationale behind a specific change implemented (or just considered in the case of issues) by developers, including changes related to software licenses. + + +3.3 Qualitative Analysis + + +Our qualitative analysis aims at answering RQ4 and it is based on manual inspection and categorization of commit messages and issue tracker discussions. Since we do not have limitations in terms of the project’s programming language to analyze (unlike the quantitative analysis), we performed our qualitative analysis on commit messages and issue tracker discussions from a set of 1,160 projects written in seven different languages: 159 C, 91 C++, 78 C#, 324 Java, 166 Javascript, 147 Python, and 195 Ruby projects. Note that the choice of the languages considered in our study is not random: we focused on seven of the ten most popular programming languages during 2014 and 2015 (Zapponi http://githut.info; Cass http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages). + + +The considered projects were instead selected by applying the following procedure. Firstly, from our list of twelve million repositories, we extracted those written in the seven languages of interest. Then, we extracted only the repositories satisfying the following two criteria: (i) they were not forks of the main repository, and (ii) they had at least one star (i.e., at least one user expressed appreciation for the repository) or watcher (i.e., at least one user asked to receive notification about changes made in the repository). These selection criteria were used to exclude from our analysis personal repositories (e.g., the website of a GitHub user) that might have biased our results. However, it is important to note that for Java, we considered the comprehensive list of all 381,161 projects. In our initial investigation of Java projects (Vendome et al. 2015a), we observed the need for refinement that was thus adopted for the additional six languages, because we observed a high proportion of false positive commit messages and issues discussions. Thus, the filtering sought to improve the generated taxonomy. + + +Then, we extracted the change log of the cloned projects in order to analyze them and identify the commit messages likely related to licensing. In total, 103,128,211 commits were considered. To identify commit messages likely related to license changes, we adopted a case-insensitive keyword-based filtering based on the critical words exploited by Ninka during license identification, and augmented them with license names. The detailed set of keywords being used for this matching is reported in Table 1. In some cases, our +keyword-filters included bi-grams composed of the license type and version, since some license types (e.g., apache) produced a very large amount of false positive discussions when they were considered alone (e.g., all the commit message talking about Apache projects). + + +In the end, the keyword-based filtering allowed us to identify a total of 746,874 commit messages (742,671 for Java, which amounted to approximately $\sim 1\%$ of the overall commits for Java). Given the high number of relevant commits, we sampled 20\% of the commits found for each language as object of our manual inspection. However, we set a minimum threshold of 100 commits per language, and a maximum threshold of 500. These thresholds were adopted to ensure representativeness for each of the studied language, while keeping the manual analysis effort reasonable. Note that our sampling is statistically significant with a 95\% confidence interval $\pm 10\%$ or better. This resulted in a total of 1,413 commits to be inspected. It is worth noting that for Java projects, in addition to the 500 sampled commit messages matching the keywords in Table 1, we also considered 224 randomly sampled commit messages from the commits of the 1,833 projects in which we identified (in our quantitative analysis) an instance of an atomic license change, because we were interested in investigating the reasons behind such changes. Clearly, this was not possible for the systems written in other programming languages that, as said before, were not part of our quantitative analysis. The number of sampled commits by each programming language is reported in the second column of Table 2. + + +| Language | #of commits | # of issue tracker discussions | +|----------|-------------|--------------------------------| +| C | 227 | 30 | +| C# | 100 | 6 | +| C++ | 139 | 12 | +| Python | 130 | 41 | +| Java | 724 | 273 | +| JavaScript | 122 | 79 | +| Ruby | 195 | 45 | +| Overall | 1,637 | 486 | +Concerning the issue tracker discussions, we built a Web crawler collecting the information present in all issue trackers of the studied projects. In particular, for each issue, our crawler collected (i) its title and description, (ii) the text of each comment added to it, (iii) and the date the issue was opened and closed (when applicable). Then, in order to find the relevant issues (i.e., those presenting discussions about software licenses), we used a keyword search mechanism aimed at matching, in the issue title, keywords related to licensing (as previously explained for the commit messages). By applying this procedure, we identified a total of 486 issue discussions potentially related to licensing, as shown in the third column of the Table 2. + + +After collecting commit messages and issue discussions, in order to analyze and categorize them, we followed an open coding process inspired by the Grounded Theory (GT) principles formulated by Corbin and Strauss (1990). This analysis of commit messages and issue tracker discussions aimed at finding the rationale for licensing changes; in particular, we aimed at answering the following two sub-questions: What are the reasons pushing developers to associate a particular license to their project? and What causes them to migrate licenses or release their project under a new license (i.e., co-licensing)? + + +To perform the open coding, we distributed the commit messages and the issue tracker discussions among the authors such that two authors were randomly assigned to each message (a message can be a commit message or an entire issue tracker discussion). After each round of open coding in which the authors independently created classifications for the messages, we met to discuss the coding identified by each of us, and we refined them into categories. Note that during each round the categories defined in previous rounds were refined accordingly to the new knowledge created from the additional manual inspections and from the authors’ discussions. Overall, the open coding concerned (i) 1,413 randomly selected licensing-related commit messages identified via the keywords-based mechanism; (ii) the 224 commit messages from the Java systems’ commits where a licensing change was observed in our quantitative analysis; and (iii) the 486 issue tracker discussions matching licensing-related keywords. The output of our open coding procedure is a set of categories and group explaining why licenses are adopted and changed. We qualitatively discuss the findings of this analysis in Section 4.4, presenting our categories classification and examples of commit messages and issue tracker discussions belonging to the various categories. + + +3.4 Dataset Diversity Analysis + + +To get an idea of the external validity of our dataset, we measured the diversity metric proposed by Nagappan et al. (2013) for our dataset by matching the list of our mined projects from GitHub to the list of available projects from Boa (Dyer et al. 2013). Given the different datasets exploited in the context of our quantitative and qualitative analysis, we discuss the diversity metrics separately. + + +3.4.1 Quantitative Analysis + + +We were able to match by name 1,556 out of the 16,221 projects exploited in our quantitative analysis against the names of the projects in the diversity metric dataset by Nagappan et al. (2013). This subset was used in the computation of the diversity metric, obtaining a score. + + + + +8We looked for the target keywords only in the issue titles, because we found that including the issue descriptions in the search generates a considerable number of false positives. +of 0.35, indicating that around 10% of our dataset covers just over a third of the open source projects according to six dimensions: programming language, developers, project age, number of committers, number of revisions, and number of programming languages. The dimensional scores are 0.45, 0.99, 1.00, 0.99, 0.96, 0.99, respectively, suggesting that our subset covers the relevant dimensions for our analysis. However, the focus on Java projects limits the programming language score, affecting the overall score. + + +Another important aspect to evaluate is the representativeness of the licenses present in our dataset with respect to those diffused in the FOSS community. The Open Source Initiative (OSI) specifies a list of 70 approved licenses, indicating the ones reported in the first column of Table 3 as the most commonly used in FOSS software (they do not specify any order). The second column of Table 3 reports the top licenses as extracted from the FLOSSmole’s SourceForge snapshot of December 2009 (Howison et al.), while the third column shows the top licenses as extracted from our sample of GitHub projects exploited for the quantitative analysis. + + +The licenses declared by OSI as the most commonly used were also the most commonly found in our dataset (BSD 2 and 3 fall both in the BSD type). In the comparison between our dataset and SourceForge, while the order of diffusion for the different licenses is not exactly the same, six of the top eight licenses in SourceForge are also present in our dataset (all but Public Domain and Academic Free License). This analysis, together with the diversity metric, suggests that the dataset we exploited in our quantitative analysis is representative of Open Source systems. + + +Table 4 reports the year of the first commit date for each of the 16,221 considered projects. This table clearly shows the exponential growth of GitHub until 2012, confirming what already was observed by people in the GitHub community (Doll http://tinyurl.com/muyxkru). While GitHub also experienced exponential growth in 2013 (https://octoverse.github.com/), our dataset does not mirror this fact. This is due to a design choice we made while randomly choosing the projects to clone. In particular, we cloned projects during January 2014, excluding projects with a commit history less than one year from the set of 381,161 Java projects (i.e., projects with the first commit performed no later than January 2013). This was needed since, in the context of RQ$_2$, we are interested in observing migration patterns occurring over the projects’ history. Thus, projects having a very short commit history were not likely to be relevant for the purpose of this study. Moreover, since in RQ$_1$ we are interested in observing licenses’ usage in the context of the GitHub’s drastic + + +| OSI popular license (unordered) | SourceForge (Dec. 2009) | Our Github data set (Quant. Analys.) | +|--------------------------------|-------------------------|-------------------------------------| +| Apache-2 Lic | GNU Public Lics | GNU Public Lics | +| BSD 2-Clause Lic | Lesser GNU Public Lics | Apache Lics | +| BSD 3-Clause Lic | BSD Lics | Lesser GNU Public Lics | +| GNU Public Lics | Apache Lics | MIT Lic | +| Lesser GNU Public Lics | Public Domain | Eclipse Public Lic | +| MIT Lic | MIT Lic | Comm. Dev. and Dist. Lic | +| Mozilla Public Lic 2 | Academic Free Lic | Mozilla Public Lic | +| Comm. Dev. and Dist. Lic | Mozilla Public Lics | BSD Lics | +| Eclipse Public Lic | | | +expansion, we decided to exclude the 60 projects having the first commit in 2013 from our analysis due to the severe lack of representation in our sample despite the continued growth of GitHub. +---------------------------------------- +------------------------------- +Section 79: +3.4.2 Qualitative Analysis + + +Similarly, we were able to match by name 471 out of the 1,160 projects (against the names of the projects in the diversity metric dataset (Nagappan et al. 2013)) from which we manually investigated commit messages and issue discussions in our qualitative analysis. As done for the quantitative analysis, we considered the matched subset for the computation of the diversity metric, obtaining a score of 0.32, indicating that ( \sim 40\% ) of our dataset covers just under a third of the open source projects according to six dimensions: programming language, developers, project age, number of committers, number of revisions, and number of programming languages. The dimensional scores are 0.43, 0.99, 1.00, 0.99, 0.94, and 1.0, respectively. Intuitively, these scores are directly impacted by the limited number of projects that we were able to match. However, we still observe relatively high diversity scores suggesting that our qualitative analysis is representative for a substantial portion of the open source systems. +---------------------------------------- +------------------------------- +Section 80: +3.5 Replication Package + + +The working data set of our study is available at: http://www.cs.wm.edu/semeru/data/EMSE15-licensing. It includes (i) the lists of projects and their urls, (ii) the issues tracker and commit data, (iii) the analysis scripts, and (iv) a summary of the achieved results. +---------------------------------------- +------------------------------- +Section 81: +4 Study Results + + +This section discusses the achieved results answering the four research questions formulated in Section 3.1. +---------------------------------------- +------------------------------- +Section 82: +4.1 RQ 1: What is the Usage of Different Licenses in GitHub? + + +Figure 1 depicts the percentage of licenses that were first introduced into a project in the given year, which we refer to as relative license usage. We only report the first occurrence of each license committed to any file of the project. To ease readability, the bars are grouped by permissive (dashed bars) or restrictive licenses (solid bars). Additionally, we omit data prior to 2002 due to the limited number of projects created during those years in our sampled dataset (see Table 4). +For the year 2002, we observed that restrictive licenses and permissive licenses had been used approximately equally with a slight bias towards using restrictive licenses. Although the LGPL-2.1 and LGPL-2.1+ variants are restrictive licenses, they are less restrictive than their GPL counter-part. The LGPL specifically aimed at ameliorating licensing conflicts that arose when linking code to a non-(L)GPL library. Instead, the various versions of the GPL license require the system to change its license to the same version of the GPL, or else the component would not legally be able to be redistributed together with the project source code. Thus, it suggests a bias toward using less restrictive licenses even among the mostly used copyleft licenses. By the subsequent year (2003), a clear movement towards using less restrictive licenses can be seen with the wider adoption of the MIT/X11 license as well as the Apache-1.1 license. Additionally, we observe that the LGPL is still prominent, while the CMU, CPL-1.0, and GPL-2.0+ licenses were declining. + + +During the following five years (2004–2008), the Apache-2.0, CDDL-1.0, EPL-1.0, GPL-3.0, LGPL-3.0, and DWTFYW-2 licenses were created. For the same observation period, Bavota et al. found that the Apache ecosystem grew exponentially (Bavota et al. 2013). This observation explains the rapid diffusion of the Apache-2.0 license among FOSS projects. We observed a growth that resulted in the Apache-2.0 license accounting for approximately 41% of licensing in 2008. Conversely, we observed a decline in the relative usage of both GPL and LGPL licenses. These two observations suggest a clear shift toward permissive licenses, since ( \sim 60\% ) of licenses attributed were permissive starting from 2003 (with small drops in 2007 and 2009). + + +Another interesting observation was that the newer version of the GPL (GPL-3.0 or GPL-3.0+) had a lower relative usage compared to its earlier version until 2011. Additionally, the adoption rate was more gradual than for the Apache-2.0 license that appears to supersede Apache-1.1 license. However, the LGPL-3.0 and LGPL-3.0+ do not have more +popularity than prior versions in terms of adoption, despite the relative decline of the LGPL-2.1’s usage starting in 2010. Our manual analysis of commits highlighted explicit reasons that pushed some developers to choose the LGPL license. For instance, a developer of the hibernate-tools project when committing the addition of the LGPL-2.1+ license to her project wrote: + + +The LGPL guarantees that Hibernate and any modifications made to Hibernate will stay open source, protecting our and your work. + + +This commit note indicates that LGPL-2.1+ was chosen as the best option to balance the freedom for reuse and guarantee that the software will remain free. + + +Conversely, we observed the abandonment of old licenses and old license versions as newer FOSS licenses are introduced. For example, Apache-1.1 and CPL-1.0 become increasingly less prevalent or no longer used among the projects. In both cases, a newer license appears to replace the former license. While the Apache-2.0 offers increased protections (e.g., protections for patent litigation), the EPL-1.0 and the CPL-1.0 are the same license, with the only difference that IBM is replaced by the Eclipse Foundation as the steward of the license. Thus, the two licenses are intrinsically the same from a legal perspective, and most likely projects migrated from the CPL to the EPL; this would explain why the EPL adoption grew as the CPL usage shrunk. + + +Finally, we observed fluctuations in the adoption of the MIT/X11 license. As the adoption of permissive licenses grew with the introduction of the Apache-2.0 license, it first declined in adoption and was followed by growth to approximately its original adoption. Ultimately, we observed a stabilization of the MIT/X11 usage at approximately 10% starting in 2007. + + +In order to determine whether the proportions for a given license exhibited a stationary trend, or a clearly increasing trend over the observed years, we performed ADF-tests as explained in Section 3.2. Results are reported in Table 5, where significant $p$-values (shown in bold face) in the second column indicate that the series is stationary ($H_0$ rejected), while significant $p$-values in the third column indicates that the series has an explosive, i.e., clearly increasing, trend ($H_{0e}$ rejected). The results indicate that: + + + + +Almost no license is exhibiting a stationary trend. The results only show significant differences for the zend-2.0 license, which is not particularly popular, and a marginal significance for CMU, CPL-1.0 and GPL-1.0+. + + +Confirming the discussion above, we have a clearly increasing trend not only for permissive licenses such as Apache-2.0 and MIT/X11 but also for new versions of restrictive licenses facilitating the integration with other licenses (in particular, GPL-3.0, which eases the compatibility with the Apache license, as well as LGPL-2.0, which facilitates compatibility when code is integrated as a library). We also see an increase for DWTFYW-2.0, but, as it will be discussed in Section 5, this can be likely due to cases in which developers do not have a clear idea about the license to be used. + + + + +Summary for RQ 1 + For the analyzed Java projects, we observed a clear trend towards using permissive licenses like Apache-2.0 and MIT/X11. Additionally, the permissiveness or restrictiveness of a license can impact the adoption of newer license versions, where permissive licenses are more rapidly adopted. Conversely, restrictive licenses seem to maintain a greater ability to survive in usage as compared to the permissive licenses, which become superseded. Restrictive (GPL-3.0) or semi-restrictive (LGPL-2.0) licenses that facilitate +Table 5 The results of the augmented Dickey-Fuller test to determine stationary or explosive trends in the license usage + + +| License | Stationary trend (p-value) | Explosive Trend (p-value) | +|---------------|---------------------------|---------------------------| +| Apache-1.1 | 0.14 | 0.86 | +| Apache-2.0 | 0.98 | +0.02 + | +| BSD | 0.73 | 0.27 | +| CDDL v1 | 0.42 | 0.58 | +| CMU | 0.05 | 0.95 | +| CPL-1.0 | 0.43 | 0.57 | +| EPL-1.0 | 0.07 | 0.93 | +| DWTFYW-2.0 | 0.99 | +0.01 + | +| MPL-1.0 | 0.90 | 0.10 | +| MPL-1.1 | 0.32 | 0.68 | +| NPL-1.1 | 0.55 | 0.45 | +| svnkit+ | 0.78 | 0.22 | +| zend-2.0 | +0.01 + | 0.99 | +| MIT/X11 | 0.97 | +0.03 + | +| GPL-1.0+ | 0.05 | 0.95 | +| GPL-2.0 | 0.67 | 0.33 | +| GPL-2.0+ | 0.66 | 0.34 | +| GPL-3.0 | 0.98 | +0.02 + | +| GPL-3.0+ | 0.69 | 0.31 | +| LGPL-2.0 | 0.99 | +0.01 + | +| LGPL-2.0+ | 0.67 | 0.33 | +| LGPL-2.1 | 0.35 | 0.65 | +| LGPL-2.1+ | 0.54 | 0.46 | +| LGPL-3.0 | 0.63 | 0.37 | +| LGPL-3.0+ | 0.52 | 0.48 | + + +integration with other licenses also exhibit an increasing trend. Finally, we observed a stabilization in the license adoption proportions of particular licenses, despite the exponential growth of the GitHub code base. + + +4.2 RQ2: What are the Most Common Licensing Change Patterns? + + +We analyzed commits, where a license change occurred, with a two-fold goal (i) analyze license change patterns to understand both the prevalence and types of licensing changes affecting software systems, and (ii) understand the rationale behind these changes. Overall, we found 204 different atomic license change patterns. To analyze them, we identified the patterns having the highest proportion across projects (i.e., global patterns) and within a project (i.e., local patterns). We sought to distinguish between dominant global patterns (Table 6) and dominant local patterns (Table 7) to study, on one hand, the overall trend of licensing changes and, on the other hand, to understand specific phenomena occurring in certain projects. + + +The global patterns were extracted by identifying and counting the presence of a pattern only once per project and then aggregating the counts over all projects. For instance, 823 +Table 6 Top ten global atomic license change patterns + + +| Top Patterns (Overall) | Frequency | +|------------------------|-----------| +| no license or unknown → Apache-2.0 | 823 | +| Apache-2.0 → no license or unknown | 504 | +| no license or unknown → GPL-3.0+ | 269 | +| GPL-3.0+ → no license or unknown | 181 | +| no license or unknown → MIT/X11 | 163 | +| no license or unknown → GPL-2.0+ | 113 | +| GPL-2.0+ → no license or unknown | 111 | +| MIT/X11 → no license or unknown | 98 | +| no license or unknown → EPL-1.0 | 94 | +| no license or unknown → LGPL-2.1+ | 91 | + + +| Top Migration Patterns Between Licenses | Frequency | +|----------------------------------------|-----------| +| GPL-3.0+ → Apache-2.0 | 25 | +| GPL-2.0+ → GPL-3.0+ | 25 | +| Apache-2.0 → GPL-3.0+ | 24 | +| GPL-2.0+ → LGPL-2.1+ | 22 | +| GPL-3.0+ → GPL-2.0+ | 21 | +| LGPL-2.1+ → Apache-2.0 | 16 | +| GPL-2.0+ → Apache-2.0 | 15 | +| Apache-2.0 → GPL-2.0+ | 13 | +| MPL-1.1 → MIT/X11 | 11 | +| MIT/X11 → Apache-2.0 | 11 | + + +projects in our dataset experienced at least one change (each) from No license → Apache-2.0, thus the final count (globally) for the pattern is 823. The most dominant global patterns were either a change from either no license or an unknown license to a particular license, or a change from either a particular license to no license or an unknown license. Table 6 shows the top ten global patterns. We observe that the inclusion of Apache-2.0 was the most common pattern for unlicensed or unknown code. Clearly, this is likely due to the specific programming language (i.e., Java) exploited by the sample of projects we quantitatively analyzed. + + +Table 6 also shows the most common global migrations when focusing the attention on changes happened between different licenses. We observe that the migration towards the more permissive Apache-2.0 was a dominant change among the top ten atomic license changes for global license migrations. An interesting observation is the license upgrade and downgrade between GPL-2.0+ and GPL-3.0+. GPL-3.0 is considered by the Free Software Foundation as a compatible license with the Apache-2.0 license.9 Due to the large usage of the Apache license in Java projects, this pattern is quite expected. However, the migration GPL-3.0+ → GPL-2.0+ is interesting, since it not only still allows for the project to be redistributed as GPL-3.0 but also allows for the usage as GPL-2.0, which is less restrictive, as well. + + +Regarding the local patterns (Table 7), the frequencies were computed by first identifying the most frequent (i.e., dominant) pattern in each project, and then counting the number of + + +9http://gplv3.fsf.org/wiki/index.php/Compatible licenses +times a specific pattern is the most frequent across the whole dataset. For instance, the \textit{GPL-1.0+} \rightarrow \textit{GPL-3.0+} pattern is the most frequent in 36 projects from our dataset. Table 7 summarizes the most common local migrations. The migrations appear to be toward a less restrictive license or license version. The low frequency of the \textit{atomic license change} local patterns indicates that migrating licenses is non-trivial. It can also introduce problems with respect to reuse. For example, we observed a single project where \textit{GPL-1.0+} code was changed to \textit{LGPL-2.0+} a total of nine times. \textit{LGPL} is less restrictive than \textit{GPL}, when the code is used as a library. Thus, if parts of the system are \textit{GPL}, the developer must comply with the more restrictive and possibly incompatible constraints. + + +Until now, we considered \textit{atomic license changes} among any file in the repository. This was needed since most of the analyzed projects lack of a specific file (e.g., license.txt) declaring the project license. To extract the declared project license, we considered a file in the top level directory named: \textit{license}, \textit{copying}, \textit{copyright}, or \textit{readme}. When just focusing on projects including such files, we extracted 24 different change patterns. Table 8 illustrates the top eight licensing changes between particular licenses (i.e., we excluded no license or unknown license from this table) for declared project licenses. We only considered the top eight, since there was a tie between five other patterns or the next group of change patterns. We observe that the change from \textit{Apache-2.0} \rightarrow \textit{MIT/X11} was the most prevalent license change pattern, and the co-license of \textit{MIT/X11} with \textit{Apache-2.0} is the second most prevalent one. Interestingly, this pattern was not dominant in our file-level analysis, although the Grounded Theory analysis provided us support for this pattern. The \textit{MIT/X11} license + + +| Pattern | Frequency | +|---------|-----------| +| GPL-2.0+ \rightarrow GPL-3.0+ | 36 | +| GPL-2.0+ \rightarrow LGPL-3.0+ | 15 | +| LGPL-3.0+; Apache-2.0 \rightarrow Apache-2.0 | 12 | +| GPL-3.0+; Apache-2.0 \rightarrow Apache-2.0 | 12 | +| GPL-2.0+ \rightarrow LGPL-2.1+ | 10 | +| GPL-1.0+ \rightarrow LGPL-2.0+ | 9 | +| GPL-2.0+ \rightarrow GPL-3.0+ | 9 | +| GPL-3.0+ \rightarrow Apache-2.0 | 8 | +| GPL-3.0+ \rightarrow GPL-2.0+ | 8 | +| GPL-3.0+ \rightarrow LGPL-3.0+ | 8 | + + +| Pattern | Frequency | +|---------|-----------| +| Apache-2.0 \rightarrow MIT/X11 | 12 | +| Apache-2.0 \rightarrow MIT/X11; Apache-2.0 | 8 | +| GPL-2.0+ \rightarrow GPL-3.0+ | 7 | +| MIT/X11 \rightarrow Apache-2.0 | 6 | +| GPL-3.0+ \rightarrow Apache-2.0 | 6 | +| MIT/X11; Apache-2.0 \rightarrow Apache-2.0 | 5 | +| Apache-2.0 \rightarrow GPL-3.0+ | 5 | +| GPL-3.0+ \rightarrow MIT/X11 | 3 | +was used to allow commercial reuse, while still maintaining the open source nature of the project. + + +The pattern of $\text{GPL-2.0+} \rightarrow \text{GPL-3.0+}$ (Top-3 in Table 8) was expected since it was tied for the most prevalent among global atomic license changes. Similarly, the patterns of $\text{MIT/X} \rightarrow \text{Apache-2.0}$, $\text{GPL-3.0+} \rightarrow \text{Apache-2.0}$, and $\text{Apache-2.0} \rightarrow \text{GPL-3.0}$ were also among the top eight global changes. Another notable observation is that license changes are frequently happening toward permissive licenses. Excluding the five changes from $\text{Apache-2.0} \rightarrow \text{GPL-3.0+}$, the remaining changes for the top eight are either a licensing change from a restrictive (or copyleft) license to a permissive license or a licensing change between two different permissive licenses. + + +Summary for RQ 2 + The key insight from the analysis of atomic license change patterns observed on the studied Java projects is that the licenses tend to migrate toward less restrictive licenses. +---------------------------------------- +------------------------------- +Section 83: +4.3 RQ 3: to What Extent are Licensing Changes Documented in Commit Notes or Issue Tracker Discussions? + + +Table 9 reports the results of the identification of traceability links between licensing changes and commit messages/issue tracker discussions. We found a clear lack of traceability between license changes in both the commit message history and the issue tracker. In both data sources, we first extracted the instances (i.e., commit messages and issue tracker discussion comments). + + +| Data source | Linking query | Links | +|-------------|-------------------------------------------------------------------------------|--------| +| Commit | Commits with the keyword “license” | 70,746 | +| Messages | Commits containing new license name | 519 | +| | Commits containing new license name and the keyword “license” | 399 | +| Issue | Comments from closed issues containing the keyword “license” | 0 | +| Tracker | Comments from closed issues containing the new license | 0 | +| Comment | Comments from closed issues containing the new license and the keyword “license” | 0 | +| Matching | Comments from open issues containing the keyword “license” | 68 | +| | Comments from open issues containing the new license | 712 | +| | Comments from open issues containing the new license and the keyword “license” | 16 | +| Issue | Closed comments opened before license change and closed before or at license change | 197 | +| Tracker | Open comments open before the license change | 2,241 | +| Date-based | Comments from closed issues open before the license change and closed before or at the license change with keyword “license” | 0 | +| Matching | Comments from open issues open before the license change with keyword “license” | 0 | +| Issue | Comments in closed issues containing the keyword “Fixed #[issue_num]” | 66,025 | +| | Comments in open issues containing the keyword “Fixed #[issue_num]” | 3,407 | +| Commit | Comments in closed issues containing the commit hash where the license change occurs | 0 | +| Matching | Comments in open issues containing the commit hash where the license change occurs | 1 | +discussions) where the keyword “license” appears or where a license name was mentioned (e.g., “Apache”). In the former case, we are identifying potential commits or issues that are related to licensing, while the latter attempts to capture those related to specific types of licenses. + + +By using the first approach, we retrieved 70,746 commits and 68 issues; while looking for license names, we identified 519 commits and 712 issues. However, these numbers are inflated by false positives (e.g., “Apache” can relate to the license or it can relate to one of the Apache Software Foundation’s libraries). For this reason, we then looked for commit messages and issue discussions containing both the word “license” as well as the name of a license. This resulted in a drop of the linked commit messages to 399 and in zero issue discussions. Such results highlight that license changes are rarely documented by developers in commit messages and issues. + + +We also investigated whether relevant commits and issues could be linked together. We linked commit messages to issues when the former explicitly mentions fixing a particular issue (e.g., “Fixed #7” would denote issue 7 was fixed). We observed that this technique resulted in a large number of pairs between issues and commits; thus, our observation of a lack of license traceability is not simply an artifact of poor traceability for these projects. To further investigate the linking, we extracted the commit hashes where a license change occurred and attempted to find these hashes in the issue tracker’s comments. Since the issue tracker comments contain the abbreviated hash, we truncated the hashes appropriately prior to linking. Our results indicated only one match for an open issue and zero matches for closed issues. + + +Finally, we attempted to link changes to issues by matching date ranges of the issues to the commit date of the license change. The issue had to be open prior to the change and if the issue had been closed the closing date must have been after the change. However, we did not find any matches with a date-based approach. + + +Summary for RQ 3 + For the analyzed Java projects, both the issue tracker discussions and commit messages yielded very minimal traceability to license changes, suggesting that the analysis of licensing requires fine-grained approaches analyzing the source code. +---------------------------------------- +------------------------------- +Section 84: +4.4 RQ 4: What Rationale do These Sources Contain for the Licensing Changes? + + +In this section, we firstly present the taxonomy that resulted from the open coding of commit messages and issue tracker discussions. As explained in Section 3, this analysis has been performed on 1,637 commit messages and 486 issue tracker discussions from 1,160 projects written in seven programming languages, and aims at modeling the rationale of license adoption and changes. Secondly, we present our findings when looking at the commits that introduce atomic license changes in the analyzed Java projects. +---------------------------------------- +------------------------------- +Section 85: +4.4.1 Analyzing Commit Messages and Issue Discussions + + +Table 10 reports the categories obtained in the open coding process. In total, we grouped commit messages and issue tracker discussions into 28 categories, and organized them into seven groups that will be described in detail in the rest of this section. Additionally, 430 commits and 161 issue discussions identified by means of pattern matching as potentially related to licensing were classified as false positives. This is mainly due to the wide range of matching keywords that we used for our filtering (see Section 3) to identify as many commits/issues as possible. Finally, for 16 commits and two issue discussions that were +Table 10 Categories defined through open coding for the Issue tracker discussion comments and Commit notes + + +| Category | C | C++ | C# | Java | Javascript | Python | Ruby | Overall | +|---------------------------|-----|------|-----|------|------------|--------|------|---------| +| | I C | I C | I C | I C | I C | I C | I C | I C | +| Generic license additions | | | | | | | | | +| Choosing license | 1 | 0 | 0 | 0 | 0 | 6 | 0 | 2 | 0 | 1 | 0 | 11 | 0 | +| License added | 1 | 22 | 3 | 19 | 0 | 15 | 25 | 75 | 22 | 34 | 9 | 34 | 1 | 33 | 59 | 232 | +| License change | | | | | | | | | +| License change | 2 | 14 | 1 | 8 | 1 | 5 | 3 | 14 | 4 | 9 | 2 | 6 | 2 | 18 | 15 | 74 | +| License upgrade | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | +| License rollback | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 3 | +| Removed licensing | 0 | 3 | 0 | 3 | 0 | 4 | 0 | 6 | 1 | 8 | 0 | 2 | 0 | 3 | 1 | 29 | +| Changes to copyright | | | | | | | | | +| Copyright added | 0 | 6 | 0 | 3 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 15 | +| Copyright update | 2 | 24 | 0 | 7 | 1 | 6 | 5 | 89 | 2 | 7 | 2 | 4 | 1 | 8 | 13 | 138 | +| License fixes | | | | | | | | | +| Link broken | 7 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 16 | 0 | 1 | 0 | 19 | 0 | 46 | 0 | +| License mismatch | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | +| Fix licensing | 4 | 2 | 0 | 1 | 0 | 2 | 1 | 3 | 2 | 0 | 0 | 1 | 2 | 1 | 9 | 10 | +| License file modification | 0 | 11 | 0 | 8 | 0 | 14 | 0 | 0 | 1 | 11 | 1 | 7 | 1 | 29 | 3 | 80 | +| Missing licensing | 1 | 1 | 0 | 0 | 0 | 3 | 2 | 0 | 7 | 0 | 12 | 0 | 4 | 1 | 26 | 5 | +| License compliance | | | | | | | | | +| Compliance discussion | 1 | 9 | 0 | 5 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 0 | 2 | 20 | +| Derivative work inconsistency | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | +| Add compatible library | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 3 | 3 | +| Removed third-party code | 3 | 13 | 1 | 8 | 0 | 1 | 0 | 1 | 0 | 2 | 0 | 4 | 0 | 3 | 4 | 32 | +| License compatibility | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | +| Reuse | 1 | 1 | 1 | 0 | 0 | 17 | 0 | 1 | 0 | 10 | 0 | 1 | 0 | 0 | 21 | 1 | +| Dep. license added | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | +| Dep. license issue | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | +| Clarifications/Discussions| | | | | | | | | +| License clarification | 2 | 0 | 2 | 1 | 1 | 0 | 19 | 0 | 2 | 1 | 4 | 0 | 2 | 0 | 32 | 2 | +| Terms clarification | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | +| Verify licensing | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | +| License agreement | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | +| Request for a license | | | | | | | | | +| Licensing request | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 6 | 0 | 11 | 0 | +| License output for the end user | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | + + +related to licensing, it was not possible, based on the available information, to perform a clear categorization. Thus, they were excluded from this study. + + +In the following, we discuss examples related to the various groups of categories. +Generic License Additions This group of categories concerns cases in which a license was added in a file, project or component where it was not present, as well as discussions related to choosing the license to be added in a project. One typical example of commit message, related to the very first introduction of a software license into the repository, mentioned: + + +“Added a license page to TARDIS.” (https://github.com/tardis-sn/tardis/commit/07b2a072d89d45c386d5f988f04435d76464750e) + + +Other commit messages falling in this category were even precise in reporting the exact license committed into the repository, e.g.,: + + +“Add MIT license. Rename README to include rst file extension.” (https://github.com/Schevo/schevorecipe.db/commit/b73bef14adeb7c87c002a908384253c8f686c625) + + +Finally, commit messages automatically generated by the GitHub’s licensing feature were present, e.g.,: + + +“Created LICENSE.md.” + + +While commit messages show the addition of a license to a project, they do not provide the rationale behind the specific choice. This can be found, sometimes, in the discussions carried out by the developers in the issue trackers to establish the license under which their project would be released. For example, one of the issue discussions we analyzed was titled “Add LICENSE file” (https://github.com/rosedu/web-workshops/issues/1) in the project web-workshops, and the issue opener explained the need for (i) deciding the license to adopt and (ii) involve all projects’ contributors in such a decision: + + +“A license needs to be chosen for this repo. All contributors need to agree with the chosen license. A list of contributors is enclosed below.” + + +Doubts and indecision about which license to adopt were also evident in several of the issue discussions that we manually analyzed: + + +“What license to use? BSD, GNU GPL, or APACHE?” (https://github.com/kovmarci86/d3-armory/issues/5) + + +Interestingly, one developer submitted an issue for the project InTeX entitled “Dual license under LGPL and EPL” (https://github.com/mtr/intex/issues/1) that related to adding a new license to balance code reuse of the system, while avoiding “contagious” licensing (the term “contagious” was used by the original developer of the system). The developer commented: + + +“Your package is licensed under GPL. I’m not a lawyer but as far as I understand the intention of the GPL, all LaTeX documents compiled with the InTeX package will have to be made available under GPL, too. [...] I think, you want users to publish changes they did at your code. A dual license under LGPL and EPL would ensure that a) changes on your code have to be published along with a binary publication and b) that your code can be used in GPL and non-GPL projects. See JGraphT’s relicensing for more background.” + + +This response demonstrates a potential lack of understanding regarding the license implications to compiled LaTeX and proposes dual-licensing as a solution. However, the original +developer also indicates a lack of legal background and is not willing to offer a dual-license based on his understanding stating: + + +"Thank you for your interest. I not a lawyer myself either, but my intentions are: + + + + + + +I want changes to the source code of InTeX to be made available so that others can benefit from them too. + + + + + + +I do not want any “contagious” copyright of documents compiled with InTeX. However, I’ve always thought of InTeX as a (pre)compiler, and given this GPL FAQ answer, I think licensing the compiler’s source code under GPL does not limit or affect the copyright of the documents it is used to process. + + + + + + +Unless you can prove me wrong about this, I will close this issue." + + +Thus, the developer responds by providing his understanding of the GPL by referencing a response by GNU regarding compiled Emacs. However, the developer does indicate an openness to adding a new license if the GPL would in fact be applied to generated LaTeX documents. This example is particularly interesting, since it shows the original developer’s rationale for picking the GPL as well as the difficulty that developers have with respect to licensing. + + +License Change + This group of categories concerns cases in which (i) a licensing statement was changed from one license towards a different one; (ii) a license was upgraded towards a new version, e.g., from GPL-2.0 to GPL-3.0; (iii) cases of license rollback (i.e., when a license was erroneously changed, and then a rollback to the previous license was needed to ensure legal compliance); and (iv) cases in which for various reasons developers removed a previously added license. + + +Most commit messages briefly document the performed change, e.g., “Switched to a BSD-style license”, “Switch to GPL”. Some others, partially report the rationale behind the change: + + +“The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software” + + +The commit message explains that permission has been granted for the license change by the NetBSD Foundation. However, the committer does not explain the reason for the removal of the two clauses. Other commits are instead very detailed in providing full picture of what happened in terms of licensing: + + +“Relicensed CZMQ to MPLv2 - fixed all source file headers - removed COPYING/COPYING.LESSER with GPLv3 and LPGv3 + exceptions - added LICENSE with MPLv2 text - removed ztree class which cannot be relicensed - (that should be reintroduced as foreign code wrapped in CZMQ code).” + + +The commit message from the project CZMQ (https://github.com/zeromq/czmq/commit/eabe063c2588cde0af90e5ae951a2798b7c5f7e4) is very informative, reporting the former license (i.e., GPL-3.0 and LGPL-3.0), the new license (i.e., MPL-2.0), and the changes applied in the repository to ensure compliance to the new licensing terms (e.g., the removal of the ztree class). This license change demonstrates a move towards a more permissive license, which has been shown to be prevalent in our study of Java projects. + + +10http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#CanIUseGPLToolsForNF +We also found commit messages reporting the rationale behind specific license changes, such as the following commit from the project nimble (https://github.com/bradleybeddoes/nimble/commit/e1e273ff18730d2f8e0d7c2af1951970e676c8d1): + + +“Change in project License from AGPL 3.0 to Apache 2.0 prior to first public release. Several factors influenced this decision the largest being community building and making things as easy as possible for folks to get started with the project. We don’t however believe Open Source == Free and will continue to investigate the best way to commercialize this. Restrictive copy-left licenses aren’t however the answer.” + + +While the developers want to enable external developers to reuse the system, they are also interested in commercializing the software product. The developers acknowledge that copy-left licenses do not meet their needs. + + +For License Rollback, we observed that the project PostGIS reverted back licensing to a custom license (https://github.com/postgis/postgis/commit/4eb4127299382c971ea579c8596cc41cb1c089bc). The commit does not offer rationale since it simply states: + + +“Restore original license terms.” + + +From the analysis of the commit emerged that the author had re-licensed the system under the GPL earlier and subsequently reverted back the licensing to his own custom license. However, it is not clear if this rollback was due to a misappropriation of GPL, an incompatibility in the system, or to other factors. + + +Additionally, we found commit messages illustrating that license removals do not necessarily indicate that the licensing of the system was removed. For instance: + + +“Removing license as it is declared elsewhere” (https://github.com/ros/ros_comm/commit/e451639226e9fe4eebc997962435cc454687567c) + + +“Remove extra LICENSE files +One repository, one license. No need to put these on the box either.” (https://github.com/openatv/enigma2/commit/b4dfdf09842b3dcacb2a6215fc040f7ebbbb3c03) + + +“Remove licenses for unused libraries” (https://github.com/ttop/cuanto/commit/a1e58f2c93de40ab304c494e05853957c549fd44) + + +In these cases, the system contains redundant or superfluous license files that can be removed. This observation highlights that strictly analyzing the license changes that have happened in the history of a software system could (wrongly) suggest that the system has migrated toward closed-source. The third commit message, instead, indicates that licenses were removed due to unused code, which required those licenses. Such cases, in which a project is adopting unnecessary licenses due to third-party libraries no longer needed, should be carefully managed since it may discourage other developers to reuse the project, especially if the unnecessary licenses are restrictive. + + +Changes to Copyright This group of categories includes commits/issues related to simple changes/additions applied to the copyright statement, like copyright year, or authors. Changes to a list of author names occur to indicate names of people who provided a substantial contribution to the project, therefore claiming their ownership. Previous work indicated that often such additions occur in correspondence of large changes performed by contributors whose names are not mentioned yet in the copyright statement (Penta and Germán +Changes to copyright years have also been previously investigated, and are often added to allow claiming right on source code modified in a given year (Di Penta et al. 2010). + + +License Fixes + This group of categories is related to changes in the license mainly due to various kinds of mistakes or formatting issues, as well as to cases in which a licensing statement was accidentally missing (note that this is different to cases of license addition in which the license was originally intended to be absent from the project). + + +For example, in this group, we observed cases of issues discussing +license mismatch +, where developers found conflicting headers or conflicts between the declared license and the license headers. In the former case, a developer posted an issue to the project +gtksourcecompletion +’s issue tracker (https://github.com/chuchiperriman/gtksourcecompletion/issues/1): + + + + +“The license states that this is all LGPL-3, but the copyright headers of the source files say otherwise (and some are missing). Is this intentional, or should these all be under the same license? I’ve included licensecheck output below.” + + + + +Subsequently, the issue poster listed files in the system with +GPL +, +LGPL +, and no copyright. Additionally, he indicated cases where the Free Software Foundation address was incorrect as well. We observed a similar situation in another project: a developer opened the issue “LICENSE file doesn’t match license in header of svgeezy.js” (https://github.com/benhowdle89/svgeezy/issues/20) to svgeezy’s issue tracker and stated: + + + + +The LICENSE file specifies the MIT license, but the header in svgeezy.js says it’s released under the WTFPL. Which is the correct license? + + + + +In this second case, we observe that the declared license and source header are not consistent. However, the issue has not been resolved at the time of writing this paper and so we cannot report the resolution or any feedback offered by the original developers of the system. + + +Other interesting cases are the ones related to the fix of +missing licenses +. Often developers are made aware of missing licenses via the issue tracker by projects’ users reporting the issue. Sometimes, the complete project may be unlicensed, leading to discussions like the one titled “GNU LGPL license is missing” from the project +rcswitch-pi + (https://github.com/r10r/rcswitch-pi/issues/17): + + + + +Under which license is this source code published? This project is heavily based on wiring-pi and rc-switch: rc-switch: GNU Lesser GPL wiring-pi: GNU Lesser GPL +The GNU Lesser GPL should be added: http://www.gnu.org/licenses/lgpl.html + + + + +Based on the project’s characteristics (i.e., its foundations on previously existing projects), the developer recommends the addition of the missing LGPL license. + + +The commits and issues falling in the +License File Modification + category are related to changes applied to the license file type or name. For example, developers may change the license file from the default +LICENSE.md + file generated by GitHub to a +.txt + or +.rtf +. Additionally, developers change the file name often to make it more meaningful as illustrated in this commit message of the project +Haml + (https://github.com/haml/haml/commit/537497464612f1f5126a526e13e661698c86fd91): + + + + +“Renamed the LICENSE to MIT-LICENSE so you don’t have to open the file to find out what license the software is released under. Also wrapped it to 80 characters because I’m a picky [edited]” (quote edited for language) +Other typical changes concern the renaming of the +COPYRIGHT + file to +LICENSE + or the move of the license file in the project’s root directory. These cases do not indicate changes towards a different license or in general any change to the license semantics, but only in the way in which the license is presented. + + + + +License Compliance + This group of categories is probably the most interesting to analyze, and concerns categories related to discussions and changes because of license compliance. Specifically, other than generic compliance discussions, there are cases in which (i) a derivative work’s legal inconsistency was spotted or discussed; (ii) a compatible library is added to replace another incompatible library from a licensing point of view; (iii) third-party code is completely removed when no legally-compliant alternative was possible; (iv) cases of discussion related to license compatibility in the context of reuse; and (v) cases in which an added dependency or an existing dependency has conflicts with the current license. + + +A very interesting example is the issue discussion entitled “Using OpenSSL violates GPL licence” in the project SteamPP (https://github.com/seishun/SteamPP/issues/1). Surprisingly, the developer of the project initially commented: + + + + +gnutls and libnss have terrible documentation and I don’t consider this a priority issue anyway. If you would like to submit a pull request, then be my guest. + + + + +Despite this initial reaction, the OpenSSL library was replaced by Crypto++ within a week in order to meet the licensing requirements. + + +Examples of third-party libraries removed due to licensing issues are also prevalent in commit messages, e.g., : + + + + +“Remove elle(1) editor, due to an incompatible license.” (https://github.com/booster23/minixwall/commit/342171fa9e9d769ce4aa48525142a569b34962f7) + + + + +The incompatibility in this case was due to +elle +’s clause explicitly reporting: “NOT be sold or made part of licensed products.”. Additionally, we saw the commit from the project wkhtmltopdf-qt-batch, where files were removed due to a recommendation by the project’s legal staff: “Remove some files as instructed by Legal department” (https://github.com/alexkoltun/wkhtmltopdf-qt-batch/commit/9b142a07a7576afa15ba458e97935aac5921ef8d). This shows that license compliance may not be always straightforward to developers and that they may need to rely on legal council in order to determine whether licensing terms have been met. + + +We also observed changes in the system’s licensing aimed at satisfying compliance with third-party code in the project gubg (https://github.com/gfannes/gubg.deprecated/commit/4d291ef433f0596dbd09d5733b25d27b3a921cf4): + + + + +Changed the license to LGPL to be able to use the msgpack implementation in GET nv. + + + + +Similarly, we found issue tracker discussions about conflicting licenses or about the compatibility of licenses between the project and third-party libraries. Interestingly, there was an issue opened by a non-contributor of the project android-sensorium (https://github.com/fmetzger/android-sensorium/issues/11), stating: + + + + +Google Play Services (GMS) is proprietary, hence not compatible with GNU LGPL. (The jar inside the Android library referred to in the project.properties). + + +F-Droid.org publishes the ....o3gm package, but we cant publish this without removing this library. +Thus, the license incompatibility not only created a potential license violation for the project but also prevented the non-contributor from cataloging the system among projects hosted on F-Droid (https://f-droid.org/), a well-known forge of open source Android apps. + + + + +Additionally, we observed issues related to reuse, where one contributor suggests a dual license to allow for greater reuse in other applications. The contributor of the project python-hpilo (https://github.com/seveas/python-hpilo/issues/85) stated, + + +Due to incompatibility between GPLv3 and Apache 2.0 it is hard to use python-hpilo from, for instance, OpenStack. It would therefore be helpful if the project code could also be released under a more permissive license, like for instance Apache 2.0 (which is how OpenStack is licensed) + + +The other contributors subsequently utilized the thread to vote and ultimately agreed upon the dual license. Not only does this example indicate the consideration for reuse but it also demonstrates that licensing decisions are determined by all copyright holders and not by a single developer. It is also important to note that GPL-3.0 and Apache-2.0 are not considered incompatible by the Free Software Foundation. + + +Conversely, we also observed an interesting discussion in which the issue posted in the project patchelf (https://github.com/NixOS/patchelf/issues/37) asked “Is it possible for you to change GPL to LGPL? It would help me using your software.”. The developer posting the question was developing a system licensed under the BSD license with which GPL would not be compatible. A contributor refused to change licensing by stating: “GPL would not be compatible”. Moreover, one of the contributors explained that changing licensing is non-trivial by responding: + + +It wouldn’t be easy to change the license, given that it contains code from several contributors, who would all need to approve of the change. + + +Again, this response highlights the importance for all contributors to approve a license change. However, reaching an agreement among all contributors might be far from trivial, due to personal biases developers could have with respect to licensing (Vendome et al. 2015b). + + +We also observed a case related to derivative work, where the license differed from the original system’s licensing (category: Derivative Work Inconsistency). A developer created the issue “Origin and License Issue” for the project tablib (https://github.com/kennethreitz/tablib/issues/114) to which he offered support, but first noted: + + +While tablib is MIT-licensed, there are several potential provenance and license issues with Oo, XLS and XLSX formats that tablib embeds. I have collected some of these potential issues here. This is at best ... byzantine. [...] https://bitbucket.org/ericgazoni/openpyxl/ is reported as being derived from PHPExcel which is LGPL-licensed at https://github.com/PHPOffice/PHPExcel but openpyxl is not LGPL but MIT-licensed. If this is really derived then there is a possible issue as the license may be that of the original not of the derivative. + + +The issue poster lists the various components used with their licensing to point out incompatibility issues, and in particular those related to the derivative code that the system utilizes. + + +Clarifications/Discussions + This group of categories contains issues related to clarifying the project’s licensing, the terms or implications of the licensing, and the agreement between +contributors made in a Contributor License Agreement (CLA). License Clarification were about the actual license of the project and typically occurred when the system did not contain a license file (i.e., a declared project license). For example, one project’s user created the issue “Please add a LICENSE file” for the Mozilla’s project 123done (https://github.com/mozilla/123done/issues/139) stating: + + +The repo is public, but it’s not easy to find out how I’m allowed to use or share the code. + + +Could you add a LICENSE file to make it easier for users to understand how you’d like it to be used? + + +Similarly, another project, pyelection, has the issue “What license is this code released under?” (https://github.com/alex/pyelection/issues/1) with no further comments from the poster. Thus, we observe that developers use the issue tracker as a mean to understand the licensing and request an explicit licensing file. + + +Another surprising issue discussion is related to understanding the terms of a license. The issue was posted to the neunode’s issue tracker (https://github.com/snakajima/neunode/issues/5) by an external developer looking to reuse the code and asked: + + +We are impressed with what you’ve done with neu.Node and are interested in using it for offline mapping applications. However, we work at a company that has more than 1M$ in revenue. Your license terms say MIT for companies with less than 1M$ in revenue (which is not an approach I’ve seen before). Please could you clarify the license terms for a company that is larger that that? We’re trying to make some decisions on our direction at the moment, so a quick response would be appreciated if possible. + + +Interestingly, the license terms set conditions based on the money value of the company looking to reuse the code. In this case, the external developer’s company exceeds the threshold. The original developer indicates that his software is intended to benefit the developer community as a whole, and more specifically students and individuals. The original developer gave two options: (i) a large check without maintenance support, or (ii) detail descriptions of the product, a compelling argument for giving a free license to reuse the system, and acknowledgment in the description. Thus, the original developer is not interested to financial gain (though, he could reasonably be convinced at the right price), but rather wants to support the open source community and receive credit for his work. + + +We identified a category of License Agreement. This scenario arises when an external developer to the project submits some code contribution to the project, and the project contributors require that developer to complete a Contributor License Agreement (CLA) to avoid licensing/copyright disputes. We observed a discussion related to updating the textual information of the project’s CLA with respect to country designations (http://github.com/adobe/brackets/issues/8337). Similarly, in our previous Java study (Vendome et al. 2015a), a developer submitted a patch, but it could not be merged into the system until that developer filled out the CLA (https://github.com/FasterXML/jackson-module-jsonSchema/issues/35). A CLA makes it explicit that the author of a contribution is granting the recipient project the right to reuse and further distribute such contribution (Brock 2010). Thus, it prevents the contributed code from becoming a ground for a potential lawsuit. + + +Request for a License This group contains issue discussions in which a developer asks for a license or a license file. While these are similar to reuse, it differs since the developers +do not necessarily state that they want to reuse the system, since it is possible that they want to contribute as well. Thus, these are more generic requests for the developer to attribute a license to the system without explaining the reason for such a request. For example, we found the issue titled “No license included in repository” for the project jquery-browserify (https://github.com/jmars/jquery-browserify/issues/20) in which the poster commented: + + +Would you consider adding a license to the repository? It’s currently missing one and according to TOS. + + +[Not posting a license] means that you retain all rights to your source code and that nobody else may reproduce, distribute, or create derivative works from your work. This might not be what you intend. + + +Even if this is what you intend, if you publish your source code in a public repository on GitHub, you have accepted the Terms of Service which do allow other GitHub users some rights. Specifically, you allow others to view and fork your repository. + + +If you want to share your work with others, we strongly encourage you to include an open source license. + + +If you don’t intend on putting a license up that’s fine, but if you do want to use an open source license please do so. I’d be happy to fork/PR for you if you just let me know which license you want to put in (MIT/BSD/Apache/etc.) + + +This comment demonstrates that licensing also impacts derivative work and can prevent other developers from contributing to a system. This is an important distinction, since findings and prior work (Vendome et al. 2015a, b; Sojer and Henkel 2010) demonstrate that licensing could be an impediment to reuse and not an impediment to contribute towards a project/system. + + +License Output for the End User This category describes a unique case where an issue was posted regarding the output of the license to the end user. The issue stated: + + +“This output could be read by monitoring tools, for example to automatically warn about expiration (although Phusion also emails expiration warnings, the desired upfront time for the warning is not configurable like that).” (http://github.com/phusion/passenger/issues/1482) + + +Unlike the previous categories, this issue relates to end user licensing the software. The contributor of the system suggests the inclusion of a feature to aid in monitoring the license expiration. Interestingly, this category shows that developers also consider licensing from the impact on the “client” using the system. This aspect of understanding the impact of licensing on the “client” or end user has also been unexplored in prior studies. + + +4.4.2 Analysing Commits Implementing Atomic License Changes in Java Systems + + +In this analysis, we specifically targeted commit messages where a licensing change occurred so that we could understand the rationale behind the change. We did not apply a keyword for these commit messages since we knew they were commits related to changes in licensing. When reading these commits, we also included the atomic license change pattern that was observed at that particular commit to add context. We observed new support for the existing categories and the results are reported in Table 11. We refer to new support as commit messages indicating new rationale for the existing categories. +Table 11 Categories defined through open coding for the commit messages in which a license change occurred + + +| Category | Commits | +|--------------------------------|---------| +| Generic license additions | | +| Choosing license | 0 | +| License added | 63 | +| License change | | +| License change | 9 | +| License upgrade | 1 | +| License rollback | 1 | +| License removal | 19 | +| Changes to copyright | | +| Copyright added | 0 | +| Copyright update | 1 | +| License fixes | | +| Link broken | 0 | +| License mismatch | 0 | +| Fix missing licensing | 9 | +| License file modification | 0 | +| Missing licensing | 1 | +| License compliance | | +| Compliance discussion | 0 | +| Derivative work inconsistency | 0 | +| Add compatible library | 0 | +| Removed third-party code | 1 | +| License compatibility | 0 | +| Reuse | 0 | +| Dep. license added | 0 | +| Dep. license issue | 0 | +| Clarifications/Discussions | | +| License clarification | 0 | +| Terms clarification | 0 | +| Verify licensing | 0 | +| License agreement | 0 | +| Request for a license | | +| Licensing request | 0 | +| License output for the end user| | +| Output licensing | 0 | + + +As for the License Change group of categories, we observed general messages indicating a license change occurred and in some cases explicitly stating the new license, such as the following commit messages: + + +“Rewrite to get LGPL code.” + + +“Changed license to Apache v2” +These two commit messages do not offer rationale, but they at least indicate the new license that has been attributed to the system. So, a developer inspecting the change history would be able to accurately understand the particular license change. + + +Since we observed many instances of no license $\rightarrow$ some license, the prevalence of License Added was expected. However, these License Added commit messages resembled the License Change messages since they often did not include a clear rationale (i.e., while being part of the License Added category, their level of detail was similar to the License Change category). For example, a developer asserted the Apache-2.0 license to the headers of the source files across his project, but his commit message simply stated: + + +“Enforce license” + + +In the case of License Removal, we observed that licenses were removed due to code clean up, files deletion, and dependencies removal. For example, we observed the removal of the GPL-2.0 license with the following commit message, + + +“No more smoketestclientlib” + + +It indicates the removal of a previously exploited library. Additionally, licenses were removed as developers cleaned up the project. + + +Fix Missing Licensing is related to a license addition, but it occurred when the author intended to license the file, but forgot either in the initial commit or in the commit introducing the licensing. For example, one commit message stated: + + +“Added missing Apache License header.” + + +This indicates that the available source code may inaccurately seem unlicensed. + + +Additionally, License Upgrade refers to license change, where the version of the license is modified to the most recent. In this particular case, we observed a change from GPL-2.0+ to GPL-3.0+. The commit message stated: + + +“...Change copyright header to refer to version + 3 of the GNU General Public License and to point readers at the + COPYING3 file and the FSF’s license web page.” + + +While the commit message describes the version change, it does not supply rationale. Instead, the message is a log of the changes. + + +An important observation from the second round of our analysis was the ambiguity of commit messages. For example, we observed a commit classified as Copyright Update stating, + + +“Updated copyright info.” + + +However, this commit corresponded to a change in licensing from GPL-2.0 to LGPL-2.1+. This case both illustrates the lack of detail offered by developers in commit messages, and it illustrates that an update can be more significant than adding a header or changing a copyright year. + + +Since we sampled commits from all Java projects, it was infeasible to sample a larger representative number of commit messages. Thus, augmenting the second round by considering commits in which an atomic license change occurred benefited the taxonomy by targeting relevant commits better. However, we were able to sample statistically representative sample sizes in this work due to pre-filtering the projects. The results corroborate the representativeness, since we observed the same categories. +Another important observation that appears to support the supposition from our traceability analysis that developers remove licensing related issues from the issue tracker is that we found links that were removed in the period of time between our crawling and our data analysis. These were categorized as +Link Broken + and amounted to 45 of the overall issues. It is also possible that these cases represent developers that utilize external bug tracking systems as well. + + +Summary for RQ 4 + While our open coding analysis, based on grounded theory, indicated some lack of documentation (e.g., prevalence of false positives) and poor quality in documentation with respect to licensing in both issue tracker discussion and commits notes, we formally categorized the available rationale. We also found that the rationale may be incomplete or ambiguously describe the underlying change (e.g., “Updated copyright info” representing a change between different licenses). Finally, we observed that issue trackers also served as conduits for project authors and external developers to discuss licensing. +---------------------------------------- +------------------------------- +Section 86: +5 Lessons and Implications + + +The analysis of the commit messages and issue tracker discussions highlighted that the information offered with respect to licensing choice/change is very often quite limited. A developer interested in reusing code would be forced to check the source code of the component to understand the exact licensing or to ask for clarification (using the issue tracker, for example). Additionally, the reason behind the change is not usually well documented. This detail is particularly important when a system uses external/third-party libraries since a license may change during the addition or removal of those libraries. + + +An important observation from our open coding analysis also stresses the need for better licensing traceability and aid in explaining the license grants/restrictions. We found several instances in which the issue tracker was used to ask for clarifications regarding licensing by external developers who sought to reuse the code. For example, we observed that developers interpret the implications of licensing differently, which generates misunderstandings in terms of reuse. This suggests that code reuse is problematic for developers due to licensing. Therefore, our study demonstrates a need for clear and explicit licensing information for the projects hosted on a forge. + + +Similarly, we observed that external developers would request a license, since the projects appeared to be unlicensed; however, a subset of these requests were due to licensing being attributed in a different manner than external developers expected (e.g., part of the +gemspec + file for Ruby projects and not a +LICENSE + file). We also observed developers adding license files to parent directories as opposed to headers in the source code as well as appending the license name to the license file (e.g., +LICENSE + would be renamed +LICENSE.MIT +). This way of declaring a license is particularly used in GitHub project where the system asks the developer(s) to choose a license, when a project is created, and then it creates the +LICENSE + file in the project’s root directory. + + +These observations indicate a lack of standardization in how licensing is expressed among both projects in the same language and projects across different languages. It suggests that developers need a standardized mechanism to declare the license of a software project. Third-party tools or forges could support developers by maintaining this standardized documentation automatically. + + +Another important observation is the type of difficulty that developers have with the licensing of third-party code and the ways in which they achieve compliance. We observe in +both the issue discussions and commit messages that libraries are removed due to incompatible licensing terms. Conversely, libraries are also chosen due to the particular license of the source code. This feature can be important for open source developers that aim for a wide adoption of their systems. Their choice in licensing may directly impact the adoption. This suggests that the choice in licensing can directly impact the adoption of libraries. Therefore, we foresee that library/code recommenders based on open source code base should be license aware. This consideration applies, for example, to approaches recommending code examples or libraries by sensing the developers’ context (Cubranic et al. 2005; Holmes and Murphy 2005; Ponzanelli et al. 2013, 2014). In other words, on one hand the project’s license should be a relevant part of the context, on the other hand, the code search engines (e.g., Grechanik et al. 2010; McMillan et al. 2012a, b, c, 2011, 2013) should consider the target code license as a constraint in the search. + + +The lack of traceability of licensing changes is also important for researchers investigating software licensing on GitHub. While we cannot generalize to other features, it does suggest that commit message analysis may be largely incomplete with respect to details of the licensing-related changes made during that commit. One way to achieve this for developers is to take advantage of summarization tools such as ARENA (Moreno et al. 2014) and ChangeScribe (Cortés-Coy et al. 2014; Linares-Vásquez et al. 2015). While ARENA analyzes and documents licensing changes at release level, ChangeScribe automatically generates commit messages; however, using ChangeScribe would require extending it to analyze licensing changes at commit level. Another option is that forges (and software tools in general) verify that every file contains a license and that every project properly documents its license (this feature could be optional). In summary, it would greatly improve traceability between license changes and their rationale, and assert a consistency among the repositories. Also, it would be beneficial for developers using another project to be informed when a licensing change occurs. For example, a developer could mark specific projects as dependents and receive automated notifications when particular changes occur. This would be very beneficial with licensing since a change in the license of a dependency could result in license incompatibilities. + + +The open coding of commit messages and issue tracker discussions also suggests that commercial usage of code is a concern in the open source community. Currently, the MIT/X license and the Apache license seem to be the most prominent licenses for this purpose. Indeed, also the quantitative analysis of Java projects showed a trend towards the use of permissive licenses. The lack of a license is an important consideration in open source development, since it suggests that the code may in fact be closed source (or copyrighted by the original author). We observed such issues in discussions related to lack of licensing, since it hindered reuse. Indeed, sometimes developers initiate an open source project without attributing it a license. This is either because they lack a deep knowledge of the importance of the licensing on the possibility of (dis)allowing certain types of reuse for their code (Vendome et al. 2015b), but also because there is limited support in the task of choosing the most suitable license for a project. Existing tool support, such as Choose A License, helps users in choosing a license, but the tool is completely context-insensitive with respect to the constraints imposed. A better, context-sensitive tool support is provided in the Markos project (Bavota et al. 2014), but it mainly provides the list of compatible licenses for a given component. + + +11http://choosealicense.com +6 Threats to Validity + + +Threats to construct validity concern the relationship between theory and observation, and relate to possible measurement imprecision when extracting data used in this study. In mining the Git repositories, we relied on both the GitHub API and the git command line utility. These are both tools under active development and have a community supporting them. Additionally, the GitHub API is the primary interface to extract project information. We cannot exclude imprecision due to the implementation of such API. In terms of license classification, we rely on Ninka, a state-of-the-art approach that has been shown to have 95% precision (Germán et al. 2010b); however, it is not always capable of identifying the license (15% of the time in that study). For what concerns the open coding performed in the context of RQ4, we have identified, through a stratified sampling, a sample of commit messages and issue tracker discussions large enough to ensure an error of ±10% with a confidence level of 95%. Such a sample has been identified starting from candidate commit messages and discussions identified by means of pattern matching, using the keywords of Table 1. Although we aimed to build a comprehensive set of licensing-related keywords, it is possible that we missed licensing-related discussions not matching any of these keywords. + + +Threats to internal validity can be related to confounding factors, internal to our study, that could have affected the results. For the atomic licensing changes, we reduced the threat of having the project size as a confounding factor by representing the presences of a particular change at each commit. A license change typically is handled at a given instance and not frequency. By using commit-level analysis, we prevent the number of files from inflating the results so that they do not inappropriately suggest large numbers of changes occurred in a project. To analyze the changes across projects, we took a binary approach of analyzing the presence of a pattern. Therefore, a particular project would not dominate our results due to size. To limit the subjectiveness of the open coding, classifications were always performed by two of the authors, and then every case of discording classification was discussed as explained in Section 3.3. + + +Threats to external validity represent the ability to generalize the observations in our study. Our quantitative study is based on the analysis of over 16K Java projects. This makes us confident that our findings have a good generalizability for what concerns Java systems, while they cannot be extended to systems written in other programming languages. Our qualitative study has been performed instead on commit messages and issue discussions extracted from software systems written in seven different languages. However, the generalizability of our qualitative results is limited to the seven considered languages and it is supported by the relatively low number of considered systems (i.e., 1,160) due to the manual effort required for the identification of the rationale behind licensing decisions (as well as the limited number of potential repositories with license-related commit messages or issue discussions). + + +GitHub’s exponential growth and popularity as a public forge indicates that it represents a large portion of the open source community. While the exponential growth or relative youth of projects can be seen as impacting the data, these two characteristics represent the growth of open source development and should not be discounted. Additionally, GitHub contains a large number of repositories, but it may not necessarily be a comprehensive set of all open source projects or even all Java projects. However, the large number of projects in our dataset (and relatively high diversity metrics values as shown in Section 3.4) gives us enough confidence about the obtained findings. Further evaluation of projects +across other open source repositories (and other programming languages for the quantitative part) would be necessary to validate our observations in a more general context. It is also important to note that our observations only consider open source projects. Since we need to extract licenses from source code, we did not consider any closed source projects and we cannot assert that any of our results would be representative in closed source projects. +---------------------------------------- +------------------------------- +Section 87: +7 Conclusions + + +This paper reported an empirical study aimed at analyzing, from a quantitative and qualitative point of view, the adoption and change of licenses in open source projects hosted on GitHub. The study consists of (i) a quantitative part, in which we studied license usage and licensing changes in a set of 16,221 Java projects hosted on GitHub, and (ii) a qualitative analysis in which we analyzed commit messages and issue tracker discussions from 1,160 projects hosted on GitHub and developed using seven most popular programming languages (i.e., C, C++, C#, Java, Javascript, Python, and Ruby). + + +The quantitative analysis on the Java projects aimed at (i) providing an overview of the kinds of licenses being used over time by different projects, (ii) analyzing licensing changes, and (iii) identifying traceability links between licensing changes and licensing-related discussions. Results indicated that: + + +– New license versions were quickly adopted by developers. Additionally, new license versions of restrictive licenses (e.g., GPL-3.0 vs GPL-2.0) favored longer survival of earlier versions, unlike the earlier version of permissive licenses that seem to disappear; +– Licensing changes are predominantly toward or between permissive licenses, which ease some kind of derivative work and redistribution, e.g., within commercial products; +– There is a clear lack of traceability between discussions and related license changes. + + +The qualitative analysis was based on an open coding procedure inspired by grounded theory (Corbin and Strauss 1990), and aimed at categorizing licensing-related discussions and commits. The results indicate that: + + +– Developers post questions to the issue tracker to ascertain the project’s license and/or the implications of the license suggesting that licensing is difficult; +– There is a lack of standardization or consistency in how licensing is attributed to a system (both within the same programming language and across different programming languages), which causes misunderstandings or confusion for external developers looking to reuse a system; +– Developers, in general, do not supply detailed rationale nor document changes in the commit messages or issue tracker discussions; +– License compatibility can impact both the adoption and removal of a third-party library due to issues of license compliance. + + +This work is mainly exploratory in nature as it is aimed at empirically investigating license usage and licensing changes from both quantitative and qualitative points of view. Nevertheless, there are different possible uses one can make of the results of this paper. Our results indicate that developers frequently deal with licensing-related issues, highlighting the need for developing (semi)automatic recommendation systems aimed at supporting +license compliance verification and management. Additionally, tools compatible or integrated within the forge to support licensing documentation, change notification, education (i.e., picking the appropriate license), and compatibility would benefit developers attempting to reuse code. While working in this direction, one should be aware of possible factors that could influence the usage of specific licenses and the factors motivating licensing changes. This paper provides solid empirical results and analysis of such factors from real developers. + + +Future work in this area should aim at (i) extending the study by performing a larger quantitative and qualitative analysis on more projects, and (ii) performing a deeper investigation into the rationale for licensing changes, for example, by performing an analysis of dependencies in software projects and relating such analysis with the changes being performed. Last, but not least, as discussed in Section 5, it would be useful to incorporate licensing analysis into existing software recommender systems. Such recommenders could not only rely on the local project’s context, but also exploit rationale from previous licensing changes to produce recommendations. + + +Acknowledgments This work is supported in part by NSF CAREER CCF-1253837 grant. Massimiliano Di Penta is partially supported by the Markos project, funded by the European Commission under Contract Number FP7-317743. Any opinions, findings, and conclusions expressed herein are the authors’ and do not necessarily reflect those of the sponsors. + + +References + + +123done issue 139 https://github.com/mozilla/123done/issues/139 +android-sensorium issue 11 https://github.com/fmetzger/android-sensorium/issues/11 +Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S (2013) The evolution of project inter-dependencies in a software ecosystem: The case of apache:280–289 +Bavota G, Ciemniewska A, Chulani I, De Nigro A, Di Penta M, Galletti D, Galoppini R, Gordon TF, Kedziora P, Lener I, Torelli F, Pratola R, Pukacki J, Rebahi Y, Villalonga SG (2014) The market for open source: an intelligent virtual open source marketplace. In: 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering, CSMR-WCRE 2014, Antwerp, Belgium February 3-6, 2014, pp 399–402 +brackets issue 8337. http://github.com/adobe/brackets/issues/8337 +Brock A (2010) Project harmony: inbound transfer of rights in FOSS projects. Intl. Free and Open Source Software Law Review 2(2):139–150 +Cass S. The 2015 top ten programming languages. http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages +Corbin J, Strauss A (1990) Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21 +Cortés-Coy LF, Linares-Vásquez M, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: 2014 IEEE 14th international working conference on source code analysis and manipulation (SCAM), IEEE, pp 275–284 +Cuanto commit. https://github.com/ttop/cuanto/commit/a1e58f2c93de40ab304c494e05853957c549fd44 +Cubranic D, Murphy GC, Singer J, Booth K. S. (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465 +Czmq commit. https://github.com/zeromq/czmq/commit/eabe063c2588cde0af90e5ae951a2798b7c5f7e4 +d3-armory issue 5. https://github.com/kovmarci86/d3-armory/issues/5 +Di Penta M, Germán DM, Antoniol G (2010) Identifying licensing of jar archives using a code-search approach. In: Proceedings of the 7th international working conference on mining software repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa May 2–3, 2010, Proceedings, pp 151–160 +Di Penta M, Germán DM, Guéhéneuc Y, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - Volume 1, ICSE 2010 Cape Town, South Africa, 1–8 May 2010, pp 145–154 +Dickey DA, Fuller WA (1979) Distributions of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74:427–431 + + +Dickey DA, Fuller WA (1981) Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49(4):1057–1072 + + +Doll B The octoverse in 2012. http://tinyurl.com/muyxkru. Last accessed: 2015/01/15 + + +Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: 35th international conference on software engineering, ICSE ’13, San Francisco, CA USA, May 18–26, 2013, pp 422–431 + + +enigma2 commit. https://github.com/openatv/enigma2/commit/b4dfdf09842b3dcacb2a6215fc040f7ebbbb3c03 + + +Free Software Foundation (2015) Categories of free and nonfree software. https://www.gnu.org/philosophy/categories.html. Last accessed: 2015/01/15 + + +F-Droid. https://f-droid.org/. Last accessed: 2015/01/15 + + +Germán DM, Hassan AE (2009) License integration patterns: addressing license mismatches in component-based development. In: 31st international conference on software engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings, pp 188–198 + + +Germán DM, Di Penta M, Guéhéneuc Y, siblings G. Antoniol. (2009) Code technical and legal implications of copying code between applications. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009 (Co-located with ICSE), Vancouver, BC Canada May 16-17, 2009 Proceedings, pp 81–90 + + +Germán DM, Di Penta M, Davies J (2010a) Understanding and auditing the licensing of open source software distributions. In: The 18th IEEE international conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2 2010, pp 84–93 + + +Germán DM, Manabe Y, Inoue K (2010b) A sentence-matching method for automatic license identification of source code files. In: ASE 2010, 25th IEEE/ACM international conference on automated software engineering, Antwerp Belgium, September 20–24 2010, pp 437–446 + + +GitHub API. https://developer.github.com/v3/. Last accessed: 2015/01/15 + + +GNU General Public License (2015). http://www.gnu.org/licenses/gpl.html. Last accessed: 2015/01/15 + + +gtksourcecompletion issue 1. https://github.com/chuchiperriman/gtksourcecompletion/issues/1 + + +Gobeille R (2008) The FOSSology project. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008 (Co-located with ICSE), Leipzig, Germany May 10–11, 2008 Proceedings, pp 47–50 + + +Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C (2010) A search engine for finding highly relevant applications. In: Proceedings of the 32Nd ACM/IEEE international conference on software engineering - Volume 1, ICSE ’10, New York, NY, USA ACM, pp 475–484 + + +gubg commit https://github.com/gfannes/gubg.deprecated/commit/4d291ef433f0596dbd09d5733b25d27b3a921cf4 + + +Holmes R, Murphy GC (2005) Using structural context to recommend source code examples. In: 27th international conference on software engineering (ICSE 2005), 15–21 May 2005 St. Louis, Missouri USA, pp 117–125 + + +Howison J, Conklin M, Crowston K FLOSSmole: a collaborative repository for FLOSS research data and analyses. IJITWE’06 1:17–26 + + +Haml commit https://github.com/haml/haml/commit/537497464612f1f5126a526e13e661698c86fd91 + + +Intex issue 1 https://github.com/mtr/intex/issues/1 + + +jackson-module-jsonschema issue 35 https://github.com/FasterXML/jackson-module-jsonSchema/issues/35 + + +jquery-browserify issue 20 https://github.com/jmars/jquery-browserify/issues/20 + + +Linares-Vásquez M, Cortés-Coy LF, Aponte J, Poshyvanyk D (2015) ChangeScribe: A tool for automatically generating commit messages. In: 37th IEEE/ACM international conference on software engineering (ICSE’15), formal research tool demonstration, page to appear + + +Manabe Y, Hayase Y, Inoue K (2010) Evolutional analysis of licenses in FOSS. In: Proceedings of the joint ERCIM workshop on software evolution (EVOL) and international workshop on principles of software evolution (IWPSE), Antwerp, Belgium, September 20–21, 2010, pp 83–87 ACM + + +McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11, New York, NY, USA, ACM + + +McMillan C, Grechanik M, Poshyvanyk D (2012a) Detecting similar software applications, pp 364–374 + + +McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q (2012b) Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans Softw Eng 38(5):1069–1087 +McMillan C, Hariri N, Poshyvanyk D, Cleland-Huang J, Mobasher B (2012c) Recommending source code for use in rapid software prototypes. In: Proceedings of the 34th international conference on software engineering, ICSE '12, Piscataway, NJ, USA, IEEE Press, pp 848–858 + + +Mcmillan C, Poshyvanyk D, Grechanik M, Xie Q, Fu C. (2013) Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol 22(4):37:1–37:30 + + +minixwall commit https://github.com/booster23/minixwall/commit/342171fa9e9d769ce4aa48525142a569b34962f7 + + +Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic generation of release notes. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), Hong Kong, China November 16–22 2014, pp 484–495 + + +Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research. In: Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE'13, Saint Petersburg, Russian Federation, August 18–26 2013, pp 466–476 + + +neunode issue 5 https://github.com/snakajima/neunode/issues/5 + + +Nimble commit https://github.com/bradleybeddoes/nimble/commit/e1e273ff18730d2f8e0d7c2 af1951970e676c8d1 + + +Oracle MySQL - FOSS License Exception. http://www.mysql.com/about/legal/licensing/foss-exception/. Last accessed: 2015/01/15 + + +Passenger issue 1482 http://github.com/phusion/passenger/issues/1482 + + +patchelf issue 37 https://github.com/NixOS/patchelf/issues/37 + + +Penta MD, Germán DM (2009) Who are source code contributors and how do they change? In: 16th working conference on reverse engineering, WCRE 2009, 13–16 October 2009, Lille France, pp 11–20 + + +PF: The OpenBSD Packet Filter http://www.openbsd.org/faq/pf Last accessed: 2015/01/15 + + +Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: 17th european conference on software maintenance and reengineering, CSMR 2013, Genova, Italy, March 5–8 2013, pp 57–66 + + +Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: 11th working conference on mining software repositories, MSR 2014, Proceedings, May 31 - June 1 Hyderabad, India, pp 102–111 + + +Postgis commit https://github.com/postgis/postgis/commit/4eb4127299382c971ea579c8596cc41cb1c089bc + + +pyelection issue 1 https://github.com/alex/pyelection/issues/1 + + +python-hpilo issue 85 https://github.com/seveas/python-hpilo/issues/85 + + +rcswitch-pi issue 17 https://github.com/r10r/rcswitch-pi/issues/17 + + +Ros-comm commit https://github.com/ros/ros_comm/commit/e451639226e9fe4eebc997962435cc454687567c + + +schevorecipe.db commit https://github.com/Schevo/schevorecipe.db/commit/b73bef14adeb7c87c002a908384253c8f686c625 + + +Singh P, Phelps C (2009) Networks, social influence, and the choice among competing innovations: Insights from open source software licenses. Inf Syst Res 24(3):539–560 + + +Sojer M, Henkel J (2010) Code reuse in open source software development: Quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(12):868–901 + + +Software Package Data Exchange (SPDX) http://spdx.org Last accessed: 2015/01/15 + + +State of the Octoverse in 2012 https://octoverse.github.com/ Last accessed: 2015/01/15 + + +Steampp issue 1 https://github.com/seishun/SteamPP/issues/1 + + +svgeezy issue 20 https://github.com/benhowdle89/svgeezy/issues/20 + + +tablib issue 114 https://github.com/kennethreitz/tablib/issues/114 + + +Tardis commit https://github.com/tardis-sn/tardis/commit/07b2a072d89d45c386d5f988f04435d76464750e + + +The BSD 2-Clause License. http://opensource.org/licenses/BSD-2-Clause. Last accessed: 2015/01/15 + + +Tuunanen T, Koskinen J, Kärkkäinen T (2009) Automated software license analysis. Softw Autom Eng 16(3-4):455–490 + + +Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, Germán DM, Poshyvanyk D (2015a) License usage and changes: A large-scale study of Java projects on GitHub. In: The 23rd IEEE international conference on program comprehension, ICPC 2015, Florence, Italy, May 18–19, 2015. IEEE + + +Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, German DM, Poshyvanyk D (2015b) When and why developers adopt and change software licenses. In: The 31st IEEE international conference on software maintenance and evolution, ICSME 2015 Bremen, Germany, September 29 - October 1, 2015, pages 31–40 IEEE +Christopher Vendome is a fourth year Ph.D. student at the College of William & Mary. He is a member of the SEMERU Research Group and is advised by Dr. Denys Poshyvanyk. He received a B.S. in Computer Science from Emory University in 2012 and he received his M.S. in Computer Science from The College of William & Mary in 2014. His main research areas are software maintenance and evolution, mining software repositories, software provenance, and software licensing. He is member of the IEEE and ACM. + + +Gabriele Bavota is an assistant professor at the Free University of Bolzano-Bozen. received (cum laude) the Laurea in Computer Science from the University of Salerno (Italy) in 2009 defending a thesis on Traceability Management, advised by Prof. Andrea De Lucia. He received the PhD in Computer Science from the University of Salerno in 2013. Form January 2013 to October 2014 he has been a research fellow at the Department of Engineering of the University of Sannio. His research interests include software maintenance and evolution, refactoring of software systems, mining software repositories, empirical software engineering, and information retrieval. +Massimiliano Di Penta is an associate professor at the University of Sannio, Italy since December 2011. Before that, he was assistant professor in the same University since December 2004. His research interests include software maintenance and evolution, mining software repositories, empirical software engineering, search-based software engineering, and service-centric software engineering. He is currently involved as principal investigator for the University of Sannio in a European Project about code search and licensing issues (MARKOS - www.markosproject.eu). + + +Mario Linares-Vásquez is a Ph.D. candidate at the College of William and Mary advised by Dr. Denys Poshyvanyk, and co-founder of liminal ltda. He received his B.S. in Systems Engineering from Universidad Nacional de Colombia in 2005, and his M.S. in Systems Engineering and Computing from Universidad Nacional de Colombia in 2009. His research interests include software evolution and maintenance, software architecture, mining software repositories, application of data mining and machine learning techniques to support software engineering tasks, and mobile development. He is member of the IEEE and ACM. +Daniel M. German is an Associate Professor at the University of Victoria in Victoria, Canada. He received his Ph.D. degree in Computer Science from University of Waterloo in Canada. His research interests are in software engineering. In particular, software evolution, open source and intellectual property. + + +Denys Poshyvanyk is an Associate Professor at the College of William and Mary in Virginia. He received his Ph.D. degree in Computer Science from Wayne State University in 2008. He also obtained his M.S. and M.A. degrees in Computer Science from the National University of Kyiv-Mohyla Academy, Ukraine and Wayne State University in 2003 and 2006, respectively. His research interests are in software engineering, software maintenance and evolution, program comprehension, reverse engineering, software repository mining, source code analysis and metrics. He is a member of the IEEE and ACM. +---------------------------------------- +------------------------------- +Section 88: +Maintaining interoperability in open source software: A case study of the Apache PDFBox project + + +Simon Butler\textsuperscript{a,}\textsuperscript{ +}, Jonas Gamalielsson\textsuperscript{a,}\textsuperscript{ +}, Björn Lundell\textsuperscript{b,}\textsuperscript{*}, Christoffer Brax\textsuperscript{b}, Anders Mattsson\textsuperscript{c}, Tomas Gustavsson\textsuperscript{d}, Jonas Feist\textsuperscript{e}, Erik Lönroth\textsuperscript{f} + + +\textsuperscript{a}University of Skövde, Skövde, Sweden +\textsuperscript{b}Combitech AB, Linköping, Sweden +\textsuperscript{c}Husqvarna AB, Huskvarna, Sweden +\textsuperscript{d}PrimeKey Solutions AB, Stockholm, Sweden +\textsuperscript{e}RedBridge AB, Stockholm, Sweden +\textsuperscript{f}Scania IT AB, Södertälje, Sweden + + +\textbf{A B S T R A C T} + + +Software interoperability is commonly achieved through the implementation of standards for communication protocols or data representation formats. Standards documents are often complex, difficult to interpret, and may contain errors and inconsistencies, which can lead to differing interpretations and implementations that inhibit interoperability. Through a case study of two years of activity in the Apache PDFBox project we examine day-to-day decisions made concerning implementation of the PDF specifications and standards in a community open source software (OSS) project. Thematic analysis is used to identify semantic themes describing the context of observed decisions concerning interoperability. Fundamental decision types are identified including emulation of the behaviour of dominant implementations and the extent to which to implement the PDF standards. Many factors influencing the decisions are related to the sustainability of the project itself, while other influences result from decisions made by external actors, including the developers of dependencies of PDFBox. This article contributes a fine grained perspective of decision-making about software interoperability by contributors to a community OSS project. The study identifies how decisions made support the continuing technical relevance of the software, and factors that motivate and constrain project activity. + + +© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/) + + + + +Introduction + + + + +Many software projects seek to implement one or more standards to support interoperability with other software. For example, interconnected systems implement standardised communications protocols, such as the open systems interconnect stack, and web standards, including the hypertext transfer protocol (HTTP) and the secure sockets layer (SSL), to support information exchange and commercial activities (Wilson, 1998; Treese, 1999; Ko et al., 2011). + + +As businesses and civil society — governments at national and local level, and the legal system — move away from paper documents (Lundell, 2011; Rossi et al., 2008) to rely increasingly on digitised systems, the implementation of both communication protocols and document standards becomes ever more crucial (Rossi et al., 2008; Wilson et al., 2017; Lehtonen et al., 2018). Standards are written by humans, and despite the care taken in their creation they are imperfect, vague, ambiguous and open to interpretation when implemented in software (Allman, 2011; Egyedi, 2007). Furthermore, practice evolves so that implementations, often seen as the de facto reference for a standard, can diverge from the published standard as has been the case with the JPEG image format (Richter and Clark, 2018). Indeed, practice can, for example, with HTML, CSS and JavaScript, repeatedly deviate from standards, sometimes with the intention of locking users in to specific products (W3C, 2019a; Bouvier, 1995; Phillips, 1998) and with the consequence that web content becomes challenging to implement and access (Phillips, 1998), and to archive (Kelly et al., 2014). + + +While software interoperability relies on standards, different software implementations of a given standard are interpretations... +of the standard that may not be fully interoperable (Egyedi, 2007). Consequently, the developers of software implementations will become involved in a discourse to find a common understanding of the standard that supports interoperability, as illustrated by Allman (2011), Lehtonen et al. (2018), and Watteyne et al. (2016). The means by which interoperability is achieved varies. The Internet Engineering Task Force (IETF) (IETF, 2019a), for example, uses a process, often summarised as “Rough consensus and running code” (Davies and Hoffmann, 2004), that requires interoperability between independent implementations is achieved early in the standardisation process (Wilson, 1998). An increasing proportion of software that implements communication and data standards, particularly where it is non-differentiating, is developed through collaboration by companies working in community open source software (OSS) projects (Lundell et al., 2017; Butler et al., 2019). By community OSS project we mean OSS projects managed by foundations or are collectively organised (Riehle, 2011), where many of the developers are directed by companies and other organisations, and collaborate to create high quality software (Fitzgerald, 2006). Examples of this process include OSS projects under the umbrella of the Eclipse Internet of Things Working Group (Eclipse IoT Working Group, 2019), and LibreOffice (The Document Foundation, 2019). In many cases and domains both OSS and proprietary solutions are available for the same standard and need to interoperate to remain relevant products. While the literature documents the process of standardisation, and the technical challenges of implementing standards compliant software, there is little research that focuses on how participants in OSS projects decide how to implement a standard, and how to revise their implementation to correct or improve its behaviour. To explicate the challenges facing community OSS projects developing standards compliant software and the day-to-day decisions made by contributors this study investigates the following research question: + + +How does a community OSS project maintain software interoperability? + + +We address the research question through a case study (Gerring, 2017; Walsham, 2006) of two years of contributions to the Apache PDFBox1 OSS project. The PDFBox project is governed by the Apache Software Foundation (ASF) (ASF, 2019a) and develops and maintains a mature (Black Duck, 2019) Java library and tools to create and process Portable Document Format (PDF) documents (Lehmkuhler, 2010). PDFBox is used in other OSS projects (Apache Tika, 2019; CEF Digital, 2019; Khudairi, 2017), and as a component in proprietary products and services. PDFBox is described further in Section 3.2. + + +Developed in the 1990s, PDF is a widely-used file format for distributing documents, which are created, processed and read by many different applications on multiple platforms. Versions of PDF are defined in a number of specifications and standards documents, including formal (ISO) standards, that implementers need to follow to ensure the interoperability of their software. There is evidence that the PDF standards are challenging to implement (Bogk and Schöpl, 2014; Endignoux et al., 2016), that the quality of PDF documents varies (Lehtonen et al., 2018; Lindlar et al., 2017), and that the dominance of Adobe’s software products creates user expectations that need to be met by the developers of other PDF software (Gamalielsson and Lundell, 2013; Endignoux et al., 2016; Amiouny, 2017; 2016). In the following section we provide a background description of PDF, and also review the related academic literature. + + +Section 3 details reasons for the purposeful sampling (Patton, 2015) of PDFBox as the case study subject. We also identify the data sources investigated for the case study and give an account of the application of thematic analysis (Braun and Clarke, 2006) to identify semantic themes in the types of decisions concerning the interoperability of PDFBox made by contributors to the project and the factors influencing those decisions. + + +Through the analysis of the data we identified four fundamental types of decision made concerning the interoperability of PDFBox related to compliance with published PDF specifications and standards. The types of decision and the technical circumstances in which they are made are described in Section 4. We also provide an account of the factors identified that influence those decisions including resources, knowledge, and the influence of external actors, such as the developers of other PDF software, and the creators of documents. We discuss the challenges faced by the PDFBox project in Section 5 including the technical challenges faced by the developers of PDF software, and potential solutions. Thereafter, we consider how the behaviour of contributors to the PDFBox project sustains the project in the long term. Lastly, we present the conclusions in Section 6 and identify the contributions made by this study. + + + + +Background and related work + + + + +2.1. Standards development and interoperability + + +The development of standards for information and communications technologies is undertaken by companies and other organisations using a range of approaches, e.g. whether the technology is implemented before the standard is developed, and the working practices of the standards body involved. One perspective is that standards have two different types of origin. Some standards are specified by standards bodies, e.g. ISO and ITU. While others arise through extensive or widespread use of a particular technology, regardless of whether it was developed by one company or collaboratively (Treece, 1999). Another perspective is that standards are either requirement-led or implementation-led (Phipps, 2019). Phipps, a director (and sometime President) of the Open Source Initiative, argues the primary use of the requirement-led model is where standardisation is used to create a market, for example the development of 5G (Nikolich et al., 2017). In contrast, implementation-led standards are developed to support an innovation in software or data format that has been adopted by a wider audience than the creating company and standardisation is necessary to support interoperability. A third view is provided by Lundell and Gamalielsson (2017) who identify standards that are developed before software, software that is implemented and then forms the basis of a standardisation process (including that of PDF), and the development of standards in parallel with software. The latter process is identified as being of increasing importance in the telecommunications industry (Wright and Druta, 2014), and examples can be found in the standardisation process for internet protocols managed by the IETF (IETF, 2019a). The IETF emphasises interoperability at an early stage of protocol development, rather than technical perfection (Bradner, 1996; Wilson, 1998; Bradner, 1999). The process of developing interoperability between low powered devices in the IoT domain is described by Ko et al. (2011). They record the development of the internet protocol (IP) in 6LoWPAN to provide interoperable communications stacks for two IoT operating systems Contiki-OS and TinyOS. The interoperable implementations are then used to determine whether the solutions achieved are practicable for the types of IoT devices expected to use them (Ko et al., 2011). + + +A further approach to interoperability is the development of implementations of standards, particularly communication protocols, +in OSS projects. Companies participating in the Eclipse IoT Working Group (2019), for example, collaborate, sometimes with competitors, in OSS projects to develop implementations of open communications standards used in the IoT domain that then support their products (Butler et al., 2019). Examples include the implementation of the Open Mobile Alliance’s (OMA, 2019) lightweight machine to machine (LWM2M) protocol in Leshan (Eclipse Foundation, 2019b) and Wakaama (Eclipse Foundation, 2019c), and the constrained application protocol (CoAP) (Shelby et al., 2014) in Californium (Eclipse Foundation, 2019a). Additionally, the collaborative OSS project serves to identify and document cogent misinterpretations and misunderstandings of the standard (Butler et al., 2019). + + +2.2. PDF standards and interoperability + + +Adobe Systems developed PDF as a platform-independent, inter-change format for documents that can preserve presentation independently of the application and operating system. In 1993, the first PDF specification was made freely available and a number of revisions of the specification have been published since (see Table 1). Some versions of the specification have been published as ISO standards (e.g. ISO 32000-1:2008 and ISO 32000-2:2017), including specialised subsets of the PDF format for the print industry (e.g. ISO 15929:2002 and ISO 15930-1:2001), and engineering applications (e.g. ISO 24517-1:2008). + + +PDF documents vary in size and complexity from single page tickets, receipts and order summaries, through academic papers, to very large documents, such as Government reports, and books. Consequently, PDF documents may have short lifespans, or have a significantly longer life as business and legal records, particularly as organisations move away from paper. Many different software packages exist to create, display, edit and process PDF files. Further, a significant problem for long-term use of PDF is that many documents will outlive the software used to create them (Gamalielsson and Lundell, 2013), so will require standards compliant software that can faithfully reproduce the documents to be available at some arbitrary point in the future. + + +PDF software, therefore, does not work in isolation; it must interoperate with other software to the extent that implementations need to be able to process documents created by other software, regardless of how long ago, and to create documents that other implementations can read. Furthermore there is the requirement that those documents be readable many years in the future, particularly in the case of documents such as contracts and official documentation issued by governmental agencies. These requirements are not a theoretical exercise, they are practical requirements that already pose problems for organisations and businesses. For example, in the dataset examined for this article there is evidence that contractors for the Government of the Netherlands have created many thousands of official academic transcripts as PDF documents that do not comply with the PDF specifications and are, at best, problematic to process (see mailing list thread Users-1, Table 5 on p 8). + + +PDF is a complex file format that is used to create documents with a rich variety of content including text, images, internal document links, indexes, fillable forms, and digital signatures. Each version of the PDF standard cites normative references — other standards — that form part of the standard and are described as “... indispensable for the application of this document” in ISO 32000-1:2008 (ISO, 2008). The normative references include standards for fonts, image formats, and character encodings. In addition, several normatively referenced standards include normative references themselves (and so on). For example, among the normative references of ISO 32000-2:2017 is part 1 of an early revision of the JPEG 2000 ISO standard (ISO/IEC 15444-1:2004) which in turn has 13 normative references, including 10 IEC, ISO and ISO/IEC standards. The specifications and standards also define the declarative programming language that describes PDF documents, as well as the expected behaviours and capabilities of programs that create and process PDF documents. The size and complexity of the PDF specifications and ISO standards themselves pose a daunting challenge for software developers implementing them. The recently published ISO 32000-2:2017 standard, for example, consists of 984 pages and has 90 normative references (ISO, 2017). Further challenges complicate the development of software that works with PDF files. A key challenge is the common perception that the Adobe Reader family of software applications are the de facto reference implementations of the PDF specifications and standards to which the performance of other implementations is compared (Amiouny, 2016; Lehtonen et al., 2018). Another source of difficulty is the Robustness Principle (Allman, 2011), otherwise known as Postel’s Law, which is applied in Adobe’s Reader products, and stated by Postel, in the context of communication protocols, as, “... be conservative in what you do, be liberal in what you accept from others.” (Postel, 1981). In practice, PDF reading and processing software implements repair mechanisms to allow malformed files to be read, within limitations. The limitations, however, are only documented in the behaviour of Adobe’s products. + + +2.3. Related work + + +A key aspect of software interoperability is the agreement and documentation of data formats and communication protocols in specifications and standards. There are many practical challenges to the standardisation process, and a number of approaches have been tried. Ahlgren et al. (2016) argue that open standardisation processes are needed to support interoperability in the IoT domain. An example can be found in the development of implementations of the 6TiSCH communications protocol for low-power devices (Watteyne et al., 2016). Watteyne et al. describe an iterative process of interoperability testing between implementations and how the lessons learnt through testing inform further iterations of the standardisation process. Another example is the standardisation of the QUIC protocol. Originally implemented by Google, QUIC has been in use for some 6 years and a standard is being developed by an IETF committee (IETF, 2019b; 2019c; Piraux et al., 2018). + + +Table 1 +Selected PDF versions and ISO standards. + + +| Version | ISO Standard | Year | Comment | +|---------|--------------|------|---------| +| PDF v1.0 | First published PDF specification. | +| PDF v1.4 | Improved encryption, added XML metadata, and pre-defined CMaps. | +| PDF v1.5 | Added JPEG 2000 images and improved encryption. | +| PDF/A-1 | ISO 19005-1:2005 | 2005 | An archive format for standalone PDF documents based on PDF v1.4. | +| PDF v1.7 | ISO 32000-1:2008 | 2008 | Extended range of support for encryption. | +| PDF/A-2 | ISO 19005-2:2011 | 2011 | An archive format for standalone PDF documents based on ISO 32000-1:2008. | +| PDF/A-3 | ISO 19005-3:2012 | 2012 | An extension of PDF/A-2 to support file embedding. | +| PDF v2.0 | ISO 32000-2:2017 | 2017 | Revision of ISO 32000-1:2008. | +Piraux et al. (2018) evaluated the interoperability of fifteen implementations of QUIC finding some shortcomings in all. The tests developed by Piraux et al. have since been incorporated in the test suites of some of the implementations tested (Piraux et al., 2018). + + +Standardisation processes can take a long time, and consequently may be seen by some as an inhibitor of innovation. De Coninck et al. (2019), for example, cite the slowness of the QUIC standardisation process as motivation for a proposed plugin mechanism to extend QUIC. They have proposed, implemented and investigated a flexible approach where applications communicating with QUIC negotiate which extensions to QUIC to use during connection set-up (De Coninck et al., 2019). + + +Standards are also long-lived, and require review and revision in response to developments in both practice and technology. The Joint Photographic Expert Group (JPEG) have initiated a number of standardisation efforts to update the 25 year old JPEG standards for image files, including the JPEG XT project (JPEG, 2019). Richter and Clark (2018) identify how JPEG implementations differ from the standard, and the difficulties of applying the JPEG conformance testing protocol published in ISO 10918-5:2013 (ISO, 2013) to current implementations. Richter et al. identify two key issues. Firstly, the evolution of a body of practice building on the standard during the 25 years since it was made available, which motivates the standardisation review. Secondly, parts of the current standard are not used in practice, and may no longer need to be part of any revised standard (Richter and Clark, 2018). + + +The standardisation of HTML and CSS, and other web technologies followed a different path. Standards for both HTML and CSS have been developed by the World Wide Web Consortium (W3C) (W3C, 2019b) since the 1990s (W3C, 2019a), initially under the auspices of the IETF (Bouvier, 1995). During the browser wars (Bouvier, 1995) companies would add functionality to their browsers to extend the standard, and encourage web site developers to create content specifically for innovative features found in one browser. The process of developing websites to support variations in HTML became so onerous for developers that practitioners campaigned for Microsoft and Netscape to adhere to W3C standards (Phillips, 1998; WaSP, 2019). + + +Previous research on the development of PDF software in two OSS projects found developers adopted specific strategies to support interoperability (Gamalielsson and Lundell, 2013). Specifically, developers would exceed the specification, and mimic a dominant implementation so that their software complied with that implementation. In addition, the study illuminated difficulties developers had interpreting the PDF standard. One issue identified was the lack of detail in parts of the specification that made software implementation imprecise, and unreliable. Another concern expressed was that the complexity of the specification inhibited implementation (Gamalielsson and Lundell, 2013). Indeed, analyses of PDF from the perspective of creating parsers have found the task to be challenging (Bogk and Schöpl, 2014; Endignoux et al., 2016). As part of their investigation of PDF, Endignoux et al. (2016) identify ambiguities in the file structures that were used to discover bugs in a number of PDF readers. Bogk and Schöpl (2014) describe the experience of trying to create a formally verified parser for PDF. They advise that the creators of future file format definitions should ensure that the format is “... complete, unambiguous and doesn’t allow unparseable constructions.” (Bogk and Schöpl, 2014) In practice, the complexity of PDF specifications can lead to significant security vulnerabilities in software implementations (Mladenov et al., 2018a; 2018b). + + +The PDF/A standards (see Table 1) are used in document preservation. An area of concern is the management of documents that do not comply with the PDF/A standards. Lehtonen et al. (2018) identify the complexity of the problems faced by those handling documents, and explore mechanisms through which documents might be repaired so that they are “well-formed and valid PDF/A files.” The team behind the development of veraPDF, a PDF/A validator, identify difficulties interpreting the PDF/A standard (Wilson et al., 2017) to be able to create validation tests representing a clear understanding of the standards. Additionally, Wilson et al. (2017) record the need to limit the scope of the validation tests implemented in veraPDF because of the scale of the task, particularly in the validation of normative references such as JPEG 2000. Lindlar et al. (2017) record the development of a test set of PDF documents to test the conformance of PDF files with the structural and syntactic requirements of ISO 32000-1:2008. The authors argue that a test set used to examine basic well-formedness requirements is helpful in digital preservation, as it simplifies the detection of specific problems as a precursor application of document repair techniques (Lindlar et al., 2017). + + +In summary, previous research shows the necessity of standardisation for software interoperability, and details approaches to standardisation. Research has also identified how practice can deviate from standards, and in the case of PDF the practical difficulties of developing software, and the challenges of creating mechanisms to evaluate standards compliance. The challenges of implementing standards have also been recorded. However, there is a lack of research that examines the nature of day-to-day practical decision-making of software developers when implementing a standard. + + + + +Research approach + + + + +We undertake a case study (Gerring, 2017; Walsham, 2006) of a single, purposefully-sampled (Patton, 2015) community OSS project that focuses on the challenges contributors face when creating and maintaining interoperable software and how they collaborate to resolve problems. + + +3.1. Case selection + + +Apache PDFBox was selected as a relevant subject for the case study for several reasons. Firstly, for PDFBox to have any value for users it must be able to interoperate with other software that reads and writes PDF documents. As such, it must implement sufficient of the PDF specifications and standards to be perceived as a viable solution. Secondly, the PDF specifications and standards are complex and documented as challenging to implement, with the additional requirement that implementations need to process a wide variety of conforming and non-conforming documents to emulate the functionality of a dominant implementation. Thirdly, though the software produced by the OSS project is most likely to be used in a business setting, PDFBox is an ASF project and is independent of direct company control. Consequently, contributors to PDFBox are obliged to rely on cooperation with others in the community to achieve their goals. Fourthly, the PDFBox project actively develops and maintains software, responds to reports of issues with the software, and releases revisions of the software frequently. + + +The scope of the investigation is the publicly documented work contributing to nine releases of Apache PDFBox between the release of v2.0.3 in September 2016 and the release of v2.0.12 in October 2018. The period investigated was specifically chosen to include the publication of the ISO 32000-2:2017 standard, also known as PDF v2.0, in August 2017. + + +3.2. Case description + + +The Apache PDFBox project develops a Java library and command line tools that can create and process PDF files. The library is relatively low level and can be used to create and process PDF documents conforming to different versions of the PDF specifications. +and ISO standards (see Table 1 for examples). In development since 2002, and an ASF governed project since 2008, PDFBox is maintained by a small group of core developers and an active community of contributors. PDFBox is a dependency of some other ASF projects, including Apache Tika (Apache Tika, 2019), and other OSS projects, including the European Union funded Digital Signature Services project (CEF Digital, 2019). PDFBox is used to parse documents in one version of the veraPDF validator (veraPDF, 2019), as well as being used in proprietary software products and services. PDFBox was also part of the software suite used by journalists to extract information from PDF files amongst the documents collectively known as the Panama Papers (Khudairi, 2017; ICJ, 2019). + + +At the time of the study, the most recent major revision of PDFBox, v2.0.0, had been released in March 2016 and maintenance releases have generally been made approximately every two to three months since. In addition, the project maintains an older version, v1.8, in which bugs are fixed, and releases made less often. The overwhelming majority of bug fixes for the 1.8.x series are backported to the 2.0.x series. The project is also working towards a major revision in v3.0. +---------------------------------------- +------------------------------- +Section 89: +3.3. Data collection + + +The core data for the case study consists of the online archives of activity in the PDFBox project. Using the PDFBox website (Apache PDFBox, 2019) we identified the communication channels available for making contributions, and the resources available for users of the software and contributors (see Table 2). Three public communication channels can be used to make contributions: the Jira issue tracker, and developers and users mailing lists. In addition there is a commits mailing list that reports the commits made to the PDFBox source code repository through messages generated by the version control system. A read-only mirror of the PDFBox source code is also provided on GitHub. + + +Mailing list archives identified were downloaded from the ASF mail archives (ASF, 2019b) and the GrimoireLab Perceval component (Bitergia, 2019) was used to parse the Mbox format files and convert them into JSON format files. The JSON files were then processed using Python scripts to reconstruct the email threads and write the threads out in emacs org-mode files for analysis (org-mode is a plain text format for emacs that supports text folding and annotation). The Jira issue tracker tickets were retrieved in JSON format using the Jira REST API (Atlassian, 2019). The JSON records for each ticket were then aggregated and processed by Python scripts to create org-mode files containing the problem description and the comments on the ticket. +---------------------------------------- +------------------------------- +Section 90: +3.4. Data analysis + + +The data gathered from the PDFBox project was analysed using the thematic analysis framework (Braun and Clarke, 2006). + + +Initially, the first author worked systematically through all the collected data to identify the email threads and issue tracker tickets that address the topic of interoperability in any regard. The mailing list threads and issue tracker tickets cover a wide range of topics including project administration as well as help requests, and potential bug reports. Key factors considered included reference to the capabilities of PDFBox in comparison to other PDF processing software and mention of any PDF specification or standard or any of its normative references, such as font and image formats. During this phase, email threads were reconstructed where parts of conversations with the same subject line had been recorded in the archives as separate threads.3 + + +The set of candidate email threads and issue tracker tickets were then examined in more detail to identify discussions in which decisions were made concerning the implementation of functionality related to the PDF specifications and standards, and their normative references in PDFBox and other software. Mailing list threads and issue tracker tickets where no clear decision was articulated were ignored for analytical purposes, as were discussions where it was judged there was insufficient information given for any decisions made to be clearly understood. + + +The conversations recorded in mailing list threads and issue tracker tickets contain the technical opinions and judgements of domain experts, including the core developers, and often contain explicit reference to PDF specifications and standards. Where there was no specific reference to a standard in a conversation, the topic of the discussion was used to determine relevance through comparison with other conversations on the topic explicitly linked to the PDF standards by contributors. At the end of the process, 111 mailing list threads and 394 issue tracker tickets had been identified for further analysis. Coding was also used at this stage to annotate the discussions, and particularly the decisions made, to help identify the nature of the problems being addressed, the relationship between the problems and the PDF standards and other PDF software, and the outcome of the decision-making process. + + +The corpus of 505 mailing list and issue tracker discussions was then analysed in depth by the first author to identify candidate semantic themes to describe the types of decision being made, and to identify candidate thematic factors influencing the decisions made. The coding from the previous phase supported the grouping of decision types and the development of semantic themes. Additional coding undertaken at this stage was used to identify factors influencing decisions and to develop a set of candidate thematic factors. + + +In the subsequent phase, all authors discussed the candidate decision types and factors alongside illustrative discussions taken from the corpus. A set of four semantic themes and seven thematic factors was agreed, and their consistency with the larger body of evidence reviewed by the first author. +---------------------------------------- +------------------------------- +Section 91: +4. Findings + + +This section describes the semantic themes identified through thematic analysis that categorise the decisions made by contributors to PDFBox regarding maintenance of its interoperability. Each decision type is illustrated with examples. Thereafter we provide + + + + +3 Each email header contains a reference to the message it replies to. Sometimes the reference can be omitted when replying to a mailing list message. +Table 3 +Types of software development decisions related to the PDF specifications and standards in the Apache PDFBox project. + + +| Decision Type | Description | +|---------------------------------------------------|-----------------------------------------------------------------------------| +| Improve to match de facto reference implementation| A decision taken in the context of improving or correcting PDFBox to match the de facto reference implementation. | +| Degrade to match de facto reference implementation | A decision taken in the context of degrading the compliance of PDFBox with a PDF specification or standard so that the behaviour matches that of an Adobe implementation. | +| Improve to match standard | A decision taken in the context of improving or correcting the behaviour of PDFBox to meet a PDF specification or standard. | +| Scope of implementation | A decision taken about the extent of the PDFBox implementation. | + + +Table 4 +Apache PDFBox JIRA issue tracker tickets referenced in Section 4.1. + + +| Decision type | Issue tracker ticket | +|---------------------------------------------------|----------------------| +| Improve to match de facto reference implementation| PDFBOX-3513 | +| | PDFBOX-3589 | +| | PDFBOX-3654 | +| | PDFBOX-3687 | +| | PDFBOX-3738 | +| | PDFBOX-3745 | +| | PDFBOX-3752 | +| | PDFBOX-3781 | +| | PDFBOX-3789 | +| | PDFBOX-3874 | +| | PDFBOX-3875 | +| | PDFBOX-3913 | +| | PDFBOX-3946 | +| | PDFBOX-3958 | +| Degrade to match de facto reference implementation| PDFBOX-3929 | +| | PDFBOX-3983 | +| Improve to Match Standard | PDFBOX-3914 | +| | PDFBOX-3920 | +| | PDFBOX-3992 | +| | PDFBOX-4276 | +| | PDFBOX-3293 | +| | PDFBOX-4045 | +| | PDFBOX-4189 | + + +an account of the main factors that motivate and constrain the outcomes of the types of decision made. + + +4.1. Decision types + + +We identified four major types of decision related to the implementation of the PDF specification and standards in the PDFBox project (see Table 3), each of which is described below with illustrative examples. We also provide descriptions of the thematic factors identified that, in combination, influence the decisions made. + + +4.1.1. Improve to match de facto reference implementation + + +Much of the work of PDFBox contributors is focused on trying to match the behaviour of Adobe’s PDF software. The PDFBox core developers and many contributors treat the Adobe PDF readers as de facto reference implementations of the PDF specifications and standards (e.g. PDFBOX-3738 and PDFBOX-3745 – PDFBox JIRA issue tracker tickets referred to in Section 4.1 are listed in Table 4), and use the maxim that PDFBox should be able to process any document the Adobe PDF readers can. As one core developer explains: + + +“There is the PDF spec and there are real world PDFs. Not all real world PDFs are correct with regards to the spec. Acrobat, PDFBox and many other libraries try to do their best to provide workarounds for that. We typically try to match Acrobat ...” (PDFBOX-3687). + + +The ISO 32000-2:2017 standard (ISO, 2017, pp. 18-19) identifies two classifications of PDF processing software: PDF readers and PDF writers. Accordingly, developers trying to match the Adobe implementations face two major challenges. The first is to be able to process the same input that Adobe software does. The second is to create output of similar quality to that produced by Adobe software. There are also two types of output of PDF software: the document created, and how given document is rendered on screen or in print. To “try to match Acrobat” (PDFBOX-3687), documents created by PDFBox should, insofar as is possible match those output by Adobe software so that they are rendered consistently by other software, and the expectation is that PDFBox, and software created using it, should also render documents with similar quality to the Adobe implementations (e.g. PDFBOX-3589 & PDFBOX-3752). + + +The convention in software that reads PDF files is to apply the Robustness Principle (Allman, 2011; Postel, 1981) so that documents that are not compliant with PDF specifications and standards can be processed and rendered, insofar as is possible (e.g. PDFBOX-3789). Exactly what incorrect and malformed content should, or can, be parsed into a working document is not specified by the PDF specifications and standards. The exemplar for developers is the behaviour of the Adobe Readers, as well as the behaviour of other PDF software. + + +PDF documents consist of four parts: a header, a body, a cross reference table, and a trailer. The header consists of the string “%PDF–” and a version number, followed, on a second line, by a minimum of four bytes with a value of 128 or greater so that any tool trying to determine what the file contains will treat it as binary data, and not text. The trailer consists of the string “%%EOF” on a separate line, immediately preceded by a number on one line representing the offset of the cross-reference table and the string “startxref” on the line before that (see Fig. 1). A PDF parser reads the first line of a file and then searches for the “%%EOF” marker and works backwards to find the cross-reference table using the offset on the preceding line, and to read the trailer that confirms the number of objects referenced in the table, and the object reference of the root object of the document tree. The parser should then be able to read all the objects in the PDF file. + + +Where the cross-reference table is missing or damaged, PDF parsers may, according to the ISO 32000-1:2008 standard (ISO, 2008, p. 650), try to reconstruct the table by searching for objects in the file (see Fig. 2). In practice, Adobe software appears to apply the Principle of Robustness more widely so that a wide range of problems, for example with fonts, are also tolerated by the parser. +---------------------------------------- +------------------------------- +Section 92: +4 The PDFBox JIRA issue tracker tickets referenced have URLs of the form https://issues.apache.org/jira/browse/PDFBOX-‘NNNN’ where ‘NNNN’ is the four digit number of the ticket. For example, PDFBOX-3738 has the URL https://issues.apache.org/jira/browse/PDFBOX-3738. +---------------------------------------- +------------------------------- +Section 93: +5 There are also ‘linearised’ PDF files intended for network transmission where the trailer and cross-reference tables precede the body. + + +6 The repair mechanism is why, sometimes, Adobe software applications offer the opportunity for the user to save a newly opened document. +The work required to resolve issues of this nature varies in scope. Sometimes the source code revision is relatively trivial; a simple change to make the parser more lenient because the document author’s intention is clear. For example, PDFBOX-3874 where a small change is made to a font parser so that it will accept field names in the font metadata that are capitalised differently to the specification. Similarly, in PDFBOX-3513, the PDFBox core developers identify an error in the ISO 32000-1:2008 standard as the underlying cause of an observed problem with PDFBox. One column of a table specifies two types (a name and a dictionary) for the value of an encoding dictionary for Type 3 fonts (ISO, 2008, p 259), the next column of the table clearly specifies that the field must be a dictionary. The contributor who encountered the document, proposes a revision to the parser to accommodate the error (PDFBOX-3513). One core developer comments that “...we’ve never encountered a file with the problem you’ve presented.” Another core developer points out that there is no guidance in the specification on how to treat a Type 3 font that does not have an encoding dictionary. Instead of improvising a fallback encoding, the core developers argue that there may be a case to ignore the font specified in the document as it cannot be reliably used, and the parser is not revised given the rarity of the problem. + + +Adobe and other PDF software sometimes exceed the specifications and standards. In PDFBOX-3654, for example, a file is found that renders in many other applications, but not in PDFBox. The problem is a font that is encoded in a hexadecimal format, and the standard is unequivocal on the subject: + + +“Although the encrypted portion of a standard Type 1 font may be in binary or ASCII hexadecimal format, PDF supports only the binary format.” (ISO, 2017, p. 351) + + +The source code is revised to support the font encoding and the core developer processing the issue observes: + + +“So the font is incorrectly stored. But obviously, Adobe supports both, so we should too.” (PDFBOX-3654) + + +In some cases the Adobe software extends the specifications and standards through the implementation of additional functionality that reflects wider practice. Often the only documentation of the additional functionality is in the implementation, and other implementers only discover the change when differences in behaviour are reported to them. For example, a report in PDFBOX-3913 shows that Adobe software and PDF.js process and render a Japanese URI, which PDFBox can not. The ISO 32000-2:2017 standard specifies that the targets of URIs (links) should be encoded in UTF-8. In both applications the URI is encoded in UTF-16, which is necessary to represent some Japanese characters used in domain names, but exceeds the standard. Revisions are made to PDFBox (documented in PDFBOX-3913, PDFBOX-3946, and PDFBOX-3958) to support UTF-16 for URIs and implement the same functionality as both Adobe and PDF.js. + + +PDFBox contributors also find instances where documents created by the software are not rendered as expected by Adobe’s software. In these cases, typically, there is a difference in the model in documents created by PDFBox and the model that Adobe expects. In some cases a great deal of work is required to understand how Adobe and other readers interpret the PDF document. In PDFBOX-3738 work is undertaken to understand how the output of digitally signed files is interpreted by Adobe and other reader products. The acquired knowledge is then applied so that PDFBox can create documents that can be read and rendered with digital signature displayed by other PDF software. The developers also identify a related problem, documented in PDFBOX-3781, that affects documents with forms and digital signatures. + + +Merging PDF files can be a difficult problem for implementers to solve. PDFBOX-3875 records the challenges faced when merging two documents where the internal bookmarks are structured using slightly different representations in the document model. In the merged document some of the bookmarks do not work as expected. The initial assessment by one of the core developers is that the cause is within the PDFBox source code and is “…probably a bug. Not the kind that will be fixed quickly ...”. One approach used by the core developers to evaluate how best to solve the problem is to merge the documents using other applications, including Adobe software, and to examine the document created following the merge. Work is started to try to create a viable solution by emulating the document resulting from merging the files using Adobe software, but further problems are encountered and the work is not completed. + + +4.1.2. Degrade to match de facto reference implementation + + +As noted already, developers of PDF software, including the PDFBox developers, tend to view Adobe PDF software implementations as a gold standard. However, Adobe’s software developers do not always implement the PDF specifications and standards in the way that others might, and on occasions, implement solutions that can be seen as incorrect. Consequently, developers of PDF software then need to determine how they might degrade the adherence of their software to the PDF specifications and standards to match Adobe’s implementations. + + +PDFBOX-3929 begins in a discussion on the PDFBox users mailing list where a user observes that PDF documents created by PDFBox with floating point numbers used for field widget border + + + + +7 PDF.js is a widely used open source PDF reader implemented in JavaScript, see https://mozilla.github.io/pdf.js/. +widths, are rendered by Adobe XI and Adobe DC without a border (Users-2 and Users-3 in Table 5). The borders of other annotation types are unaffected. + + +The width of borders drawn around annotations, such as form fields, are defined in PDF documents in two ways: a border array holding three or four values, or in some cases a border style dictionary (an associative array) that includes a value for the width of the border in points. In both cases the value to specify the width is defined as a number. PDF specifications and standards define two numeric types integer objects and real objects. The ISO 32000 standards then say “... the term number refers to an object whose type may be integer or real.” ISO, 2008, p. 14; ISO, 2017, p. 24). ISO 32000-2:2017, for example, is explicit where fields are required to hold integer values, and uses the term number for other numeric fields. + + +Both versions of the ISO 32000 standard define the border array using the following sentence: + + +“The array consists of three numbers defining the horizontal corner radius, the vertical corner radius, and border width, all in default user space units.” (ISO, 2008, p. 384; ISO, 2017, p. 465) + + +Accordingly, the interpretation of the standards used in PDFBox agrees with the standard; border width can be specified with a floating point number. However, the Adobe reader software expects an integer, and ignores non-integer values, such as 3.0, by treating them as having a value of zero. Consequently, the PDFBox implementation was revised so that annotations in documents created by PDFBox will be rendered with borders by Adobe DC. A bug report was also made to Adobe support, saying that the standard had been interpreted incorrectly. + + +A closely related issue is found in a thread on the users mailing list (Users-4) where a developer reports that the Adobe reader implementations behave in an unexpected way. This time the concern is the border drawn around a URI action annotation, or a link. The border is defined in the standard as described above, but the Adobe reader implementations interpret the values 1, 2, and 3 as meaning a thin, medium and thick border respectively. The PDFBox API documentation is updated to describe how the Adobe reader implementations interpret the border width value. + + +A contributor reports in PDFBOX-3983 that Acrobat Reader fails to display some outlines and borders where the miter limit is set to a value of zero or less. The miter limit indicates how junctions between lines should be drawn. The ISO 32000-1:2008 standard states: + + +Parameters that are numeric values, such as the current colour, line width, and miter limit, shall be forced into valid range, if necessary. (ISO, 2008, p124) + + +The statement was revised in ISO 32000-2:2017 by the replacement of “forced” with “clipped” (ISO, 2017, p. 157). + + +Accordingly, one interpretation might be that a compliant PDF reader would be able to display a document correctly regardless of the value of the miter limit recorded because it would automatically correct the value. However, Adobe implementations appear not to correct the value. The user reporting the problem supplies a patch so that the miter limit in documents created by PDFBox will contain miter limit values that are positive, and the simple fix allows Adobe software to display the document. OpenPDFtoHTML, another OSS project, has also encountered the same problem and takes similar action.6 + + +4.1.3. Improve to match standard + + +The PDFBox implementation is also revised to meet the requirements of the PDF standards and normative references, independently of the need to match the performance of Adobe products. + + +The use of multi-byte representations of characters in Unicode character encodings such as UTF-16 require some careful processing by PDF parsers because some single byte values can be mis-interpreted. The single byte value 0x20 represents the space character in fonts encoded in one byte. In multi-byte character encodings the byte 0x20 may be part of a character and so should not be treated as a single byte. Two kinds of operator can be used in PDF documents to position text, one of which should be used with multi-byte font encodings so that single byte values that form part of multi-byte characters are not mis-interpreted. A patch is contributed in PDFBOX-3992 so that PDFBox fully supports the operator used to justify multi-byte encoded text to comply with the ISO 32000-1:2008 standard. + + +The PDF/A group of standards define an archive format for PDF. The demands of the standards are high, and compliance requires a great deal of attention to detail during document preparation. In general, the PDF/A standards constrain the types of content that can be present in compliant files, and sometimes make very precise demands on the quality of embedded resources. The veraPDF Project develops a freely available validator for PDF/A files. PDFBox also implements ‘preflight’ functionality to validate documents against the requirements of PDF/A-1b (the ISO 19005-1:2005 standard) and there are examples where the implementation is revised to match the performance of the veraPDF validator when differences are found. For example, a bug in the preflight validator is found in PDFBOX-4276 and the functionality corrected so that the incorrect output is now detected as veraPDF would. In PDFBOX-3920 a user reports that font subsets created by PDFBox do not include all the data required by the PDF/A-2 standard (ISO 19005-2:2011). The PDFBox source code is modified so that the output meets the standard. + + +The number of revisions to the PDF specifications and standards mean that occasionally it is found that PDFBox does not implement a particular feature or capture all the data in a PDF document. A contributor reports a problem with PDFBox where a field is ignored during parsing that leads to content being rendered that is supposed to be hidden. The user provides a patch in PDFBOX-3914 + + + + +6 https://github.com/danfickle/openhtmltopdf/issues/135. +which forms the basis of an update to the source code so that the field is imported and the document rendered correctly. + + +4.1.4. Scope of implementation + + +The core developers also make decisions about the scope of the software implemented by the PDFBox project. The question of what functionality forms the scope of the PDFBox implementation arises in some bug reports and feature requests, and has multiple dimensions. + + +PDFBox is not intended to be a comprehensive solution for creating, processing or rendering PDF documents. The project charter, or mission statement says: + + +“The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.” (Apache PDFBox, 2019) + + +PDFBox relies on some external libraries to provide functionality, especially in the area of image processing. There is no need for the PDFBox project to reimplement the wheel, particularly in technically demanding domains. A further difficulty is that image processing provision within the core Java libraries is incomplete, and varies between Java versions. Some functionality, such as the JPEG 2000 codec, is no longer maintained and is difficult for OSS implementers to adopt because of the licence used and potential patent issues (discussed further in Section 4.2.6). Java provision for image processing is changing and, with Java v9, functionality is gradually being returned to the core libraries. However, the JPEG 2000 codec remains outside the main Java libraries. Further, PDFBox core developers often recommend the use of the Twelve Monkeys plugin9 for image processing, in particular because it processes CMYK images that PDFBox does not. + + +Some areas of work are outside the current scope of PDFBox, including the implementation of rendering for complex scripts. There is some provision, and some developers have contributed code for non-European languages where they have expertise (for example Users-5). In some cases the layout of the languages is sufficiently close to Latin scripts that there is no need for additional provision, if the fonts are correct as shown in PDFBOX-3293. However, for many languages including Arabic and those from the Indian subcontinent there is a need to implement code to position the glyphs using GSUB and GPOS tables. In PDFBOX-4189 a user provides a lot of the functionality to support GSUB tables for Bengali. The complexity of the task is clear from the discussions reviewing and accepting the source code. + + +Decisions are also made about the cause of observations and whether what is observed is the result of a problem with PDFBox. Where the issue lies with PDFBox, decisions are then made about resolving the problem. Sometimes the erroneous observation results from other software. A user reports a difference between the assessments by Adobe preflight and PDFBox concerning a document’s compliance with the PDF/A-1b standard in PDFBOX-4045. Adobe XI identifies inconsistencies in the glyph widths for one font in the document. After investigation the core developers determine that there is no error in PDFBox and that Adobe X agrees that the document is compliant. Given the inconsistent assessments made by Adobe X and XI, and that inspection of the font does not show the issue reported by Adobe XI, the PDFBox core developers conclude there is a problem with the implementation of preflight in the particular version of Adobe XI used. +---------------------------------------- +------------------------------- +Section 94: +9 https://github.com/haraldk/TwelveMonkeys. + + +4.2. Factors influencing decision-making + + +Common to the decision types observed is a set of considerations or factors that influence the outcome of the decision-making process (see Table 6). + + +| Factor | Description | +|-------------------------|-----------------------------------------------------------------------------| +| Workforce | The availability of contributors to do work. | +| Maintenance Risk | The maintenance burden for the project of a feature implementation. | +| Expertise | The collective expertise of the contributors to the project. | +| Sustainable Solution | The long-term viability of a technical solution. | +| Capability | The ability to make relevant and meaningful changes in a given context. | +| Intellectual Property Rights | Matters pertaining to copyright, patents and licensing. | +| Java Interoperability | The consequences for interoperability of revisions to Java. | + + +4.2.1. Workforce + + +Companies choose to use the PDFBox software and, where appropriate for their needs, contribute to its improvement through the work of their developers. As noted, the core developers of PDFBox are few in number and are, as they emphasise, not paid for their work on PDFBox: + + +“The project is a volunteer effort and were always looking for interested people to help us improve PDFBox. There are a multitude of ways that you can help us depending on your skills.” (Apache PDFBox, 2019) + + +With limited time available to them (Targett, 2019), the PDFBox core developers concentrate their efforts (Khudairi, 2019) in areas of the software where work is a priority, unless other developers in the community are able to contribute. + + +The example given previously of work on a solution for a document merging problem (PDFBOX-3875)10 that halts may be explained by the limited workforce being focused on other, more achievable tasks, as illustrated by a core developers’ comment on another task: + + +“I had hoped to implement that but given current commitments I have it is unlikely that I’m able to do it in the short term (I’m trying to concentrate on resolving AcroForms related stuff in my spare time for the moment[1]).” (PDFBOX-3550) + + +Another example of the influence of the available workforce on decision making can be found in PDFBOX-3875 where a developer working for a company wants a problem resolved. The problem is challenging and will take time to understand and resolve. The developer reporting the problem is given three choices: to adopt and use another OSS application, and, implicitly, to buy a licence for Adobe professional, or to contribute the fix themselves either directly or by commissioning other developers to do the work. + + +4.2.2. Maintenance risk + + +The notion of a maintenance risk can be related to the factors of expertise and workforce. Core developers will sometimes express or imply a concern that they are unwilling to accept a solution. For example, PDFBOX-3962 where a user proposes a solution that repairs the unicode mappings in one PDF document so that it can be + + +10 Issue tracker tickets referenced in Section 4.2 are given in Table 7. +rendered. The core developers identify that the solution resolves a special case, and that further work would be required to develop a viable solution for the Java 9 libraries. Another concern articulated in some requests for support for complex scripts is that the core developers do not have the skills to maintain the functionality. A lengthy discussion of the issue can be found in PDFBOX-3550 where the core developers identify some central challenges to creating a solution. The main concern in both cases is that by providing additional functionality that cannot be maintained or is a challenge to maintain, either in terms of the effort required or the necessary expertise, there is a risk to the utility of the software, and, perhaps, the viability of the project. + + +4.2.3. Expertise + + +The implementation of PDF software requires expertise in a wide range of areas in addition to PDF itself. Limitations to the available expertise help determine what work can be done by contributors. One implication, already noted, is the reluctance to maintain source code in areas where there is no or limited expertise amongst the core developers. Another is that some areas of functionality cannot be developed. For example, a user asks about compressing CMYK JPEG images in PDFBOX-3844. The core developer responds by saying: + + +“There is no JPEG compression from CMYK BufferedImage objects out of the box, i.e. Java ImageIO doesn’t support it, and we don’t have the skills, so that I’ll have to close as “won’t fix” this time.” (PDFBOX-3844) + + +The alternative suggested in PDFBOX-3844 is to investigate the Twelve Monkeys project that builds on the Java ImageIO functionality. + + +There is also a great deal of expertise within the PDFBox community which can enable the implementation of solutions. In PDFBOX-4095 one contributor provides a proposed solution to a challenging problem. After some work evaluating the proposed change, which isn’t going well, another contributor suggests a simple revision that resolves the problems. Similarly a complex image rendering problem is solved with the help of advice from a contributor in PDFBOX-4267, and another contributor implements code to process YCbCr CMYK JPEG images in PDFBOX-4024. + + +Expertise alone, however, is not sufficient to provide a solution to a problem in all cases. The discussion in PDFBOX-4189 shows there is considerable expertise within the user community and the core developers about fonts and how to render complex scripts. Key factors that have prevented the work being done previously have been not only a shortage of available workforce, but also a lack of expertise in the target language that would provide sufficient understanding to distinguish between good and bad solutions: + + +“Many complex scripts (such as Arabic) require shaping engines which require deep knowledge of the languages in order to follow the rules in the OpenType tables.” (PDFBOX-3550) + + +4.2.4. Sustainable solution + + +There are often implementation choices to be made when resolving a problem. The better long-term solution is more viable than the short-term fix, or workaround. In PDFBOX-3300 concerns are reported about the way that a font subset has been created prior to embedding it in a document. A specific solution is proposed that provides a way of resolving the problem. Another developer identifies that the optimal solution is to resolve some problems in the CMap\textsuperscript{11} parser. It is a more sustainable solution than a patch to provide a specific workaround. In this case the developers are able to create a generic solution that better addresses the font standards, and thereby the PDF standards, and provides a longer-lived solution. + + +4.2.5. Capability + + +A key factor in decisions concerns whether the project is able to correct the problem that is causing the observed behaviour. The examples given in Section 4.1.2 where the PDFBox implementation was degraded from meeting the standard to match the behaviour of Adobe’s software illustrate one aspect of capability as a factor. In those cases the ‘incorrect’ implementation could not be revised, and only a revision to PDFBox could ensure documents created would be rendered as expected by Adobe’s implementations. In other cases bugs are found in external libraries or infrastructure that have an impact on PDFBox. Often a workaround will be found, or an alternative library recommended. For example, PDFBOX-3641 describes a situation in which PDFBox uses a core Java library in a way that triggers a bug in the Java implementation. The code in PDFBox is revised to prevent the bug being triggered. The Java bug is also reported\textsuperscript{12}. + + +4.2.6. Intellectual property rights + + +PDF documents can include technologies and artifacts where use is constrained by copyright, patents or licences. In addition, PDFBox is implemented in Java which during its lifetime has moved from closed source, to largely open source, to some variants (e.g. OpenJDK and derivatives like Amazon Corretto) that are entirely open source. An implementation of the JPEG 2000 codec was included in extensions to the Java libraries. During Sun Microsystems’ process to make Java open source the codec along with other image codecs was released as a separate library known as ImageIO. The licence used for the implementation of the JPEG 2000 codec is not an Open Software Initiative (OSI) approved open source licence and some consider the licence used is incompatible with OSS licences such as the GPL v3 and the Apache Licence v2.0.\textsuperscript{13} In addition there are concerns amongst OSS developers about the potential of patent claims related to JPEG 2000, though the concerns are diminishing with the passage of time. Most of the image codecs in the ImageIO library have been reincorporated into the Java libraries in OpenJDK since v9, but the JPEG 2000 codec has not. Consequently, JPEG 2000 support in PDFBox, where it is required by users, relies on the jai-imageio\textsuperscript{14} implementation of the codec. + + +\textsuperscript{11} A CMap is a table in a font file that maps character encodings to the glyphs that represent them. + + +\textsuperscript{12} https://bugs.openjdk.java.net/browse/JDK-8175984 + + +\textsuperscript{13} For example the opinion expressed at: https://github.com/jai-imageio/jai-imageio-jpeg2000. + + +\textsuperscript{14} https://github.com/jai-imageio/jai-imageio-jpeg2000. +which is no longer maintained. A user reports using the OpenJPEG implementation of JPEG 2000 in PDFBOX-4320. However, OpenJPEG is implemented in C and can only be used as native code and which may not be suitable for some deployment contexts. The development of a replacement OSS JPEG 2000 codec is inhibited by the resources, including expertise and finance, required to implement a large and complex standard.\footnote{JPEG 2000 is defined in ISO/IEC 15444 which consists of 14 parts (see Lundell et al., 2018).} + + +The ISO 19005-1:2005 standard (ISO, 2005, p. 11) for archival PDF documents mandates the embedding of fonts, including the standard 14 fonts,\footnote{PDF specifications require 14 fonts to be present on systems that render documents, e.g. ISO 32000-1:2008 (ISO, 2008, p. 256).} or substitute fonts, in files so that the document contains all the resources required to render it. The requirement is stated as: “Only fonts that are legally embeddable in a file for unlimited, universal rendering shall be used.” (ISO, 2005, p. 10). The requirement can be problematic because many fonts have licences that do not permit redistribution. The matter is discussed in PDFBOX-3618. The legality of the embedded fonts is the responsibility of the document creator. Both the PDF/A-1 and PDF/A-2 standards include a note that clarifies the need for the legal use of any font to be clearly and verifiably stated: + + +“This part of ISO 19005 precludes the embedding of font programs whose legality depends upon special agreement with the copyright holder. Such an allowance places unacceptable burdens on an archive to verify the existence, validity and longevity of such claims.” (ISO, 2005, p. 11; ISO, 2011, p. 15). + + +4.2.7. Java interoperability + + +In addition there are a set of problems concerning interoperability with Java that are an influence on the solutions implemented in PDFBox. Some are related to the PDF standards where Java is used to provide support such as image processing required by the standards. An example is found in PDFBOX-3549 where Java versions have differing capability to process ICC colour spaces, and in some versions there are bugs that affect the handling of ICC colour spaces. During the period of PDFBox activity investigated three new major versions of Java were released, and many revisions made to each version. There is also some evidence in the mailing lists and Jira tickets that some users are still using Java 5, which was already obsolete at the start of the period investigated. + + +4.3. Summary + + +Through analysis of two years of activity in the PDFBox project related to implementation of the PDF specifications and standards, we have identified four decision types related to development of the project software and seven factors that influence those decisions. The four decision types are related to adapting the software to emulate the behaviour of Adobe’s PDF software, implementing the PDF standards, and the scope of the PDFBox implementation. The seven factors act in combination to facilitate and constrain development activity, especially the interplay between expertise and workforce. + + + + +Analysis + + + + +Much of the work of PDFBox contributors consists of trying to match the implementation of Adobe PDF reader software. The reasons for matching Adobe implementations are mostly clear, yet trying to emulate Adobe’s software is clearly challenging, and solutions, including validators, that might reduce the extent of the challenges, and the risks, are themselves challenging to create. + + +5.1. The challenges of developing PDF parsers + + +The PDF specifications and standards specify that PDF software may try to reconstruct files where the cross reference table is incorrect or has been omitted. In practice the Principle of Robustness is applied in Adobe’s PDF software so that PDF files that are not well-formed can often be rendered. The developers of other PDF applications are obliged to follow Adobe’s lead. If the developers of non-Adobe PDF software did not implement parsers that behaved similarly to Adobe’s then their products would quickly become irrelevant because PDF users often believe that because documents can be read and rendered by Adobe software that they must meet the standard (Amiouny, 2016; Lehtonen et al., 2018). The extent to which PDF applications and libraries are expected to tolerate errors in documents is documented by Adobe’s software, which creates a number of challenges for developers of PDF software. + + +Firstly, non-Adobe developers are left with the time-consuming puzzle of trying to match the Adobe implementations. Indeed, the puzzle includes an element of chance because differences in performance are discovered when a PDF document including a triggering problem is processed. Secondly, there are clearly security concerns in this approach. Parsing is arguably one of the more challenging software engineering tasks. In the case of PDF, the core specifications and standards are extensive and complex, and include a large number of normative references for component file and media types, all of which need to be parsed by either a PDF implementation or its dependencies. PDFBox has been the subject of Common Vulnerabilities and Exposures (CVE) notices related to parser implementation\footnote{For example CVE-2018-8036 and CVE-2018-117979.}, as have other PDF software implementations. The core developers are therefore making decisions about security as part of those around the viability of the software when trying to match the behaviour of Adobe’s software. + + +Some practitioners argue that a small revision made in the ISO 32000-2:2017 standard concerning the structure of the file that more precisely defines the relationship between the header and the end of file marker largely put an end to the need to apply the Principle of Robustness in PDF parsing (Amiouny, 2017). However, though the changes in the standard are important and may ease some of the burden on developers, we do not share the optimism because the changes only apply to structure of documents that are or claim to be PDF v2.0 compliant. Of course, there remain in circulation all the documents created during some 25 years of PDF usage, as well as those documents that will continue to be created which are compliant with earlier specifications and standards. Further, the Principle of Robustness is applied to tolerate non-conformance with normative standards of PDF, such as fonts and images, as well as minor PDF implementation errors. Given the history of malformed PDF files and the challenges of standards compliance, the fact that a document claims to be PDF v2.0 and complies with the structural requirements of ISO 32000-2:2017 does not guarantee that either the document or its components comply with the standard. Consequently, the need for tolerant parsing remains. + + +One improvement might be the creation of reference implementations and validation tools; practices that have been adopted in the development of open standards, for example in the IoT domain as noted in Section 2.3 (e.g. Watteyne et al., 2016). Validation tools for fonts could help ensure that font creators build font files that contain sufficient, accurate information for other software to use the font file, and that implementers of font parsers have a means by which to evaluate their software. Further, validation tools for PDF documents and a reference implementation for PDF would help the developers of PDF software create more interoperable ap- +lications, with less effort and, possibly, reduce the security risks arising from the need to parse malformed documents. However, in practice PDF validators are difficult and expensive to implement. The veraPDF (veraPDF, 2019) PDF/A validator, for example, was created during a European Union funded project, and the PDFTools validator is proprietary licenced software.(^\text{19}) The problem remains, also, that solutions such as validators are forward looking, and can not address the challenge of processing non-compliant PDF files created during the last 25 years that still need to be read. There is, though, a case for introducing validators and reference implementations to help ensure that PDF files created in the future pose fewer problems for software developers (Lundell and Gamalielsson, 2018). Furthermore, tools such as validators provide a reference point against which to try to improve the quality of existing documents, exemplified by the work of Lehtonen et al. (2018) with applications in PDF file preservation. + + +5.2. Practice vs standard + + +Other challenges for PDFBox contributors arise from the development of practice, particularly by Adobe, and where that moves away from the standards. PDFBOX-3913 records the discovery that Adobe’s PDF software and PDF.js exceed the ISO 32000-1:2008 standard by implementing UTF-16 encoding for destination URIs in links. The bug report dates from August 2017 and is contemporary with the publication of ISO 32000-2:2017, which specifies the use of UTF-8 encoding (ISO, 2017, p. 515). Given the use of UTF-16 encoded URIs, which have been part of HTML 5 since 2011,(^\text{20}) it is outwardly reasonable for Adobe and others to follow practice. However, it remains an open question why UTF-16 encoding for URIs was not part of the ISO 32000-2:2017 standard. + + +A further issue found in some PDFBox Jira issue tickets is a grey area between the standard and how a document is presented. The PDF specifications and standards apply to the quality of the document, and the manner in which some parts of the document are to be rendered (for example character spacing). However, the standard does not specify how software might render all of the document. The examples given above to illustrate degradation of compliance with the standard to match Adobe’s implementation are of particular interest. The ISO 32000-1:2008 and ISO 32000-2:2017 standards are clear on how the value of the border width should be represented in a compliant PDF document. As the PDFBox core developers identified, the representation of the values of border widths within the document does not comply with the PDF specifications and standards because valid non-integer values are not accepted by Adobe software. However, the presentation on screen by Adobe software of border widths defined in the document is an interpretation of the values in the document, and one that may not need to be followed slavishly. + + +5.3. Project sustainability + + +The PDFBox core developers generally act to improve the functionality of the project software. However, there are times when their actions appear to be constrained by the long-term interests of the project. Some decisions, for example around the support for complex scripts and graphics processing, have ready explanations in that the core developers do not always have the necessary skills, or time, to implement the required solutions. There are also some activities where there may not be a clear decision stated, but the core developers, and some other contributors, do not complete tasks because they have run out of time, or have other, higher priority, tasks to attend to. It may be inferred that the developers are acting in the long-term interests of the project to create software that works and can be maintained. The concern being that if the project contributors overreach their collective abilities and their capacity to develop and maintain good quality software, then there is a risk the project may cease to be viable. There are parallels to be drawn between the decision-making of the core developers where they reflect their capacity to make and maintain specific changes and the decisions made within a business to maintain itself as a going concern. Implicit is the idea that the PDFBox software remains marketable, i.e. that the software is sufficiently compliant with the PDF specifications and standards that it is useful to many users, and the project will therefore continue to attract users and contributors without the need to take risks by making unsustainable changes. + + +It should be recognised that this observed process of self-regulation is precisely that. There is no company or group of companies driving the development of PDFBox and making strategic decisions. There are no dedicated managers making strategic decisions. Instead, what appear to be sensible, level-headed strategic decisions that might be made by a business are being made in the small by a small collective of individuals and company developers collaborating on the development and maintenance of PDFBox. + + +5.4. Limitations + + +The case study reported in this article describes and analyses the activity of practitioners collaborating in an OSS community to develop software that can create and process PDF documents. We acknowledge the limitations to the transferability of our findings that arise from the nature of the study. However, we conjecture that the findings may be representative of the challenges faced and decision types made in other OSS projects and, perhaps businesses, implementing standards-based interoperable software, in particular where a dominant implementation contributes to the discourse on the meaning of interoperability. Further, the factors informing the decisions made relate to technical and resource concerns that appear to be relevant for other businesses and organisations. + + + + +Conclusions + + + + +The study reports findings from an investigation of the practical decisions concerning interoperability made during a two year period by contributors to a community open source software project (Apache PDFBox). The PDFBox project develops and maintains software that can be used to create and process documents that conform to multiple PDF specifications, some of which have been published as ISO standards. Four types of decision made by contributors to maintain the interoperability of the PDFBox software were identified through thematic analysis. Decisions on software interoperability concern compliance with the PDF specifications and ISO standards, and to match or mimic the behaviour of the de facto reference implementation, where that is unrelated to the standards or in conflict with them. In conjunction, contributors also make decisions about the scope of the PDFBox implementation. Contributors to the PDFBox project are able to deliver high quality software through a careful, and at times, conservative, decision-making process that allows an often agile response to the discovery of problems with the project’s software and to changes in the dominant proprietary implementation. At the same time, the decisions made are informed by factors including resource and technical considerations which contribute towards the longer term viability of the project and the software created. + + +(^{19}) PDFTools 3-Heights Validator: https://www.pdf-tools.com/pdf20/en/products/pdf-converter-validation/pdf-validator/. + + +(^{20}) https://www.w3.org/TR/2011/WD-html5-20110525/urls.html. +In summary, the study makes the following contributions to the existing body of knowledge in this area: + + + + +A rich and detailed account of types of decisions made within a community OSS project to maintain software interoperability; + + +An account of technical and non-technical factors that motivate and constrain software development activity in the project and support project sustainability. + + + + +This study provides a rich illustration and analysis of the challenges faced by contributors to a community OSS project to implement and maintain interoperable, standards-based software. The study has shown how the contributors to PDFBox are able to meet challenges arising from the demands of the technical specifications and standards, and the performance of a de facto reference implementation. The study also finds that through awareness of the resources available to the project, the project is able to maintain interoperable software of continuing technical relevance. A topic for future research is to understand the extent to which the challenges and the decision-types identified, and the factors influencing those decisions are representative of those faced by other organisations — businesses and OSS projects — developing standards-based implementations. + + +Declaration of competing interest + + +None. + + +Acknowledgements + + +This research has been financially supported by the Swedish Knowledge Foundation (KK-stiftelsen) and participating partner organisations in the LIM-IT project. The authors are grateful for the stimulating collaboration and support from colleagues and partner organisations. + + +References + + +Ahlgren, B., Hidell, M., Ngi, E.C.H., 2016. Internet of things for smart cities: interoperability and open data. IEEE Internet Comput. 20, 52–56. doi:10.1109/MIC.2016.124. + + +Allman, E., 2011. The robustness principle reconsidered. Commun. ACM 54, 40–45. doi:10.1145/1978542.1978557. + + +Amiouny, D., 2016. Buggy PDF Files, Should We Try to Fix Them?. Amyuni Technologies Inc. http://blog.amyuni.com/?p=1627. Accessed: 2019-05-15. + + +Amiouny, D., 2017. PDF 2.0 and the Future of PDF: Takeways from PDF Days Europe 2017. Amyuni Technologies Inc. http://blog.amyuni.com/?p=1702. Accessed: 2019-05-14. + + +Apache PDFBox, 2019. Apache PDFBox: a Java PDF Library. The Apache Software Foundation. https://pdfbox.apache.org/. Accessed: 2019-09-17. + + +Apache Tika, 2019. Apache Tika — a Content Analysis Toolkit. Apache Software Foundation. https://tika.apache.org/. Accessed: 2019-06-05. + + +ASF, 2019. The Apache Software Foundation. The Apache Software Foundation. http://www.apache.org/. Accessed: 2019-06-05. + + +ASF, 2019. Apache Software Foundation Public Mailing List Archives. Apache Software Foundation. http://mail-archives.apache.org/. Accessed: 2019-06-05. + + +Atlassian, 2019. Jira REST APIs. Atlassian. https://developer.atlassian.com/jira/devnet/jira-apis/jira-rest-apis. Accessed: 2019-04-15. + + +Bitergia, 2019. GrimoireLab. Bitergia. https://chaoss.github.io/grimoirelab/. Accessed: 2019-08-03. + + +Black Duck, 2019. Apache PDFBox. Black Duck Software Inc. https://www.openhub.net/p/pdfbox/. Accessed: 2019-03-08. + + +Bogk, A., Schöpl, M., 2014. The pitfalls of protocol design: attempting to write a formally verified PDF parser. In: 2014 IEEE Security and Privacy Workshops, pp. 198–203. doi:10.1109/SPW.2014.36. + + +Bouvier, D.J., 1995. Versions and standards of HTML. SIGAPP Appl. Comput. Rev. 3, 9–15. doi:10.1145/238228.238232. + + +Bradner, S., 1996. The internet standards process — revision 3. Internet Engineering Task Force. https://www.rfc-editor.org/rfc/rfc2206.html. Accessed: 2019-09-19. + + +Bradner, S., 1999. The internet engineering task force. In: DiBona, C., Ockman, S., Stone, M. (Eds.), OpenSources: Voices from the Open Source Revolution. O'Reilly & Associates, pp. 28–30. + + +Braun, V., Clarke, V., 2006. Using thematic analysis in psychology. Qual. Res. Psychol. 3, 77–101. doi:10.1177/1478088706063003. + + +Butler, S., Gamalielsson, J., Lundell, B., Brax, C., Sjöberg, J., Mattsson, A., Gustavsson, T., Feist, J., Lönnroth, E., 2019. On company contributions to community OSS projects. IEEE Trans. Softw. Eng. (early access) doi:10.1109/TSE.2019.2910305, 1–11. + + +CEF Digital, 2019. Start Using Digital Signature Services (DSS). CEF Digital. https://ec.europa.eu/cefdigital/wiki/pages/viewpage.action?pageId=77177034. Accessed: 2019-04-29. + + +Davies, E.B., Hoffmann, J., 2004. IETF Problem Resolution Process. Internet Engineering Task Force. https://www.rfc-editor.org/rfc/rfc3844.html. Accessed: 2019-09-19. + + +De Coninck, Q., Michel, F., Piraux, M., Rochet, F., Given-Wilson, T., Legay, A., Perret, P., Ronaventure, O., 2019. Pluginizing QUIC. In: Proceedings of the ACM Special Interest Group on Data Communication. ACM, New York, NY, USA, pp. 59–74. doi:10.1145/3341302.3342078. + + +Eclipse Foundation, 2019. Californium (Cf) CoAP framework. Eclipse Foundation. https://www.eclipse.org/cf/. Accessed: 2019-10-03. + + +Eclipse Foundation, 2019. Eclipse Leshan. The Eclipse Foundation. https://www.eclipse.org/leshan/. Accessed: 2019-10-03. + + +Eclipse Foundation, 2019. Eclipse Wakaama. The Eclipse Foundation. https://www.eclipse.org/wakaama/. Accessed: 2019-10-03. + + +Eclipse IoT Working Group, 2019. Open Source for IoT. Eclipse IoT Working Group. https://www.eclipse.org/iot/. Accessed: 2018-08-29. + + +Egidi, T.M., 2007. Standard-compliant, but incompatible?. Comput. Standards Interfaces 29, 605–613. doi:10.1016/j.csi.2007.04.020. + + +Endignonx, G., Levillain, O., Migeon, J.Y., 2016. Caradoc: a pragmatic approach to PDF parsing and validation. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 125–139. doi:10.1109/SPW.2016.39. + + +Fitzgerald, B., 2006. The transformation of open source software. Manage. Inf. Syst. Q. 30, 587–598. + + +Gamalielsson, J., Lundell, B., 2013. Experiences from implementing PDF in open source: Challenges and opportunities for standardisation processes. In: Proceedings of the 8th International Conference on Standardization and Innovation in Information Technology (SITI) 2013. pp. 1–11. doi:10.1109/SITI.2013.6774572. + + +Gerring, J., 2017. Case Study Research: Principles and Practices, second ed. Cambridge University Press, Cambridge, UK. + + +ICJ, 2019. The Panama Papers: Exposing the Rogue Offshore Finance Industry. https://www.icj.org/investigations/panama-papers/, Accessed: 2019-05-29. + + +IETF, 2019. Internet Engineering Task Force. Internet Engineering Task Force. https://www.ietf.org/. Accessed: 2019-09-27. + + +IETF, 2019. QUIC (quic) — about. Internet Engineering Task Force. https://datatracker. ietf.org/wg/quic/about/. Accessed: 2019-09-24. + + +IETF, 2019. QUIC (quic) — documents. Internet Engineering Task Force. https://datatracker. ietf.org/wg/quic/documents/. Accessed: 2019-09-24. + + +ISO, 2005. Document management — Electronic Document File Format for Long-Term Preservation — Part 1: Use of PDF 1.4 (PDF/A-1) (ISO 19005-1:2005), first ed. International Organization for Standardisation, Geneva, Switzerland. + + +ISO, 2008. Document Management — Portable Document Format — Part 1: PDF 1.7 (ISO 32000-1:2008), first ed. International Organization for Standardisation, Geneva, Switzerland. + + +ISO, 2012. Document Management — Electronic Document File Format for Long-Term Preservation — Part 2: Use of ISO 32000-1 (PDF/A-2) (ISO 19005-2:2011), first ed. International Organization for Standardisation, Geneva, Switzerland. + + +ISO, 2013. Digital Compression and Coding of Continuous-Tone Still Images: JPEG File Interchange Format (JFIF) (ISO/IEC 10918-5:2013), first ed. International Organization for Standardisation, Geneva, Switzerland. + + +ISO, 2017. Document Management — Portable document format — Part 2: PDF 2.0 (ISO 32000-2:2017), first ed. International Organization for Standardisation, Geneva, Switzerland. + + +JEGF, 2019. Overview of JPEG XT. International Standards Organisation. https://jegf.org/jegxt/. Accessed: 2019-04-01. + + +Kelly, M., Nelson, M.L., Wégie, M.C., 2014. The archival acid test: Evaluating archive performance on advanced HTML and JavaScript. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 25–28. doi:10.1109/ICDL.2014.6970146. + + +Khudairi, S., 2017. The Apache Software Foundation Recognizes Apache Innovations to the Pulitzer Prize-winning Panama Papers investigation. Apache Software Foundation, https://blogs.apache.org/foundation/entry/the-apache-software-foundation-recognizes. Accessed: 2019-02-14. + + +Khudairi, S., 2019. Apache in 2018 — by the Digits. Apache Software Foundation. https://blogs.apache.org/foundation/entry/apache-in-2018-by-the-. Accessed: 2019-01-02. + + +Ko, J., Eriksson, J., Tsiftes, N., Dawson-Haggerty, S., Vasseur, J., Durvy, M., Terzis, A., Dunkels, A., Culler, D., 2011. Industry: Beyond Interoperability: Pushing the Performance of Sensor Network IP Stacks. In: Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems. ACM, New York, NY, USA, pp. 1–11. doi:10.1145/2070942.2070944. + + +Lehmkuhler, A., 2010. Apache PDFBox — Working with PDFs for Dummies. The Apache Software Foundation. https://people.apache.org/lehman/apachecon/ ApacheConPDFBox.pdf. Accessed: 2019-06-04. + + +Lehtonen, J., Helin, H., Kylander, J., Koivunen, K., 2018. PDF mayhem: is broken really broken? In: Proceedings of the 15th International Conference on Digital Preservation (IPRES 2018) doi:10.17615/IRRES-1228649. + + +Lindlar, M., Tunnat, Y., Wilson, C., 2017. A test-set for well-formedness validation in JHOVE — the good, the bad and the ugly. In: Proceedings of the 15th International Conference on Digital Preservation (IPRES 2017) doi:10.5281/zenodo.1228649. +Lundell, B., 2011. e-Governance in public sector ICT procurement: what is shaping practice in Sweden? Eur. J. ePractice 12, 66–78. https://joinup.ec.europa.eu/sites/default/files/document/2014-06/ePractice220Journal%20Vol.%205%202012-March_April%202011.pdf. + + +Lundell, B., Gamalielsson, J., 2017. On the potential for improved standardisation through use of open source work practices in different standardisation organisations: how can open source projects contribute to development of IT-standards? In: Jakobs, K. (Ed.) Digitalisation: Challenge and Opportunity for Standardisation. Proceedings of the 22nd EURAS Annual Standardisation Conference, EURAS Contributions to Standardisation Research, Vol. 12. Verlag Mainz, Aachen, pp. 137–155. + + +Lundell, B., Gamalielsson, J., 2018. Sustainable digitalisation through different dimensions of openness: How can lock-in, interoperability, and long-term maintenance of IT systems be addressed? In: Proceedings of OpenSym ’18. ACM, New York, NY, USA doi: 10.1145/3233391.3235527. + + +Lundell, B., Gamalielsson, J., Katz, A., 2018. On challenges for implementing ISO standards in software: Can both open and closed standards be implemented in open source software? In: Jakobs, K. (Ed.) Corporate and Global Standardization Initiatives in Contemporary Society. IGI Global, Hershey, PA, USA, pp. 219–251. doi: 10.4018/978-1-5225-5320-5. + + +Lundell, B., Gamalielsson, J., Tengblad, S., Yousefi, B.H., Fischer, T., Johansson, G., Rödung, B., Mattsson, A., Oppmark, J., Gustavsson, T., Feist, J., Landemo, S., Lönnroth, E., 2017. Addressing lock-in, interoperability, and long-term maintenance challenges through open source: How can companies strategically use open source? In: Open Source Systems: Towards Robust Practices – Proceedings of the 13th IFIP WG 2.13 International Conference on Open Source Systems, OSS 2017. Springer, pp. 80–88. doi: 10.1007/978-3-319-57735-7_9. + + +Mladenov, V., Mainka, C., Meyer zu Selhausen, K., Grothe, M., Schwenk, J., 2018a. 1 Trillion dollar refund — how to spoof PDF signatures. https://www.pdf-insecurity.org/download/paper.pdf. Accessed: 2019-05-09. + + +Mladenov, V., Mainka, C., Meyer zu Selhausen, K., Grothe, M., Schwenk, J., 2018b. How to break PDF signatures. https://pdf-insecurity.org/. Accessed: 2019-05-14. + + +Nikolich, P., I. C. L., Korhonen, J., Marks, R., Tye, B., Li, G., Ni, J., Zhang, S., 2017. Standards for 5G and beyond: their use cases and applications. https://futurenetworks.ieee.org/tech-focus/june-2017/standards-for-5g-and-beyond. Accessed: 2019-10-03. + + +OMA, 2019. OMA SpecWorks. Open Mobile Alliance. https://www.omaspecworks.org/. Accessed: 2019-10-03. + + +Patton, M.Q., 2015. Qualitative Research and Evaluation Methods, fourth ed. Sage Publications Inc, Thousand Oaks, California, USA. + + +Phillips, B., 1998. Designers: the browser war casualties. Computer 31, 14–16. doi: 10.1109/2.722269. + + +Phipps, S., 2019. Open Source and FRAND: Why Legal Issues are the Wrong Lens. Open Forum Academy. http://www.openforumeuropa.org/wp-content/uploads/2019/03/OFA_- +Opinion_Paper +- +Simon_Phipps +-_OSS_and_FRAND.pdf. Accessed: 2019-10-03. + + +Piraux, M., De Coninck, Q., Bonaventure, O., 2018. Observing the evolution of QUIC implementations. In: Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC. ACM, New York, NY, USA, pp. 8–14. doi: 10.1145/3248850.3248487. + + +Postel, J., 1981. RFC 793, Transmission Control Protocol. Internet Engineering Task Force. https://tools.ietf.org/html/rfc793. Accessed: 2019-04-15. + + +Richter, T., Clark, R., 2018. Why JPEG is not JPEG — testing a 25 years old standard. In: 2018 Picture Coding Symposium (PCS), pp. 1–5. doi: 10.1109/PCS.2018.8456260. + + +Riehle, D., 2011. Controlling and steering open source projects. IEEE Comput. 44, 93–96. doi: 10.1109/MC.2011.206. + + +Rossi, B., Russo, B., Succi, G., 2008. Analysis about the diffusion of data standards inside European public organizations. In: 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications, pp. 1–6. doi: 10.1109/ICTTA.2008.4529953. + + +Shelby, Z., Hartke, K., Bormann, C., 2014. The Constrained Application Protocol (CoAP). Internet Engineering Task Force. https://www.rfc-editor.org/rfc/rfc7252. html. Accessed: 2019-10-03. + + +Targett, E., 2019. Meet the Apache Software Foundations Top 5 code Committers. Computer Business Review. https://www.chrononline.com/feature/apache-top-5. Accessed: 2019-10-04. + + +The Document Foundation, 2019. LibreOffice. The Document Foundation. https://www.libreoffice.org/. Accessed: 2019-09-26. + + +Treese, W., 1999. Putting it together: Engineering the Net: The IETF. netWorker 3, 13–19. doi: 10.294562.294634. + + +veraPDF, 2019. Industry supported PDF/A validation. veraPDF Consortium. http://verapdf.org/. Accessed: 2019-06-03. + + +W3C, 2019. The history of the web. World Wide Web Consortium. https://www.w3.org/History/the-history-of-the-web. Accessed: 2019-09-18. + + +W3C, 2019. World wide web consortium (W3C). World Wide Web Consortium. https://www.w3.org/. Accessed: 2019-09-18. + + +Walsham, G., 2006. Doing interpretive research. Eur. J. Inf. Syst. 15, 320–330. doi: 10.1057/palgrave.esi.3000585. + + +WaSP, 2019. History of the Web Standards Project. The Web Standards Project. https://www.webstandards.org/about/history/. Accessed: 2019-09-27. + + +Watteyne, T., Handziski, V., Vilajosana, X., Duquennoy, S., Hahn, O., Baccelli, E., Wolisz, A., 2016. Industrial wireless IP-based cyber-physical systems. Proc. IEEE 104, 1025–1038. doi: 10.1109/JPROC.2015.2509186. + + +Wilson, C., McGuinness, R., Jung, J., 2017. veraPDF: Building an open source, industry supported PDF/A validator for cultural heritage institutions. Digital Lib. Perspect. 33, 156–165. + + +Wilson, J., 1998. The IETF: Laying the Net’s asphalt. Computer 31, 116–117. doi: 10.1109/2.707624. + + +Wright, S.A., Druta, D., 2014. Open source and standards: the role of open source in the dialogue between research and standardization. In: 2014 IEEE Globecom Workshops (GC Wkshps), pp. 650–655. doi: 10.1109/GLOCOMW.2014.7063506. + + +Simon Butler received a Ph.D. from The Open University in 2016. He is a researcher in the Software Systems Research Group at the University of Skövde in Sweden. His research interests include software engineering, open source software, program comprehension, software development tools and practices, and software maintenance. + + +Jonas Gamalielsson received a Ph.D. from Heriot Watt University in 2009. He is a senior lecturer at the University of Skövde and is a member of the Software Systems Research Group. He has conducted research related to free and open source software in a number of projects, and his research is reported in publications in a variety of international journals and conferences. + + +Professor Björn Lundell received a Ph.D. from the University of Exeter in 2001, and leads the Software Systems Research Group at the University of Skövde. Professor Lundell’s research contributes to theory and practice in the software systems domain, in the area of open source and open standards related to the development, use, and procurement of software systems. His research addresses socio-technical challenges concerning software systems, and focuses on lock-in, interoperability, and longevity of systems. Professor Lundell is active in international and national research projects, and has contributed to guidelines and policies at national and EU levels. + + +Christoffer Brax received the M.Sc. degree from the University of Skövde in 2000, and a Ph.D. from Örebro University in 2011. He is a consultant with Combitech AB working in systems engineering, requirements management, systems design and architecture, and IT security. Christoffer has 18 years experience as a systems engineer. + + +Anders Mattsson received the M.Sc. degree from Chalmers University of Technology, Sweden, in 1989 and a Ph.D. in software engineering from the University of Limerick, Ireland in 2012. He has almost 30 years experience in software engineering and is currently R&D manager for Information Products and owner of the software development process at Husquarna AB. Anders is particularly interested in strengthening software engineering practices in organizations. Special interests include software architecture and model-driven development in the context of embedded real-time systems. + + +Tomas Gustavsson received the M.Sc. degree in Electrical and Computer Engineering from KTH Royal Institute of Technology in Stockholm in 1994. He is co-founder and current CTO of PrimeKey Solutions AB. Tomas has been researching and implementing public key infrastructure (PKI) systems for more than 24 years, and is founder and developer of the open source enterprise PKI project EBJCA, contributor to numerous open source projects, and a member of the board of Open Source Sweden. His goal is to enhance Internet and corporate security by introducing cost effective, efficient PKI. + + +Jonas Feist received the M.Sc. degree in Computer Science from the Institute of Technology at Linköping University in 1988. He is senior executive and co-founder of RedBridge AB, a computer consultancy business in Stockholm. + + +Erik Lönroth holds an M.Sc. in Computer Science and is the Technical Responsible for the high performance computing area at Scania IT AB. He has been leading the technical development of four generations of super computing initiatives at Scania and their supporting subsystems. Erik frequently lectures on development of super computer environments for industry, open source software governance and HPC related topics. +---------------------------------------- +------------------------------- +Section 95: +An empirical study on downstream workarounds for cross-project bugs + + +Hui Ding Wanwangying Ma Lin Chen Yuming Zhou Baowen Xu +State Key Laboratory for Novel Software Technology +Nanjing University, China +dinghui85@gmail.com, wwyma@smail.nju.edu.cn, {lchen, zhouyuming, bwxu}@nju.edu.cn + + +Abstract—GitHub has fostered complicated and enormous software ecosystems, in which projects depend on and co-evolve with each other. An error in an upstream project may affect its downstream projects through inter-dependencies, forming cross-project bugs. Though the upstream developers should fix the bugs on their side, proposing a workaround, i.e., a temporary solution in the downstream project is a common practice for the downstream developers. In this study, we empirically investigated the characteristics of downstream workarounds in the scientific Python ecosystem. Combining the statistical comparisons and manual inspection, we have the following three main findings. First, in general, the workarounds and the corresponding upstream fixes are significantly different in code size and code structure. Second, there are three kinds of cross-project bugs that the downstream developers usually work around. Last, four types of common patterns are identified from the investigated workarounds. The findings of this study lead to better understanding of cross-project bugs and the practices of developers in software ecosystems. + + +Keywords—GitHub ecosystems; cross-project bugs; workarounds; practices + + +I. INTRODUCTION + + +Benefiting from the social coding capabilities of GitHub, software development on GitHub has evolved beyond a single project into socio-technical ecosystems [1]. Projects rely on the infrastructure or functional components provided by other projects, forming complex inter-project dependencies. In this way, some bugs in the upstream projects may affect their downstream projects through the dependencies. This phenomenon was confirmed by Ma et al. [2]. In their study, they investigated cross-project correlated bugs, i.e., causally related bugs reported to different projects in scientific Python ecosystem on GitHub, focusing on how developers coordinate to triage and fix this kind of bugs. + + +In the context of cross-project bugs, it is no doubt that the upstream project where the bug roots should provide a radical cure. However, the affected downstream projects usually offer a workaround, i.e., a temporary solution locally to bypass the upstream error. Ma et al. posted a questionnaire in which they asked what the downstream developers usually did to deal with cross-project bugs. The result indicated that 89.3% of the respondents chose to propose a temporary workaround, which was proven to be the most common practice [2]. + + +Workarounds are important in two folded [2]. First, it can be used to avoid the long-lasting impact of an upstream bug. A workaround must be implemented if the upstream team is not willing or able to fix the bug quickly, and it allows the downstream project to temporarily suppress the upstream bug. Second, adding a workaround for an upstream bug enables the downstream project to support buggy upstream version without affecting the end users. As many users may still use an old version of the upstream project, the downstream developers cannot rely on a fix in the next upstream release. Therefore, the downstream developers have to work around bugs regardless of whether they have been already fixed upstream. + + +Despite the wide use and importance of the workarounds for cross-project bugs, little work has paid attention on this issue. Studying the workaround will help to understand not only the fixing process of cross-project bugs, but also the coordination between projects in a software ecosystem. Therefore, we conduct this study to investigate the characteristics of the downstream workarounds in the context of cross-project bugs. + + +We base our study on scientific Python ecosystem on GitHub. For a cross-project bug, we refer to the patch injected into the buggy upstream project as the upstream fix, while the temporary solution provided for the affected downstream project as the downstream workaround. We make an investigation of the workarounds from three aspects. First, we compare the code size and design of the workarounds with those of the corresponding upstream fixes. Second, we inspect whether the cross-project bugs that were worked around in downstream projects have something in common. Third, we investigate whether software practitioners developed the workarounds in some common ways. + + +The main contributions of this study is as follows. First, we extract 60 downstream workarounds in the scientific Python ecosystem. Second, we identify three kinds of cross-project bugs that the downstream developers usually work around. Third, we summarize four common workaround patterns. Last, we provide several design requirements for the workaround supporting tools. + + +The rest of the paper is organized as follows. Section II describes related work. Section III presents our research methodology, and Section IV shows our empirical results. We propose further discussions on our findings in Section V, and +II. RELATED WORK + + +A. Cross-project Bugs + + +As the development of software ecosystems, more and more cross-project bugs appear and attract the attention of an increasing number of researchers. + + +Some existing studies showed that cross-project bugs brought many troubles to ecosystem developers. Decan et al. [3] reported that the developers in R ecosystems felt it more and more of a pain if the upstream packages broke. Adams et al. [4] indicated that the core activity of integration for open source distributions was synchronizing the newer upstream version. To avoid the cross-project bugs, developers had to pay great attention on the synchronizing process. Bavota et al. [5] found that the upstream upgrade would have strong effects on downstream projects when there were general dependencies between them. Their study showed that a large amount of downstream code had to be modified when the upstream project changed if the downstream project depended on the upstream framework or general services. In that case, the upstream bugs would leave a wide impact on the downstream projects. + + +Some other researches focused on the coordination between developers in different projects during fixing cross-project bugs. Villarroel et al. [6] leveraged the reviews of App users to help developers realize the downstream demand. They classified and prioritized the downstream reviews, so that the upstream developers were able to catch the important bugs quickly. Ma et al. [2] studied how developers fixed cross-project correlated bugs in scientific Python ecosystem. Combining manual inspection and the results of an online survey, they revealed how developers, especially those on the downstream side tracked the root cause of cross-project bugs and dealt with them to eliminate their bad effects. Our study bases on and extends that work. We focus on a specific but common practice of the downstream developers when facing cross-project bugs, i.e., proposing a workaround. + + +B. Blocking Bugs + + +Another special type of bugs is blocking bugs which are to some extent similar to cross-projects bugs. Blocking bugs prevent other bugs (in the same or other projects) from being fixes. It often happens because of a dependency relationship among software components. Under the environment, the developers cannot fix their bugs because the modules that they are fixing depend on other modules that have unresolved bugs. Due to their severe impact, some researchers have turned their eyes to blocking bugs. + + +Garcial and Shihab [7] found that it took two to three times longer to fix blocking bugs than non-blocked bugs. They then employed decision tress to predict whether a bug is a blocking bug or not. They extracted 14 kinds of features to construct the predictor and evaluated which features were most influential to indicate the blocking bugs. + + +Later, Xia et al. [8] proposed a novel method named ELIBloker to identify blocking bugs with the class imbalance phenomenon taken into account. ELIBloker utilized more features and combined multiple classifiers to learn an appropriate imbalance decision boundary. ELIBloker outperformed the method in [7] by up to 14.7% F-measure. + + +Unlike blocking bugs which prevent the fixing of bugs in the dependent modules, cross-projects bugs occur in upstream projects but affect the normal operation of the downstream projects. For the affected downstream modules/projects, the developers attempt to take some action to be released from the blocking/cross-project bugs in other components. In this paper, we investigate the downstream practices when facing cross-project bugs. + + +C. Design of Bug Fixes + + +Fixing software bugs is an important activity during software maintenance. Developers devote substantial efforts to design the bug fixes, which reflect the developers’ expertise and experience. Various studies investigated the nature and design of bug fixes. Zhong and Su [9] extracted and analyzed more than 9000 real-world bug fixes from six Java projects. They obtained 15 findings which could gain insights on automatic program repair. Pan et al. [10] explored the underlying bug fix patterns and identified 27 bug fix patterns that were amenable to automatic detection. Park et al. [11] analyzed bugs which were fixed more than once to understand the characteristics of incomplete patches. They revealed that predicting supplementary patch was a difficult problem. Jiang et al. [12] conducted an study on the characteristics of Linux kernel patches that could explain patch acceptance and reviewing/integration time. Misirli et al. [13] proposed a measure to study the impact of fix-inducing changes. They found that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change were the best indicators of high-impact fix-inducing changes. Echeverria et al. [14] evaluated developers’ performance on fixing bugs and propagating the fixes to other products in industrial Software Product Line. + + +According to different characteristics of bug fixes, researches developed various automatic tools to support bug repair. Goues et al. [13,14] used genetic programming to repair bugs in C programs, and evaluated what fraction of bugs could be repaired automatically. They generated a large, indicative benchmark set for systematic evaluations. Mechtaev et al. [17] presented a semantics-based repair method applicable for large-scale real-world software. Gu et al. [18] considered bad fix problem and implemented a prototype that automatically detects bad fixes for Java programs. + + +When fixing bugs, developers may have different options to design the bug fix. Leszak et al. [19] pointed out that some defects were not fixed by correcting the real error-causing component, but rather by a workaround injected at another location. An online material gives a clear description about the workaround [20]: “A workaround is a far less elegant solution to the problem. Typically, a workaround is not viewed as something that is designed to be a panacea, or cure-all, but +rather as a crude solution to the immediate problem. As a temporary fix, a workaround will do very well until a suitable permanent fix can be implemented by project management personnel.” Murphy-Hill et al. [21] studied why a developer might choose a workaround instead of a fix at a real location. They summarized six factors: risk management, interface breakage, consistency, user behavior, cause understanding, and social factors. Some other studies also paid attention to the phenomenon of workarounds. Ko et al. [22] found that if a bug had a known workaround, developers often focused on more severe bugs. Berglund [23] indicated that bugs could be worked around and workarounds were relevant in early stages of the bug fixing process. + + +Different from most existing studies which investigated the design of fixes for within-project bugs, our study concentrates on the characteristics of downstream workarounds in the context of cross-project bugs. + + +III. RESEARCH METHODOLOGY + + +In this section, we first introduce how we collected data in the study. Then we present the research questions. Finally, we describe the research methods used to investigate the questions. + + +A. Data Source + + +The cross-project bugs under investigation were collected by Ma et al. [2]. The data are available online1. The dataset contains 271 pairs of cross-project bugs gathered from scientific Python ecosystem on GitHub. Every pair includes an upstream issue reported to the root-cause project and a downstream issue reported to the affected project. Specifically, these cross-project bugs involve 204 projects including seven core libraries in the ecosystem, that is, IPython2, NumPy3, SciPy4, Matplotlib5, Pandas6, Scikit-learn7, and Astropy8. + + +Since our study focuses on the workarounds, we are only interested in the cross-project bugs for which the downstream developers have provided a workaround. In order to extract the data we needed, we manually read all the bug reports on the downstream side of the 271 pairs of bugs. If the downstream developers were willing to propose a workaround, they were very likely to leave related information in the issue reports. For example, a developer of IPython suffering a bug of Setuptools commented, “I’ll open an Issue on setuptools to deal with this, and figure out what the best workaround in IPython should be.” (ipython/ipython#8804) Two of the authors of this paper carried on this task and found 60 pairs of cross-project bugs to further investigate in this study. + + +For the 60 pairs of bugs, we concentrated on their downstream workarounds and the corresponding upstream fixes. Usually, the upstream issue will link to the bug-fix commits if it has been repaired. Also, if the downstream issue was worked around, the commits including the workaround would be indicated. By manually inspecting the issue reports, the two authors linked every pair of closed cross-project bugs with the commits containing the fix/workaround. Note that nine cross-project bugs have not been fixed by the upstream projects. Therefore, in total, we collected 60 downstream workarounds and 51 upstream fixes. + + +B. Research Questions + + +The aim of this study is to investigate the characteristics of downstream workarounds in the context of cross-project bugs. In particular, we attempt to answer the following three research questions: + + +RQ1: Are there differences between downstream workarounds and the corresponding upstream fixes? + + +Compared with the upstream fix, the workaround is injected in a different project and serves a different purpose. Therefore, is the design of workaround different from that of the fix? We compared them in two aspects: the code size and code structure. + + +RQ2: Do the cross-project bugs that downstream developers work around have some common features? + + +As stated, not all of the cross-project bugs have workarounds. Then what features do these 60 bugs with workarounds have in common? In RQ2, we sought to find the answer. + + +RQ3: Do the workarounds have some common patterns? + + +In RQ3, we attempted to find whether downstream developers worked around the upstream bugs in some common ways. + + +C. Research Methods + + +1) Quantitative analysis methods + + +In RQ1, the Wilcoxon signed-rank test and the Cliff’s δ served to compare the code size between the upstream fixes and the downstream workarounds. + + +The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare whether two matched groups of data are identical [24]. The paired sample in our study are the sizes (concerning the number of modified files or the number of changed lines of code) in the downstream workarounds and upstream fixes. We set the null hypothesis $H_0$ and its alternative hypothesis $H_1$ as follows: + + +$H_0$: The number of modified files / the number of changed lines of code in the downstream workarounds is the same as that in the upstream fixes. + + +$H_1$: The number of modified files / the number of changed lines of code in the downstream workarounds is significantly different from that in the upstream fixes. + + +We assessed the test results at the significance level of 0.05. If the p-value obtained from the Wilcoxon signed-rank test was lower than 0.05, the sizes of workarounds and fixes were considered significantly different. Together with the + + + + +1 https://github.com/njuap/ICSE2017 +2 http://ipython.org, https://github.com/ipython/ipython +3 http://www.numpy.org, https://github.com/numpy/numpy +4 http://www.scipy.org/scipylib, https://github.com/scipy/scipy +5 http://matplotlib.org, https://github.com/matplotlib/matplotlib +6 http://pandas.pydata.org, https://github.com/pydata/pandas +7 http://scikit-learn.org, https://github.com/scikit-learn/scikit-learn +8 http://www.astropy.org, https://github.com/astropy/astropy +median values of the sizes, we were able to decide whether the size of workaround was smaller than the size of its corresponding fix. + + +Furthermore, we used the Cliff’s $\delta$ effect size to measure the magnitude of the difference between the sizes of workarounds and fixes. Cliff’s $\delta$ provides a simple way of quantifying the practical difference between two groups [25]. Of all kinds of effect sizes, Cliff’s $\delta$ is the most direct and simple variety of a non-parametric one [26]. By convention, the magnitude of the difference is considered either trivial ($|\delta| < 0.147$), small (0.147-0.33), moderate (0.33-0.474), or large (> 0.474) [27]. + + +2) Qualitative analysis + + +For RQ2, RQ3, and part of RQ1, we performed a qualitative analysis to investigate the questions. Two authors manually inspected the issue reports and the code of fixes/workarounds for the cross-project bugs. + + +The two authors first individually completed the task following the same procedure and criteria. They reviewed the issue reports and code carefully, then executed the existing test cases provided by the developers to keep track of traces and to observe the input/output. During this procedure, they wrote down some necessary information: the bug information (bug type, root cause, bug impact, and participants), the bug context (related methods, test cases, traces, and input/output), and the workaround and fix strategies. And they also wrote down their findings. + + +After individual investigation, they came together to discuss their findings and draw conclusions. + + +IV. RESEARCH RESULTS + + +A. RQ1: Differences Between Fixes and Workarounds + + +In order to compare the upstream fixes and the downstream workarounds, we first statistically compared their sizes in terms of the number of modified files and the number of modified lines of code. Then, we inspected the code structure of fixes and workarounds to see whether they were different. + + +Among the 60 pairs of cross-project bugs, nine of them have not been fixed in the upstream projects until now. Therefore, we could not compare their workarounds with upstream fixes. In RQ1, we only investigated the remaining 51 pairs of cross-project bugs. + + +1) Statistical comparison of the size + + +TABLE I. shows the minimum, the maximum, and the average values, as well as the 25th, 50th, and 75th percentiles of workaround/fix size. To facilitate a visual comparison, we also use boxplots to illustrate the size distributions (Fig. 1). It is clear that the number of modified files and the number of modified lines of code in workarounds are both smaller than those in fixes. + + + + +We also adopted the Wilcoxon signed-rank test and Cliff’s $\delta$ effect size to statistically compare the workarounds and fixes. The results are shown in TABLE II. The p-values less than 0.05 indicate that the number of modified files and the number of modified lines of code are significantly different between the workarounds and fixes. The values of Cliff’s $\delta$ mean that the difference in the number of changed files between them is small, but the difference in the number of modified lines of code is large. + + +| #Files | #SLOC | +|--------|-------| +| P-value | 0.019 0.014 | +| $|\delta|$ | 0.232 0.771 | + + +Combining the boxplots and the results of statistical tests, we conclude that the size of the workaround is significantly smaller than the size of the corresponding upstream fix. + + +2) Inspection of code + + +After statistically comparing the size of the downstream workarounds and the corresponding upstream fixes, we looked into their code to make a further investigation. + + +In general, for eight out of the 51 cross-projects bugs, the upstream fix and the corresponding downstream workaround were designed in the same manner. The developers from both sides had similar idea to modify their own projects when facing the bug. For example, using the Astropy normalizer led to a + + +| TABLE I. THE SIZES OF THE UPSTREAM FIXES AND DOWNSTREAM WORKAROUNDS | +|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| +| | Min. | Max. | Avg. | 25th | 50th | 75th | +| #Files | | | | | | | +| Fixes | 1 | 8 | 3 | 2 | 2 | 4 | +| Workarounds | 1 | 6 | 2 | 1 | 2 | 3 | +| #SLOC | | | | | | | +| Fixes | 1 | 829 | 93 | 19 | 36 | 105 | +| Workarounds | 1 | 662 | 61 | 10 | 26 | 45 | +TypeError in Sunpy when playing a mapcube peek animation (sunpy/sunpy#1532). It was caused by a bug in ImageNormalize class of Astropy which did not include a call to the inherited method autoscale_None() (astropy/astropy #4117). To address this problem, both Sunpy and Astropy used an explicit call to autoscale_None(). Fig. 2 shows the downstream workaround and upstream fix for this bug. Additionally, it is worth noting that the fix and the workaround were proposed by the same developer. Another example is shown in astropy/astropy#3052 which was caused by numpy/numpy #5251. The downstream workaround was just a copy of the upstream fix for the cross-project bug. + + +For the remaining 43 out of the 51 cross-project bugs, the downstream developers worked around them in a different way from what the upstream developers did to fix the bugs. This seems to accord with our intuition. Whether for within-project or cross-project bugs, a workaround is a short-term solution injected in a place other than the true root-cause location. For cross-project bugs, the workaround is placed in the downstream project where the upstream buggy method is called, while the ultimate fix is to repair the buggy method itself. Intuitively, the two kinds of modification are usually different, which is confirmed by our observations. + + +In Section IV.C, we will discuss the workaround patterns in detail. + + +B. RQ2: Common Bug Features + + +By manually inspecting the issues reports of the 60 cross-project bugs, we found that some bugs did have something in common. We totally identified three kinds of common features. Forty-nine investigated bugs could be classified into the remaining 11 bugs have distinct characteristics themselves and cannot be put into any category. + + +1) Emerging cases + + +A cross-project bug was reported when the downstream project encountered an emerging case that the upstream method did not cover. Thirty-nine of the 60 cross-project bugs could be classified into this kind. More specifically, we divided the 39 bugs into two subcategories. + + +First, the original upstream method could not process certain types or forms of data. For example, astropy/astropy#3052 reported that a method in NumPy did not use suitable format for Unicode data (numpy/numpy#5251). Astropy/astropy#4658 was caused by np.median from NumPy that could not handle the masked arrays (numpy/numpy#7330). Luca-dex/pyTSA#18 worked around an upstream bug that Pandas could not read csv files if the column separator was not comma (pandas-dev/pandas#2733). + + +Second, the upstream method might not consider the processing of edge cases. For example, the method utilities.autowrap.ufuncify in Sympy failed when the length of the symbol list was larger than 31 (sympy/sympy#9593). The failure resulted from an error in the method frompyfunc of NumPy, which did not check the number of arguments (numpy/numpy#5672). + + +2) Wrong outputs + + +Sometimes, the upstream methods might produce wrong results with specific inputs which could break their downstream projects. Six of the studied upstream bugs were caused by wrong outputs. + + +The wrong outputs are partly caused by the incorrect design of the functionality. Blaze/odo#331 was caused by the wrong output of datetime64 series in Pandas. The method should return NAT instead of NaN with an empty series (pandas-dev/pandas#11245). In NumPy, np.log1p(inf) returned NaN while it should return Inf (numpy/numpy#4225), which led to + + +@@ -203,7 +205,11 @@ def updatefig(i, im, annotate, ani_data, removes): + 203 205 + 204 206 im.set_array(ani_data[i].data) + 205 207 im.set_cmap(self.maps[i].plot_settings['cmap']) + 206 - im.set_norm(self.maps[i].plot_settings['norm']) + 208 + norm = deepcopy(self.maps[i].plot_settings['norm']) + 210 + # The following explicit call is for bugged versions of Astropy's ImageNormalize + 211 + norm.autoscale_None(ani_data[i].data) + 212 + im.set_norm(norm) + + +(a). The downstream workaround + + +@@ -67,5 +67,8 @@ def __call__(self, values, clip=None): + 67 68 values = np.array(values, copy=True, dtype=float) + 69 70 + # Set default values for vmin and vmax if not specified + 71 + self.autoscale_None(values) + 70 73 + # Normalize based on vmin and vmax + 71 74 np.subtract(values, self.vmin, out=values) + + +(b). The upstream fix + + +Fig. 2. The comparison of the code for the downstream workaround and the corresponding upstream fix. +an undesired result in Nengo (nengo/nengo#260). + + +Some other unexpected outputs of the upstream methods were introduced by the carelessly incompatible changes when the upstream developers fixed another bug or developed a new feature. For example, the method combine_first in new version of Pandas performed an unwanted conversion of dates to integers (pandas-dev/pandas#3593), which made some modules of Clair unusable (eike-welk/clair/#43). + + +3) Python 3 incompatibility + + +Some upstream methods could not perform correctly under Python 3 while they could work perfectly under Python 2. Then, when running downstream projects in Python 3, the original upstream method resulted in a bug. For example, method loadtxt in NumPy failed with complex data in Python 3 (numpy/numpy#5655), which affected its downstream project msmtools (markovmodel/msmtools#18). Totally, four of the 60 cross-project bugs are due to Python 3 incompatibility. + + +C. RQ3: Workaround Patterns + + +After investigating the characteristics of cross-project bugs with workarounds, we summarized the common patterns from the studied workarounds. Generally, we found four workaround patterns covering the workarounds for 37 cross-project bugs. + + +1) Pattern 1: Using a different method + + +When an upstream method that the downstream project used has a bug, it is a simple way to replace the buggy one with a similar method. + + +Example: The Obspy developer experienced segmentation faults on certain systems when constructing a NumPy array (obspy/obspy#536). After investigation, this bug was caused by an error in np.array (numpy/numpy#3175). The downstream developers worked around the cross-project bug by using np.frombuffer instead of np.array. Fig. 3 shows the downstream workaround. + + +Ten out of the 60 workarounds were designed to adopt another method that could provide the same functionality. However, most of the replacements were provided by the original upstream projects. As in the example above, np.frombuffer and np.array comes from the same project NumPy. This phenomenon implies two things. First, some libraries may tend to develop multiple methods with overlapping capabilities. Second, the downstream projects are not willing to change their dependencies. It is reasonable since adding a new dependency means that more effort should be laid on downstream project to understand the release cycle of the new upstream project and to coordinate with it. + + +The main challenge in proposing this kind of workaround lies in two aspects. The first is to find a replacement method that is preferably designed by identical upstream project or at least a stable project. Second, the parameters should be carefully modified to fit the new method since it may require a different kind of parameter compared with the buggy method. The challenge also indicates that an automatic tool to recommend similar APIs and adapt parameters will be useful for developers to work around a cross-project bug. + + +2) Pattern 2: Conditionally using the original method + + +As we have stated in IV.B, most of the cross-project bugs are caused by one or more uncovered cases of the upstream methods. Therefore, an intuitive way to work around the bug is to only use the method in the cases that will not result in a failure. + + +Example: Scipy/scipy#3596 recorded a bug that scipy.signal.fftconvolve did not work well in multithreaded environments. After digging into this issue, the developers found that scipy.signal.fftconvolve made use of numpy.fft.rfftn /irfftn for non-complex inputs and it was NumPy’s FFT routines that were actually not thread safe. Though later numpy/numpy#4655 fixed the bug in NumPy, the SciPy developers still thought that they should work around it in their side, because they support older NumPy version that did not have the fix. Fig. 4 shows the downstream workaround. For pre-1.9 NumPy, if there are non-complex inputs, SciPy only calls numpy.fft.rfftn /irfftn from one thread at a time to be thread safe. In other cases, they use their own FFT method instead. + + +However, though this workaround helped the users get out of trouble, it seemed a little complex. A developer proposed that the easiest workaround would be to convert the non-complex inputs to complex inputs (by adding 0j) so they were processed by SciPy’s FFT routine instead of the buggy NumPy’s RFFT method. This idea was disapproved by other developers. Because the NumPy’s RFFT method is significantly faster, it is better to use this method whenever possible. Just as another SciPy developer commented, “Whatever fix is done on the SciPy side, it would be nice if it didn’t prevent someone who had a new enough (fixed) NumPy from using the newer RFFT method multithreaded.” + + +@@ -109,5 +109,5 @@ def getSequenceNumber(self): + 109 109 def getMSRecord(self): + 110 110 # following from obspy.mseed.tests.test_libmseed + 111 111 msr = clibmseed.msr_init(C.POINTER(MSRecord)()) + 112 112 pyobj = np.array(self.msrecord) + 113 113 errcode = \ + + +Fig. 3. The downstream workaround injected in Obspy +Fifteen out of the 60 workarounds were designed to restrict the use of the buggy upstream method to its covered cases. There are two key points in proposing a workaround of this kind. First, the developers should determine under what conditions the original used upstream method would fail, i.e., the uncovered cases. Usually, developers could find the answer during the process of diagnosing the bug. After that, it is important to decide how to deal with the failed cases. During inspecting the 11 workarounds, we find that the developers either made used of another method or just raised an error or an exception (e.g., sympy/sympy#9593). + + +3) Pattern 3: Adapting the inputs to use original method + + +To avoid the failure caused by the uncovered cases, developers may also choose to convert their inputs into a processable form which can be correctly handled by the buggy upstream method. + + +Example: Pyhrf/pyhrf#146 reported test failure which seemed to come from scipy.misc.fromimage. When trying to open 1-bit images, the SciPy method would produce a segmentation fault. In order to avoid the failure, the Pyhrf developers decided to first convert the 1-bit image into an 8-bit image which could be dealt with by the SciPy method. Fig. 5 shows the downstream workaround. + + +Nine out of the 60 studied workarounds conform to this pattern. Though it seems to be a direct way to convert an uncovered case to a covered case in order to use the original upstream routine, this method is not always feasible. + + +4) Pattern 4: Converting the outputs of the original method + + +To work around the buggy upstream methods that produce wrong outputs with certain inputs, the downstream developers possibly choose to convert the wrong results to their desired ones. + + +Example: The method combine_first in Pandas falsely converted of dates to integers (pandas-dev/pandas#3593). To bypass the bug, its downstream project Clair explicitly called pd.to_datetime to convert the time-related data from integers to dates (eike-welk/clair/#43). Fig. 6 shows the downstream workaround. + + +Apart from this example, two other downstream projects worked around cross-project bugs in this way. +V. DISCUSSION + + +In this section, we discuss the findings about downstream workarounds. + + +A. Workaround Generation + + +Ma et al. proposed that the workaround was the most common practice that the downstream developers used to cope with cross-project bugs [2]. Workarounds play a significant role since they can bypass the bad impact of bugs while waiting for upstream fixes, as well as shield the end user from being affected even when they use a buggy upstream version [2]. Therefore, when suffering a cross-project bug, it will be of great use if the downstream developers could propose a workaround timely. + + +In Section IV, we summarized the 60 cross-project bugs with workaround into three main categories. The largest number of bugs were new cases that the upstream method could not process. To temporarily handle the problem, the downstream developers may adopt another method with similar functionality instead, limit the use of the buggy method within the cases that it can handle, or convert the emerging case to the form that the buggy method can deal with. When facing the cross-project bugs which produce wrong results with certain inputs, the downstream developers may continue use the original method, but then explicitly transform the outputs into the correct form. + + +Summarizing the bug types and common workaround patterns will be of help for developers to efficiently develop a suitable workaround. At the same time, it can also guide the design of (automatic) workaround generation tools. From the discussion in Section IV.C, the tool is supposed to do the following tasks. First, it can search for alternative methods which have the same functionality with the buggy method. Second, it can extract the conditions where the upstream methods do not correctly work. Third, it can adapt the input data to the suitable forms that the upstream methods are able to process. + + +In our opinions, a preferred workaround should follow three principles whether generated by hand or by tool. First, the workaround could suppress or bypass the upstream bug to make the downstream project run normally. Second, the workaround is supposed to make as few code changes as possible. Ma et al. indicated that the workarounds would be removed afterwards [2]. Therefore, the workaround is preferred to be designed in a way that does not affect other modules and make it easy to deprecate. Third, the workaround is supposed to use efficient methods in order not to reduce the performance of the project. + + +B. Workaround Recommendation + + +In a software ecosystem, some central projects are used by multiple other projects. For example, in scientific Python ecosystem, NumPy is the basic tool and nearly all the projects within this ecosystem depend on it. Therefore, an error in a popular project like NumPy may break more than one downstream projects. All of them may need to work around the cross-project bug while waiting for an upstream fix. Under this circumstance, a downstream project could benefit from another responsive sibling project which has proposed a workaround for the same bug. + + +Dask/dask#297 shows an example. The project Dask was affected by a NumPy bug (numpy/numpy#3484). Then a developer found that another project Scikit-learn was suffering the same bug. After digging into the code of Scikit-learn, he indicated that Dask could learn from Scikit-learn. He commented, “Possible solution would be to add a function for python 3 compatibility, as scikit-learn did: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/fixes.py#L8.” Then, Dask copied the solution of Scikit-learn to their own code as their workaround for the bug. + + +An existing workaround in a sibling project reduces the workload of the developers suffering the same bug. However, to find a suitable workaround from another project seems to be a non-trivial task. First, the developers should find out what other projects are also affected by the cross-project bug. Then, they should get to know how these affected projects deal with the bug. Last, they have to select an appropriate workaround from these projects and adapt it to their own project. Therefore, a workaround recommendation tool which automates the process could be useful. + + +This tool should be designed to have at least three functionalities. First, it can predict what other projects may be influenced by the same bug and learnt the workaround from. Second, it can check for the code changes to extract downstream workarounds. Last, it can compare the context of the affected modules in different projects to rank the workarounds. The developers are facing several technical challenges to develop such a tool, which deserves a further study. + + +C. Workaround Removing + + +As we have stated before, the downstream workaround is a temporary solution injected in the downstream projects to cope with a cross-project bug. Unlike the corresponding upstream fix which is an ultimate and permanent solution, the workaround may be modified or discarded later [2]. We indeed find some cases which shows that the developers intend to remove or change the workarounds in the future. +Materialsinnovation/pymks#132 reported that Pymks broke down due to a bug in Scikit-learn (scikit-learn/scikit-learn#3984). The downstream developer added key word argument size as a short term solution to the current dimension requirement for the buggy method from Scikit-learn. He then wrote in the commit, “Sklearn developers have already removed the dimension requirement on development version of the code. Once this version is released, this keyword argument should be removed.” In pandas-dev/pandas#9276, the Pandas developer proposed a workaround for a NumPy bug (numpy/numpy #5562) with a comment that they would reconsider that decision once the upstream project fixed the bug. Sympy/sympy#9593 included a workaround for another NumPy bug (numpy/numpy#5672). The developer left a comment in the code that “maxargs is set by numpy compile-time constant NPY_MAXARGS. If a future version of numpy modifies or removes this restriction, this variable should be changed or removed.” + + +From these example, we see that the downstream developers could not decide the exact time to modify or remove the workarounds, because the time depends on when the responsible upstream projects accomplish certain tasks (e.g., releasing a new version or modify specific variables). Consequently, the downstream developers need to track the progress of their concerning upstream projects, in order to maintain their workarounds accordingly. It absolutely adds the burden of the downstream maintainers, which is confirmed by the respondents of the survey posed by Ma et al [2]. + + +In order to reduce the maintenance burden of the downstream developers, an automatic workaround modification or removing tool is desirable. The tool is supposed to detect the occurrence of the upstream event which may influence the workaround and give a notification to the developers. Another key function of the tool is to (semi-)automatically remove the workarounds when the workarounds could be deprecated. + + +Additionally, the time to remove the workarounds is also worth studying. The workaround is a landmark case of the coordination between the upstream and downstream projects during the fixing process of cross-project bugs. To study the lifecycle of a workaround will help to understand how developers on both sides collaborate with each other to fix cross-project bugs and how developers from different projects cooperate within a software ecosystem. + + +VI. THREATS TO VALIDITY + + +In this section, we discuss the threats to validity of our study. + + +The first threat concerns the accuracy of the identification of workarounds and fixes. Kim et al. pointed out that it needed high quality bug-fix information to reduce superficial conclusions, but many bug-fixes were polluted [28]. In order to identify the workarounds and fixes, two authors individually reviewed the issue reports and manually related commits indicated in the reports. They then cross-checked each other’s results to maximize the accuracy of the data under investigation. + + +The second threat concerns the unknown effect of the deviation of the variables under statistical tests (the size of the workaround/fix) from the normal distribution. To mitigate these threats, our conclusions have been supported by proper statistical tests. We chose Wilcoxon signed-rank test and the Cliff’s $\delta$ effect size, because they are nonparametric tests which do not require any assumption on the underlying data distribution. + + +The third threat concerns the researchers’ preconceptions. The two authors that conducted the manual analysis followed the same procedure and criteria in collecting the studied dataset, identifying and comparing fixes and workarounds, as well as summarizing bug features and workaround patterns. However, it is in general difficult to completely eliminate the influence of researchers’ preconceptions. In order to minimize personal bias, they discuss the results, especially the unclear cases together. + + +The last threat concerns the generalization of our empirical results. We conducted our study on the scientific Python ecosystem. However, cross-project bugs and downstream workarounds do not only occur within the specific ecosystem. We cannot assume that our results generalize beyond the specific environment where they were conducted. Further validation on other ecosystems is desirable. + + +VII. CONCLUSION AND FUTURE WORK + + +In previous work, proposing a workaround is shown to be a common practice for downstream developers to bypass the impact of a cross-project bug. In this study, we studied the characteristics the downstream workarounds. First, we manually identified 60 cross-project bugs which have a workaround from 271 cross-project bugs in scientific Python ecosystem. Then, with these data, we empirically compared the workaround with its corresponding upstream fix, summarized the bug features and workaround patterns. The main findings of this study is as follows: + + + + +In general, the size of the workaround is significantly smaller than that of the corresponding fix. The fix and the workaround usually have different code structures. + + +The cross-project bugs which the downstream developers worked around are usually caused by an emerging case that the upstream method cannot process, or by a wrong output with certain inputs, or Python 3 incompatibility. + + +Four patterns of workarounds are identified: using another method with similar functionality, restricting the buggy method to the range it can process, converting the inputs to a processable form, and correcting the outputs after using the buggy method. + + + + +The findings in this study also indicate the needs and possibility of developing tools supporting workaround generation, recommendation, maintenance and removal. In future work, we will continue to develop these supporting tools, as well as investigate the lifecycle of workarounds in more kinds of software ecosystems. +ACKNOWLEDGMENT + + +This work is supported by the National Natural Science Foundation of China (61472175, 61472178, 91418202) and the National Natural Science Foundation of Jiangsu Province (BK20130014). + + +REFERENCES + + +[1] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, "An in-depth study of the promises and perils of mining GitHub", Empirical Software Engineering, pp. 1–37, 2015. + + +[2] W. Ma, L. Chen, X. Zhang, Y. Zhou, and B. Xu, "How do developers fix cross-project correlated bugs? A case study on the GitHub scientific Python ecosystem", in Proceedings of the 39th International Conference on Software Engineering, 2017, p. Accepted. + + +[3] A. Decan, T. Mens, M. Claes, and P. Grosjean, "When GitHub meets CRAN: an analysis of inter-repository package dependency problems", in Proceedings of International Conference on Software Analysis, Evolution, and Reengineering, 2016, pp. 493–504. + + +[4] B. Adams, R. Kavanagh, A. E. Hassan, and D. M. German, "An empirical study of integration activities in distributions of open source software", Empirical Software Engineering, vol. 21, no. 3, pp. 960–1001, Jun. 2016. + + +[5] G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, "How the Apache community upgrades dependencies: an evolutionary study", Empirical Software Engineering, vol. 20, no. 5, pp. 1275–1317, Oct. 2015. + + +[6] L. Villarroel, G. Bavota, B. Russo, R. Oliveto, and M. Di Penta, "Release planning of mobile apps based on user reviews", in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 14–24. + + +[7] H. Valdivia Garcia and E. Shihab, "Characterizing and predicting blocking bugs in open source projects", in Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 72–81. + + +[8] X. Xia, D. Lo, E. Shihab, X. Wang, and X. Yang, "ELBlocker: Predicting blocking bugs with ensemble imbalance learning", Information and Software Technology, vol. 61, pp. 93–106, May 2015. + + +[9] H. Zhong and Z. Su, "An empirical study on real bug fixes", in Proceedings of the 37th International Conference on Software Engineering, 2015, vol. 1, pp. 913–923. + + +[10] K. Pan, S. Kim, and E. J. Whitehead, "Toward an understanding of bug fix patterns", Empirical Software Engineering, vol. 14, no. 3, pp. 286–315, Jun. 2009. + + +[11] J. Park, M. Kim, and D.-H. Bae, "An empirical study of supplementary patches in open source projects", Empirical Software Engineering, vol. 22, no. 1, pp. 436–473, May 2016. + + +[12] Y. Jiang, B. Adams, and D. M. German, "Will my patch make it? and how fast?: case study on the Linux kernel", in Proceedings of the 10th Working Conference on Mining Software Repositories, 2013, pp. 101–110. + + +[13] A. T. Misirli, E. Shihab, and Y. Kamei, "Studying high impact fix-inducing changes", Empirical Software Engineering, vol. 21, no. 2, pp. 605–641, Apr. 2016. + + +[14] J. Echeverria, F. Perez, A. Abellanas, J. I. Panach, C. Cetina, and O. Pastor, "Evaluating bug-fixing in Software Product Lines: an industrial case study", in Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2016, pp. 1–6. + + +[15] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, "GenProg: a generic method for automatic software repair", IEEE Transactions on Software Engineering, vol. 38, no. 1, pp. 54–72, Jan. 2012. + + +[16] C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, "A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each", in Proceedings of the 34th International Conference on Software Engineering, 2012, pp. 3–13. + + +[17] S. Mechtaev, J. Yi, and A. Roychoudhury, "Angelix: scalable multiline program patch synthesis via symbolic analysis", in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 691–701. + + +[18] Z. Gu, E. T. Barr, D. J. Hamilton, and Z. Su, "Has the bug really been fixed?", in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010, vol. 1, p. 55. + + +[19] M. Leszak, D. E. Perry, and D. Stoll, "A case study in root cause defect analysis", in Proceedings of the 22nd international conference on Software engineering, 2000, pp. 428–437. + + +[20] "Workaround - Project Management Knowledge". [Online]. Available: https://project-management-knowledge.com/definitions/w/workaround/. [Accessed: 08-Apr-2017]. + + +[21] E. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan, "The design of bug fixes", in Proceedings of 35th International Conference on Software Engineering, 2013, pp. 332–341. + + +[22] A. J. Ko, R. DeLine, and G. Venolia, "Information needs in collocated software development Teams", in Proceedings of the 29th International Conference on Software Engineering, 2007, pp. 344–353. + + +[23] E. Berglund, "Communicating bugs: global bug knowledge distribution", Information and Software Technology, vol. 47, no. 11, pp. 709–719, 2005. + + +[24] J. D. Gibbons and D. A. Wolfe, Nonparametric Statistical Inference. 2003. + + +[25] E. a. Freeman and G. G. Moisen, "A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa", Ecological Modelling, vol. 217, no. 1–2, pp. 48–58, 2008. + + +[26] G. MacBeth, E. Razumiejczyk, and R. Ledsema, "Cliff’s Delta calculator: a non-parametric effect size program for two groups of observations", Universitas Psychologica, vol. 10, no. 2, pp. 545–555, 2012. + + +[27] Y. Yang, Y. Zhou, H. Lu, L. Chen, Z. Chen, and B. Xu, "Are slice-based cohesion metrics actually useful in effort-aware post-release fault-proneness prediction? An empirical study", IEEE Transactions on Software Engineering, vol. 41, no. 4, pp. 331–357, 2015. + + +[28] S. Kim, H. Zhang, R. Wu, and L. Gong, "Dealing with noise in defect prediction", in Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 481–490. +---------------------------------------- +------------------------------- +Section 96: +Who, What, Why and How? Towards the Monetary Incentive in Crowd Collaboration: A Case Study of Github’s Sponsor Mechanism + + +Xunhui Zhang, Tao Wang +, Yue Yu +, Qiubing Zeng, Zhixing Li, Huaimin Wang +{zhangxunhui,taowang2005,yuyue,lizhixing15}@nudt.edu.cn,qiubingzeng@gmail.com,whm_w@163.com +National University of Defense Technology +Changsha, Hunan, China + + +ABSTRACT +While many forms of financial support are currently available, there are still many complaints about inadequate financing from software maintainers. In May 2019, GitHub, the world’s most active social coding platform, launched the Sponsor mechanism as a step toward more deeply integrating open source development and financial support. This paper collects data on 8,028 maintainers, 13,555 sponsors, and 22,515 sponsorships and conducts a comprehensive analysis. We explore the relationship between the Sponsor mechanism and developers along four dimensions using a combination of qualitative and quantitative analysis, examining why developers participate, how the mechanism affects developer activity, who obtains more sponsorships, and what mechanism flaws developers have encountered in the process of using it. We find a long-tail effect in the act of sponsorship, with most maintainers’ expectations remaining unmet, and sponsorship has only a short-term, slightly positive impact on development activity but is not sustainable. While sponsors participate in this mechanism mainly as a means of thanking the developers of OSS that they use, in practice, the social status of developers is the primary influence on the number of sponsorships. We find that both the Sponsor mechanism and open source donations have certain shortcomings and need further improvements to attract more participants. + + +CCS CONCEPTS +• Computer systems organization → Embedded systems; Redundancy; Robotics; • Networks → Network reliability. + + +KEYWORDS +sponsor, donation, GitHub, open source, financial support + + +1 INTRODUCTION +Open source development has brought prosperity to software ecosystems. Its characteristics of distributed coordination, free participation, and convenient sharing have led to the emergence of myriad open source projects, large-scale participation of developers, and continuous development of high-quality projects. However, the expansion of project scales has also brought challenges for software maintenance, such as continuously and rapidly increasing feature requests and bug fix reports [37] and an increasing pull request review workload [69]. Although there are many continuous integration (CI) tools and continuous deployment (CD) tools to help reduce the workload of project managers, the complicated and high-pressure maintenance work still subjects them to stress [66]. Past studies have shown that most current open source work is still spontaneously performed by volunteers [22]. They engage in open source work as a hobby, to improve their personal reputations or to learn new technologies. These intrinsic benefits motivate volunteers to make open source contributions [21]. However, many core managers and software maintainers would like to secure funding from others for their open source work because of the aforementioned challenges, thereby alleviating the related mental pressure and financial burdens [5, 57, 67]. + + +At present, there are many ways in which the open source sphere obtains financial support, such as crowdfunding on Kickstarter, project donations on OpenCollective, and issue rewards on BountySource and IssueHunt [49]. However, these are mainly web portals serving open source contributors active in other social coding communities. The separation of development activities and financial support brings problems. First, it is difficult for sponsors to find active developers and open source projects in the open source community. Second, open source contributors need to spend considerable effort on maintaining the financial support platform. In May 2019, GitHub, the world’s most popular software hosting platform, launched the Sponsor mechanism, characterized by deep integration of financial support and the social coding platform. While the Sponsor mechanism supports sponsorship of organizations and projects, it targets mainly individual contributors in the GitHub community. Therefore, unlike past related studies [52, 53], we can explore donation mechanism in the open source sphere from the perspective of individual developers. In this context, this paper aims to explore donation in the open source sphere using the Sponsor mechanism as an example. We conducted an empirical study based on mixed methods and answered the following research questions. + + +RQ1 Why do individuals participate or not in the Sponsor mechanism? +From the feedback of GitHub developers, we summarized eight reasons for participation among sponsored developers, six reasons for participation among sponsors, and six reasons for not participating in the mechanism among other individuals. The main reason that participants used the Sponsor mechanism was its relationship with open source software (OSS) usage. The main reason for not participating was that developers did not need sponsorship or that they were driven to participate in open source development because of its nonmonetary character. Our findings can help optimize +the Sponsor mechanism and attract more participants by satisfying the different motivations of contributors. + + +RQ2 How effective is sponsorship in motivating developer OSS activity? + + +We find through quantitative analysis that the sponsor mechanism has provided only a short-term, subtle boost to contributors’ activities. According to the results of the qualitative analysis, most developers agree that sponsorship can provide them with motivation but are not satisfied with the available amounts. In contrast, most sponsors are satisfied with the current mechanism. Our findings shed light on the application of the Sponsor mechanism in the open source sphere and the problems surrounding it. This work helps to rationalize the mechanism to promote greater participation in open source contributions among developers. + + +RQ3 Who is likely to receive more sponsorship? + + +The questionnaire results show that making useful OSS contributions and being active are the most critical factors for obtaining more sponsorship. However, according to the quantitative data analysis results, the factor that most affects sponsorship is the developer’s social status in the community. Our findings can provide actionable suggestions for developers seeking more sponsorships, while the conflicting results also illuminate the problems with OSS donations. + + +RQ4 What are the shortcomings of the Sponsor mechanism? + + +The research reveals that problems with the mechanism include usage deficiencies, object orientation with supported functions, and personalization. Many developers complain that the donations do not apply to open source ecosystems. A more relevant mechanism is needed to promote the healthy and sustainable development of the ecosystem. + + +The contributions of this paper are as follows: + + + + +To the best of our knowledge, this is the first in-depth study that comprehensively analyzes the GitHub Sponsor mechanism. + + +We quantitatively and qualitatively analyze the Sponsor mechanism along four dimensions, including developers’ motivation to participate (why), the mechanism’s effectiveness (how), the characteristics of developers who obtain more sponsorships (who), and the mechanism’s shortcomings (what). + + +We provide actionable suggestions to help developers participating in the Sponsor mechanism obtain more sponsorship and feasible advice for improving the mechanism’s effectiveness. + + + + +The remainder of this paper is organized as follows. Section 2 presents the related work, and Section 3 describes the background of the GitHub Sponsor mechanism. Section 4 presents the study design of this paper. In Section 5, we describe the results for each research question. Then, we discuss the findings in Section 6, and describe the threats in Section 7. Finally, in Section 8, we conclude the paper and describe future work. +---------------------------------------- +------------------------------- +Section 97: +2 RELATED WORK + + +Open Innovation in Science (OIS) is a concept, which unifies the two domains of open and collaborative practices in science, i.e., open science (OS) and open innovation (OI) [6]. For OS, the three pillars are accessibility, transparency, and inclusivity, among which the inclusivity (e.g., citizen science) is directly related to the knowledge production process. For OI, various forms of collaborative practice exist, including crowdsourcing, OSS development, etc. Regarding these open initiatives, the motivation and incentives of participation has always been the focus of continuous research [4, 70]. Although there are different views on the relationship between citizen science, crowdsourcing, and OSS development, we follow the relationships described above and present the related work on participation motivation and monetary incentives of the three parts separately. +---------------------------------------- +------------------------------- +Section 98: +2.1 Citizen science + + +For traditional citizen science, the motivation of participants varies greatly depending on the age [2], gender [48], educational background [46], and level of involvement [63]. In many cases, both monetary and non-monetary incentives have a positive effect on participation [9]. However, Wiseman et al. found that non-monetary incentives alone were better for online HCI projects to promote high-quality data from participants [71]. Knowles [38] also confirmed that although monetary incentives enhanced participation, they undermined sustained participation in volunteering initiatives. While for some specific projects (e.g., the conservation of species), monetary incentives even have the opposite effect [55]. + + +Because participants act as sensors to collect data or volunteer their idling computer or brainpower to classify large data sets in the citizen science projects [71], their motivation to participate is primarily intrinsic [15, 43]. However, as motivation to participate varies for different projects, the imposition of monetary incentives can have different effects. Unlike traditional citizen science, OSS development is an open innovation activity requiring deep involvement and a great deal of experience, so the motivation and incentives for participation may vary considerably. +---------------------------------------- +------------------------------- +Section 99: +2.2 Crowdsourcing + + +Acting as a type of online activity, participants will receive the satisfaction of a given kind of need, be it economic, social recognition, self-esteem, or the development of individual skills [16]. Hossain [34] classified the motivators into extrinsic and intrinsic motivators, where extrinsic motivators include financial motivators (e.g., cash), social motivators (e.g., peer recognition), and organizational motivators (e.g., career development). Intrinsic motivators are directly related to participants’ satisfaction with the task (e.g., enjoyment, fun). Considering the related incentives, Liang et al. [45] highlighted that both intrinsic and extrinsic incentives could increase the effort of participation; however, extrinsic incentives weaken the impact of intrinsic motivation. By comparing paid and unpaid tasks, Mao et al. [47] concluded that monetary incentives make the task processing speed faster, but the quality is reduced. Based on this, Feyisetan et al. [18] improved the paid microtasks more engaging by including sociality features or other game elements. MTurk is a typical and popular crowdsourcing platform based on financial incentives and gamification, where participants are recruited, paid, and rated for their participation in microtasks, which ensure speed and quality at the same time [10]. Unlike MTurk, the contribution to Wikipedia is not incentivized by monetary rewards. Content +contribution is more driven by reciprocity, self-development, while community participation relies on altruism, self-belonging, etc [73]. + + +As can be seen from the related works above, there are many situations of crowdsourcing and different forms of motivation and incentive. However, unlike OSS development, traditional crowdsourcing tasks are mostly micro-tasks, which are relatively simple and require less time. Moreover, there is a clear distinction between the roles, i.e., core developers and external contributors for OSS contributors. Contribution types include code contribution, code review, repository maintenance, management, etc. + + +2.3 Open source software development + + +Successful OSS initiatives can effectively change the method of software development [30, 39], improve software development efficiency [31, 60], and ensure software quality through effective management [1, 58]. Many projects have emerged along with the increasing number of users participating in the development of the OSS community [28]. In this context, many companies are involved in contributing to open source projects [32]. However, they have limited control and influence in day-to-day OSS work and decision processes [35], and OSS still relies on the voluntary participation of crowd labor [17]. + + +Many studies have focused on analyzing individuals’ motivations and the incentives for participating in OSS projects [14, 20, 33, 42, 59, 72]. Von Krogh et al. [68] classified contributors’ motivations into three categories, namely, intrinsic motivation (e.g., ideology and fun), internalized extrinsic motivation (e.g., reputation and own use), and extrinsic motivation (e.g., career and pay). Among developers who volunteer to contribute to open source projects, their motivation is mainly intrinsic or internalized extrinsic motivation [68]. They have full-time jobs and spend some spare time making open source contributions [21]. However, Hars et al. [3] found that being paid can promote continuous contribution from developers with all types of motivation. + + +Currently, there are many ways to obtain financial support for open source initiatives, e.g., through donations or bounties. Many studies have focused on the characteristics, impact, or effectiveness of each form of financial support. For example, regarding bounties, Zhou et al. [77] studied the relation between issue resolution and bounty usage and found that adding bounties would increase the likelihood of issue resolution. Acting as a way for recruiting developers, setting bounties attracts those developers who want to make money through open source contributions, which facilitate the completion of complex tasks. However, unlike bounty, the donation is a way of passively obtaining financial support. Regarding open source donation, Krishnamurthy et al. [40] studied the donation to the OSS platform and found the relation between donation level and platform association length and relational commitment. For the donation to OSS, Nakasai et al. [50, 51] analyzed the incentives of individual donors and found that the benefits for donors and software release could promote donations. In contrast, bugs in software will negatively affect the number of donations. However, they only focused on eclipse projects. Overney et al. [53] studied the impact of donations from a broader perspective of open source projects on GitHub, which corresponds to NPM packages and explicitly mentions the way of donation in the README.md files. They found that only a small fraction (mainly active projects) asked for donations, and the number of received donations was mainly associated with project age. Most donations are requested and eventually used for engineering activities. However, there was a slight influence of donation on project activities. Although Overney et al. did a thorough analysis of project-level donation, there lacks analysis of donation towards open source developers. Also, we think adding the qualitative analysis from the users’ perspective can confirm the quantitative findings and help understand the pros and cons of system design and use. +---------------------------------------- +------------------------------- +Section 100: +3 BACKGROUND + + +3.1 Terminology + + +To help the reader understand the rest of the article, we introduce key terms related to the Sponsor mechanism. + + + + +Sponsor +: an entity who provides donations to others. + + +Maintainer +: an entity who can be sponsored (developers who set up a Sponsor profile). + + +Nonmaintainer +: an entity who has not set up the Sponsors. + + +Sponsorship +: the donation relationship between a sponsor and a maintainer. + + +AccountSetUpTime +: the time when maintainers set up the Sponsor profile for their accounts. + + +FirstSponsorTime +: the time when maintainers receive their first sponsorship. + + + + +3.2 Introduction of the Sponsor mechanism + + +Currently, in GitHub, the workflow and key elements of sponsorship are shown in Figure 1, where the sponsorship is constructed on the maintainer’s sponsor page by clicking the “select” button of specific amount. The sponsor page is preset by the maintainer when setting up a Sponsor profile in the related GitHub account, which mainly consists of the following elements. +• Personal description: maintainers are free to add text and modify it at any time. The main content can cover basic personal information, project information, why they need to be sponsored, other ways of donation, etc. +• Preset goal: maintainers are allowed to set the number of sponsors or sponsorships that they want to get from the Sponsor mechanism and add related descriptions about the goal. +• Featured projects: this part lists the related projects that the maintainer currently works on or with the most popularity. +• Preset tiers & description: this part contains the tiers set by the maintainer. Sponsors can choose which tier to pay according to the amount and the related description. +• Payment choices: sponsors can choose to monthly or one-time customized payment. + + +After choosing the way to construct the sponsorship, sponsors can get the sponsor badge and receive updates from the sponsored maintainer in the future. + + +3.3 Preliminary analysis + + +We conduct a statistical analysis of the use trends of the Sponsor mechanism (Figure 2 shows the number of developers who set up the Sponsor account and how the number of sponsorships changes over time). We can see that the number of developers who set up an account increased sharply around October 2019 (new things inspire people’s interest). At other times, the growth rate shows a downward trend. Meanwhile, the absolute number of participants in this mechanism increased steadily, although the growth rate shows a slight upward trend. Compared to GitHub itself, which has shown a strong increase in its user base [74], the Sponsor mechanism has not attracted as much attention. In this context, we formulate RQ1: Why do individuals participate (or not) in the Sponsor mechanism? + + +According to our manual observation of GitHub developers’ sponsorship pages, we find that developers can spend more time on their open source work if sponsored by others (with examples of this trend being Tim Condon [64] and Super Diana [61]). In short, we consider how the Sponsor mechanism may affect developers’ open source activities. In this context, we ask RQ2: How effective is sponsorship in motivating developer OSS activity? + + +There are some very successful cases of individuals receiving support under the GitHub Sponsor mechanism (e.g., Caleb Porzio, who was sponsored by 1,314 sponsors as of 7 August 2021). However, most Sponsor participants have not been successful, and many have not received any sponsorships at all. According to Figure 3, only 14.1% of maintainers are sponsored at least once. Most people do not receive any sponsorships, despite setting up a Sponsor account. Among sponsors, most (76.3%) sponsor others just one time. Based on the statistical analysis results, we consider which developer characteristics lead to more sponsorships. In this vein, we ask RQ3: Who is likely to receive more sponsorships? + + +Currently, there are many ways to obtain financial support for open source initiatives, e.g., through donations or bounties. The different types of financial support each have advantages and disadvantages [49]. It falls to participants (especially those who have participated in multiple financial support mechanisms) to judge the reasonableness and effectiveness of each. To better understand users’ perceptions of the Sponsor mechanism and thus enrich and improve it, we propose RQ4: What are the shortcomings of the Sponsor mechanism? +---------------------------------------- +------------------------------- +Section 101: +4 STUDY OVERVIEW + + +4.1 Overall research methodology + + +The overall framework of this paper is shown in Figure 4, with the research methodology consisting of two main parts: data collection and research methods. + + +4.1.1 Data collection. The data is collected using GitHub API. The goal was to find different kinds of GitHub users (maintainers, sponsors, and nonmaintainers) and gather their related basic information and activities. Here, we focus on how to distinguish different kinds of users. The acquisition of relevant basic information and details on activities is described in the subsequent section (see Section 4.2) when we introduce each research method in detail. We acquired different types of users through the following steps. + + +(1) We used the RESTful API [27] to obtain all users. After that, we queried maintainers using the field hasSponsorsListing + + + + + +of the GraphQL API [26]. We obtained 60,732,250 users who had not deleted their accounts, among which 7,992 users were individual maintainers. + + +(2) We used the field +sponsorshipsAsMaintainer + of the GraphQL API [26] to look up all the sponsorships that maintainers had received and the corresponding sponsors. + + +(3) Using the list of sponsors queried in step (2), we used the field +sponsorshipsAsSponsor + of the GraphQL API [26] to query all the related maintainers. This step was to supplement the information on the maintainers who had set up the Sponsor profiles identified during the query process in step (1). + + +(4) We repeated steps (2) and (3) until no new maintainers or sponsors appeared. + + +Through the above steps, we obtained 20,579 users, among which 8,028 are maintainers, 13,555 are sponsors (1,004 users are maintainers while sponsoring others at the same time). We also get 22,315 times of sponsorships. All users except maintainers were marked as nonmaintainers. + + +4.1.2 Research methods. To answer the research questions, we used a combination of quantitative and qualitative analysis. Regarding our why (RQ1) and what (RQ4) questions, since it was difficult to capture everyone’s reasons for participation or nonparticipation and summarize the shortcomings of the mechanism based on just the platform information, we asked relevant people to complete a questionnaire. For the how (RQ2) and who (RQ3) questions, we collected maintainer-related data, quantitatively analyzed the impact of sponsorship behavior on maintainer open source activity, and explored the correlation between factors and the amount of sponsorship. On this basis, we again conducted a qualitative analysis using a questionnaire. This combination of quantitative and qualitative analysis led to our conclusions. Next, we describe each research method in detail. + + +4.2 Detailed introduction of research methods + + +4.2.1 Questionnaire. Since there are three types of interaction between the user and the Sponsor mechanism, namely, interactions with a sponsor, a maintainer, or a nonmaintainer (see Section 3.1), we designed three different online surveys [75]. The surveys for both sponsors and maintainers relate to their expectations for and satisfaction with the Sponsor mechanism. The survey for nonmaintainers relates to their reason for not setting up the Sponsor feature for their account. All the surveys start with an introduction to the research background and purpose. There are two types of questions in each survey. + + + + +Demographic questions designed to obtain participants’ information, including their role in and experience with OSS development (the predefined answers were inspired by prior research [44]). + + +Main questions, designed to gather users’ views on the Sponsor mechanism. + + + + +Among the main questions, there are three kinds. + + + + +Open-ended questions aimed at gathering answers. + + +Rating scale questions soliciting users’ satisfaction and agreement levels. + + +Multiple-choice questions with “Other” text field options aimed at gathering large-scale user feedback while providing additional answers. + + + + +We provide a final, open-ended question to allow participants to talk freely about the Sponsor mechanism. We discussed the questions with software engineering researchers to ensure that the items were well designed for our study and clear enough for participants to answer. Finally, we used SurveyMonkey [62] to deploy our online surveys. + + +There were two rounds of each survey: 1) the pilot stage, aimed at gathering answers to the open-ended questions from a limited number of participants, and 2) the full-scale stage, aimed at gathering the votes for each answer from a larger population. The statistics on the two stages can be seen in Table 1. + + +Participant recruitment. To recruit participants for the two rounds of three different surveys, we took the following steps: +Table 1: Statistics on the two-stage survey + + +| Stage | Statistic items | Maintainers | Sponsors | Nonmaintainers | +|-------|----------------|-------------|----------|----------------| +| Pilot | #selected participants | 400 | 400 | 400 | +| | #successful invitations | 394 | 388 | 390 | +| | #response (%) | 45 (11.4%) | 24 (6.2%) | 9 (2.3%) | +| | Date for collection | June 8, 2021 - June 15, 2021 | +| Full-scale | #selected participants | 6,104 | 6,359 | 7,500 | +| | #successful invitations | 5,951 | 6,224 | 7,343 | +| | #response (%) | 467 (7.8%) | 396 (6.4%) | 202 (2.8%) | +| | Date for collection | June 29, 2021 - July 13, 2021 | +---------------------------------------- +------------------------------- +Section 102: +means the number, e.g., #response implies the number of responses + + +(1) For all three types of users (maintainers, sponsors, nonmaintainers), we filtered out those whose email or name information could not be openly accessed, as these users might not want to receive questionnaires. + + +(2) For all three types of users, we filtered out those who had not been active in the last month (since May 3, 2021), as they might not have focused on open source work on GitHub in recent days. In this step, we used the GitHub API to obtain users’ recent activity, including the top repositories to which they had contributed in the last month and their last update time (field “updatedAt”) on GitHub [26]. + + +(3) For nonmaintainers, we selected only users who may be eligible to set up a Sponsor profile based on their location information and the list of countries or regions included under the GitHub Sponsor mechanism [25]. + + +(4) After completing the above three steps, we randomly selected 400 unique individuals of each type without overlap as participants in the pilot stage. + + +(5) For the full-scale stage, we selected all other maintainers (6,104) and sponsors (6,359) as participants. For nonmaintainers, due to the low response rate in the pilot stage, we filtered users according to the total number of stars of projects owned by developers (collected on 23 June 2021). We selected those with at least ten stars (we assumed that developers with popular projects are more likely to be interested in the Sponsor mechanism and use GitHub very often). After that, we randomly selected 7,500 participants. + + +Response and analysis. After selecting the participants, we published the questionnaire online and sent the web address to participants via email. The email invitation contained the basic information of the questionnaire publisher, the reason for the release, the number of questions, and the estimated time required to fill out the questionnaire. + + +Based on the participants’ feedback of the pilot stage, we designed the questionnaires for the full-scale stage. We removed 1 question for maintainers, 1 question for sponsors, and 2 questions for nonmaintainers due to answers with repetitive content in relation to the answers to other questions. We extracted the essential information from all responses and turned some open questions into multiple-choice questions (3 for maintainers, 3 for sponsors, and 1 for nonmaintainers) through open coding of card sorting method [78] by the first, second and the fifth authors together. To avoid disturbing the participants, we extended the time to collect the responses in this stage relative to that in the pilot stage but did not send a second email reminder. At the same time, because different types of participants dedicate different amounts of attention to the Sponsor mechanism, the response rate varies greatly. Nonmaintainers, who do not participate in the Sponsor mechanism, may not care about it and not want to reply to the email. + + +When analyzing the multiple-choice questions, we first calculated the voting rate for each preset option. After that, we manually included the textual response for the “Other” option into the preset taxonomy, if possible, via the closed coding method [78]. If a new topic emerged, we integrated it into the existing taxonomy. When analyzing the last open question (“Do you have anything else to tell us about the Sponsor mechanism?”), we extracted the essential information from the textual response for qualitative analysis. To facilitate analysis, we use [MCx], [SCx], and [OCx] to represent the textual response in the questionnaire for maintainers, sponsors, and nonmaintainers, respectively, where x indicates the serial number of the comment. + + +Through the first two questions of each questionnaire, we collected participants’ demographic information, including their status and experience with open source development. For the full-scale stage, the results are shown in Table 2. More than 70% of participants in each category have more than three years of OSS development experience. More than 10% of sponsors have no OSS development experience, which indicates that many sponsors sponsor others solely to support OSS development or maintenance. + + +Table 2: Demographic information of participants in the full-scale stage + + +| Questions | Answers | M (%) | S (%) | NM (%) | +|-----------|---------|-------|-------|--------| +| Q1: How would you best describe yourself? | Developer working in industry | 62.3 | 80.0 | 65.5 | +| | Full time independent developer | 16.6 | 10.0 | 8.0 | +| | Student | 11.6 | 6.9 | 6.5 | +| | Academic researcher | 3.7 | 3.6 | 16.0 | +| Q2: How many years of OSS development experience do you have? | Never | 1.1 | 10.2 | 3.0 | +| | <1 year | 2.2 | 4.6 | 6.5 | +| | 1-3 years | 10.1 | 14.5 | 12.6 | +| | 3-5 years | 21.9 | 22.6 | 23.1 | +| | 5-10 years | 33.6 | 26.9 | 27.1 | +| | >10 years | 31.2 | 21.3 | 27.6 | + + +M: maintainer; S: sponsor; NM: nonmaintainer + + +4.2.2 ITS analysis. The aim of this analysis was to determine when to treat sponsorship as an intervention and how it influences the potential trends in maintainers’ activities (development and discussion activities) from a long-term perspective. Therefore, following the guidelines of previous studies [53, 65, 76], we used the ITS method. The settings of the ITS analysis are shown below. + + +Interventions: We set both accountSetUpTime and firstSponsorTime (see Section 3.1) as separate interventions. We assumed that maintainers may increase their activity after accountSetUpTime to attract others’ attention for future sponsorship or be motivated to increase their open source contributions after firstSponsorTime. + + +Responses: We set the number of commits (development activity) and the number of discussions (discussion activity) as responses, as they indicate different kinds of activities on GitHub. + + +Unstable period: Similar to previous studies [53, 65, 76], we set 15 days before and after interventions as the unstable period. + + +Before & after intervention periods: To retain enough analyzable data, we selected maintainers with at least six months of activity. +before and after interventions in addition to the unstable period. Therefore, each maintainer has at least $15 \times 2 + 6 \times 2 + 30 = 390$ days of activity on GitHub. + + +Time window: + Each month in before & after intervention periods is a time window, and the unstable period is also a time window. Therefore, there are $6 \times 12 + 1 = 13$ time windows in all. + + +The independent variables are as follows. + + +Basic items. + +- +intervention: + Binary variable indicating an intervention +- +time: + Continuous variable indicating the time by month from the start of an observation to each time window, with a value range of $[0, 12]$ +- +time after intervention: + Continuous variable indicating how many months have passed after an intervention (if there is no intervention, $time\ after\ intervention=0$; otherwise, $time\ after\ intervention=time-6$). + + +Developer characteristics. + +- +number of stars before: + Continuous variable, measured as the total number of stars of maintainer-owned repositories before the start of each time window +- +in company: + Binary variable indicating whether company information exists at data collection time +- +has goal: + Binary variable indicating whether a maintainer sets a goal for sponsorship at data collection time +- +has another way: + Binary variable indicating whether a maintainer sets other methods for receiving donations at data collection time +- +is hireable: + Binary variable indicating whether a maintainer declares a hireable status at data collection time + + +Developer activities. + +- +number of commits before: + Continuous variable measured as the number of commits before the start of each time window +- +number of discussions before: + Continuous variable measured as the number of discussions before the start of each time window + + +We built a mixed effect linear regression model for ITS analysis with a maintainer identifier as the random effect and all the measured factors as fixed effects. A major advantage of the mixed effect model is that it can eliminate the correlated observations within a subject [19]. Here, the time windows for the same maintainer tend to have a similar trend. We used the lmer function of the lmerTest package in R [41] to fit models for the maintainer’s commit and discussion activities. For better model performance, we transformed the continuous variables to make them approximately normal and on a comparable scale through log-transformation (plus 0.5) and standardization (mean 0, standard deviation 1) [56]. To reduce the multicollinearity problem, we excluded factors with variance inflation factor (VIF) values $\geq 5$ using the vif function of the car package in R [11]. We report the coefficients and the related $p$ values obtained in this way. We also report the explained variance of the factor, which can be interpreted as the effect size relative to the total variance explained by all the factors. For the fitness of models, we report both marginal ($R^2_m$) and conditional ($R^2_c$) R-squared values using the r.squaredGLMM function of the MuMIn package in R [7]. + + +Together with ITS analysis, we visually present how responses change over time to show the activity change more intuitively (statistical analysis). Since there is an unstable period in the ITS analysis, we analyze this period separately using the Wilcoxon paired test method, which is presented in the following section. + + +4.2.3 Wilcoxon paired test. + For the ITS analysis, the unstable period is ignored. However, the Sponsor mechanism involves a small amount of money, which may influence maintainer behavior in the short term only. We assume that maintainers may have great fluctuations in OSS activity during the unstable period. We used a paired, nonparametric test method called the Wilcoxon paired test [8]. Through two-sided tests (both alternative=greater and alternative=less) [12], we can see whether the intervention increases or decreases a maintainer’s activity. We considered three kinds of interventions, including accountSetUpTime, firstSponsorTime, and before and after each sponsorship. We used Cliff’s delta ($\delta$) to measure the effect size [29], with $|\delta| < 0.147$ indicating a negligible effect size, $0.147 \leq |\delta| < 0.33$ indicating a small effect size, $0.33 \leq |\delta| < 0.474$ indicating a medium effect size, and $|\delta| \geq 0.474$ indicating a large effect size. + + +4.2.4 Hurdle regression analysis. + The critical idea of hurdle regression is to create a dataset with maintainer characteristics and the amount of sponsorship established. Therefore, we collected different characteristics of each maintainer heuristically, including basic information, social characteristics, Sponsor mechanism characteristics, developer activities, and project characteristics. For the amount of sponsorship, we used the number of times that a maintainer is sponsored. Next, we present detailed descriptions of the collected variables. + + +Developer basic information. + +- +user age: + Continuous variable measured as the time interval by month since the creation of the user account in the GitHub community until the data collection time +- +in company: + Binary variable indicating whether a maintainer introduces the personal work situation in detail +- +has email: + Binary variable indicating whether a maintainer publicly provides the contact information +- +has location: + Binary variable indicating whether the maintainer discloses the geographical location information +- +is hireable: + Binary variable indicating whether a maintainer indicates availability for hire + + +Social characteristics. + +- +followers: + Continuous variable measured as the number of followers +- +followings: + Continuous variable indicating how many users the maintainer follows + + +Sponsor mechanism characteristics. + +- +min tier: + Continuous variable measured as the minimum number of dollars set by the maintainer for donations +- +max tier: + Continuous variable indicating the maximum donation +- +has goal: + Binary variable indicating whether a maintainer sets a goal for sponsorship +• +has another way +: Binary variable indicating whether a maintainer introduces other modes for receiving donations. Here, we identified other donation modes by finding links to other funding platforms in the description on the sponsorship page. Other platforms are shown in Table 9, which was compiled according to the collection by Overney et al. [53] and the supported external links of GitHub [24]. + + +• +introduction richness +: Continuous variable measured as the length of the introduction on the personal sponsorship page. + + +• +user age after sponsor account +: Continuous variable indicating the time interval by month (to see how time influences the amount of sponsorship). + + +Activity characteristics. + + +• +number of commits +: Continuous variable measured as the total number of commits in GitHub from +accountSetUpTime + until the data collection time. + + +• +number of discussions +: Continuous variable measured as the number of comments, including issue comments, pull request comments, and commit comments from +accountSetUpTime + until the data collection time. + + +Project characteristics. + + +• +sum star number +: Continuous variable measured as the total number of stars of repositories created by a maintainer. + + +• +sum fork number +: Continuous variable indicating the number of forks. + + +• +sum watch number +: Continuous variable indicating the number of watchers. + + +• +sum top repository star number +: Continuous variable measured as the total number of stars of top repositories that a maintainer contributed in the four months before data collection [23]. + + +• +number of dependents +: Continuous variable measured as the number of repositories that rely on the project with the most watchers among all projects owned by the maintainer. + + +When building the hurdle regression models, we removed maintainers with less than 3 months of activity after +accountSetUpTime + to reduce the impact of time on sponsorship. We reasoned that sponsors need time to find maintainers to donate to. To reduce the zero-inflation in the response variance, we used hurdle regression [36] by splitting the sample into two parts: + + + + + + +maintainers + who have not received any donations from others, to examine which factors influence whether a maintainer receives donations. + + + + + + +maintainers + with at least 1 sponsorship, to examine how the amount of received donations is influenced by the aforementioned characteristics. + + + + + + +For the reduction of the multicollinearity problem and the report of results, we use the same methods (see Section 4.2.2). +---------------------------------------- +------------------------------- +Section 103: +5 RESULTS +---------------------------------------- +------------------------------- +Section 104: +5.1 RQ1: Why do individuals participate or not in the Sponsor mechanism? + + +For this research question, the questionnaire had a dedicated item for each of the three types of participants, i.e., Q3 for maintainers, sponsors, and nonmaintainers. Table A shows the motivations or reasons elaborated by different types of developers in the full-scale stage and the percentage of votes for each option. +---------------------------------------- +------------------------------- +Section 105: +5.1.1 Related motivations. + + +From the results, we find that some of the motivations of maintainers and sponsors are related. + + +Project use relationship. + For RM1 and RS1, they all indicate that the usage of related projects leads to sponsorship. Some 64.9% of maintainers and 85.8% of sponsors cite this factor as one motivation for participating in the Sponsor mechanism; this consensus puts it in first place on both groups’ motivation lists. People think that users should give back to contributors in various ways, among which the Sponsor mechanism serves as a “nice way to say thanks” [MC23] and “allow people to easily fund their projects.” [MC20]. From the perspective of sponsors, developers are grateful for the OSS that they use and hope to express their gratitude and, e.g., “show support for OSS, which I heavily rely on in my daily work. Without OSS, I could not have built a career in data science” [SC3]. + + +Promotion of continuous OSS contributions. + RM2 and RS2 reflect participants’ uniform motivation to engage in further OSS contributions. Some 63.1% and 78.4% of maintainers and sponsors, respectively, cite this factor as a motivation; this factor thus ranks 2nd among all the enumerated reasons for participation. For open source developers, if they want to devote themselves to open source projects, they need to solve the problem of daily costs and open source maintenance costs (e.g., “I believe in the open source and good-for-humanity idea. I need to get paid only to live a decent life” [MC37]). Therefore, the emergence of the Sponsor mechanism may help them solve the above problems to a certain extent and then invest more time in open source projects (e.g., “I was really hoping to get sponsorship so I could spend more time focusing on developing open source projects” [MC11]). For sponsors, they also hope to inspire contributors to continue to make outstanding contributions (e.g., “motivate them to do the awesome work” [SC5]). + + +Recognition of OSS work. + For RM4 and RS3, they all indicate sponsors’ recognition of maintainers. A total of 39.9% of maintainers and 49% of sponsors cite this factor as a motivation for participation; this motivation ranks 4th and 3rd for these two groups, respectively. For some people, sponsorship is a manifestation of greater recognition by sponsors than income. + + +Support for specific features. + For RM7 and RS5, 18.8% of maintainers and 9.4% of sponsors hope that the Sponsor mechanism can help set the agenda for issue resolution priorities, although many people think that OSS should not be related to money (e.g., “If there was money given by others involved, I would feel pressed to implement whatever they want (like in industry projects). I want FLOSS to be completely independent from corporate requests” [OC5]). +---------------------------------------- +------------------------------- +Section 106: +5.1.2 Motivation across different user types. + + +In addition to the motivations mentioned above related to the sponsor and maintainer relationship, there are other motivations or reasons related to the kinds of users. + + +Maintainers: + More than 60% of these participants chose RM3, but only 13% chose RM8. In the Other option, 4 participants mentioned that they hope sponsorship can cover some of their infrastructure costs. Moreover, 28.9% of participants even chose RM5 +Just for fun. + This indicates that different people have different +Table 3: Reasons for participating or not participating in the Sponsor mechanism + + +| Reason_maintainers | Votes (%) | Reason_sponsors | Votes (%) | Reason_non-maintainers | Votes (%) | +|--------------------|-----------|-----------------|-----------|------------------------|-----------| +| RM1 It allows users of my projects to express thanks/appreciation | 64.9 | RS1 Because I benefit from the developer’s projects | 85.8 | RO1 No need to be sponsored | 39.3 | +| RM2 Sponsorship can motivate my future | 63.1 | RS2 To encourage the developer to continue the contribution | 78.4 | RO2 I contribute to OSS not for money | 38.3 | +| RM3 Side income for OSS contribution | 60.6 | RS3 To show my recognition of the developer’s work | 69.5 | RO3 My work is not worth being sponsored | 28.4 | +| RM4 It can reflect community recognition for my work | 39.9 | RS4 Because I’m interested in the developer’s projects | 49.0 | RO4 Never heard of it | 26.4 | +| RM5 Just for fun | 28.9 | RS5 To motivate the developer to work harder on a specific feature | 9.4 | RO5 It’s cumbersome | 8.5 | +| RM6 I deserve to be rewarded for my past 21.8 | 18.8 | RS6 Because I know the developer | 8.9 | RO6 Not available in my region | 2.0 | +| OSS contribution | | Other | | Other | 10.4 | +| RM7 I am able to prioritize the requirements of sponsors (e.g., fixing bugs) | 13.1 | | | | +| RM8 It’s a way for me to make a living | 1.9 | | | | + + +The main reason cited for participation is to obtain or express appreciation for the use of open source projects or to recognize the maintainer’s OSS contribution. In turn, such support may promote better contributions. Maintainers seeking to make money tend to obtain extra income rather than a full livelihood through sponsorship. For nonmaintainers, in addition to personal reasons, the mixing of open source projects and money is another critical consideration preventing them from participating. + + +5.2 RQ2: How effective is sponsorship in motivating developer OSS activity? + + +We used the following methods for this research question: statistical analysis (visualization), ITS analysis, unstable period analysis based on the Wilcoxon paired test method, and qualitative analysis based on a questionnaire survey. We also explored the two kinds of interventions, namely, accountSetUpTime and firstSponsorTime. + + +5.2.1 Visualization. Figures 5-8 present the change in activities over time. We can see from the figures that both commit and discussion activities remain stable before and after the intervention. However, during the unstable period, developers tend to be more active than usual. In response to this phenomenon, we analyzed the persistent and transient effects of the interventions using the ITS method and Wilcoxon paired test method, respectively. + + +5.2.2 ITS analysis. Table 4 shows the results of the ITS analysis. The results show that the factor with the strongest correlation to OSS activity is the associated historical activity (i.e., number of commits before for Commit Model, number of discussions before for Discussion Model). For all four models, the associated historical activity explains more than 80% of the total variance. For the impact of other funding sources, we find that the variance explained by this factor does not exceed 1.1% in all four models. Therefore, it is somewhat clear that the existence of funding sources other than the Sponsor mechanism does not influence our exploration of the association of this mechanism with open source activity. + + +For the number of commits, we find that for both accountSetUpTime and firstSponsorTime, there is a slight growth trend before the intervention. After the intervention, both show a negative growth trend ($\beta(t) + \beta(t \text{ after intervention}) < 0$). Additionally, we find that the intervention itself is negatively correlated with the number of commits ($\beta(\text{intervention}) < 0$). + + +For the number of discussions, we find results similar to those for the commit activity. The intervention of the Sponsor mechanism changes the original slowly increasing dynamics and reduces the discussion activity. Specifically, the intervention has no effect at accountSetUpTime but a slightly negative effect at firstSponsorTime. + + +In regard to the above results, it is surprising that the setup of the Sponsor mechanism or the first sponsorship does not contribute +to the maintainer’s commit activity or discussion activity growth. In contrast, there is a slight inhibitory effect. To illuminate this situation, we followed up with a questionnaire to explore the maintainers’ subjective satisfaction with the Sponsorship mechanism and its motivating effect (see Section 5.2.4). + + +5.2.3 Wilcoxon paired test analysis. Table 5 shows the results of the Wilcoxon paired test and Cliff’s delta. + + +For the number of commits, when the maintainer sets up the Sponsor account, is sponsored for the first time, or receives a new sponsorship, the number of commits after the intervention is significantly higher. For the number of discussions, we find no significant changes around the three kinds of interventions. + + +This result indicates that sponsor behavior leads to a short-term increase in commit activity. For discussion, however, the sponsorship does not lead to short-term changes. In contrast to the ITS analysis, the Wilcoxon paired test analyzes changes in activity during the unstable period, further demonstrating that the Sponsorship mechanism can give a short-term boost to development activity. + + +5.2.4 Questionnaire survey. To further explore the effectiveness of the Sponsorship mechanism, we conducted independent research with maintainers and sponsors to uncover their subjective judgments about the efficacy of the mechanism. In response to this goal, we asked maintainers (Q4 “How satisfied are you with the income from sponsors?”) and sponsors (Q4 “As a sponsor, to what extent does your sponsorship meet your expectations?”). Meanwhile, we asked the maintainers directly about their internal perceptions of the effectiveness of sponsorship incentives (Q5 “To what extent can sponsorship motivate you?”). The results are shown in Figure 9. + + +For sponsors, we find that 53.7% think that sponsorship meets their expectations fully or a great deal and only 14.1% report that their expectations are hardly met or not met at all. For maintainers, we find that 50.4% consider that sponsorship motivates them fully or a great deal but 22.5% think that it does not bring any motivating effect. However, in terms of the amount of sponsorship, we find that only 20.7% of maintainers are either satisfied or very satisfied. + + +Table 4: Results of ITS analysis + + +| Commit Model | Dependent variable: scale(log(number of commits + 0.5)) | Discussion Model | Dependent variable: scale(log(number of discussions + 0.5)) | +|--------------|----------------------------------------------------------|------------------|----------------------------------------------------------| +| | accountSetUpTime | firstSponsorTime | accountSetUpTime | firstSponsorTime | +| Coeffs (Err.) | Chisq | Coeffs (Err.) | Chisq | Coeffs (Err.) | Chisq | +| Intercept | -0.10 + (0.01) | 0.01 + (0.01) | 0.01 + (0.01) | 0.01 + (0.01) | +| scale(log(number of commits before + 0.5)) | 0.59 + (0.01) | 5190.72 + | 0.58 + (0.02) | 1185.38 + | +| scale(log(number of discussions before + 0.5)) | -0.02 (0.01) | 3.45 + | -0.03 (0.02) | 2.29 | +| scale(log(number of stars before + 0.5)) | -0.06 + (0.01) | 55.23 + | -0.07 + (0.01) | 22.71 + | +| has goal (TRUE) | 0.06 + (0.01) | 17.43 + | 0.07 + (0.03) | 5.97 + | +| has other way (TRUE) | 0.16 + (0.05) | 8.22 + | 0.14 (0.09) | 2.36 | +| in company (TRUE) | 0.89 + (0.01) | 38.56 + | 0.11 + (0.03) | 15.60 + | +| is hireable (TRUE) | 0.00 (0.01) | 0.02 | 0.01 (0.03) | 0.22 | +| time | 0.02 + (0.00) | 96.11 + | 0.03 + (0.00) | 61.22 + | +| intervention (TRUE) | -0.02 + (0.01) | 5.66 | -0.09 + (0.02) | 25.54 + | +| time after intervention | -0.04 + (0.00) | 245.92 + | -0.05 + (0.00) | 97.38 + | + + +Number of Observations 75,516 20,148 75,516 20,148 + + +[ R^2 ] + + +[ R^2 ] + + +[ 0.64 ] + + +[ 0.64 ] + + +[ 0.66 ] + + +[ 0.65 ] + + +[ * p < 0.001, ** p < 0.01, * p < 0.05, < 0.1 ] + + +Figure 9: Results of 5-point Likert scale questions +with their income from sponsorship and 30.1% are dissatisfied or very dissatisfied with the amount. + + +We think that the main reason for this difference is that sponsors’ main motivation to participate is to display their gratitude, inspire others, etc., by giving funds. Therefore, most sponsors are satisfied with their own behavior. For maintainers, although more than half think that sponsorship can be stimulating, we find that only approximately 20% are satisfied with the amount of sponsorship received. This shows that open source sponsorship has a positive effect on some developers, but in fact, the amount of monetary rewards that can be received through sponsorship is relatively small and unlikely to meet the expectations of maintainers. + + +In terms of short-term effects, the Sponsor mechanism makes a slightly positive contribution to the development activity but has no significant impact on discussion activity. However, this impact is not sustained. One possible reason is that the actual amount of support does not meet maintainers’ expectations, which makes it difficult for maintainers to rely on sponsorship income to keep investing in open source contributions. + + +5.3 RQ3: Who is likely to receive more sponsorships? + + +For this research question, we tried to identify the important factors influencing the amount of sponsorship and provide further advice on maintainers. We again analyzed and verified the results through a combination of quantitative and qualitative analysis. For the qualitative analysis, we analyzed both maintainers and sponsors and explored the consistency of their perceptions of sponsorship. + + +5.3.1 Hurdle regression. From an overall perspective (see Table 6), the hurdle regression models fit well, with $R^2 = 34\%$ and $R^2 = 39\%$, respectively. Even though 7,465 maintainers have more than 3 months of activity after setting up their Sponsor profile, only 2,750 (36.8%) of them receive at least one sponsorship. Moreover, only 6% receive sponsorships more than 10 times, and only 25 maintainers receive more than 100 sponsorships. Therefore, although many people want to obtain sponsorship, only a small number of people succeed. + + +When we consider whether the maintainer receives any sponsorships (columns 2 and 3 of Table 6), the followers factor, representing social status, has the most substantial positive effect, explaining 45.8% of the total variance. However, the factor followings is negatively correlated with the likelihood of receiving sponsorship (effect size: 3.1%). It is likely that compared to followings, followers better represents the centrality of maintainers in the community, while maintainers with large followings tend to learn more from others in the community. Discussion activity is positively correlated with the likelihood of sponsorship (number of discussions, effect size: 22.7%), while relatively speaking, commit activity explains only 0.3% of the variance. A possible explanation is that sponsored developers tend to focus more on issues or pull requests submitted by sponsors to give back or attract the attention of others. Commit activity is common among GitHub developers, where many developers may just focus on their own issues. For sponsor tiers, the min tier is negatively correlated with the likelihood of sponsorship acquisition (effect size: 12.3%). However, max tier is positively correlated and explains 5% of the variance. Both of the tiers have sizable effects but opposite directions of influence. It is likely that many sponsors tend to donate only a little money and that setting a high min tier may cause them to abstain from sponsorship. However, if maintainers want to obtain sponsorships, they cannot undervalue themselves. Trying to increase the max tier can increase the possibility of being sponsored. Another thing for maintainers to note is the importance of the introduction text when setting up their Sponsor account. If maintainers introduce themselves at greater length, they are more likely to become sponsored (effect size: 5.1%). Other factors have negligible effects, with explained variances of less than 5%. + + +When we consider the amount of sponsorship received by maintainers (columns 4 and 5 of Table 6), the social status of maintainers is also positively correlated with the response (followers, effect size: 65.3%). At the same time, followings oppositely correlates with the response (effect size: 10.7%). The factor number of discussions explains 9.6% of the total variance. The min tier variable becomes nonsignificant, unlike in the receive sponsorship model. A possible explanation for this result is that the setting of the min tier is not a long-term solution for securing more sponsorship. Developers need to be more focused on their status and daily activities in the community. Other factors have negligible effects. + + +5.3.2 Questionnaire. We asked questions related to maintainers (Q6 “In which way do you think you can obtain more sponsorships?”) and sponsors (Q5 “What kind of developer do you prefer to sponsor?”) separately. Table 7 presents the results. + + +For maintainers. The results reveal that from the maintainers’ perspective, producing useful projects and tools (WM1, WM4) is seen as more likely to draw sponsorships than just participating in projects (WM5, WM6, WM7, WM8, WM9). One possible reason for this is that the Sponsor mechanism is to credit funds to individual accounts, and the sponsorship button on the project homepage also needs to be configured by the owner. Some sponsors who want to donate to a project through the Sponsor mechanism (e.g., those reporting that “I prefer to sponsor projects, not a specific developer” [SC167]) may end up sponsoring only the project’s owner. + + +Some 54.5% of maintainers think that by working hard, they can obtain more sponsorships (WM2). However, some maintainers... +said sponsorship is simply a matter of popularity (e.g., “Purely popularity basically... OSS Creators from YouTube earn a ton of money” [MC292]; “I think it is mostly a function of being a celebrity so it operates on the same rules” [MC262]). This is probably why 54.1% of the maintainers chose WM3. + + +More than 1 option was chosen by 85.6% of the sponsored participants. Moreover, 20.5% chose at least 5 options, which shows that in fact, the options that we offered are feasible for promoting sponsorships among maintainers. Some relevant participants indicated that “Donations just don’t work” [MC284] or “It doesn’t matter; people take when it’s free” [MC281]. These responses suggest that the reasons that prevent most people from obtaining more sponsorships that would meet their expectations are not limited to individual participation characteristics and platform mechanism design; rather, the act of sponsorship itself may not be suitable for the open source sphere. Indeed, 10 participants who selected WM11 indicated that there was no way to obtain more sponsorship. + + +For sponsors, the vast majority (85.1%) chose WS1, which suggests that most sponsors support developers involved in the open source projects that sponsors use. This corresponds to the top-ranked way of obtaining sponsorship (WM1) selected by the maintainers, suggesting that the best way to obtain more sponsorship, in the opinion of both maintainers and sponsors, is to create projects that more people use. Similarly, more than half of the participants wanted to sponsor projects of personal interest (WS2) and developers who had made significant contributions (WS3). We find that 31.1% of the sponsors chose to sponsor independent developers (WS5). However, some sponsors said that just being an independent developer is not enough and that the development and maintenance of good open source projects or tools are needed (e.g., “Independent developers with nice tools” [SC30]). + + +Most sponsors do not consider the act of sponsorship as a form of charity—few people reported doing so simply because the person being rewarded was in hardship (WS7) or had not received many rewards (WS6). Likewise, sponsors do not want to reward another developer simply because they know one another (only 15.4% chose WS8, e.g., “It is usually a library I am using in my own project and I know the developer in person” [SC168]). +---------------------------------------- +------------------------------- +Section 107: +Table 6: Result for factors influencing sponsorship + + +| Dependent variable: receive sponsorship | Coefs (Err.) | Chiq | +|----------------------------------------|-------------|------| +| (Intercept) | −0.53∗∗∗ (0.09) | 1.80∗ (0.07) | +| scale(log(user age + 0.5)) | −0.10∗ (0.03) | 8.62∗∗ | +| in company (TRUE) | −0.26∗∗∗ (0.06) | 18.08∗∗∗ | +| has email (TRUE) | −0.03 (0.06) | 0.31 | +| has location (TRUE) | −0.11 (0.09) | 1.41 | +| is hireable (TRUE) | −0.19∗∗ (0.06) | 9.70∗∗ | +| scale(log(followings + 0.5)) | 0.96∗∗∗ (0.04) | 545.36∗∗∗ | +| scale(log(min tier + 0.5)) | −0.19∗∗∗ (0.03) | 37.39∗∗∗ | +| scale(log(max tier + 0.5)) | −0.42∗∗∗ (0.04) | 146.89∗∗∗ | +| has goal (TRUE) | 0.23∗∗∗ (0.03) | 59.82∗∗∗ | +| has other way (TRUE) | 0.18∗ (0.06) | 8.32∗ | +| scale(log(user age after sponsor account + 0.5)) | 0.28 (0.22) | 1.54 | +| scale(log(number of commits + 0.5)) | 0.02 (0.03) | 0.40 | +| scale(log(number of discussions + 0.5)) | 0.08 (0.04) | 3.42 | +| scale(log(sum star number + 0.5)) | 0.73∗∗∗ (0.05) | 270.29∗∗∗ | +| scale(log(sum top repository star number + 0.5)) | −0.10∗∗ (0.04) | 7.48∗∗ | +| scale(log(introduction richness + 0.5)) | −0.13∗∗ (0.04) | 9.55∗∗ | +| scale(log(number of dependents + 0.5)) | 0.25∗∗∗ (0.03) | 60.84∗∗∗ | +| Number of Observations | 7,465 | 2,790 | +| delta R² | 0.34 | 0.39 | +---------------------------------------- +------------------------------- +Section 108: +Table 7: Ways of obtaining more sponsorship + + +| Way_maintainers | Votes (%) | Who_sponsors | Votes (%) | +|-----------------|-----------|--------------|-----------| +| WM1 Producing useful projects | 62.6 | WS1 Developers whose projects I benefit from | 85.1 | +| WM2 Staying active and contributing more in the community | 54.5 | WS2 Developers whose projects I’m interested in | 60.3 | +| WM3 Advertising myself or my work to the community | 54.1 | WS3 Developers who make important contributions | 50.9 | +| WM4 Producing valuable code | 38.5 | WS4 Developers who are active in community | 42.0 | +| WM5 Getting involved in popular projects | 29.1 | WS5 Independent developers | 31.1 | +| WM6 Getting involved in projects adopted by companies | 25.5 | WS6 Developers who haven’t received much sponsorship | 24.1 | +| WM7 Getting involved in long-term projects | 21.6 | WS7 Developers who are in hardship | 18.7 | +| WM8 Getting involved in less maintained yet important projects | 19.1 | WS8 Developers who I know | 15.4 | +| WM9 Getting involved in projects led by companies | 8.8 | WS9 Other | 1.0 | +| WM10 Providing localized content | 7.4 | | | +| WM11 Other | 3.6 | | | +Most maintainers and sponsors think that sponsorship builds on relationships forged through using OSS. Active and meaningful participation in open source contributions can also help maintainers gain more attention. However, the quantitative analysis reveals that the social popularity of the maintainer in the community is the decisive factor in obtaining more sponsorships. + + +5.4 RQ4: What are the shortcomings of the Sponsor mechanism? + + +For this research question, we investigated the mechanism shortcomings found by participants while using the Sponsor mechanism. We asked the question “What are the shortcomings of the Sponsor mechanism?” of both maintainers (Q7) and sponsors (Q6) separately. Table 8 presents the results. + + +Among maintainers, 13.1% thought that the Sponsor mechanism was perfect (SM6) and could meet their personal needs well, while among sponsors, 33.1% thought that the mechanism was perfect (SS2). This indicates that the satisfaction of different types of mechanism participants, especially maintainers, varies greatly. The current Sponsor mechanism does not meet maintainers’ needs well. The shortcomings include the following main aspects (some of these were resolved by GitHub during the research process). + + +Discoverability of maintainers. The results reveal that 51.3% of maintainers found it difficult to be discovered by sponsors (SM1); however, based on feedback from sponsors, only 19.6% found it difficult to determine whom they should sponsor (SS3). A larger share (40.1%) found it difficult to assess who urgently needed sponsorship (SS1). + + +Interactivity of participants. From the results, we find that among maintainers, 29.4% thought that the current Sponsor mechanism cannot support good direct communication with sponsors (SM2), while among sponsors, 11.8% wanted communication support (SS5). Some thought that they should not burden developers by interrupting their normal development process (“I don’t want to burden the developers [by asking them] to communicate with sponsors. The sponsor should be string-free” [SC195]). + + +Payments. Many people, including maintainers and sponsors, highlighted existing payment problems with the Sponsor mechanism, including limited payment options (25.1% of maintainers – SM3)), limited sponsorship tiers, inconvenient tax payments (19.3% of the maintainers – SM5)), and limited payment providers. Some of these shortcomings, e.g., the limited payment options, may have been resolved by GitHub during the research process. + + +User distinction. A total of 20.7% (SM4) of maintainers and 10.5% (SS6) of sponsors mentioned the distinction between sponsors and others in project development activities. + + +Geographical restrictions. From SM7 and SS4, we see that 11% of maintainers and 13.2% of sponsors thought that support for regions limits the popularity of participation. As of 27 July 2021, only 37 regions were supported, leaving many people unable to participate in the mechanism (RO6) and sponsors unable to sponsor as many people as they want (e.g., “Not all organizations I want to support joined GitHub sponsors” [SC192]). + + +Lack of contribution indicators. Five participants noted that there was a lack of valid OSS contribution indicators. OSS contributions are not limited to commits and pull requests. If not involved in the current project, the sponsor hardly knows who has played a significant role in the project development (e.g., “It is not easy to measure my OSS contribution. Sometimes it is just filing issues; other times, it is documentation PRs” [MC350]). Moreover, contributions of small patches to large projects are difficult for others to find and thus are unlikely to gain sponsorships (e.g., “In my case, you will be hard-pressed to get anything for your work when you are making just a little addition to a massive piece of software” [MC379]). Among sponsors, some want to sponsor a project, not individual maintainers (e.g., “I prefer to sponsor projects, not a specific developer” [SC167]). + + +OSS donations. The Sponsor mechanism itself is an act of donation. On GitHub, sponsorship is primarily for users or organizations that have created a GitHub account. We find from the results that 16 participants thought that the donation mechanism itself was not suitable for the current open source sphere. Many reasons were cited for this evaluation: People take open source projects for granted, and no one wants to pay for them (e.g., “People still do not like to pay for software” [MC355]). Companies that use open source initiatives to gain revenue do not want to give back to the open source project (e.g., “Most companies don’t fund any of their open source dependencies” [MC354]). Donations are passive income, and without a regular income, developers have little motivation to work full-time on open source projects (e.g., “Donation makes far less revenue than charging for things” [OC78]). + + +To solve the problems mentioned above, we offer the following actionable suggestions after taking into account the participant feedback. + + +Discoverability of maintainers. + + + + +Add “Sponsor” buttons for the relevant project or people on the release webpage (“Recognition of sponsors in release of the repository would be something I can think of” [SC217]). + + +Add support for integrated development environments (IDEs), allowing developers to discover package dependencies and quickly jump to sponsor pages while developing with IDEs (“Better discoverability and integration with other developer tooling” [SC65]). + + +Provide a more straightforward way to show personal OSS contributions (e.g., “Promote efforts like a dashboard” [MC126]). + + + + +Interactivity among participants. + + + + +Allow maintainers to configure themselves whether they wish to communicate directly with sponsors. The interaction can be set up in different groups for different sponsors, similar to Patreon’s integration solution with Discord [54] (e.g., “Lack of integration with the payment tiers like the Discord integration with Patreon” [MC337]). + + +Allow maintainers to configure their own thank-you emails that can be sent automatically when they receive a sponsorship (e.g., “Some kind of thank-you setup where I can send notes, etc.” [MC109]). + + +Allow sponsors to upload statements to disclose expenses related to sponsorship proceeds (“Distribution of the money, +especially in FOSS [free and open source software] projects” [MC88]. + + + + +Payments. + + + + +Provide clear income and expense statements to the sponsor and maintainer automatically. + + +Integrate as many payment providers as possible on the basis of meeting tax requirements. + + + + +User distinctions. + + + + +Let maintainers decide, through in a configurable form in their personal settings, whether they want to treat sponsors differently from nonsponsors. + + +In addition to an option to show distinctions, add configuration options such as what development activities to show and whether to distinguish between sponsors with different sponsorship amounts (e.g., “Developers should be allowed to set permission levels based on sponsorship. E.g., you can only comment or make requests if you’re a sponsor (or if the developer directly opts you in, or if you’ve made contributions to the project, things like that). This would really positively change the culture of GitHub collaboration” [SC212]). + + + + +Geographical restrictions. Provide support for more regions. + + +Lack of contribution indicators. Set up a multidimensional indicator of contributions, and ensure rational allocation of project sponsorship funds. + + +OSS donations. Future research should synthesize feedback from all types of open source participants and reconsider how to improve the sponsorship mechanism or design a more appropriate form of open source financial support. +---------------------------------------- +------------------------------- +Section 109: +Table 8: Shortcomings of the Sponsor mechanism + + +| Shortcoming_maintainers | Votes (%) | Shortcoming_sponsors | Votes (%) | +|-------------------------|-----------|----------------------|-----------| +| S M1 It’s hard for others to discover me for sponsorship | 51.3 | S S11 I cannot assess how urgently a developer needs to be sponsored | 40.1 | +| S M2 I can’t interact with my sponsors on GitHub (e.g., for expressing appreciation) | 29.4 | S S2 None. It’s perfect | 33.1 | +| S M3 Lack of a wide range of payment options (e.g., one-time/yearly/quarterly payment) | 25.1 | S S3 It’s hard for me to find the developer I should sponsor | 19.6 | +| S M4 GitHub does not distinctly mark my sponsors (e.g., I cannot easily tell whether an issue submitter is my sponsor) | 20.7 | S S4 It is not supported in many regions | 13.2 | +| S M5 I have to pay taxes | 19.3 | S S5 I can’t interact with the developer I sponsored on GitHub | 11.8 | +| S M6 None. It’s perfect to me | 13.1 | S S6 I’m not distinctly marked in the projects whose maintainers have been sponsored by me (e.g., when I submit an issue) | 10.5 | +| S M7 It is not supported in many regions | 11.0 | S S7 Other | 8.1 | +| S M8 I can’t declare how I dealt with the received money | 10.1 | | | +| S M9 Other | 9.4 | | | + + +During the research process, GitHub fixed some shortcomings, e.g., the one-time payment method. + + +The shortcomings of the Sponsor mechanism relate to three main aspects. +Usage deficiencies +: difficulty of participants in finding each other, lack of good interaction support, lack of promotion, lack of adequate payment and billing support, etc. +Object orientation with supported functions +: despite support for organizations and projects, main targeting of individuals. For sponsors, a need for better support for corporate sponsorship; for maintainers, a need for better support for multicontributor projects. +Personalization +: a need for configurability of the Sponsor mechanism to reflect variation in participant types and motivations. +---------------------------------------- +------------------------------- +Section 110: +6 DISCUSSION + + +Through this study of the integrated sponsorship mechanism on the world’s most popular open source platform (GitHub), we found that participation in the mechanism has not shown the same rapid growth as participation in open source projects. Meanwhile, there is a long-tail effect regarding the number of sponsorships obtained by maintainers; i.e., most maintainers do not obtain many sponsorships or even any at all. Compared to the work of Overney et al. [53], this research brings us one step closer to understanding the incentive effect of sponsorship on individual developers by collecting feedback from participants in open source donations, taking the GitHub Sponsor as an example. + + +Although this article considers only the Sponsor mechanism, it lacks overall consideration and comparative analysis of all open source sponsorship platforms. However, we think that the article still provides some guidance in helping improve the mechanism itself and exploring the essence of open source donation. + + +This paper explored four aspects of the Sponsor mechanism: its who, what, why, and how. The main findings and insights are as follows. + + +Why do individuals participate or not in the Sponsor mechanism? + + +Not all open source contributors endorse open source donation. There were more nonparticipants than participants. Like the motivations for participation in traditional citizen science [15, 43] and information-sharing crowdsourcing systems like Wikipedia [73], developers are primarily intrinsically motivated to participate in open source contributions [21]. However, because open source development activities are more complex and require significant +maintenance, many contributors are looking for financial support [5, 57, 67]. Among the groups that support and use it, it is generally relationships built through the use of specific software that serve as the backbone of the sponsorship behavior. In fact, many users want to reflect the difference between sponsors and nonsponsors in development activities and, in this way, change the method of open source collaboration and participation in open source donation. Such a change might not be very pleasant and could lead to the open source sphere becoming money driven. We think that making the format personalized and configurable may meet the needs of more people without changing the nature of the open source sphere. + + +It is necessary for system designers to consider regional support and then make the Sponsor mechanism accessible and better for more people who want to participate by improving the user experience (e.g., better access to bill for tax). + + +How effective is sponsorship in motivating developer OSS activity? + + +In a study of donations to projects, Overney et al. [53] found that donation did not improve engineering activity. And in our study, we also found that sponsorship has only a short-term positive stimulating effect on maintainers’ development activity. However, the impact does not last, and there is even a slight negative effect in the long term. A possible reason for this result is that most maintainers do not receive sufficient sponsorship through the Sponsor mechanism to be motivated to contribute continuously. This may reflect the characteristics of open source donations. The maintainer passively receives sponsorship from the sponsor, and there is no compulsion for the act of sponsorship to occur. Thus, situations may arise that are similar to that of one of our questionnaire participants, who created heavily used tools but received no sponsorships. When compared horizontally with the results of other maintainers, such an outcome may have the negative effect of dealing a blow to maintainers and reducing their enthusiasm for making open source contributions. + + +For system designers, it is important to consider how to design conjunctive mechanisms, such as adding a ranking list according to the number of received or given sponsorships in the annual report or other locations. Therefore, the sponsorship mechanism can become a more continuous driving force, enhancing the impact of the sponsorship on developer activities. + + +Who is more likely to receive sponsorships? + Participants’ subjective perceptions conflict with the actual phenomenon. Participants believe that creating useful open source projects should lead to more sponsorships. However, we find that the most significant factor influencing the amount of sponsorship is social status. This inconsistent finding illustrates that participants want to express their gratitude or receive appreciation from others through the software usage relationship. However, it is not the case that those who develop sufficiently useful tools receive substantive sponsorship. Given the feedback from participants in our questionnaire, this situation is likely to cause maintainers to complain about a lack of publicity for themselves or about the fact that their work leads to no more sponsorships. At the same time, developers who make minor contributions to popular projects or outstanding contributions to niche projects may be ignored under this mechanism. Comparing to project-oriented donation, e.g., open collective, patreon [53]. Although the Sponsor mechanism is targeted at developers, which allows external contributors who do not own but are actively involved in popular projects to get donations. However, it is found through the results that sponsors prefer project-oriented donation, i.e., the core developers or owners of popular or used projects are more likely to receive sponsorship. Since some of the money donated to projects is spent on travel/food [53], we think it is needed to consider the percentage of contributors’ contributions to achieve greater equity. + + +As for now, we think that for open source developers who want to get more sponsorship, it is essential to increase one’s community visibility through advertising and help oneself get more attention by building open source projects that more people use. + + +What are the shortcomings of the Sponsor mechanism? + The defects of the Sponsor mechanism are manifested in three main aspects: usage defects, object-oriented and support mechanisms, and personalization setting problems. At the same time, many developers believe that sponsorship behavior is not suitable for the open source ecosystem. The free nature of OSS leads to an unwillingness to pay. This finding shows that in addition to the problems with the mechanism itself, donations are not perfectly adapted to the open source ecosystem. The passivity, uncertainty, and instability inherent to donations make it difficult for maintainers to rely on them and continue to make open source contributions for a long time. At the same time, the lack of reasonable evaluations of contributions and funding allocation makes it difficult for sponsors to determine whom to sponsor and by how much. So the bounty approach of “getting paid to do more” is recognized by some people than the donation approach, through which they can get paid immediately for the work and have more precise goals [77]. But how to balance the advantages of bounty and avoid regarding money as the guide of open source development may be the goal of future monetary incentive system design. For more specific system design recommendations, see Section 5.4. + + +Overall, the Sponsor mechanism is a good attempt and an essential step toward achieving reasonable and effective open source financial support. As of now, the mechanism still needs further improvement to meet the needs of more developers. +---------------------------------------- +------------------------------- +Section 111: +7 THREATS TO VALIDITY + + +For the questionnaire, we did not do the detection of carelessly invalid responses [13]. First of all, the number of questions is small, the time required to answer is short, and there is no overlap between questions, so it is not feasible to judge the validity of the responses simply by the results. Secondly, we did not set attention check items to shorten user participation time. However, since users need to click on our questionnaire and jump to the SurveyMonkey site to respond after receiving the email, we think this has ensured the validity of the responses we received to some extent. When conducting the second round of the questionnaire survey, to avoid disturbing participants excessively, we sent it only once. We did not send second or third reminder emails. At the same time, people who have not set up a Sponsor account may not care about the mechanism. As a result, the response rate was low. +For ITS analysis, data should be collected for the different factors for each time window. However, due to the lack of availability of timestamps in the GitHub API, some factors were measured only at their values at the time of data collection (e.g., in company), as they do not change frequently. + + +For hurdle regression, the factors included in the models were several aspects related to the sponsorship of developers. However, other factors may influence whether a developer can obtain sponsorship or how much funding is received. Moreover, the number of sponsorships does not accurately indicate the amount of money that a developer receives from donations, as there exist different tiers and sponsors can withdraw their monthly sponsorship at any time. However, we do not have access to data on the actual donations received by each developer. Developers may obtain donations from other platforms to maintain related projects. We did not consider all this funding in total or the activities of developers on other platforms. + + +This paper explored only the effectiveness of the Sponsor mechanism for individual users, but the Sponsor mechanism itself can also be used for organizational accounts. To avoid our analysis being confounded by the impact of such users, we processed our data accordingly. Therefore, the results do not apply to GitHub’s organizational accounts. According to statistics, 92% of users who set up sponsors are individual users. +---------------------------------------- +------------------------------- +Section 112: +8 CONCLUSION AND FUTURE WORK + + +This paper took GitHub’s Sponsor mechanism as a case study and used a mixed qualitative and quantitative analysis method to investigate four dimensions of the mechanism. Regarding why developers participate in the Sponsor mechanism, we found that it is mainly related to the use of OSS. Regarding the mechanism’s effectiveness, we found that the Sponsor system has only a short-term effect on development activities but that in the long term, there is a slight decrease. We studied who obtains more sponsorships and found that the social status of the maintainer in the community correlates most strongly with this outcome (the more followers, the more sponsorships a developer acquires). Regarding the drawbacks of the mechanism, we found that in addition to the shortcomings in its use, participants felt that the Sponsor mechanism should better attract and support corporate sponsors. Some people thought that the open source donation method needed to be improved to attract more developers to participate. Overall, we have explored the correlation between donation behavior and developers in open source communities using the GitHub Sponsor mechanism. In future work, we will further explore the following aspects: 1) the advantages and disadvantages of different open source donation platforms and the effectiveness of incentives for open source activities and 2) different types of open source financial support and the reasonableness and effectiveness of each mode. + + +ACKNOWLEDGMENTS + + +This work is supported by China National Grand R&D Plan (Grant No.2020AAA0103504). Thanks to all GitHub users who response to the questionnaire. + + +REFERENCES + + +[1] Mark Aberdour. 2007. Achieving quality in open-source software. IEEE software 24, 1 (2007), 58–64. +[2] Bethany Alender. 2016. Understanding volunteer motivations to participate in citizen science projects: a deeper look at water quality monitoring. Journal of Science Communication 15, 3 (2016), A04. +[3] Shaosong Ou Alexander Hars. 2002. Working for free? Motivations for participating in open-source projects. International journal of electronic commerce 6, 3 (2002), 25–59. +[4] Maria J Antikainen and Heli K Vaataja. 2010. Rewarding in open innovation communities—how to motivate members. International Journal of Entrepreneurship and Innovation Management 11, 4 (2010), 440–456. +[5] Dryden Ash. 2013. The ethics of unpaid labor and the OSS community. https://www.ashedryden.com/blog/the-ethics-of-unpaid-labor-and-the-oss-community. [Online; accessed June 8, 2021]. +[6] Susanne Beck, Carsten Bergenholtz, Marcel Bogers, Tiare-Maria Brasseur, Marie Louise Conradsen, Dáilétt Di Marco, Andreas P Distel, Leonard Dobusch, Daniel Dörler, Agnes Effert, et al. 2020. The Open Innovation in Science research field: a collaborative conceptualisation approach. Industry and Innovation (2020), 1–50. +[7] Kenneth P. Burnham and David R. Anderson. 2002. Model Selection and Multimodel Inference: a Practical Information-Theoretic Approach (2nd ed.). Springer. +[8] G. Canfora, L. Cerulo, M. Cimitile, and MD Penta. 2014. How changes affect software entropy: an empirical study. Empirical Software Engineering 19, 1 (2014), 1–38. +[9] Francesco Cappa, Jeffrey Laut, Maurizio Porfiri, and Luca Giustiniano. 2018. Bring them aboard: rewarding participation in technology-mediated citizen science projects. Computers in Human Behavior 89 (2018), 246–257. +[10] Krista Casler, Lydia Buckel, and Elizabeth Hackett. 2013. Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in human behavior 29, 6 (2013), 2156–2160. +[11] Jacob Cohen, Patricia Cohen, Stephen G West, and Leona S Aiken. 2013. Applied multiple regression/correlation analysis for the behavioral sciences. Routledge. +[12] The SciPy community. 2008. API Reference of scipy.stats.wilcoxon. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html. [Online; accessed July 31, 2021]. +[13] Paul G Curran. 2016. Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology 66 (2016), 4–19. +[14] Paul A David and Joseph S Shapiro. 2008. Community-based production of open-source software: What do we know about the developers who participate? Information Economics and Policy 20, 4 (2008), 364–398. +[15] Margret C Domroese and Elizabeth A Johnson. 2017. Why watch bees? Motivations of citizen science volunteers in the Great Pollinator Project. Biological Conservation 208 (2017), 40–47. +[16] Enrique Estellés-Arolas and Fernando González-Ladrón-de Guervara. 2012. Towards an integrated crowdsourcing definition. Journal of Information science 38, 2 (2012), 189–200. +[17] Yulin Fang and Derrick Neufeld. 2009. Understanding sustained participation in open source software projects. Journal of Management Information Systems 25, 4 (2009), 9–50. +[18] Oluwaseyi Feyisetan, Elena Simperl, Max Van Kleek, and Nigel Shadbolt. 2015. Improving paid microtasks through gamification and adaptive furtherance incentives. In Proceedings of the 24th international conference on world wide web. 333–343. +[19] Andrzej Gałecki and Tomasz Burzykowski. 2013. Linear mixed-effects model. In Linear Mixed-Effects Models Using R. Springer, 245–273. +[20] Rishab Aiyer Ghosh. 2005. Understanding free software developers: Findings from the FLOSS study. Perspectives on free and open source software 28 (2005), 23–47. +[21] GitHub. 2016. Getting Paid for Open Source Work. https://opensourceguide.getting-paid/. [Online; accessed June 8, 2021]. +[22] GitHub. 2017. Open Source Survey. https://opensourceurvey.org/2017/. [Online; accessed June 8, 2021]. +[23] GitHub. 2021. About your personal dashboard. https://docs.github.com/en/github/setting-up-and-managing-your-github-user-account/managing-user-account-settings/about-your-personal-dashboard#finding-your-top-repositories-and-teams. [Online; accessed May 24, 2021]. +[24] GitHub. 2021. Displaying a sponsor button in your repository. https://docs.github.com/en/github/administering-a-repository/managing-repository-settings/displaying-a-sponsor-button-in-your-repository. [Online; accessed May 22, 2021]. +[25] GitHub. 2021. Invest in the software that powers your world. https://github.com/sponsors. [Online; accessed July 30, 2021]. +[26] GitHub. 2021. Reference of GraphQL User API. https://docs.github.com/en/graphql/reference/objects#user. [Online; accessed July 30, 2021]. +[27] GitHub. 2021. Reference of RESTful List users API. https://docs.github.com/en/rest/reference/users#list-users. [Online; accessed August 1, 2021]. + + +[28] GitHub. 2021. The 2020 State of the OCTOVERSE. https://octoverse.github.com/. [Online; accessed February 4, 2021]. + + +[29] R. J. Grissom and J. J. Kim. 2007. Effect Sizes for Research: A Broad Practical Approach. Effect sizes for research: a broad practical approach. + + +[30] Carl Gutwin, Reagan Penner, and Kevin Schneider. 2004. Group awareness in distributed software development. In Proceedings of the 2004 ACM conference on Computer supported cooperative work. ACM, Chicago, Illinois, USA, 72–81. + + +[31] Stefan Haefliger, Georg Von Krogh, and Sebastian Spaeth. 2008. Code reuse in open source software. Management science 54, 1 (2008), 180–193. + + +[32] Cynthia Harvey. 2017. 35 Top Open Source Companies. https://www.datamation.com/open-source/35-top-open-source-companies. [Online; accessed February 5, 2021]. + + +[33] Andrea Hemetsberger. 2002. Fostering cooperation on the Internet: Social exchange processes in innovative virtual consumer communities. ACR North American Advances 29 (2002), 354–356. + + +[34] Mokter Hossain. 2012. Users’ motivation to participate in online crowdsourcing platforms. In 2012 International Conference on Innovation Management and Technology Research. IEEE, 310–315. + + +[35] Javier Luis Cánovas Izquierdo and Jordi Cabot. 2018. The role of foundations in open source projects. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Society. ACM, Gothenburg, Sweden, 3–12. + + +[36] S. Jackman, C. Kleiber, and A. Zeileis. 2008. Regression Models for Count Data in R. Journal of Statistical Software 27, 8 (2008), 1–25. + + +[37] Jayanta Kanwal and Pratibha Mahgul. 2012. Bug Prioritization to Facilitate Bug Report Triage. Journal of Computer Science and Technology 27 (2012), 397–412. + + +[38] Bran Knowles. 2013. Cyber-sustainability: towards a sustainable digital future. Lancaster University (United Kingdom). + + +[39] Bruce Kogut and Anca Meitus. 2001. Open-source software development and distributed innovation. Oxford review of economic policy 17, 2 (2001), 248–264. + + +[40] Sandeep Krishnamurthy and Arvind K Tripathi. 2009. Monetary donations to an open source software platform. Research Policy 38, 2 (2009), 404–414. + + +[41] Alexandra Kuznetsova, Per B. Brockhoff, and Bune H. B. Christensen. 2017. InterTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software 82, 13 (2017), 1–26. https://doi.org/10.18637/jss.v082.i13 + + +[42] Karim Lakhani and Robert W. 2005. Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. MIT Press, Cambridge. + + +[43] Lincoln R Larson, Caren B Cooper, Sara Futch, Devyani Singh, Nathan J Shipley, Kathy Dale, Geoffrey S LeBaron, and John Y Takekawa. 2020. The diverse motivations of citizen scientists: Does conservation emphasis grow as volunteer participation progresses? Biological Conservation 242 (2020), 108428. + + +[44] Huigang Li, Yue Yu, Tao Wang, Gang Yin, Shanhan Li, and Huaimin Wang. 2021. Are You Still Working on This An Empirical Study on Pull Request Abandonment. IEEE Transactions on Software Engineering (2021), 1–1. https://doi.org/10.1109/TSE.2021.3053403 + + +[45] Debra J Mesch, Patrick M Rooney, Kathryn S Steinberg, and Brian Denton. 2006. The effects of race, gender, and marital status on giving and volunteering in Indiana. Nonprofit and Voluntary Sector Quarterly 35, 4 (2006), 565–587. + + +[46] Nadia. 2015. A handy guide to financial support for open source. https://github.com/nayalia/lemonade-stand/blob/master/README.md. [Online; accessed June 8, 2021]. + + +[47] Keitaro Nakasai, Hideaki Hata, and Kenichi Matsumoto. 2018. Are donation badges appealing? A case study of developer responses to eclipse bug reports. IEEE Software 36, 3 (2018), 22–27. + + +[48] Keitaro Nakasai, Hideaki Hata, Saya Onoue, and Kenichi Matsumoto. 2017. Analysis of donations in the eclipse project. In 8th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, Tokyo, Japan, 18–22. + + +[49] Cassandra Overney. 2020. Hanging by the Thread: An Empirical Study of Donations in Open Source. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 131–133. https://doi.org/10.1145/3377812.3382170 + + +[50] Cassandra Overney, Jens Meinicke, Christian Kästner, and Bogdan Vasilescu. 2020. How to Not Get Rich: An Empirical Study of Donations in Open Source. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1209–1221. https://doi.org/10.1145/3377811.3380410 + + +[51] Patrícia Tiago, Maria João Gouveia, César Capinha, Margarida Santos-Reis, and Henrique M Pereira. 2017. The influence of motivational factors on the frequency of participation in citizen science activities. Nature Conservation 18 (2017), 61. + + +[52] Cassandra Overney. 2020. Become a sponsor to Super Diana. https://github.com/sponsors/alphacentauri2. [Online; accessed May 26, 2021]. + + +[53] SurveyMonkey. 1999. https://www.surveymonkey.com/. [Online; accessed May 26, 2021]. + + +[54] Andrew Schofield and Grahame S. Cooper. 2006. Participation in Free and Open Source Communities: An Empirical Study of Community Members’ Perceptions. In Open Source Systems, Ernesto Damiani, Brian Fitzgerald, Wai-Chi Scacchi, Marco Scotto, and Giancarlo Succi (Eds.). Springer US, Boston, MA, 221–231. + + +[55] Manuel Sojer and Joachim Henkel. 2010. Code reuse in open source software development: Quantitative evidence, drivers, and impediments. Journal of the Association for Information Systems 11, 12 (2010), 2. + + +[56] Diana Super. 2020. Become a sponsor to Super Diana. https://github.com/sponsors/0xTim. [Online; accessed May 26, 2021]. + + +[57] Asher Trockman, Shurui Zhou, Christian Kästner, and Bogdan Vasilescu. 2018. Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the Npm Ecosystem. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE ’18). Association for Computing Machinery, New York, NY, USA, 511–522. https://doi.org/10.1145/3180155.3180209 + + +[58] Lian Tung. 2020. Redis database creator Sanfilippo: Why I’m stepping down from the open-source project. https://www.zdnet.com/article/redis-database-creator-sanfilippo-why-im-stepping-down-from-the-open-source-project/. [Online; accessed June 8, 2021]. + + +[59] Steven J. Vaughan-Nichols. 2021. Hard work and poor pay stresses out open-source maintainers. https://www.zdnet.com/article/hard-work-and-poor-pay-stresses-out-open-source-maintainers/. [Online; accessed Jun 8, 2021]. + + +[60] Georg Von Krogh, Stefan Haefliger, Sebastian Spaeth, and Martin W. Wallin. 2012. Carrots and Rammbocks: Motivation and Social Practice in Open Source Software Development. MIS Q. 36, 2 (Jun 2012), 649–676. + + +[61] Jing Wang, Patrick C. Shih, and John M. Carroll. 2015. Revisiting Linus’s law: Benefits and challenges of open source software peer review. International Journal of Human-Computer Studies 77 (2015), 52–65. https://doi.org/10.1016/j.ijhcs.2015.01.005 + + +[62] John Willinsky. 2005. The unacknowledged convergence of open source, open access, and open science. First Monday 10, 8 (Aug. 2005). https://doi.org/10.5210/fm.v10i8.1265 + + +[63] Sarah Wiseman, Anna L Cox, Sandy JJ Gould, and Duncan P Brumby. 2017. Exploring the effects of non-monetary reimbursement for participants in HCI research. Human Computation (2017). + + +[64] Bo Xu, Donald R. Jones, and Bingxia Shao. 2009. Volunteers’ involvement in online community based software development. Information & Management 46, 3 (2009), 151–158. https://doi.org/10.1016/j.im.2008.12.005 + + +[65] Bo Xu and Dahui Li. 2015. An empirical study of the motivations for content contribution and community participation in Wikipedia. Information & management 52, 3 (2015), 275–286. + + +[66] Yue Yu, Gang Yin, Huaimin Wang, and Tao Wang. 2014. Exploring the Patterns of Social Behavior in GitHub. In Proceedings of the 1st International Workshop on Crowd-Based Software Development Methods and Technologies (Hong Kong, China) (CrowdSoft 2014). Association for Computing Machinery, New York, NY, USA, 31–36. https://doi.org/10.1145/2666539.2666571 + + +[67] Xunzhao Zhang, Tao Wang, Yue Yu, Quheng Zeng, Zhiying Li, and Huaimin Wang. 2012. Questionnaire design for GitHub Sponsor mechanism. (2022). https://doi.org/10.5281/ZENODO.5715824 + + +[68] Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. 2017. The impact of continuous integration on other software development practices: A large-scale empirical study. In 2017 32nd IEEE/ACM. +A OTHER PLATFORMS BESIDES THE SPONSOR MECHANISM + + +Table 9: Other platforms for obtaining OSS financial support + + +| Name | URL | +|-----------------------------|------------------------------------------| +| Bountysource | https://www.bountysource.com | +| Flattr | https://flattr.com | +| IssueHunt | https://issuehunt.io | +| Kickstarter | https://www.kickstarter.com | +| Liberapay | https://liberapay.com | +| Gittip | https://gratipay.com | +| Gratipay | https://gratipay.com | +| OpenCollective | https://opencollective.com | +| Otechie | https://otechie.com | +| Patreon | https://www.patreon.com | +| PayPal | https://www.paypal.com | +| Tidelift | https://tidelift.com | +| Tip4Commit | https://tip4commit.com | +| LFX Mentorship (formerly CommunityBridge) | https://lfx.linuxfoundation.org/tools/mentorship | +| Ko-fi | https://ko-fi.com | +---------------------------------------- +------------------------------- +Section 113: +Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects + + +Michael Hilton + +Oregon State University, USA + +hiltonm@eecs.oregonstate.edu + + +Timothy Tunnell + +University of Illinois, USA + +tunnell2@illinois.edu + + +Kai Huang + +University of Illinois, USA + +khuang29@illinois.edu + + +Darko Marinov + +University of Illinois, USA + +marinov@illinois.edu + + +Danny Dig + +Oregon State University, USA + +digd@eecs.oregonstate.edu + + +ABSTRACT +Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI rising as a big success story in automated software engineering, it has received almost no attention from the research community. For example, how widely is CI used in practice, and what are some costs and benefits associated with CI? Without answering such questions, developers, tool builders, and researchers make decisions based on folklore instead of data. + + +In this paper, we use three complementary methods to study the usage of CI in open-source projects. To understand which CI systems developers use, we analyzed 34,544 open-source projects from GitHub. To understand how developers use CI, we analyzed 1,529,291 builds from the most commonly used CI system. To understand why projects use or do not use CI, we surveyed 442 developers. With this data, we answered several key questions related to the usage, costs, and benefits of CI. Among our results, we show evidence that supports the claim that CI helps projects release more often, that CI is widely adopted by the most popular projects, as well as finding that the overall percentage of projects using CI continues to grow, making it important and timely to focus more research on CI. + + +CCS Concepts +• Software and its engineering → Agile software development; Software testing and debugging; + + +Keywords +continuous integration; mining software repositories + + + + +INTRODUCTION +Continuous Integration (CI) is emerging as one of the biggest success stories in automated software engineering. CI systems automate the compilation, building, testing and deployment of software. For example, such automation has been reported [22] to help Flickr deploy to production more than 10 times per day. Others [40] claim that by adopting CI and a more agile planning process, a product group at HP reduced development costs by 78%. + + + + +These success stories have led to CI growing in interest and popularity. Travis CI [17], a popular CI service, reports that over 300,000 projects are using Travis. The State of Agile industry survey [48], with 3,880 participants, found 50% of respondents use CI. The State of DevOps report [49] finds CI to be one of the indicators of "high performing IT organizations". Google Trends [11] shows a steady increase of interest in CI: searches for “Continuous Integration” increased 350% in the last decade. + + +Despite the growth of CI, the only published research paper related to CI usage [53] is a preliminary study, conducted on 246 projects, which compares several quality metrics of projects that use or do not use CI. However, the study does not present any detailed information on how projects use CI. In fact, despite some folkloric evidence about the use of CI, there is no systematic study about CI systems. + + +Not only do we lack basic knowledge about the extent to which open-source projects are adopting CI, but also we have no answers to many important questions related to CI. What are the costs of CI? Does CI deliver on the promised benefits, such as releasing more often, or helping make changes (e.g., to merge pull requests) faster? Do developers maximize the usage of CI? Despite the widespread popularity of CI, we have very little quantitative evidence on its benefits. This lack of knowledge can lead to poor decision making and missed opportunities. Developers who choose not to use CI can be missing out on the benefits of CI. Developers who do choose to use CI might not be using it to its fullest potential. Without knowledge of how CI is being used, tool builders can be misallocating resources instead of having data about where automation and improvements are most needed by their users. By not studying CI, researchers have a blind spot which prevents them from providing solutions to the hard problems that practitioners face. + + +In this paper we use three complementary methods to study the usage of CI in open-source projects. To understand the extent to which CI has been adopted by developers, and which CI systems developers use, we analyzed 34,544 open-source projects from GitHub. To understand how developers use CI, we analyzed 1,529,291 builds from Travis CI, the most commonly used CI service for GitHub projects (Section 4.1). +To understand why projects use or do not use CI, we surveyed 442 developers. + + +With this data, we answer several research questions that we grouped into three themes: + + +Theme 1: Usage of CI + + +RQ1: + What percentage of open-source projects use CI? + + +RQ2: + What is the breakdown of usage of different CI services? + + +RQ3: + Do certain types of projects use CI more than others? + + +RQ4: + When did open-source projects adopt CI? + + +RQ5: + Do developers plan on continuing to use CI? + + +We found that CI is widely used, and the number of projects which are adopting CI is growing. We also found that the most popular projects are most likely to use CI. + + +Theme 2: Costs of CI + + +RQ6: + Why do open-source projects choose not to use CI? + + +RQ7: + How often do projects evolve their CI configuration? + + +RQ8: + What are some common reasons projects evolve their CI configuration? + + +RQ9: + How long do CI builds take on average? + + +We found that the most common reason why developers are not using CI is lack of familiarity with CI. We also found that the average project makes only 12 changes to their CI configuration file and that many such changes can be automated. + + +Theme 3: Benefits of CI + + +RQ10: + Why do open-source projects choose to use CI? + + +RQ11: + Do projects with CI release more often? + + +RQ12: + Do projects which use CI accept more pull requests? + + +RQ13: + Do pull requests with CI builds get accepted faster (in terms of calendar time)? + + +RQ14: + Do CI builds fail less on master than on other non-master branches? + + +We first surveyed developers about the perceived benefits of CI, then we empirically evaluated these claims. We found that projects that use CI release twice as often as those that do not use CI. We also found that projects with CI accept pull requests faster than projects without CI. + + +This paper makes the following contributions: + + + + + + +Research Questions: + We designed 14 novel research questions. We are the first to provide in-depth answers to questions about the usage, costs, and benefits of CI. + + + + + + +Data Analysis: + We collected and analyzed CI usage data from 34,544 open-source projects. Then we analyzed in-depth all CI data from a subset of 620 projects and their 1,529,291 builds, 1,503,092 commits, and 653,404 pull requests. Moreover, we surveyed 442 open-source developers about why they chose to use or not use CI. + + + + + + +Implications: + We provide practical implications of our findings from the perspective of three audiences: researchers, developers, and tool builders. Researchers should pay attention to CI because it is not a passing fad. For developers we list several situations where CI provides the most value. Moreover, we discovered several opportunities where automation can be helpful for tool builders. + + + + + + +More details about our data sets and results are available at http://cope.eecs.oregonstate.edu/CISurvey +---------------------------------------- +------------------------------- +Section 114: +2. OVERVIEW OF CI +---------------------------------------- +------------------------------- +Section 115: +2.1 History and Definition of CI + + +The idea of Continuous Integration (CI) was first introduced in 1991 by Grady Booch [26], in the context of object-oriented design: “At regular intervals, the process of continuous integration yields executable releases that grow in functionality at every release...” This idea was then adopted as one of the core practices of Extreme Programming (XP) [23]. + + +However, the idea began to gain acceptance after a blog post by Martin Fowler [37] in 2000. The motivating idea of CI is that the more often a project can integrate, the better off it is. The key to making this possible, according to Fowler, is automation. Automating the build process should include retrieving the sources, compiling, linking, and running automated tests. The system should then give a “yes” or “no” indicator of whether the build was successful. This automated build process can be triggered either manually or automatically by other actions from the developers, such as checking in new code into version control. + + +These ideas were implemented by Fowler in CruiseControl [9], the first CI system, which was released in 2001. Today there are over 40 different CI systems, and some of the most well-known ones include Jenkins [12] (previously called Hudson), Travis CI [17], and Microsoft Team Foundation Server (TFS) [15]. Early CI systems usually ran locally, and this is still widely done for Jenkins and TFS. However, CI as a service has become more and more popular, e.g., Travis CI is only available as a service, and even Jenkins is offered as a service via the CloudBees platform [6]. +---------------------------------------- +------------------------------- +Section 116: +2.2 Example Usage of CI + + +We now present an example of CI that comes from our data. The pull request we are using can be found here: https://github.com/RestKit/RestKit/pull/2370. A developer named “Adlai-Holler” created pull request #2370 named “Avoid Flushing In-Memory Managed Object Cache while Accessing” to work around an issue titled “Duplicate objects created if inserting relationship mapping using RKInMemoryManagedObjectCache” for the project RestKit [13]. The developer made two commits and then created a pull request, which triggered a Travis CI build. The build failed, because of failing unit tests. A RestKit project member, “segiddins”, then commented on the pull request, and asked Adlai-Holler to look into the test failures. Adlai-Holler then committed two new changes to the same pull request. Each of these commits triggered a new CI build. The first build failed, but the second was successful. Once the CI build passed, the RestKit team member commented “seems fine” and merged the pull request. +---------------------------------------- +------------------------------- +Section 117: +3. METHODOLOGY + + +To understand the extent to which CI is used and which CI systems developers use, we analyzed 34,544 open-source projects from GitHub with our breadth corpus. To understand how developers use CI, we analyzed 1,529,291 builds on the most popular CI system in our depth corpus. To understand why projects use or do not use CI, we surveyed 442 developers. +3.1 Breadth Corpus + + +The breadth corpus has a large number of projects, and information about what CI services each project uses. We use the breadth corpus to answer broad questions about the usage of CI in open-source projects. We collected the data for this corpus primarily via the GitHub API. We first sorted GitHub projects by their popularity, using the star rating (whereby users can mark, or “star”, some projects that they like, and hence each project can accumulate stars). We started our inspection from the top of the list, first by manually looking at the top 50 projects. We collected all publicly available information about how these projects use CI. We then used what we learned from this manual inspection to write a script to programmatically classify which CI service (if any) a project uses. The four CI services that we were able to readily identify manually and later by our script are (sorted in the order of their usage): Travis CI [17], CircleCI [5], AppVeyor [2], and Werker [18]. All of these services provide public API’s which we queried to determine if a project is using that service. + + +Moreover, we wanted to ensure that we had collected as complete data as possible. When we examined the data by hand, we found that several projects were using CloudBees [6], a CI service powered by the Jenkins CI. However, given a list of GitHub projects, there is no reliable way to programmatically identify from the GitHub API which projects use CloudBees. (In contrast, Travis CI uses the same organization and project names as GitHub, making it easy to check correspondence between Travis CI and GitHub projects.) We contacted CloudBees, and they sent us a list of open-source projects that have CloudBees build set up. We then wrote a script to parse that list, inspect the build information, and search for the corresponding GitHub repository (or repositories) for each build on CloudBees. We then used this data to identify the projects from our breadth corpus that use CloudBees. This yielded 1,018 unique GitHub repositories/projects. To check whether these projects refer to CloudBees, we searched for (case insensitive) “CloudBees” in the README files of these projects and found that only 256 of them contain “CloudBees”. In other words, had we not contacted CloudBees directly, using only the information available on GitHub, we would have missed a large number of projects that use CloudBees. + + +Overall, the breadth corpus consists of 34,544 projects. For each project, we collected the following information: project name and owner, the CI system(s) that the project uses (if any), popularity (as measured by the number of stars), and primary programming language (as determined by GitHub). + + +3.2 Depth Corpus + + +The depth corpus has fewer projects, but for each project we collect all the information that is publicly available. For this subset of projects, we collected additional data to gain a deeper understanding of the usage, costs, and benefits of CI. Analyzing our breadth corpus, as discussed in Section 4.1, we learned that Travis CI is by far the most commonly used CI service among open-source projects. Therefore, we targeted projects using Travis CI for our depth corpus. First, we collected the top 1,000 projects from GitHub ordered by their popularity. Of those 1,000 projects, we identified 620 projects that use Travis CI, 37 use AppVeyor, 166 use CircleCI, and 3 use Werker. We used the Travis CI API(^1) to collect the entire build history for each project in our depth corpus, for a total of 1,529,291 builds. Using GHTorrent [39], we collected the full history of pull requests for each project, for a total of 653,404 pull requests. Additionally, we cloned every project in our corpus, to access the entire commit history and source code. + + +3.3 Survey + + +Even after collecting our diverse breadth and depth corpora, we were still left with questions that we could not answer from the online data alone. These questions were about why developers chose to use or not use CI. We designed a survey to help us answer a number of such “why” questions, as well as to provide us another data source to better understand CI usage. We deployed our survey by sending it to all the email addresses publicly listed as belonging to the organizations of all the top 1,000 GitHub projects (again rated by the popularity). In total, we sent 4,508 emails. + + +Our survey consisted of two flows, each with three questions. The first question in both flows asked if the participant used CI or not. Depending on the answer they gave to this question, the second question asked the reasons why they use or do not use CI. These questions were multiple-choice, multiple-selection questions where the users were asked to select all the reasons that they agreed with. To populate the choices, we collected some common reasons for using or not using CI, as mentioned in websites [1,7], blogs [3,8,19], and Stack Overflow [14]. Optionally, the survey participants could also write their own reason(s) that we did not already list. The third question asked if the participant plans on using CI for future projects. + + +To incentivize participation, we raffled off a 50 USD gift card among the survey respondents. 442 (9.8% response rate) participants responded to our survey. Of those responses, 407 (92.1%) indicated that they do use CI, and 35 (7.9%) indicated that they do not use CI. + + + + +RESULTS + + + + +In this section, we present the results to our research questions. Section 4.1 presents the results about the usage of CI. Section 4.2 discusses the costs of CI. Finally Section 4.3 presents the benefits of CI. Rather than presenting implications after each research question, we draw from several research questions to triangulate implications that we present in Section 5. + + +4.1 Usage of CI + + +To determine the extent to which CI is used, we study what percentage of projects actively use CI, and we also ask developers if they plan to use CI in the future. Furthermore, we study whether the project popularity and programming language correlate with the usage of CI. + + +RQ1: What percentage of open-source projects use CI? + + +We found that 40% of all the projects in our breadth corpus use CI. Table 1 shows the breakdown of the usage. Thus, CI is indeed used widely and warrants further investigation. + + +(^1)We are grateful to the Travis CI developers for promptly resolving a bug report that we submitted; prior to them resolving this bug report, one could not query the full build history of all projects. +Additionally, we know that our scripts do not find all CI usage (e.g., projects that run privately hosted CI systems, as discussed further in Section 6.2). We can reliably detect the use of (public) CI services only if their API makes it possible to query the CI service based on knowing the GitHub organization and project name. Therefore, the results we present are a lower bound on the total number of projects that use CI. + + +Table 2: CI usage by Service. The top row shows percent of all CI projects using that service, the second row shows the total number of projects for each service. Percents add up to more than 100 due to some projects using multiple CI services. + + +| Usage by CI Service | Travis | CircleCI | AppVeyor | CloudBees | Werker | +|---------------------|--------|----------|----------|-----------|--------| +| 90.1% | 19.1% | 3.5% | 1.6% | 0.4% | | +| 12528 | 2657 | 484 | 223 | 59 | | + + +RQ2: What is the breakdown of usage of different CI services? + + +Next we investigate which CI services are the most widely used in our breadth corpus. Table 2 shows that Travis CI is by far the most widely used CI service. Because of this result, we feel confident that our further analysis can focus on the projects that use Travis CI as a CI service, and that analyzing such projects gives representative results for usage of CI services in open-source projects. + + +We also found that some projects use more than one CI service. In our breadth corpus, of all the projects that use CI, 14% use more than one CI. We think this is an interesting result which deserves future attention. + + +RQ3: Do certain types of projects use CI more than others? + + +To better understand which projects use CI, we look for characteristics of projects that are more likely to use CI. + + +CI usage by project popularity: We want to determine whether more popular projects are more likely to use CI. Our intuition is that if CI leads to better outcomes, then we would expect to see higher usage of CI among the most popular projects (or, alternatively, that projects using CI get better and thus are more popular). Figure 1 shows that the most popular projects (as measured by the number of stars) are also the most likely to use CI (Kendall’s $\tau$, $p < 0.00001$). + + +We group the projects from our breadth corpus into 64 even groups, ordered by number of stars. We then calculate the percent of projects in each group that are using CI. Each group has around 540 projects. In the most popular (starred) group, 70% of projects use CI. As the projects become less popular, the percentage of projects using CI declines to 23%. + + +Observation + + +Popular projects are more likely to use CI. + + +CI usage by language: We now examine CI usage by programming language. Are there certain languages for which the projects written primarily in such languages use CI more than others? Table 3 shows projects sorted by the percentage of projects that use CI for each language, from our breadth corpus. The data shows that in fact there are certain languages that use CI more than others. Notice that the usage of CI does not perfectly correlate with the number of projects using that language (as measured by the number of projects using a language, with its rank by percentage, Kendall’s $\tau$, $p > 0.68$). In other words, some of the languages that use CI the most are both popular languages like Ruby and emerging languages like Scala. Similarly, among projects that use CI less, we notice both popular languages such as Objective-C and Java, as well as less popular languages such as VimL. + + +However, we did observe that many of the languages that have the highest CI usage are also dynamically-typed languages (e.g., Ruby, PHP, CoffeeScript, Clojure, Python, and JavaScript). One possible explanation may be that in the absence of a static type system which can catch errors early on, these languages use CI to provide extra safety. + + +Observation + + +We observe a wide range of projects that use CI. The popularity of the language does not correlate with the probability that a project uses CI. + + +RQ4: When did open-source projects adopt CI? + + +We next study when projects began to adopt CI. Figure 2 shows the number of projects using CI over time. We answer this question with our depth corpus, because the breadth corpus does not have the date of the first build, which we use to determine when CI was introduced to the project. Notice that we are collecting data from Travis CI, which was founded in 2011 [10]. Figure 2 shows that CI has experienced a steady growth over the last 5 years. + + +We also analyze the age of each project when developers first introduced CI, and we found that the median time was around 1 year. Based on this data, we conjecture that while many developers introduce CI early in a project’s +Table 3: CI usage by programming language. For each language, the columns tabulate: the number of projects from our corpus that predominantly use that language, how many of these projects use CI, the percentage of projects that use CI. + + +| Language | Total Projects | # Using CI | Percent CI | +|------------|----------------|------------|------------| +| Scala | 329 | 221 | 67.17 | +| Ruby | 2721 | 1758 | 64.61 | +| Go | 1159 | 702 | 60.57 | +| PHP | 1806 | 982 | 54.37 | +| CoffeeScript | 343 | 176 | 51.31 | +| Clojure | 323 | 152 | 47.06 | +| Python | 3113 | 1438 | 46.19 | +| Emacs Lisp | 150 | 67 | 44.67 | +| JavaScript | 8495 | 3692 | 43.46 | +| Other | 1710 | 714 | 41.75 | +| C++ | 1233 | 483 | 39.17 | +| Swift | 723 | 273 | 37.76 | +| Java | 3371 | 1188 | 35.24 | +| C | 1321 | 440 | 33.31 | +| C# | 652 | 188 | 28.83 | +| Perl | 140 | 38 | 27.14 | +| Shell | 709 | 185 | 26.09 | +| HTML | 948 | 241 | 25.42 | +| CSS | 937 | 194 | 20.70 | +| Objective-C| 2745 | 561 | 20.44 | +| VimL | 314 | 59 | 18.79 | + + +development lifetime, it is not always seen as something that provides a large amount of value during the very initial development of a project. + + +Observation + + +The median time for CI adoption is one year. + + +RQ5: + Do developers plan on continuing to use CI? Is CI a passing “fad” in which developers will lose interest, or will it be a lasting practice? While only time will tell what the true answer is, to get some sense of what the future could hold, we asked developers in our survey if they plan to use CI for their next project. We asked them how likely they were to use CI on their next project, using a 5-point Likert scale ranging from definitely will use to definitely will not use. Figure 3 shows that developers feel very strongly that they will be using CI for their next project. The top two options, ‘Definitely’ and ‘Most Likely’, account for 94% of all our survey respondents, and the average of all the answers was 4.54. While this seems like a pretty resounding endorsement for the continued use of CI, we decided to dig a little deeper. Even among respondents who are not currently using CI, 53% said that they would ‘Definitely’ or ‘Most Likely’ use CI for their next project. + + +Observation + + +While CI is widely used in practice nowadays, we predict that in the future, CI adoption rates will increase even further. +---------------------------------------- +------------------------------- +Section 118: +4.2 Costs of CI + + +To better understand the costs of CI, we analyze both the survey (where we asked developers why they believe CI is too costly to be worth using) and the data from our depth corpus. We estimate the cost to developers for writing and maintaining the configuration for their CI service. Specifically, we measure how often the developers make changes to their configuration files and study why they make those changes to the configuration files. We also analyze the cost in terms of the time to run CI builds. Note that the time that the builds take to return a result could be unproductive time if the developers do not know how to proceed without knowing that result. + + +RQ6: + Why do open-source projects choose not to use CI? + + +One way to evaluate the costs of CI is to ask developers why they do not use CI. In our survey, we asked respondents whether they chose to use or not use CI, and if they indicated that they did not, then we asked them to tell us why they do not use CI. + + +Table 4 shows the percentage of the respondents who selected particular reasons for not using CI. As mentioned before, we built the list of possible reasons by collecting information from various popular internet sources. Interestingly, the primary cost that respondents identified was not a technical cost; instead, the reason for not using CI was that “The developers on my project are not familiar enough with CI.” We do not know if the developers are not familiar enough with the CI tools themselves (e.g., Travis CI), or if they are unfamiliar with all the work it will take to add CI to their project, including perhaps fully automating the build. To completely answer this question, more research is needed. + + +The second most selected reason was that the project does not have automated tests. This speaks to a real cost for CI, in +Table 4: Reasons developers gave for not using CI + + +| Reason | Percent | +|------------------------------------------------------------------------|---------| +| The developers on my project are not familiar enough with CI | 47.00 | +| Our project doesn’t have automated tests | 44.12 | +| Our project doesn’t commit often enough for CI to be worth it | 35.29 | +| Our project doesn’t currently use CI, but we would like to in the future | 26.47 | +| CI systems have too high maintenance costs (e.g., time, effort, etc.) | 20.59 | +| CI takes too long to set up | 17.65 | +| CI doesn’t bring value because our project already does enough testing| 5.88 | + + +Figure 4: Number of changes to CI configs, median number of changes is 12 + + +that much of its value comes from automated tests, and some projects find that developing good automated test suites is a substantial cost. Even in the cases where developers had automated tests, some questioned the use of CI (in particular and regression testing in general); one respondent (P74) even said “In 4 years our tests have yet to catch a single bug.” + + +Observation + + +The main reason why open-source projects choose to not use CI is that the developers are not familiar enough with CI. + + +RQ7: How often do projects evolve their CI configuration? + + +We ask this question to identify how often developers evolve their CI configurations. Is it a “write-once-and-forget-it” situation, or is it something that evolves constantly? The Travis CI service is configured via a YAML [20] file, named .travis.yml, in the project’s root directory. YAML is a human-friendly data serialization standard. To determine how often a project has changed its configuration, we analyzed the history of every .travis.yml file and counted how many times it has changed. We calculate the number of changes from the commits in our depth corpus. Figure 4 shows the number of changes/commits to the .travis.yml file over the life of the project. We observe that the median of number of changes to a project’s CI configuration is 12 times, but one of the projects changed the CI configuration 266 times. This leads us to conclude that many projects setup CI once and then have minimal involvement (25% of projects have 5 or less changes to their CI configuration), but some projects do find themselves changing their CI setup quite often. + + +Observation + + +Some projects change their configurations relatively often, so it is worthwhile to study what these changes are. + + +Table 5: Reasons for CI config changes + + +| Config Area | Total Edits | Percentage | +|---------------------------|-------------|------------| +| Build Matrix | 9718 | 14.70 | +| Before Install | 8549 | 12.93 | +| Build Script | 8328 | 12.59 | +| Build Language Config | 7222 | 10.92 | +| Build Env | 6900 | 10.43 | +| Before Build Script | 6387 | 9.66 | +| Install | 4357 | 6.59 | +| Whitespace | 3226 | 4.88 | +| Build platform Config | 3058 | 4.62 | +| Notifications | 2069 | 3.13 | +| Comments | 2004 | 3.03 | +| Git Configuration | 1275 | 1.93 | +| Deploy Targets | 1079 | 1.63 | +| After Build Success | 1025 | 1.55 | +| After Build Script | 602 | 0.91 | +| Before Deploy | 133 | 0.20 | +| After Deploy | 79 | 0.12 | +| Custom Scripting | 40 | 0.06 | +| After Build Failure | 39 | 0.06 | +| After Install | 14 | 0.02 | +| Before Install | 10 | 0.02 | +| Mysql | 5 | 0.01 | +| After Build Success | 3 | 0.00 | +| Allow Failures | 2 | 0.00 | + + +RQ8: What are some common reasons projects evolve their CI configuration? + + +To better understand the changes to the CI configuration files, we analyzed all the changes that were made to the .travis.yml files in our depth corpus. Because YAML is a structured language, we can parse the file and determine which part of the configuration was changed. Table 5 shows the distribution of all the changes. The most common changes were to the build matrix, which in Travis specifies a combination of runtime, environment, and exclusions/inclusions. For example, a build matrix for a project in Ruby could specify the runtimes rvm 2.2, rvm 1.9, and jruby, the build environment rails2 and rails3, and the exclusions/inclusions, e.g., exclude: jruby with rails2. All combinations will be built except those excluded, so in this example there would be 5 different builds. Other common changes included the dependent libraries to install before building the project (what .travis.yml calls before install) and changes to the build script themselves. Also, many other changes were due to the version changes of dependencies. + +RQ9: How long do CI builds take on average? + + +Another cost of using CI is the time to build the application and run all the tests. This cost represents both a cost of energy(^2) for the computing power to run these builds, but also developers may have to wait to see if their build passes before they merge in the changes, so having longer build times means more wasted developer time. + + +The average build time is just under 500 seconds. To compute the average build times, we first remove all the canceled (incomplete, manually stopped) build results, and only consider the time for errored, failed, and passed (completed builds). Errored builds are those that occur before the build begins (e.g., when a dependency cannot be downloaded), and failed builds are those that the build is not completed successfully (e.g., a unit test fail). To further understand the data, we look at each outcome independently. Interestingly, we find that passing builds run faster than either errored or failed builds. The difference between errored and failed is significant (Wilcoxon, (p < 0.0001)), as is the difference between passed and errored (Wilcoxon, (p < 0.0001)) and the difference between passed and failed (Wilcoxon, (p < 0.0001)). + + +We find this result surprising as our intuition is that passing builds should take longer, because if an error state is encountered early on, the process can abort and return earlier. Perhaps it is the case that many of the faster running pass builds are not generating a meaningful result, and should not have been run. However, more investigation is needed to determine what the exact reasons for this are. + + +(^2)This cost should not be underestimated; our personal correspondence with a Google manager in charge of their CI system TAP reveals that TAP costs millions of dollars just for the computation (not counting the cost of developers who maintain or use TAP). + + +4.3 Benefits of CI + + +We first summarize the most commonly touted benefits of CI, as reported by the survey participants. We then analyze empirically whether these benefits are quantifiable in our depth corpus. Thus, we confirm or refute previously held beliefs about the benefits of CI. + + +RQ10: Why do open-source projects choose to use CI? + + +Having found that CI is widely used in open-source projects (RQ1), and that CI is most widely used among the most popular projects on GitHub (RQ3), we want to understand why developers choose to use CI. However, why a project uses CI cannot be determined from a code repository. Thus, we answer this question using our survey data. + + +Table 6 shows the percentage of the respondents who selected particular reasons for using CI. As mentioned before, we build this list of reasons by collecting information from various popular internet sources. The two most popular reasons were “CI makes us less worried about breaking our builds” and “CI helps us catch bugs earlier”. One respondent (P371) added: “Acts like a watchdog. You may not run tests, or be careful with merges, but the CI will. :)” + + +Martin Fowler [7] is quoted as saying “Continuous Integration doesn’t get rid of bugs, but it does make them dramatically easier to find and remove.” However, in our survey, very few projects felt that CI actually helped them during the debugging process. + + +RQ11: Do projects with CI release more often? + + +One of the more common claims about CI is that it helps projects release more often, e.g., CloudBees motto is “Deliver Software Faster” [6]. Over 50% of the respondents from our survey claimed it was a reason why they use CI. We analyze our data to see if we can indeed find evidence that would support this claim. + + +We found that projects that use CI do indeed release more often than either (1) the same projects before they used CI or (2) the projects that do not use CI. In order to compare across projects and periods, we calculated the release rate as the number of releases per month. Projects that use CI average .54 releases per month, while projects that do not use CI average .24 releases per month. That is more than double the release rate, and the difference is statistically significant (Wilcoxon, (p < 0.00001)). To identify the effect of CI, we also compared, for projects that use CI, the release rate both before and after the first CI build. We found that projects that eventually added CI used to release at a rate of .34 releases per month, well below the .54 rate at which they release now with CI. This difference is statistically significant (Wilcoxon, (p < 0.00001)). + + +RQ12: Do projects which use CI accept more pull requests? + + +For a project that uses a CI service such as Travis CI, when the CI server builds a pull request, it annotates the pull request on GitHub with a visual cue such as a green check mark or a red ‘X’ that shows whether the pull request was able to build successfully on the CI server. Our intuition is that this extra information can help developers better decide whether or not to merge a pull request into their code. To determine if this extra information indeed makes a difference, we compared the pull request acceptance rates between pull +Table 6: Reasons for using CI, as reported by survey participants + + +| Reason | Percent | +|------------------------------------------------------------------------|---------| +| CI makes us less worried about breaking our builds | 87.71 | +| CI helps us catch bugs earlier | 79.61 | +| CI allows running our tests in the cloud, freeing up our personal machines | 54.55 | +| CI helps us deploy more often | 53.32 | +| CI makes integration easier | 53.07 | +| CI runs our tests in a real-world staging environment | 46.00 | +| CI lets us spend less time debugging | 33.66 | + + +Table 7: Release rate of projects + + +| Uses Travis | Versions Released per Month | +|-------------|-----------------------------| +| Yes | .54 | +| No | .24 | + + +Table 8: Comparison of pull requests merged for pull requests that had or did not have CI information + + +| CI Usage | % Pull Requests Merged | +|----------|------------------------| +| Using CI | 23 | +| Not Using CI | 28 | + + +requests that have this CI information and pull requests that do not have it, from the depth corpus. Note that projects can exclude some branches from their repository to not run on the CI server, so just because a project uses CI on some branch, there is no guarantee that every pull request contains the CI build status information. + + +Table 8 shows the results for this question. We found that pull requests without CI information were 5pp more likely to be merged than pull requests with CI information. Our intuition of this result is that those 5pp of pull requests have problems which are identified by the CI. By not merging these pull requests, developers can avoid breaking the build. This difference is statistically significant (Fisher’s Exact Test: ( p < 0.00001 )). This also fits with our survey result that developers say that using CI makes them less worried about breaking the build. One respondent (P219) added that CI “Prevents contributors from releasing breaking builds”. By not merging in potential problem pull requests, developers can avoid breaking their builds. + + +Observation + + +CI build status can help developers avoid breaking the build by not merging problematic pull requests into their projects. + + +RQ13: + Do pull requests with CI builds get accepted faster (in terms of calendar time)? + + +Once a pull request is submitted, the code is not merged until the pull request is accepted. The sooner a pull request is accepted, the sooner the code is merged into the project. In the previous question, we saw that projects using CI accept fewer (i.e., reject or ignore more) pull requests than projects not using CI. In this question, we consider only accepted pull requests, and ask whether there is a difference in the time it takes for projects to accept pull requests with and without CI. One reason developers gave for using CI is that it makes integration easier. One respondent (P183) added “To be more confident when merging PRs”. If integration is easier, does it then translate into pull requests being integrated faster? + + +Figure 6 shows the distributions of the time to accept pull requests, with and without CI. To compute these results, we select, from our depth corpus, all the pull requests that were accepted, both with and without build information from the CI server. The mean time with CI is 81 hours, but the median is only 5.2 hours. Similarly, the mean time without CI is 140 hours, but the median is 6.8 hours. Comparing the median time to accept the pull requests, we find that the median pull request is merged 1.6 hours faster than pull requests without CI information. This difference is statistically significant (Wilcoxon, ( p < 0.0000001 )). + + +Observation + + +CI build status can make integrating pull requests faster. When using CI, the median pull request is accepted 1.6 hours sooner. + + +Table 9: Percentage of builds that succeed by pull request target + + +| Pull Request Target | Percent Passed Builds | +|---------------------|-----------------------| +| Master | 72.03 | +| Other | 65.36 | + + +RQ14: + Do CI builds fail less on master than on other non-master branches? + + +The most popular reason that participants gave for using CI was that it helps avoid breaking the build. Thus, we analyze this claim in the depth corpus. Does the data show a difference in the way developers use CI with the master branch vs. with the other branches? Is there any difference between how many builds fail on master vs. on the +other branches? Perhaps developers take more care when writing a pull request for master than for another branch. + + +Table 9 shows the percentage of builds that pass in pull requests to the master branch, compared to all other branches. We found that pull requests are indeed more likely to pass when they are on master. + + +Observation + + +CI builds on the master branch pass more often than on the other branches. + + + + +IMPLICATIONS + + + + +We offer practical implications of our findings for researchers, developers, and tool builders. + + +Researchers + + +RQ1, RQ3, RQ4, RQ5: CI is not a “fad” but is here to stay. Because CI is widely used and more projects are adopting it, and has not yet received much attention from the research community, it is time for researchers to study its use and improve it, e.g., automate more tasks (such as setting up CI). We believe that researchers can contribute many improvements to the CI process once they understand the current state-of-the-practice in CI. + + +RQ2: Similarly with how GitHub has become the main gateway for researchers who study software, we believe Travis CI can become the main gateway for researchers who study CI. Travis offers a wealth of CI data, accessible via public API. Therefore, researchers can maximize their impact by studying a single system. + + +RQ7, RQ8: We found evidence of frequent evolution of CI configuration files (similar evolution was found for Makefiles [21]), so researchers can focus on providing support for safe automation of changes in configuration files, e.g., via safe refactoring tools. + + +RQ6, Table 4: The most common reason why developers do not use CI is unfamiliarity with CI, so there is tremendous opportunity for providing educational resources. We call upon university educators to enrich their software engineering curriculum to cover the basic concepts and tooling for CI. + + +Developers + + +RQ3, Table 3: The data shows that CI is more widely embraced by the projects that use dynamically typed languages (e.g., 64% of 2721 Ruby projects use CI, compared with only 20% of 2745 Objective-C projects that use CI). To mitigate the lack of a static type system, developers that use dynamically typed languages should use CI to run tests and help catch errors early on. + + +RQ13: Our analysis of the depth corpus shows that the presence of CI makes it easier to accept contributions in open-source projects, and this was also indicated by several survey respondents, e.g., “CI gives external contributors confidence that they are not breaking the project” (P310). Considering other research [43] that reports a lack of diversity in open-source projects, attracting new contributors is desirable. Thus, projects that aim to diversify their pool of contributors should consider using CI. + + +RQ7, RQ9: Because the average times for a single CI build is fairly short, and CI configurations are maintainable, it appears that the benefits of CI outweigh the costs. Thus, developers should use CI for their projects. + + +RQ3, RQ11, RQ12, RQ14: The use of CI correlates with positive outcomes, and CI has been adopted by the most successful projects on GitHub, so developers should consider CI as a best practice and should use it as widely as possible. + + +Tool Builders + + +RQ6: CI helps catching bugs, but not locating them. The CI build logs often bury an important error message among hundreds of lines of raw output. Thus, tool builders that want to improve CI can focus on new ways to integrate fault-localization techniques into CI. + + +RQ1, RQ7, RQ8: Despite wide adoption, there are many projects that have yet to use CI. Tool builders could parse build files [56], and then generate configuration files necessary for CI. By automating this process, tool builders can lower the entry barrier for developers who are unfamiliar with CI. + + + + +THREATS TO VALIDITY + + + + +6.1 Construct + + +Are we asking the right questions? We are interested in assessing the usage of CI in open-source projects. To do this we have focused on what, how, and why questions. We think that these questions have high potential to provide unique insight and value for different stakeholders: developers, tool builders, and researchers. + + +6.2 Internal + + +Is there something inherent to how we collect and analyze CI usage data that could skew the accuracy of our results? + + +Once a CI server is configured, it will continue to run until it is turned off. This could result in projects configuring a CI server, and then not taking into account the results as they continue to do development. However, we think this is unlikely because Travis CI and GitHub have such close integration. It would be difficult to ignore the presence of CI when there are visual cues all throughout GitHub when a project is using CI. + + +Some CI services are run in a way such that they cannot be detected from the information that is publicly available in the GitHub repository. This means that we could have missed some projects. However, this would mean that we are underestimating the extent to which CI has been adopted. + + +Despite a 9.8% response rate to our survey, still over 90% of our targeted population did not respond. We had no control over who responded to our survey, so it may suffer from self-selection bias. We think this is likely because 92% of our survey participants reported using CI, much higher than the percentage of projects we observed using CI in the data. In order to mitigate this, we made the survey short. +and provided a raffle as incentive to participate, to get the most responses as possible. + + +6.3 External + + +Are our results generalizable for general CI usage? While we analyzed a large number of open-source repositories, we cannot guarantee that these results will be the same for proprietary (closed-source) software. In fact, we consider it very likely that closed-source projects would be unwilling to send their source over the internet to a CI service, so our intuition is that they would be much more likely to use a local CI solution. Further work should be done to investigate the usage of CI in closed-source projects. + + +Because we focused on Travis CI, it could be that other CI services are used differently. As we showed in RQ2, Travis CI was the overwhelming favorite CI service to use, so by focusing on that we think our results are representative. + + +Additionally, we only selected projects from GitHub. Perhaps open-source projects that have custom hosting also would be more likely to have custom CI solutions. More work is needed to determine if these results generalize. + + + + +RELATED WORK + + + + +We group our related work into three different areas: (i) CI usage, (ii) CI technology, and (iii) related technology. + + +CI Usage + + +The closest work to ours is by Vasilescu et al. [53] who present two main findings. They find that projects that use CI are more effective at merging requests from core members, and the projects that use CI find significantly more bugs. However, the paper explicitly states that it is a preliminary study on only 246 GitHub projects, and treats CI usage as simply a boolean value. In contrast, this paper examines 34,544 projects, 1,529,291 builds, and 442 survey responses to provide detailed answers to 14 research questions about CI usage, costs, and benefits. + + +A tech report from Beller et al. [25] performs an analysis of CI builds on GitHub, specifically focusing on Java and Ruby languages. They answer several research questions about tests, including “How many tests are executed per build?”, “How often do tests fail?”, and “Does integration in different environments lead to different test results?”. These questions however, do not serve to comprehensively support or refute the productivity claims of CI. + + +Two other papers [44,46] have analyzed a couple of case studies of CI usage. These are just two case studies total, unlike this paper that analyzes a broad and diverse corpus. + + +Leppänen et al. [45] interviewed developers from 15 software companies about what they perceived as the benefits of CI. They found one of the perceived benefits to be more frequent releases. One of their participants said CI reduced release time from six months to two weeks. Our results confirm that projects that use CI release twice as fast as projects that do not use CI. + + +Beller et al. [24] find that developers report testing three times more often than they actually do test. This over-reporting shows that CI is needed to ensure tests are actually run. This confirms what one of our respondents (P287) said: “It forces contributors to run the tests (which they might not otherwise do)” Kochhar et al. [42] found that larger Java open-source projects had lower test coverage rates, also suggesting that CI can be beneficial. + + +CI technology + + +Some researchers have proposed approaches to improve CI servers by having servers communicate dependency information [31], generating tests during CI [30], or selecting tests based on code churn [41]. Also researchers [27] have found that integrating build information from various sources can help developers. In our survey, we found that developers do not think that CI helps them locate bugs; this problem has been also pointed out by others [28]. + + +One of the features of CI systems is that they report the build status so that it is clear to everyone. Downs et al. [32] developed a hardware based system with devices shaped like rabbits which light up with different colors depending on the build status. These devices keep developers informed about the status of the build. + + +Related Technology + + +A foundational technology for CI is build systems. Some ways researchers have tried to improve their performance has been by incremental building [35] as well as optimizing dependency retrieval [29]. + + +Performing actions continuously can also bring extra value, so researchers have proposed several activities such as continuous test generation [54], continuous testing (continuously running regression tests in the background) [50], continuous compliance [36], and continuous data testing [47]. + + + + +CONCLUSIONS + + + + +CI has been rising as a big success story in automated software engineering. In this paper we study the usage, the growth, and the future prospects of CI using data from three complementary sources: (i) 34,544 open-source projects from GitHub, (ii) 1,529,291 builds from the most commonly used CI system, and (iii) 442 survey respondents. Using this rich data, we investigated 14 research questions. + + +Our results show there are good reasons for the rise of CI. Compared to projects that do not use CI, projects that use CI: (i) release twice as often, (ii) accept pull requests faster, and (iii) have developers who are less worried about breaking the build. Therefore, it should come as no surprise that 70% of the most popular projects on GitHub heavily use CI. + + +The trends that we discover point to an expected growth of CI. In the future, CI will have an even greater influence than it has today. We hope that this paper provides a call to action for the research community to engage with this important field of automated software engineering. + + + + +ACKNOWLEDGMENTS + + + + +We thank CloudBees for sharing with us the list of open-source projects using CloudBees, Travis for fixing a bug in their API to enable us to collect all relevant build history, and Amin Alipour, Denis Bogdanas, Mihai Codoban, Alex Gyori, Kory Kraft, Nicholas Lu, Shane McKee, Nicholas Nelson, Semih Okur, August Shi, Sruti Srinivasa Ragavan, and the anonymous reviewers for their valuable comments and suggestions on an earlier version of this paper. + + +This work was partially funded through the NSF CCF-1421503, CCF-1439957, and CCF-1553741 grants. +10. REFERENCES + + +[1] 7 reasons why you should be using continuous integration. https://about.gitlab.com/2015/02/03/7-reasons-why-you-should-be-using-ci/. Accessed: 2016-04-24. + + +[2] AppVeyor. https://www.appveyor.com/. Accessed: 2016-04-26. + + +[3] The benefits of continuous integration. https://blog.codeship.com/benefits-of-continuous-integration/. Accessed: 2016-04-24. + + +[4] Build in the cloud. http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html. Accessed: 2016-04-24. + + +[5] CircleCI. https://circleci.com/. Accessed: 2016-04-26. + + +[6] CloudBees. http://cloudbees.com/. Accessed: 2016-04-26. + + +[7] Continuous integration. https://www.thoughtworks.com/continuous-integration. Accessed: 2016-04-24. + + +[8] Continuous integration is dead. http://www.yegor256.com/2014/10/08/continuous-integration-is-dead.html. Accessed: 2016-04-24. + + +[9] CruiseControl. http://cruisecontrol.sourceforge.net/. Accessed: 2016-04-21. + + +[10] CrunchBase. https://www.crunchbase.com/organization/travis-ci#/entity. Accessed: 2016-04-24. + + +[11] Google Search Trends. https://www.google.com/trends/. Accessed: 2016-04-24. + + +[12] Jenkins. https://jenkins.io/. Accessed: 2016-04-21. + + +[13] Restkit. https://github.com/RestKit/RestKit. Accessed: 2016-04-29. + + +[14] Stackoverflow. http://stackoverflow.com/questions/214695/what-are-some-arguments-against-using-continuous-integration. Accessed: 2016-04-24. + + +[15] Team Foundation Server. https://www.visualstudio.com/en-us/products/tfs-overview-vs.aspx. Accessed: 2016-04-21. + + +[16] Tools for software engineers. http://research.microsoft.com/en-us/projects/tse/. Accessed: 2016-04-24. + + +[17] Travis CI. https://travis-ci.org/. Accessed: 2016-04-21. + + +[18] Werker. http://wercker.com/. Accessed: 2016-04-26. + + +[19] Why don’t we use continuous integration? https://blog.inf.ed.ac.uk/sapm/2014/02/14/why-dont-we-use-continuous-integration/. Accessed: 2016-04-24. + + +[20] Yaml: Yaml ain’t markup language. http://yaml.org/. Accessed: 2016-04-24. + + +[21] J. M. Al-Kofahi, H. V. Nguyen, A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen. Detecting semantic changes in Makefile build code. In ICSM, 2012. + + +[22] J. Allspaw and P. Hammond. 10+ deploys per day: Dev and ops cooperation at Flickr. https://www.youtube.com/watch?v=LdOe18KhtT4. Accessed: 2016-04-21. + + +[23] K. Beck. Embracing change with Extreme Programming. Computer, 32(10):70–77, 1999. + + +[24] M. Beller, G. Gousios, and A. Zaidman. How (much) do developers test? In ICSE, 2015. + + +[25] M. Beller, G. Gousios, and A. Zaidman. Oops, my tests broke the build: An analysis of travis ci builds with github. Technical report, PeerJ Preprints, 2016. + + +[26] G. Booch. Object Oriented Design with Applications. Benjamin-Cummings Publishing Co., Inc., 1991. + + +[27] M. Brandtner, E. Giger, and H. C. Gall. Supporting continuous integration by mashing-up software quality information. In CSMR-WCRE, 2014. + + +[28] M. Brandtner, S. C. Müller, P. Leitner, and H. C. Gall. SQA-Profiles: Rule-based activity profiles for continuous integration environments. In SANER, 2015. + + +[29] A. Celik, A. Knaust, A. Milicevic, and M. Gligoric. Build system with lazy retrieval for Java projects. In FSE, 2016. + + +[30] J. C. M. de Campos, A. Arcuri, G. Fraser, and R. F. L. M. de Abreu. Continuous test generation: Enhancing continuous integration with automated test generation. In ASE, 2014. + + +[31] S. Dössinger, R. Mordinyi, and S. Biffl. Communicating continuous integration servers for increasing effectiveness of automated testing. In ASE, 2012. + + +[32] J. Downs, B. Plimmer, and J. G. Hosking. Ambient awareness of build status in collocated software teams. In ICSE, 2012. + + +[33] S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, 2014. + + +[34] J. Engblom. Virtual to the (near) end: Using virtual platforms for continuous integration. In DAC, 2015. + + +[35] S. Erdweg, M. Lichter, and M. Weiel. A sound and optimal incremental build system with dynamic dependencies. In OOPSLA, 2015. + + +[36] B. Fitzgerald, K. J. Stol, R. O’Sullivan, and D. O’Brien. Scaling agile methods to regulated environments: An industry case study. In ICSE, 2013. + + +[37] M. Fowler. Continuous Integration. http://martinfowler.com/articles/originalContinuousIntegration.html. Accessed: 2016-04-21. + + +[38] M. Gligoric, L. Eloussi, and D. Marinov. Practical regression test selection with dynamic file dependencies. In ISSTA, 2016. + + +[39] G. Gousios. The GHTorrent dataset and tool suite. In MSR, 2013. + + +[40] J. Humble. Evidence and case studies. http://continuousdelivery.com/evidence-case-studies/. Accessed: 2016-04-29. + + +[41] E. Knauss, M. Staron, W. Meding, O. Söder, A. Nilsson, and M. Castell. Supporting continuous integration by code-churn based test selection. In RCoSE, 2015. + + +[42] P. S. Kochhar, F. Thung, D. Lo, and J. L. Lawall. An empirical study on the adequacy of testing in open source projects. In APSEC, 2014. + + +[43] V. Kuechler, C. Gilbertson, and C. Jensen. Gender differences in early free and open source software joining process. In IFIP, 2012. + + +[44] E. Laukkonen, M. Paasivaara, and T. Arvonen. Stakeholder perceptions of the adoption of continuous integration: A case study. In AGILE, 2015. +[45] M. Leppänen, S. Mäkinen, M. Pagels, V. P. Eloranta, J. Itkonen, M. V. Mäntylä, and T. Männistä. The highways and country roads to continuous deployment. +IEEE Software +, 2015. + + +[46] A. Miller. A hundred days of continuous integration. In +AGILE +, 2008. + + +[47] K. Muşlu, Y. Brun, and A. Meliou. Data debugging with continuous testing. In +FSE +, 2013. + + +[48] V. One. 10th annual state of Agile development survey. https://versionone.com/pdf/VersionOne-10th-Annual-State-of-Agile-Report.pdf, 2016. + + +[49] Puppet and DevOps Research and Assessments (DORA). 2016 state of DevOps Report. https://puppet.com/system/files/2016-06/2016%20State%20of%20DevOps%20Report.pdf, 2016. + + +[50] D. Saff and M. D. Ernst. Continuous testing in Eclipse. In +ICSE +, 2005. + + +[51] Testing at the speed and scale of Google, Jun 2011. http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html. + + +[52] Tools for continuous integration at Google scale, October 2011. http://www.youtube.com/watch?v=b52aXZ2yi08. + + +[53] B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov. Quality and productivity outcomes relating to continuous integration in GitHub. In +FSE +, 2015. + + +[54] Z. Xu, M. B. Cohen, W. Motycka, and G. Rothermel. Continuous test suite augmentation in software product lines. In +SPLC +, 2013. + + +[55] S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: A survey. +STVR +, 22(2):67–120, 2012. + + +[56] S. Zhou, J. M. Al-Kofahi, T. N. Nguyen, C. Kästner, and S. Nadi. Extracting configuration knowledge from build files with symbolic analysis. In +RELENG +, 2015. +---------------------------------------- +------------------------------- +Section 119: +Managing Episodic Volunteers in Free/Libre/Open Source Software Communities + + +Ann Barcomb, Klaas-Jan Stol, Brian Fitzgerald, and Dirk Riehle + + +Abstract—We draw on the concept of episodic volunteering (EV) from the general volunteering literature to identify practices for managing EV in free/libre/open source software (FLOSS) communities. Infrequent but ongoing participation is widespread, but the practices that community managers are using to manage EV, and their concerns about EV, have not been previously documented. We conducted a policy Delphi study involving 24 FLOSS community managers from 22 different communities. Our panel identified 16 concerns related to managing EV in FLOSS, which we ranked by prevalence. We also describe 65 practices for managing EV in FLOSS. Almost three-quarters of these practices are used by at least three community managers. We report these practices using a systematic presentation that includes context, relationships between practices, and concerns that they address. These findings provide a coherent framework that can help FLOSS community managers to better manage episodic contributors. + + +Index Terms—Best practices, community management, episodic volunteering, free software, open source software +---------------------------------------- +------------------------------- +Section 120: +1 INTRODUCTION + + +Free/Libre/Open Source Software (FLOSS) research has traditionally divided contributors into core and periphery, where core describes the minority of top developers who contribute 80 percent of the code and the periphery describes all other developers [1], [2], [3]. This focus on the volume of contributions assumes a homogenized periphery, without any further distinction within that group. Further, by its very definition this distinction has an exclusive focus on code contributions, ignoring the many other types of contributions that are made to FLOSS projects. To better understand the periphery of FLOSS communities, several researchers have begun to differentiate participants within the periphery, based on the frequency and duration of their participation [4], [5], [6], [7]. In earlier work, we have drawn upon the concept of episodic volunteering (EV) from the volunteering literature to describe the subset of peripheral contributors whose contributions are short-term or infrequent [8], [9], in contrast to habitual contributors, whose contributions are “continuous or successive” [10]. In so doing, we have also reconsidered the definition of contribution, expanding it from software (or code) contribution to any type of activity within a FLOSS community [6]. By using this alternative lens on FLOSS communities, we found evidence for a wide range of contributions that episodic volunteers have made [6]. Based on a qualitative survey of 13 FLOSS communities, we developed a detailed understanding from the perspectives of both episodic volunteers and community managers. Based on this, we established an initial set of recommendations to engage episodic volunteers. A key concern in the context of episodic volunteering is whether these volunteers return to make further contributions. Drawing on the general volunteering literature, we evaluated a theoretical model that helps explain retention of episodic volunteers. + + +In this article we extend this line of research on EV in FLOSS communities. Episodic contributors represent a class of participants that can make a wide range of valuable contributions to FLOSS projects [6]. By their very nature, their participating behavior is incidental and not continuous, and so it is of particular interest to understand how episodic contributors can be “retained,” which in this context refers to them returning to a project to contribute again, rather than converting them into habitual contributors. Retention is appealing because returning contributors require less assistance than newcomers [11] and retention is one of the key factors in FLOSS project sustainability [12], [13], [14], [15], [16]. However, evidence from the general volunteering literature suggests that many organizations do not have clear strategies in place to effectively manage episodic contributors [11], [17]. Organizations may also face internal resistance in implementing these changes, as episodic contributors may be negatively perceived as costing more in resources than they deliver in contributions [18]. + + +Despite these challenges, EV is an increasingly important topic in volunteer management due to the increase in and preference for this kind of work [8], [19], [20], [21], [22]. Adapting to the changing volunteering context is necessary for the sustainability of non-profit organizations [22]. +FLOSS it has long been observed that many contributors are episodic, for instance in the case of bug reporting [2], [6], [23], [24], [25]. Furthermore, a number of benefits have been attributed to peripheral contributors—increased identification of legal issues such as copyright infringement, and high-quality bug fixes, for example [14], [26]. Hence, given the increased recognition of the importance of episodic volunteers and their contributions, it is imperative to study how to manage episodic volunteers in FLOSS communities. + + +A major change in FLOSS communities over the last decade has been the increase in firms’ involvement in open source development although volunteers remain important participants [27], [28], [29]. Many companies in different sectors use software which is developed by external FLOSS projects [30], and consequently many firms now employ developers to contribute to specific open source projects that they identify as critical to their business. Paid development does not negate the need to understand episodic participation. Even in company-dominated FLOSS communities, external developers still contribute a significant proportion of commits [31]. Additionally, from the perspective of the community, paid developers employed by external firms cannot be directed as employees [32], [33]. Although there are differences between paid contributors and other participants [28], paid contributors’ participation is sometimes episodic from the perspective of the community. Our research considers episodic participation from the community perspective, and consequently we adopt the broadest definition of volunteering, to encompass anyone engaging in FLOSS contributions who is not directly sponsored by the FLOSS community [6]. This broad definition allows us to identify practices which can actually be used by communities, without any concern for whether or not contributors are paid or sponsored by a firm. When paid contributors affect community managers’ concerns and practices, this is explicitly noted in our findings. + + +FLOSS research has been challenged for its reliance on studying forms of participation which can be readily observed through data mining, notably code contributions, bug reports, and mailing lists [34], [35]. Exclusion of non-code contributors limits the applicability of research on larger FLOSS communities, which depend not only on code contributions but also a wide range of other activities, such as planning, advocacy, mentoring, and event organization [35], [36], [37], [38]. Both unpaid and paid contributors can participate in a range of activities within FLOSS communities [39]. + + +Despite extensive research on community practices, e.g., [3], only two studies have focused specifically on episodic participation, and neither focused on identifying an extensive list of practices [6], [40]. The fact that specific practices have been proposed for other peripheral sub-groups, namely newcomers [41], [42], suggests that FLOSS communities may be using different practices, or adapting existing practices to different ends, in order to manage episodic contributors. Hence, our study had the following objectives: + + +1) Identify the concerns community managers have about episodic volunteers. +2) Identify the practices that community managers are using, or envisage using, to address their concerns about episodic volunteers. + + +To address these objectives, we conducted a Delphi study, which is a structured communication technique involving a panel of experts. We drew on the experience of FLOSS community managers to identify the concerns community managers have with EV, the practices they use—or consider using—to manage EV, and preliminary suggestions for how practices could be combined. This article makes the following contributions toward understanding the management of EV in FLOSS: + + + + +A prioritized list of 16 EV community manager concerns; + + +An extensive collection of practices which might be used to manage EV (74 percent are being used by at least three community managers), which includes connections to the concerns previously identified, as well as relationships between practices; + + +Workflows proposed by community managers which demonstrate how practices can be combined. + + + + +The remainder of the article is organized as follows. Section 2 reviews previous work that investigated open source communities, volunteers, and in particular the role of episodic contributors. Section 3 presents the Delphi research approach that we adopted, including a discussion of participant selection, data collection, and data analysis procedures. Section 4 presents the findings of the study by presenting a set of practices and concerns. Section 5 concludes by discussing our findings, the limitations of the study, and an outlook to future work. +---------------------------------------- +------------------------------- +Section 121: +2 RELATED WORK + + +This section reviews prior work on peripheral contributors and episodic volunteering in FLOSS communities. + + +2.1 Peripheral Contributors in FLOSS Communities + + +One of the earliest conceptions of the structure in FLOSS communities is the so-called Onion model [1], [43]. The Onion model depicts increasing numbers and decreasing engagement moving from the innermost core to the outermost passive users. The core contains the most prolific developers, often described as the people who create 80 percent of the code [2]. Beyond the core is the periphery, who contribute fewer lines of code. + + +Although much of the earlier research focused on the core (e.g., [2], [24]), there is now significant understanding of both the importance of the periphery and the motivations of peripheral participants. Peripheral contributors provide a range of benefits: + + + + +Bringing new knowledge to the project [26], [44], [45], [46]; + + +Raising awareness of the project [46], [47], [48]; + + +Providing new potential core contributors [26], [45], [49], [50], [51]; + + +Proposing new features [44], [52]; + + +Contributing new code [26], [44], [45], [53]; + + +Finding and reporting bugs [54]; + + +Ensuring members’ behavior abides by community norms [26]. + + + + +FLOSS developer motivations have been extensively studied. Motives are usually characterized as intrinsic motives, +inherent to the job, such as altruism and enjoyment, +internalized extrinsic + motives such as reputation and reciprocity, and +extrinsic + motives such as career and salary [55]. Peripheral contributors tend to have the same set of motivations as core developers [37], but those with extrinsic motives are less likely to continue to participate [45], [56]. In particular, peripheral contributors are more likely to seek out opportunities which afford them greater recognition with stakeholders and the chance to gain reputation [45]. Extrinsic motives, such as the desire to build a reputation and gain recognition, are more widespread among peripheral developers than core developers [45]. + + +Recent work has begun to study the periphery more closely to identify and distinguish different types of contributors. One dimension often used to distinguish is the frequency of participation. Groups that are distinguished by the frequency of their participation are newcomers [41], [57], [58], [59], [59], [60], [61], [62], people who attempt to become contributors [63], and one-time contributors [5], [40], [56]. In earlier work, we have linked the general episodic volunteering literature to the periphery [6]. The disaggregation of the periphery by frequency of contribution could also be viewed as an extension to, rather than a departure from the Onion model. The outer layers—active users and passive users—are already defined by their own actions irrespective of the contributions of others. Active users engage with the project, for instance by supplying bug reports, while passive users only use the software. Disentangling the homogenized periphery into sub-categories distinguished by frequency of participation refines the Onion model and allows for the identification of distinct attributes of different groups within the periphery. + + +In the Onion model, the different layers describe how people contribute to the software, whereas FLOSS projects include many other ways to get involved [35], [36], [37]. Carillo and Bernard [64] described code-centricity as a limitation: + + + + +“By stereotyping FOSS projects as communities of developers loosely collaborating on a FOSS-licensed software project via an online project platform, we disregard the massive amount of information that is not captured on platforms and also neglect the myriad of non-code related tasks and roles without which a project could not be what it is.” + + + + +Emphasis on code contributions within FLOSS communities may not only devalue other types of contributions, but may specifically disadvantage women [65]. Other studies have found that women’s participation in FLOSS remains low in both code and non-code activities, including leadership [66], [67], [68]. Nafus’s [65] participant observation study of FLOSS contributors found that “men monopolize code authorship and simultaneously de-legitimize the kinds of social ties necessary to build mechanisms for women’s inclusion.” Research has also demonstrated that some barriers to entry for newcomers are gendered [60], [69], and that gender may influence retention among episodic contributors [7]. Because code contributors do not represent the entire community in terms of the diversity of work, and may additionally be demographically unrepresentative, we argue the importance of including non-code contributions in our study. This emphasis makes the EV concept, which originates in the general volunteering literature rather than the software engineering literature, an appropriate lens for the study because it places no particular emphasis on any one type of contribution. +---------------------------------------- +------------------------------- +Section 122: +2.2 Episodic Volunteering + + +Episodic volunteering is a term from the general volunteering literature describing short-term or infrequent participation. Although a particular engagement may be of limited duration, retention of episodic contributors is possible. In the context of EV, retention does not mean conversion to habitual participation but repeated engagement with the same organization. In a systematic review of the EV literature, Hyde et al. [70] identified retention as a key topic in need of further research. Retention remains a compelling subject because returning volunteers require less training [11] and retention is one measure of stability in FLOSS [13], [14], [15]. The general volunteering literature on the retention of episodic contributors has largely focused on explaining the factors that lead to retention, such as satisfaction with the previous volunteering experience, intention to return, and availability [10], [71], [72]. In the FLOSS domain, Steinmacher et al. [73] found that higher quality email responses encouraged retention among newcomers. Meanwhile, Labuschagne and Holmes [57] critically examined Mozilla’s onboarding programs and found that it may not result in long-term contributors, despite the fact that mentored newcomers consider the program valuable. A study evaluating five potential EV retention factors found that satisfaction, community commitment, and social norms correlate with intention to remain [7]. + + +Another important problem in general volunteering is how organizations incorporate EV [17]. Although EV is sometimes viewed as disruptive, it is widespread and a reality that requires organizations to reconsider their strategies [18], [19], [45], [74]. Volunteer agencies can adjust to the expectations of episodic contributors by offering more flexibility in commitment, reducing training requirements, increasing the social element of service, and recognizing volunteers [75]. Volunteer coordinators can also identify tasks that are suitable for episodic contributors, which may include one-off contributions at events and on-going but non-specialized work [11]. Evaluation of suitable tasks can be done systematically by applying a ‘volunteer scenario’ approach that categorizes volunteer assets, volunteer availability, and potential assignments [76]. + + +While there is no single work that has collected a comprehensive set of practices for managing EV in FLOSS, previous studies have proposed practices for managing FLOSS contributors. Previously, we identified 20 potential practices for EV management by evaluating existing FLOSS practices in light of factors associated with the retention of episodic contributors and prior general volunteering recommendations [6]. Meanwhile, Steinmacher et al. [41] identified nine practices for communities onboarding new contributors and corresponding recommendations for new contributors. We consider practices for newcomers relevant to the study of EV because community managers cannot distinguish the future episodic volunteer from the future habitual volunteer [72] when they make their first contribution. +This study updates this line of work by drawing on the expertise of community managers. At the time of our first study [6], we found very limited evidence of community managers managing EV. This approach increases the scope and number of practices identified. First, we examine both practices which are already being used to manage EV as well as practices that experts think might be appropriate, and distinguish between speculation and observed practice. Second, we look at most of the volunteer process, from onboarding to retention, excluding only recruitment. +---------------------------------------- +------------------------------- +Section 123: +3 Study Design + + +In this section we outline the Delphi research method, elaborate the participant selection, and data collection and analysis methods. + + +3.1 Research Method + + +Our research is concerned with understanding current practices for managing episodic contributors, and also proposes practices that may be helpful for managing EV. The Delphi method was developed as a way of finding the collected opinions of a group of experts and works on the assumption that multiple experts are better able to arrive at more accurate solutions to problems. Anonymity between participants is used to prevent participants with high status or reputation from having a disproportionate influence [77], [78], [79]. The Delphi approach is suitable for complex problems [80], when solutions do not yet exist and may be best explored through the subjective judgments of an informed group of experts [77], [81]. + + +While not common in software engineering research, the Delphi method has previously been used to study complex topics such as tailoring of agile methods [82] and the adoption of tools by FLOSS developers [83]. Delphi studies typically comprise several rounds of data collection—as participants are exposed to new information in every round, they may develop new insights through iteration and exposure to others’ ideas. The Delphi method can also be conducted asynchronously, which was of particular importance in our context given the geographic distribution of open source experts. + + +The traditional Delphi method focuses on achieving consensus. As it has evolved, a variant known as the policy Delphi has emerged. A policy Delphi study is appropriate when the purpose of the study is not to establish consensus but to identify the main arguments and positions [77]. We decided that a policy Delphi study rather than a traditional Delphi study would be more appropriate in our context, because we recognized that communities may have different goals when managing EV which could be driven by community size, cultural context, or types of contribution being considered. We wanted to articulate these constraints in order to provide context for the practices, rather than assume that one approach would be effective for all communities and activities within communities. However, we were also interested in generalizing common practices and concerns, and used the collation of the different rounds of data collection to achieve consensus of opinions. + + +We codify the results of our research in the form of a collection of practices, in the appendix [84], which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TSE.2020.2985093. This ensures that the fruits of our research work can be used by practitioners, a key goal of our research. + + +EV management includes all phases of the volunteer management process. We explicitly excluded recruitment practices from consideration in our study because many of those are not specific to episodic volunteering. This focus was necessary to limit the scope of the study, which otherwise could overwhelm the participants and diffuse their focus. Although onboarding is another area where we expect overlap between habitual and episodic management, we decided to retain this part of the process in order to compare our results to a recent study summarizing onboarding practices for newcomers [41]. + + +3.2 Participant Selection + + +Participant selection is a key aspect of a successful Delphi study [85]. Participants must be selected with care, and not chosen simply on the basis of availability [86]. + + +We sought to select a panel of 20 to 25 participants, to ensure sufficient diversity even if some participants would stop participating in the study. This is within the recommended range of 15–30 participants [87]. Potential participants were identified in one of three ways. First, some approached us directly following presentations at practitioner conferences. Second, we identified people among our contacts, and people who were recommended to us by contacts. From these two groups we approached a subset which met our selection requirements, which we describe below. Third, we evaluated gaps in our coverage and sent cold emails to people we identified through online searches. The selection of participants was based not only on their enthusiasm for participation or connection to us, but also on the degree of diversity along the three selection dimensions (discussed below), as well as our expectation that the participants would be able to provide relevant input. Additionally, although gender has not, to our knowledge, been directly linked to community management, our awareness that gender can affect FLOSS participation experiences [60] inspired us to deliberately recruit female participants. In total, one-third of our participants were female. Table 1 summarizes the participants by community and their participation in the different rounds of our study. + + +To gain the full benefit of multiple perspectives, participants of a Delphi study should be diverse rather than homogeneous [88]. We identified three dimensions relevant to our study along which we expected differences of opinion to arise: size of community, contribution type, and country. We discuss each in detail below. + + +3.2.1 Size of Community + + +A previous study investigating the current state of EV in FLOSS discovered that the tasks considered appropriate for episodic contributors vary by community size [6]. For example, in smaller communities, translation is an ad-hoc task well-suited to EV. Larger communities have more complicated rules when translating, and a full cognizance of those rules requires more habitual participation. Organization size is also a factor commonly considered in studies identifying +best practices. For example, in their case study of best practices for volunteer organizations, Carvalho and Sampaio [89] considered the size of volunteer organizations in terms of the numbers of beneficiaries, paid employees, and volunteers. Because there are many different ways to operationalize community size—number of users, number of developers, size of core—and because size is more continuous than categorical, we did not categorize communities by size, but instead sought to include a number of communities of different size. + + +All communities represented by our panel experts have more than a handful of contributors. This is justified because extremely small communities tend not to be concerned with developing a volunteer management process or workflow. The communities represented are shown in Table 1. In total, 22 communities were represented, and four of these communities (Debian, Ubuntu, KDE, OpenStack) were represented twice. Detailed descriptions of each community are provided in the appendix [84], available in the online supplemental material. +---------------------------------------- +------------------------------- +Section 124: +3.2.2 Contributor Activities + + +Much of FLOSS research has been code-centric, but in large communities people work in a number of activities, such as translation and maintaining web services [35]. Our earlier study on EV in FLOSS found that while episodic contributors can engage in all activities, some areas are considered more suitable than others, depending on the community [6]. We expect that the perspective of community managers might be influenced by the activities they engage in. We used the classification system introduced by Rozas [38] to describe the Drupal community, because it contains the most comprehensive categorization of FLOSS activities. +---------------------------------------- +------------------------------- +Section 125: +3.2.3 Country + + +FLOSS communities are international, although North American and European countries are disproportionately over-represented [90]. Geographic boundaries can be eliminated, but cultural barriers may remain. For example, in 2002, Nakakoji et al. [1] explained that Japanese programmers were reluctant to directly communicate with GNU GCC core developers because they saw them as superior programmers and wanted to keep a “respectful distance.” One difficulty with identifying cultural diversity is increasing globalization, which has led to intercultural identities and identification with not only country of birth, but also residence [91], [92]. We therefore considered both the country of origin as well as of residence. + + +Our participants represented 23 countries, spanning all populated continents: Argentina, Australia, Brazil, Cyprus, Czech Republic, France, Germany, Hungary, India, Ireland, Italy, Japan, Kenya, Peru, Romania, Singapore, Spain, South Korea, Tunisia, Uganda, Ukraine, the United Kingdom, and the United States. The appendix provides details about participants’ countries of residence and origin [84], available in the online supplemental material. +---------------------------------------- +------------------------------- +Section 126: +3.3 Data Collection and Analysis + + +Data collection was initiated in January 2018 and concluded in October 2018. The study comprised three rounds, as shown in Fig. 1. + + +In the first round, participants were asked to think of any concerns they had about EV, and how they might address them. All participants were engaged in community management, which was a precondition for participating in the study. Our participants had experience with close to six categories on average, and all were involved in multiple types of contributions. Table 2 shows a paraphrased list of contribution types along with a count of how many participants were engaged in each activity. The appendix provides a detailed list of each participant’s contribution types [84], available in the online supplemental material. +---------------------------------------- +------------------------------- +Section 127: +Table 1: Study Participants by Community and Study Participation + + +| ID | Community | Rounds participated | +|-----|--------------------|---------------------| +| CM1 | (Anonymous) | ✓ | +| CM2 | Apache, RDO | ✓ | +| CM3 | ChakraLinux | ✓ | +| CM4 | CHAOSS | ✓ | +| CM5 | Debian | ✓ | +| CM6 | Drupal | ✓ | +| CM7 | Fedora | ✓ | +| CM8 | Fedora | ✓ | +| CM9 | Joomla! | ✓ | +| CM10| KDE, NextCloud | ✓ | +| CM11| KDE, Kubuntu | ✓ | +| CM12| Linux Mint, Debian| ✓ | +| CM13| Mozilla | ✓ | +| CM14| Mozilla | ✓ | +| CM15| OpenChain | ✓ | +| CM16| OpenStack, Debian | ✓ | +| CM17| OpenStack | ✓ | +| CM18| OSGeo-Live | ✓ | +| CM19| Perl | ✓ | +| CM20| PostgreSQL | ✓ | +| CM21| Python | ✓ | +| CM22| Ubuntu | ✓ | +| CM23| Ubuntu | ✓ | +| CM24| Women who Code | ✓ | +---------------------------------------- +------------------------------- +Section 128: +Table 2: Number of Participants Engaged by Contribution Type Based on [38] + + +| Name | Description | No. | +|-----------------------|--------------------------------------------------|-----| +| Source code | Write code, review code, report bugs | 14 | +| Documentation | Write, report issues | 14 | +| Translation | Translate and review translation | 9 | +| Design | User experience design, visual design, style guide creation | 6 | +| Support | Participate in support fora, create cookbooks | 11 | +| Evangelizing | Blog posts, speaking at unrelated events, marketing | 19 | +| Mentoring | Creation of training materials, mentoring contributors | 15 | +| Community management | Participation in working and local groups, conflict resolution, governance | 24 | +| Events | Organization of events, speaking at events | 18 | +| Economic | Make donations and seek sponsors | 12 | +those concerns. The purpose of this round was to generate a broad overview of the concerns and problems affecting communities. + + +Collating this round involved identifying all the unique concerns by name and description, and creating a list of all the unique practices by name, description, and associated concerns. + + +In the second round, we sought to refine our understanding of both concerns and practices. For the concerns, this entailed collecting information on the prevalence and ranking of concerns, while for the practices we elicited relationships between practices, specifically the preceding/subsequent and complementary relationships, and possible workflows. The collation for this round focused on more elaborate descriptions of practices, and reported on the ranking of concerns. Workflows were also shown. + + +The third round involved refining the information we had gathered on practices. Participants were asked to verify if they had used or only proposed a practice, and were asked to specify any relationships, context, or limitations which our earlier analyses had missed. The collation consisted of the most extended description of practices. + + +In each round, questions were posted and participants were given several weeks to respond. At the end of the period, reminders were sent to participants who had not yet responded, and the response time was extended. + + +After all responses were received, they were analyzed by the lead author using the QDAcity tool for qualitative data analysis. Contextual codes representing the dimensions of interest (community name, participant’s contribution types, and participant’s country) were applied first. Next, the lead author performed theoretical thematic analysis based on the theme of each round [93]. From Round II, the collation was presented to all authors and participants as a collection of practices, also known as a handbook [94]. The collation was sent to participants after each round as a form of member checking [95]. Additionally, after Round III, participants were supplied with a list of practices attributed to them, giving them the opportunity to challenge our interpretation. Participants were given one week to suggest modifications to the collation, then sent the revised document. In the first two rounds we received minor requests for changes, while in the final round we received only acknowledgements of receipt. + + +Responses to each round were anonymized and then sent to the respondents to confirm that the modifications did not obscure the message. Analysis was conducted on the original responses, but the anonymized responses were used to provide quotations for the collations. Quotations were attributed to individual study participants by means of an assigned two-letter code. Each participant was able to identify their own contributions, and could also build up an impression of other study participants as individuals, without knowing their identities. +---------------------------------------- +------------------------------- +Section 129: +4 Results + + +This section presents the results of our study. Section 4.1 discusses concerns associated with managing episodic contributors. Section 4.2 focuses on the practices for managing episodic contributors, and Section 4.3 extends relationships between practices into workflows. + + +4.1 Concerns With Episodic Volunteering + + +We identified a set of concerns that community managers have about EV. Broadly, community managers have a number of concerns about knowledge transmission between the community and episodic participants, the suitability of episodic contributors for tasks, how effectively community processes support EV, and how episodic contributors are included in the community. We identified sixteen concerns that community managers identified regarding episodic volunteering in their communities. Table 3 specifies all sixteen concerns by category, how frequently they were observed, and how many participants ranked these concerns in their top three most pressing concerns. + + +Space limitations preclude us from discussing all concerns. We illustrate the most common concerns in more detail below. The complete set of concerns is described in the appendix [84], available in the online supplemental material. + + +Concern 2.C Episodic contributor lacks awareness of opportunities to contribute was deemed most important, observed by 20 community managers and ranked as the most pressing concern by eight study participants. One community manager expressed this urgency as follows: + + +“Keeping volunteers interested by openly sharing opportunities where they can contribute (technical or non-technical) should be given priority.” —CM + + +Concern: 2.C Episodic contributor lacks awareness of opportunities to contribute + + +Communicating opportunities to get involved in a way that reaches episodic contributors is a concern for communities, especially when the people who are aware of tasks which could be done episodically do not enjoy outreach activities. +TABLE 3 +Concerns by Category, Number of Community Managers Observing Concern, Number of Times Ranked as Most Important Concern, Second Most Important Concern, and Third Most Important Concern + + +| Concern | Obs. No. | No. #1 | No. #2 | No. #3 | +|------------------------------------------------------------------------|----------|--------|--------|--------| +| Knowledge exchange | | | | | +| 1.C Episodic contributor lacks knowledge of developments during absences| 10 | 1 | 1 | 1 | +| 2.C Episodic contributor lacks awareness of opportunities to contribute | 20 | 8 | 1 | 4 | +| 3.C Community lacks knowledge of availability of episodic contributors | 15 | 2 | 1 | 2 | +| 4.C Episodic contributor lacks understanding of project vision | 11 | 1 | 2 | 1 | +| 5.C Episodic contributor and community have mismatched expectations | 13 | 1 | 1 | 1 | +| Suitability of episodic contributors for the work | | | | | +| 6.C Episodic contributor quality of work is insufficient | 9 | 2 | 0 | 0 | +| 7.C Episodic contributor’s timeliness and completion of work is poor | 14 | 1 | 1 | 1 | +| 8.C Community’s cost of supervision exceeds benefit of episodic contribution | 8 | 1 | 1 | 1 | +| Community processes do not support EV | | | | | +| 9.C Community cannot retain episodic contributors for sporadic requirements | 8 | 0 | 1 | 2 | +| 10.C Community has difficulty identifying appropriate tasks for episodic contributors | 15 | 1 | 4 | 2 | +| 11.C Community lacks an episodic strategy | 14 | 2 | 6 | 1 | +| 12.C Community insufficiently supports episodic contributors | 4 | 0 | 0 | 0 | +| Marginalization of episodic contributors | | | | | +| 13.C Community restricts episodic contributors from leadership roles | 12 | 1 | 1 | 1 | +| 14.C Community excludes episodic contributors from discussions and decisions | 10 | 2 | 0 | 3 | +| 15.C Community gives episodic contributors reduced access to opportunities and rewards | 5 | 0 | 0 | 0 | +| 16.C Community lacks appreciation for and recognition of episodic contributors | 9 | 0 | 1 | 1 | + + +A key characteristic of episodic volunteers is that they contribute irregularly and the nature of their participation tends to be of short duration. This lack of day-to-day engagement with a project means that episodic volunteers may simply not be aware of the opportunities to contribute. + + +Fifteen community managers observed 3.C Community lacks knowledge of availability of episodic contributors, and two considered it their primary concern. One community manager described the issue for in-person events such as conferences: + + +“This [lack of knowledge] is a big problem when working with online communities, but it can grow exponentially when you are working a live event. You may do a call for volunteers, and you may end up short-handed, and doing three things at once.” —CM23 + + +This concern directly links to one of the defining characteristics that sets episodic volunteers apart from habitual volunteers. The scenario outlined in the quote above clearly identifies a key issue with episodic volunteers, namely that their availability tends to be much more restricted. In fact, between episodes of activity, these volunteers may be quite removed from what is happening in a community on a day-to-day basis. + + +Concern 7.C Episodic contributor’s timeliness and completion of work is poor was mentioned by 14 community managers, with one ranking it as the biggest concern. CM24 summarized the concern: + + +“The main problem of using this kind of help is that sometimes you don’t know whether a person that has started a task is able to finish it all or finish it with a decent quality.” —CM24 + + +Concern: 7.C Episodic contributor’s timeliness and completion of work is poor + + +Episodic contributors may have less investment in ensuring that their work is completed in a timely manner, or is completed at all. This can be especially problematic if the work is important and others are relying on it. In a situation such as an event, it may be unavoidable to put responsibility on episodic participants. + + +This concern alludes to the asymmetry of information possessed by community managers and episodic contributors concerning the contributors’ intentions. While contributors are generally aware of their progress and the extent of their dedication to the task, this information is often not conveyed to community managers. For community managers, it becomes difficult to rely on work being completed, or completed to a sufficient standard. With an episodic contributor the problem can be more pronounced, because the community manager may be unable to form an expectation on the quality of future work based on previous experience with the contributor’s work. + + +CM6 explained why 10.C Community has difficulty identifying appropriate tasks for episodic contributors is a concern. Fifteen community managers had experience with this issue, and one thought it was the most important concern. + + +“You need to know the context and background for each task to be effective and not get lost. The problem is that to prepare this information usually requires more time than doing the task itself, so normally the person with the knowledge is the one that will do it. It ends up with few people doing a lot of work and possible contributors without knowledge of how to help.” —CM6 + + +Concern: 10.C Community has difficulty identifying appropriate tasks for episodic contributors + + +Community managers find it difficult to identify and maintain a list of suitable tasks. It can be time-consuming to describe tasks so that they can be picked up by episodic contributors. + + +It is recommended that episodic contributors be given stand-alone tasks, which can be accomplished without a deep +| Conf. Code | Name | Description | +|-----------|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------| +| +Community Governance + | | +| ✓ G.1 | Manage the delivery triangle | Adjust scope (quality or features) or schedule when project releases cannot be completed on schedule at the desired level of quality with the expected features. | +| ✓ G.2 | Use longer delivery cycles | Make release cycles longer in order to give episodic contributors the opportunity to contribute without intense time pressure. People who have multiple responsibilities will be able to participate in the project. | +| ✓ G.3 | Host in-person meetings | Host in-person meetings for creative or organizational work involving multiple volunteers. The frequency of meetings may vary by project: it could be yearly, quarterly, monthly, or even more frequent. | +| ✓ G.4 | Make decisions in public | Ensure that decisions are made in a process which is both public and open to suggestions from contributors. Even if the decision is ultimately made by an authoritative body, the transparency of the process can make participants feel a part of it. | +| ✓ G.5 | Create a community definition of quality | Create a community definition of quality so that episodic contributors will know what quality is expected. | +| ✓ G.6 | Craft a community vision | Craft an inclusive community vision and a code of conduct. A clear vision statement helps people determine if they want to participate in the community. | +| ✓ G.7 | Define measuring and success | Define what successful engagement of episodic contributors looks like. Describe how you will measure the impact. | +| G.8 | Centralize budgeting of sponsorships | Centralize the processing of sponsorships and reimbursements so that all claims will be processed in the same manner, and processing will be timely. | +| G.9 | Use an external provider for sponsorships | Hire an external service provider to serve as an intermediary in providing sponsorships. | +| G.10 | Make your leadership diverse | Try to have a diverse board or coordination group to review processes and ensure that they are welcoming and accessible. | +| G.11 | Seek sponsorship | Look for a stable sponsor to ensure continuity of events. | +| +Community Preparation + | | +| ✓ P.1 | Identify appropriate tasks | Episodic participants can more easily join if tasks are available. Identify the types of tasks which are suited for episodic contributors. | +| ✓ P.2 | Define one-off tasks | Create stand-alone, one-off tasks. | +| ✓ P.3 | Crowdsource identifying appropriate tasks | Engage experienced contributors in a short-term initiative to identify outstanding issues which could be handled by episodic contributors. Encourage them to continue to identify new tasks, once the backlog has been addressed. | +| ✓ P.4 | Document general working practices | Document the community’s working practices, placing particular emphasis on those areas which are most likely to be relevant to new and episodic contributors, and where contributions will be most appreciated. | +| ✓ P.5 | Detail how to complete a task | Do not just summarize tasks, but detail the steps that need to be taken, and consider providing a time estimate for the task. | +| ✓ P.6 | List current areas of activity | Prioritize tasks and tag them as entry level where appropriate. Group similar tasks together. | +| ✓ P.7 | Hold open progress meetings | Hold regular open meetings where previous work is summarized, and new tasks are assigned. | +| ✓ P.8 | Create working groups with a narrow focus | Create specialized working groups that people can identify with. | +| ✓ P.9 | Create written records of activity | Maintain a summary, for instance in the form of a newsletter, which describes the key discussions and resolutions which took place during a given period. Alternately, rely on written communications (mailing lists, chats) or provide meeting minutes. | +| ✓ P.10 | Keep communication channels active | Ensure that communication channels both online and offline are monitored, and that queries are directed to appropriate people. Make sure that people receive responses. | +| ✓ P.11 | Send ambassadors to small events | Send ambassadors to attend smaller events, to enable personal interactions with potential participants. | +| ✓ P.12 | Respond to all submissions | Respond to every submission in a timely manner. | +| ✓ P.13 | Have a social media team | Recruit people who enjoy social media specifically for the task of communicating with potential and episodic contributors. | +| ✓ P.14 | Set expiration dates | Set distinct deadlines for initiatives. | +| ✓ P.15 | Create continual points of entry | Create ongoing ways for people to join the project and contribute, rather than providing only specific times or times in the process when people can join. | +| Conf. Code | Name | Description | +|-----------|------|-------------| +| P.16 | Share success stories | Share stories about outstanding or long-serving community members and the challenges they faced and benefits they received. | +| P.17 | Provide templates for presentations | Create one or more standard slide decks which your contributors can use with or without modification. | +| P.18 | Write modular software | Ensure that software is modular. | +| P.19 | Educate sponsoring organizations | Educate sponsoring organizations about participation in open source projects, including topics such as the necessity of maintenance and the open model of production. | +| P.20 | Offer a consistent development environment | Document the workflow, architecture of the module, and use a container to build your project in order to allow people to easily build a local system. Decide upon one recommended way to set up a development environment and focus on this in the documentation. | + + +Onboarding Contributors + + +| Conf. Code | Name | Description | +|-----------|------|-------------| +| O.1 | Learn about the experience, preferences, and time constraints of participants | Ask new and infrequent contributors about their expectations, availability, preferences and experience. | +| O.2 | Screen potential contributors | Screen potential contributors to determine if they are a good match for the role. This may include having availability at the appropriate time, or being able to commit to a certain amount of time. | +| O.3 | Guide people to junior jobs | Guide people to junior jobs when they do not know where to start. | +| O.4 | Give a choice of tasks | Give participants a choice of the task, from a small number offered to them. | +| O.5 | Manage task assignments with an application | Use an application, such as a wiki or bug tracking system, to handle the assignment process. | +| O.6 | Explain the need for maintenance | Educate contributors about what happens to a contribution after it is included in the project. Explain the benefits to the project if they remain available to maintain their contribution. | +| O.7 | Offer guided introductory events | At events, offer walk-through tutorials on getting started as a contributor, culminating in a hackathon working on a specific beginner problem. | + + +Working with contributors + + +| Conf. Code | Name | Description | +|-----------|------|-------------| +| W.1 | Have a key contributor responsible | For every important project, make sure that one key contributor is responsible for managing it and responding to inquiries. | +| W.2 | Issue reminders | Send a reminder as the deadline approaches. Be persistent in following up on deliverables. | +| W.3 | Give permission to quit a task | Give people permission to skip on period or task, without recrimination. | +| W.4 | Encourage people to quit | Encourage people who no longer wish to fulfill a role or complete tasks to step down. | +| W.5 | Automate checking the quality of work | Utilize advances in continuous integration/continuous delivery to automate routine evaluation. | +| W.6 | Set expectations | Set expectations for deliverables and communication, even if these are minimal. | +| W.7 | Reject contributions of insufficient quality | Decline contributions which are inappropriate or not of sufficient quality. | +| W.8 | Mentor to quality | Provide mentoring when contributions are rejected due to insufficient quality. This might include access to tools to help people meet quality requirements. Ensure that contributors can always reach out to mentors to get up to speed. | +| W.9 | Require documentation as part of the submission | Require people to sufficiently document their submissions before they are accepted. | +| W.10 | Encourage learners to mentor | Engage episodic contributors in leading other episodic contributors. Let them review episodic contributions and mentor episodic contributors. | +| W.11 | Explain the context of the contribution | Understanding the larger context requires time that not all episodic contributors are able or willing to give. | +| W.12 | Sever ties | Publicly sever the group’s connection to the individual and explain the reasoning. | +| W.13 | Automate process assistance | Consider automation to help people work through the early processes, such as a chat bot or step-by-step interactive site. | + + +Contributor Retention + + +| Conf. Code | Name | Description | +|-----------|------|-------------| +| R.1 | Publicize your release schedule | Publish your development and release schedule and notify contributors of upcoming milestones, to allow them to plan their engagement. | +| R.2 | Encourage social connections | Encourage people to work together in a small group to accomplish a task. This might also include groups within a company, who can use a... | +TABLE 4 +(Continued) + + +| Conf. Code | Name | Description | +|------------|-------------------------------|-----------------------------------------------------------------------------| +| R.3 | Follow up on contributors | Keep in touch with contributors, even if just by sending an email. | +| R.4 | Instill a sense of community | Help people to understand the cooperative values that underlie free and open source software. This is best done by leading through example. | +| ✓ R.5 | Acknowledge all contributions | Have someone responsible for recognizing returning episodic contributors. This person could thank episodic contributors for returning, or alternately, explicitly welcome new contributors. | +| ✓ R.6 | Reward participation | Offer a tangible reward for participation, such as an organizer’s dinner or swag. Alternatively, offer recommendation letters, certificates, or online recommendations. | +| ✓ R.7 | Recognize everyone | Make use of systems such as badges to recognize the variety of different contributions people can make. At the conclusion of a cycle, thank and identify contributors. | +| ✓ R.8 | Praise publicly | Praise volunteers publicly. | +| ✓ R.9 | Provide evaluations and a promotion path | Provide assessment and opportunities to episodic contributors. Examples of assessment are skill exploration and personal evaluation. Examples of opportunities are travel, employment consideration, succession planning, and skill building. | +| R.10 | Promote episodic contributors | Give sustained episodic participants access to rotating leadership positions which depend on experience rather than continuous contributions. | +| ✓ R.11 | Announce milestones and celebrate meeting goals | Announce when milestones have been met, and celebrate success. | +| ✓ R.12 | Listen to suggestions | Allow anyone who participates to propose what want to implement, even if the decisions are ultimately made by a steering committee. If concepts don’t fit in with the primary project goals, allow people to create unofficial initiatives, provided these don’t damage the project. Invite creators of unofficial initiatives to incorporate them in the main project if they are successful and of high quality. Alternatively, if the project is stand-alone, recognize these successes within the project. Rotate between different focus areas with a consistent schedule. | +| ✓ R.13 | Incorporate unofficial successes | Invite creators of unofficial initiatives to incorporate them in the main project if they are successful and of high quality. Alternatively, if the project is stand-alone, recognize these successes within the project. Rotate between different focus areas with a consistent schedule. | +| ✓ R.14 | Rotate focus areas on schedule | Rotate between different focus areas with a consistent schedule. | + + +It is only in recent years that many FLOSS communities have sought to create strategies for particular aims, such as retaining newcomers or recognizing non-code contributions. Managing episodic contributors also benefits from a recognition of the problem, identification of the desired outcome, and an evaluation of practices which might be used to achieve the goal. In our previous study, community managers didn’t report making use of any practices for managing EV [6]. This study shows that FLOSS communities are adopting or adapting practices for managing EV. The fact that the concern of how to manage EV effectively remains a high concern demonstrates the need for a study such as ours, which collects and codifies the experience of multiple community managers to create a larger body of knowledge. + + +4.2 Practices for Managing Episodic Volunteering + + +We organized the identified practices into a number of categories based on the “lifecycle” of episodic contributors’ engagement. In practice, a community will not address these categories sequentially, but will move between them, iterate through them, or use practices in parallel. However, organizing the practices in categories can help to communicate them to FLOSS community managers. Each practice is aimed at ameliorating one or more of the concerns described in the previous section. + + +In total, we identified 65 practices in our study across the five categories. Table 4 provides a complete list of practices, along with a brief description of each practice. Of the 65 +practices, 48 were confirmed (indicated by a checkmark) to be in use by at least three community managers for the specific purpose of managing EV. The remaining 17 practices were proposed by our panel experts for EV management; they were used by zero, one, or two community managers. + + +Table 4 contains a brief description of each practice. The full description of each practice is more detailed. In the following subsections, we include as exemplars the full descriptions of one confirmed practice from each category, which was not previously described in the literature (see Table 5). The full descriptions of all practices can be found in the appendix [84], available in the online supplemental material. + + +The full description of a practice includes the context which may limit the generalizability of the practice, a list of the concerns involved, and a solution. It can optionally include challenges which may arise with implementing the solution, a list of community managers participating in the study who have used the practice, and a list of community managers who suggested but have not used the practice. Additionally, each practice can include a list of related practices. For the most part, practices are not meant to be used in isolation, but to be combined with related practices. Section 4.3 provides examples of how practices can be combined. Relationships between practices can take the following forms, all of which are shown in at least one of the exemplar practices chosen to demonstrate them: + + + + +General/Specific + describes a relationship where the specific practice is a more restricted and specialized practice, compared to the general practice. It is demonstrated in R.9 Provide evaluations and a promotion path (a general practice) and O.2 Screen potential contributors (a specific practice). + + +Alternative + describes two or more practices which address the same concerns with largely incompatible solutions. An example of this relationship is shown in P.8 Create working groups with a narrow focus. + + +Preceding/Succeeding + is a relationship where practices are best applied in sequential order. An example of this relationship is found in G.5 Create a community definition of quality, which shows both preceding and succeeding practices. + + +Complementary + describes the situation where practices work well when combined with other practices. W.10 Encourage learners to mentor demonstrates this relationship. +---------------------------------------- +------------------------------- +Section 130: +4.2.1 Community Governance + + +The category +Community Governance + contains practices that address broad questions about how the community operates. These are practices that will affect the potential episodic contributor’s first impressions of what kind of community it is. One example of practices in this category is G.5 Create a community definition of quality. CM24 stated they were able to make more extensive use of episodic contributors once the community began “documenting our standards of quality.” Another community manager, CM16, explained that new contributors and +episodic contributors typically are expected to know what the project considers “quality work,” but that “we never really explain it in a way that’s easy to learn, so it ends up being a barrier to entry.” + + +Practice G.5: Create a community definition of quality + + +Context: + Episodic contributors do not necessarily know what level of quality is expected. The community is large and mature enough that lack of a common perspective causes problems, and contributors cannot be expected to tacitly acquire the knowledge. + + +Concerns: + +- 4.C Episodic contributor lacks understanding of project vision +- 6.C Episodic contributor quality of work is insufficient +- 7.C Episodic contributor’s timeliness and completion of work is poor +- 11.C Community lacks an episodic strategy + + +Solution: + Create a community definition of quality so that episodic contributors will know what quality is expected. It will become significantly easier to follow many of the subsequent practices if quality is defined within the community. + + +Related practices: + +- P.4 Document general working practices is a COMPLEMENTARY practice. +- G.6 Craft a community vision is a possible PRECEDING step. +- P.10 Keep communication channels active is a possible PRECEDING step. +- P.13 Have a social media team is a possible PRECEDING step. +- G.7 Define measuring and success is a possible SUCCEEDING step. +- P.5 Detail how to complete a task is a possible SUCCEEDING step. +- P.6 List current areas of activity is a possible SUCCEEDING step. +- W.5 Automate checking the quality of work is a possible SUCCEEDING step. +- W.6 Set expectations is a possible SUCCEEDING step. +- W.7 Reject contributions of insufficient quality is a possible SUCCEEDING step. +- W.8 Mentor to quality is a possible SUCCEEDING step. + + +Challenges: + It can be difficult to retroactively apply a definition of quality to an existing project, when not all participants are in agreement. + + +Used by: + CM\textsubscript{15}, CM\textsubscript{13}, CM\textsubscript{14}, CM\textsubscript{18}, CM\textsubscript{24} + + +Proposed by: + CM\textsubscript{16}, CM\textsubscript{19} +---------------------------------------- +------------------------------- +Section 131: +4.2.2 Community Preparation + + +The category +Community Preparation + contains practices associated with preparing the community to engage episodic contributors. Identifying appropriate tasks and lowering barriers to entry are part of this group. CM\textsubscript{4} explained the reasoning behind practice P.8 Create working groups with a narrow focus to prepare the community for accepting episodic contributors: + + +“By focusing the working group on a topic that people can identify with, we hope that episodic contributors have an easier time identifying what is useful to them and then have a place to contribute.” —CM\textsubscript{4} +---------------------------------------- +------------------------------- +Section 132: +4.2.3 Onboarding Contributors + + +The category +Onboarding Contributors + contains practices that can be applied when a new episodic contributor joins the community. O.2 Screen potential contributors is part of the collection of practices for incorporating episodic contributors. A community manager explained why screening can be beneficial: + + +Practice P.8: Create working groups with a narrow focus + + +Context: + The project is too complex for participants to easily comprehend it in its entirety. It is not possible to readily identify stand-alone tasks in the project. + + +Concerns: + +- 2.C Episodic contributor lacks awareness of opportunities to contribute + + +Solution: + Create specialized working groups that people can identify with. With a narrow focus and defined outcomes, episodic contributors will be able to find tasks more readily. + + +Related practices: + +- P.6 List current areas of activity is a possible ALTERNATIVE step. +- P.18 Write modular software is a possible ALTERNATIVE step. +- P.18 Write modular software is a COMPLEMENTARY practice. +- P.18 Write modular software is a possible PRECEDING step. +- O.1 Learn about the experience, preferences, and time constraints of participants is a possible PRECEDING step. + + +Challenges: + Contributions within the working groups will need to be reported back to the larger group. + + +Used by: + CM\textsubscript{2}, CM\textsubscript{3}, CM\textsubscript{4}, CM\textsubscript{5}, CM\textsubscript{6}, CM\textsubscript{16} + + +“The first criteria of contribution should be the availability/commitment of participants to donate their time (specifically mentioned as a time frame). This will help reviewers and community leaders to estimate the impact of the contributions.” —CM\textsubscript{14} +---------------------------------------- +------------------------------- +Section 133: +4.2.4 Working With Contributors + + +The category +Working with contributors + contains practices applied during the period that the episodic contributor is working on an assignment. These practices ensure that episodic contributors’ contributions can be used by the community. A study participant expressed an interest in applying the practice W.10 Encourage learners to mentor when working with contributors: + + +“It should be possible for the people reviewing episodic contributions to be a different group than the most active developers, so reviews of episodic contributions don’t eat away the time available for other larger contributions. I +almost think of this like a mentorship, and the pool of reviewers might even be episodic contributors themselves, who have learned enough to spend part of their limited time on the project reviewing episodic contributions by others.” —CM16 + + +Practice O.2: Screen potential contributors + + +Context: + In order for a contributor to properly perform a role, a certain minimum commitment is required. The project has repeated problems with people insufficiently committing to roles. + + +Concerns: + +- 3.C Community lacks knowledge of availability of episodic contributors +- 4.C Episodic contributor lacks understanding of project vision +- 5.C Episodic contributor and community have mismatched expectations +- 10.C Community has difficulty identifying appropriate tasks for episodic contributors + + +Solution: + Screen potential contributors to determine if they are a good match for the role. This may include having availability at the appropriate time, or being able to commit to a certain amount of time. It is less likely that the commitment will not be met. + + +Related practices: + +- O.1 Learn about the experience, preferences, and time constraints of participants is a more GENERAL practice. + + +Challenges: + Some people will be prevented from pursuing the role, but if there are other forms of contribution it does not prevent them from participating altogether. Assessing potential contributors requires effort. + + +Used by: + CM3, CM8, CM10, CM13, CM14 + + +Another community manager explained how the process can also benefit the mentor: + + +“Encouraging someone to answer questions on IRC, for example, communicates that you think that they grasp the concepts.” —CM2 + + +4.2.5 Contributor Retention + + +The category Contributor Retention contains practices that encourage contributors to return. CM13 explained why R.9 Provide evaluations and a promotion path is a useful retention practice: + + +“It is also important to provide episodic volunteers with metric achievement in the community for their time dedicated and tasks completed. They can grow from basic volunteers to representatives, mentors, influential leaders and even employees, motivating results and retention.” —CM13 + + +Another community manager described an additional benefit for the community: + + +“[Skills exploration and skill building sessions] can prove helpful as the main goal would be to know what skills episodic volunteers have and what skills they can develop to contribute to more projects (long term or short term).” —CM14 + + +Practice W.10: Encourage learners to mentor + + +Context: + Highly active contributors have limited time to mentor episodic contributors. + + +Concerns: + +- 2.C Episodic contributor lacks awareness of opportunities to contribute +- 4.C Episodic contributor lacks understanding of project vision +- 8.C Community’s cost of supervision exceeds benefit of episodic contribution +- 11.C Community lacks an episodic strategy + + +Solution: + Engage episodic contributors in leading other episodic contributors. Let them review episodic contributions and mentor episodic contributors. Episodic contributors are likely to understand the concerns and limitations of other episodic contributors. Using returning episodic contributors to lead episodic contributors lets core contributors focus on other areas, and recognizes the competency of returning episodic contributors. + + +Related practices: + +- P.16 Share success stories is a COMPLEMENTARY practice. +- W.1 Have a key contributor responsible is a COMPLEMENTARY practice. +- W.8 Mentor to quality is a COMPLEMENTARY practice. +- R.2 Encourage social connections is a COMPLEMENTARY practice. + + +Used by: + CM2, CM5, CM12, CM13 + + +Proposed by: + CM11, CM16 + + +Practice R.9: Provide evaluations and a promotion path + + +Context: + Episodic contributors are unable to develop as contributors. There is sustained episodic participation, and absences do not affect the completion of duties. + + +Concerns: + +- 15.C Community gives episodic contributors reduced access to opportunities and rewards + + +Solution: + Provide assessment and opportunities to episodic contributors. Examples of assessment are skill exploration and personal evaluation. Examples of opportunities are travel, employment consideration, succession planning, and skill building. Sustained episodic participants are encouraged to continue contributing and are more beneficial to the community. + + +Related practices: + +- R.10 Promote episodic contributors is a more SPECIFIC practice. + + +Used by: + CM13, CM14, CM22 + + +Proposed by: + CM1 +4.3 Workflows + + +Many practices are of limited effectiveness if implemented alone. For instance, it would be impossible to implement O.3 Guide people to junior jobs without first implementing P.1 Identify appropriate tasks, but it would also be ineffective to initiate P.1 without planning to advertise it. However, with a wide range of practices, some tuned to specific contexts, there is no single correct way for a community manager to combine practices to achieve a particular goal. + + +We asked participants how they might combine practices into a workflow in order to address an important concern. The response to this question can be seen as examples of how community managers approached the task. It is illustrative for other practitioners who wish to understand how to leverage the extensive list of practices that resulted from this study. While it is beyond the scope of this article to identify specific workflows of practices that could be applied to any community—largely due to the fact that communities are only beginning to address EV—the links to related practices within each practice description provide guidance on how community managers have envisioned combining practices. + + +Each workflow consists of a number of practices, to be implemented sequentially or simultaneously, which together form one possible solution to a specific concern. All workflow diagrams are provided in the appendix [84], available in the online supplemental material. + + +Fig. 2 depicts an example workflow proposed by CM6 to address concern 11.C Community lacks an episodic strategy. The diagram shows the practices P.1 Identify appropriate tasks and W.1 Have a key contributor responsible as COMPLEMENTARY practices because they are not directly connected to each other, but both PRECEDE practice P.10 Keep communication channels active. P.13 Have a social media team also SUCCEEDS P.1 and W.1. + + +Another workflow is shown in Fig. 3. It was devised by CM19, and depicts an alternative approach to addressing the same concern. This shows the very individual way in which community managers might join practices to address a concern, based on their own experience and idiosyncratic understanding of their communities. +---------------------------------------- +------------------------------- +Section 134: +5 DISCUSSION AND CONCLUSION + + +5.1 Discussion + + +5.1.1 Diversity of Practices + + +In this study we sought to identify the concerns community managers have about episodic volunteers, and identify the practices that they are using—or envisage using—to address these concerns. To do this we conducted a policy Delphi study of community managers. + + +We looked for study participants engaged in different communities, from different countries, and representing communities of different sizes. In order to identify any relationship between responses based on these dimensions, responses were coded with the community name, countries involved, and activities the community manager had experience with. Observed variations in practices based upon any of the dimensions identified are described in the Context field of the full description of practices. + + +Community size was an important factor in how episodic contributors are informed about developments. Smaller communities favored a less formal approach such as P.7 Hold open progress meetings while larger communities recommended O.5 Manage task assignments with an application. Mature communities were more concerned with governance and automation practices such as G.5 Create a community definition of quality, W.5 Automate checking the quality of work, O.5 Manage task assignments with an application, and W.13 Automate process assistance. + + +Country was only associated with one difference. Specifically, reimbursement solutions such as G.8 Centralize budgeting of sponsorships and G.9 Use an external provider for sponsorships were more frequently mentioned in less developed countries, regardless of location. However, it is important to note that the context for these practices is participants who need... +sponsorship, and this situation can arise in any country. FLOSS communities had rather consistent concerns and practices around the world and we were unable to observe any cultural differences. Future work might revisit the earlier studies which suggested culture is a factor in FLOSS participation, to determine if this still holds true. + + +Contribution type + produced the greatest amount of diversity in practices. In particular, event organization supplied a number of practices primarily applicable to this context. Software development was another area that stood out as influencing practices. For example, G.3 Host in-person meetings is primarily an event-planning practice, while P.18 Write modular software is clearly specific to software development. Practices specific to one type of work within the FLOSS community were of course less likely to be confirmed than general practices applicable to multiple types of contributions. This may be the reason that some practices, such as P.20 Offer a consistent development environment and P.17 Provide templates for presentations, were not confirmed. Future research could focus on confirming practices for specific aspects of FLOSS work, and on determining the prevalence of their use. + + +Gender + was not directly included in our study design, although participants could introduce gender as context to a problem or solution if they considered it relevant. One participant did mention gender, but as a general statement, noting that women are more responsive to recruitment: + + + + +“...in my experience women are more active in volunteering if they find the community responsive. I clearly see the difference in managing gender-related communities and regular communities, that more clearly represent the state of the industry.” —CM24 + + + + +FLOSS literature suggests that responsive communities are more welcoming to all participants [73], [96], which aligns with the participant’s subsequent statement: + + + + +“Making the community friendly for women means making it friendly for everyone who is a kind person, because everyone would feel included and involved. [It’s easy to see if this is succeeding, because women are] literally half of the population.” —CM24 + + + + +Other ways of increasing female participation include appreciation for diverse teams, tracking of female participation, and improved mentoring [59], [67]. + + +Workflows + show another aspect of variation, less easy to quantify. The work of a community manager is “people-centric and versatile,” [97] and it is their implicit and tacit knowledge of their communities which undoubtedly plays a role in determining the construction of a workflow. Future research could try to elicit the factors which go into such decisions. +---------------------------------------- +------------------------------- +Section 135: +5.1.2 Comparison to Previous Studies + + +We identified 65 practices, but we note that this list of practices may not be exhaustive. We compared our findings to an earlier study of onboarding guidelines, which were based on interviews with community managers, diaries of newcomers, and literature [41]. Although their study focused on newcomers, we expected to find overlap because episodic contributors can often only be identified in retrospect [72], not when they join. We also compared our results with our earlier study, where potential practices for managing EV were proposed based on interviews with community managers and the EV literature [6]. Table 5 includes the complete list of practices proposed by the two previous studies, in addition to an overlapping subset of practices from this study. + + +In total, nine practices appeared in the other studies which were not found in our study. Two practices were identified from the onboarding study [41], and eight from the earlier EV study [6] (one practice was found in both other studies but not our study). Some of this difference can be explained by variable levels of granularity. For instance, Consider time-based releases could be seen as a specific implementation of R.1 Publicize your release schedule. The different research approaches also explain some of the difference. While the previous EV study provided suggestions based on the EV literature, some of these recommendations, such as Evaluate assets, availability and assignments may not be widely-known or systematically applied in FLOSS communities. Still other practices may have been considered so mainstream that participants did not need to mention them, such as Good documentation. In the end, our study identified 52 practices which were not described in the previous studies, in addition to 13 which were previously described (see Table 5). Our emphasis on identifying practices explains why so many new practices relevant to EV were found. Many of these practices are familiar in the FLOSS domain because community managers are adapting existing practices to the EV context. +---------------------------------------- +------------------------------- +Section 136: +5.2 Limitations of the Study + + +The Delphi method is a qualitative method, and so the traditional criteria used for quantitative studies (such as internal validity, external validity, and reliability) are not appropriate due to epistemological differences. Instead, qualitative research is best evaluated by an alternative set of criteria for naturalistic inquiries proposed by Guba [95]. Guba’s criteria are credibility, transferability, dependability, and confirmability. + + +Credibility. + Credibility concerns how plausible, or true, the findings are. Our confidence in the result is strengthened by the fact that the practices were identified iteratively, over a ten month period. This meant that there were many opportunities for participants to reflect on the information which was presented and to amend it. By design, a Delphi study involves member checking during the theory development phase. Preliminary results were also shared with a community manager not involved in the study as an additional form of member checking. + + +Transferability. + Guba recommends purposive sampling as a means of ensuring the transferability of the results [95]. We identified three dimensions which the literature suggested might affect our results and created a diverse Delphi study panel. We were able to observe situations where the dimensions limited the applicability of practices, but were also able to identify broadly applicable practices. We were able to differentiate between novel suggestions and practices which are already in use. + + +Dependability. + Dependability is strengthened by maintaining an audit trail. We maintained anonymized as well +as original copies of all responses, including feedback on the collation. We retained a copy of the collation in the state it appeared after each round as well as after feedback was received on the collation. Any supplemental documents developed in creating the collation were also retained in a project repository. + + +Confirmability. + There were multiple opportunities for study participants to correct researcher bias. The multiple phases of a Delphi study allow participants to respond to the developing theory; this is a form of member checking. In addition, we reflected our understanding to participants with a personalized report of practices we understood them to have tried or advocated and requested corrections. +---------------------------------------- +------------------------------- +Section 137: +5.3 Conclusion + + +The identification of 65 practices, 52 of which had not been previously described in the context of managing EV in FLOSS, demonstrates that many community managers are actively thinking about how to incorporate EV. Our study confirms that 74 percent of practices we identified are being actively used. This is in contrast to our earlier qualitative survey on the state of EV in FLOSS communities, where we found that community managers were aware of EV but were not taking any specific steps to manage it [6]. Given the nascent state of the literature on EV in FLOSS communities, this study fills a significant gap. We also described the relationships between practices and gave some examples of how practices can be combined to form a workflow. The findings of this study can be readily adopted by FLOSS community managers. + + +We further identified 16 concerns that community managers have about EV in their communities, and identified how frequently they were observed by our participants. These concerns were ranked by the expert panel members of this study. The ranked list provides a roadmap for future research as it provides clues as to where researchers and practitioners might direct their energy. Concerns are linked with practices for addressing them, opening the possibility of future studies investigating the effectiveness of different approaches. + + +With the collection of practices [84] we have created an extensive guide for managing EV in FLOSS which can be readily understood by researchers and practitioners, which draws upon the experiences of seasoned community managers from a number of different communities, geographic regions, and areas of expertise. To the best of our knowledge, this study is the first that has gathered practices for managing episodic contributors in FLOSS communities. Given the increasing attention for episodic contributors as a phenomenon within the open source literature, we believe this study provides a timely foundation for future work in this area. +---------------------------------------- +------------------------------- +Section 138: +ACKNOWLEDGMENTS + + +The authors would like to thank the community mentors who contributed significant time to participate in this study: R. Bowen, N. Bowers, A.-I. Chiuta, S. M. Coughlan, A. El Achêche, B. “bex” Exelbierd, L. Kisuuki, N. Kolokotronis, G. Lelarge, G. Link, S. Park, Pkpacheco, A. Pinheiro, A. Randal, J. A. Rey, C. Shorter, H. Tabunshchyk, L. Vancsa, H. Woo, S. Zacchiroli, V. Zimmerman, and the participants who preferred to remain anonymous. Additionally, we would like to thank the reviewers for their constructive feedback. Finally, S. B. Segletes provided helpful formatting advice. This work was supported, in part, by Science Foundation Ireland grants 13/RC/2094 and 15/SIRG/3293. +---------------------------------------- +------------------------------- +Section 139: +REFERENCES + + +[1] K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye, “Evolution patterns of open-source software systems and communities,” in +Proc. Int. Workshop Princ. Softw. Evol. +, 2002, pp. 76–85. + + +[2] A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and Mozilla,” +ACM Trans. Softw. Eng. Methodology +, vol. 11, no. 3, pp. 309–346, 2002. + + +[3] K. Crowston, H. Annabi, J. Howison, and C. Masango, “Effective work practices for software engineering: Free/libre open source software development,” in +Proc. Workshop Interdisciplinary Softw. Eng. Res. +, 2004, pp. 18–26. + + +[4] G. Pinto, I. Steinmacher, and M. A. Gerosa, “More common than you think: An in-depth study of casual contributors,” in +Proc. 23rd Int. Conf. Softw. Anal. Evol. Reengineering +, 2016, vol. 1, pp. 112–123. + + +[5] A. Lee and J. C. Carver, “Are one-time contributors different? A comparison to core and periphery developers in FLOSS repositories,” in +Proc. Int. Symp. Empir. Softw. Eng. Mes. +, 2017, pp. 1–10. + + +[6] A. Barcomb, A. Kaufmann, D. Riehle, K.-J. Stol, and B. Fitzgerald, “Uncovering the periphery: A qualitative survey of episodic volunteering in free/libre and open source software communities,” +IEEE Trans. Softw. Eng. +, 2018. [Online]. Available: http://dx.doi.org/10.1109/TSE.2018.2872713 + + +[7] A. Barcomb, K.-J. Stol, D. Riehle, and B. Fitzgerald, “Why do episodic volunteers stay in FLOSS communities?” in +Proc. Int. Conf. Softw. Eng. +, 2019, pp. 948–959. [Online]. Available: https://cora.uc.ie/handle/10468/7248 + + +[8] N. Macduff, “Societal changes and the rise of the episodic volunteer,” +Emerg. Areas Volunteering +, vol. 1, no. 2, pp. 49–61, 2005. + + +[9] F. Tang, N. Morrow-Howell, and E. Choi, “Why do older adult volunteers stop volunteering?” +Ageing Soc. +, vol. 30, no. 5, pp. 859–878, 2010. + + +[10] D. A. Harrison, “Volunteer motivation and attendance decisions: Competitive theory testing in multiple samples from a homeless shelter,” +J. Appl. Psychol. +, vol. 80, no. 3, pp. 371–385, 1995. + + +[11] R. A. Cnaan and F. Handy, “Towards understanding episodic volunteering,” +Vrijwillige Inzet Onderzocht +, vol. 2, no. 1, pp. 29–35, 2005. + + +[12] L. Bao, X. Xia, D. Lo, and G. C. Murphy, “A large scale study of long-time contributor prediction for GitHub projects,” +IEEE Trans. Softw. Eng. +, to be published, doi: 10.1109/TSE.2019.2918536. + + +[13] J. Gamalielsson and B. Lundell, “Sustainability of open source software communities beyond a fork: How and why has the Libreoffice project evolved?” +J. Syst. Softw. +, vol. 89, pp. 128–145, 2014. + + +[14] M. Foucault, M. Palyart, X. Blanc, G. C. Murphy, and J.-R. Falleri, “Impact of developer turnover on quality in open-source software,” in +Proc. 10th Joint Meeting Found. Softw. Eng. +, 2015, pp. 829–841. + + +[15] D. Izquierdo-Cortazar, G. Robles, F. Ortega, and J. M. González-Barahona, “Using software archaeology to measure knowledge loss in software projects due to developer turnover,” in +Proc. 42nd Hawaii Int. Conf. Syst. Sci. +, 2009, pp. 1–10. + + +[16] M. Zhou and A. Mockus, “Who will stay in the FLOSS community? Modeling participant’s initial behavior,” +IEEE Trans. Softw. Eng. +, vol. 41, no. 1, pp. 82–99, Jan. 2015. + + +[17] M. A. Hager, “Toward emergent strategy in volunteer administration,” +Int. J. Volunt. Adm. +, vol. 29, no. 3, pp. 13–22, 2013. + + +[18] N. Macduff, “Episodic volunteers: Reality for the future,” +Voluntary Action Leadership +, vol. Spring, pp. 15–17, 1990. + + +[19] K. Culp III and M. Nolan, “Trends impacting volunteer administrators in the next ten years,” +J. Volunt. Adm. +, vol. 19, no. 1, pp. 10–19, 2000. + + +[20] L. Hustinx and F. Lammertyn, “Collective and reflexive styles of volunteering: A sociological modernization perspective,” +Voluntas: Int. J. Voluntary Nonprofit Organizations +, vol. 14, no. 2, pp. 167–187, 2003. +[21] K. A. Smith, K. Holmes, D. Haski-Leventhal, R. A. Cnaan, F. Handy, and J. L. Brudney, “Motivations and benefits of student volunteering: Comparing regular, occasional, and non-volunteers in five countries,” Can. J. Nonprofit Soc. Econ. Res., vol. 1, no. 1, 2010, Art. no. 65. + + +[22] R. A. Cnaan, H. Daniel Heist, and M. H. Storti, “Episodic volunteering at a religious megaevent,” Nonprofit Manage. Leadership, vol. 1, no. 1, pp. 1–14, 2017. + + +[23] S. Koch and G. Schneider, “Effort, co-operation and co-ordination in an open source software project: GNOME,” Inf. Syst. J., vol. 12, no. 1, pp. 27–42, 2002. + + +[24] T. T. Dinh-Trong and J. M. Bieman, “The FreeBSD project: A replication case study of open source development,” IEEE Trans. Softw. Eng., vol. 31, no. 6, pp. 481–494, Jun. 2005. + + +[25] J. J. Davies, H. V. K. S. Nussbaum, and D. M. German, “Perspectives on bugs in the Debian bug tracking system,” in Proc. 7th Work. Conf. Mining Softw. Repositories, 2010, pp. 86–89. + + +[26] F. Rullani and S. Haefliger, “The periphery on stage: The intra-organizational dynamics in online communities of creation,” Res. Policy, vol. 42, no. 4, pp. 941–953, 2013. + + +[27] D. Riehle, P. Riemer, C. Kolassa, and M. Schmidt, “Paid vs. volunteer work in open source,” in Proc. 47th Hawaii Int. Conf. Syst. Sci., 2014, pp. 3286–3295. + + +[28] G. Pinto, L. F. Dias, and I. Steinmacher, “Who gets a patch accepted first? Comparing the contributions of employees and volunteers,” in Proc. 11th IEEE/ACM Int. Workshop Cooperative Hum. Aspects Softw. Eng., 2018, pp. 110–113. + + +[29] A. Capiluppi, K.-J. Stol, and C. Boldyreff, “Exploring the role of community stakeholders in open source software evolution,” in Proc. IFIP Int. Conf. Open Source Syst., 2012, pp. 178–200. + + +[30] B. Lundell et al., “Addressing lock-in, interoperability, and long-term maintenance challenges through open source: How can companies strategically use open source?” in Proc. IFIP Int. Conf. Open Source Syst., 2017, pp. 80–88. + + +[31] L. F. Dias, I. Steinmacher, and G. Pinto, “Who drives company-owned OSS projects: Employees or volunteers?” in Proc. V. Work. Softw. Vis. Evol. Maintenance, 2017, Art. no. 10. + + +[32] L. Dahlander and M. G. Magnusson, “Relationships between open source software companies and communities: Observations from Nordic firms,” Res. Policy, vol. 34, no. 4, pp. 481–493, 2005. + + +[33] P. J. Ägerfalk and B. Fitzgerald, “Outsourcing to an unknown workforce: Exploring opensourcing as a global sourcing strategy,” MIS Quart., vol. 32, no. 2, pp. 385–409, 2008. + + +[34] G. Von Krogh and S. Spaeth, “The open source software phenomenon: Characteristics that promote research,” The J. Strategic Inf. Syst., vol. 16, no. 3, pp. 236–253, 2007. + + +[35] K. Carillo, S. Huff, and B. Chawner, “What makes a good contributor? Understanding contributor behavior within large free/open source software projects—A socialization perspective,” The J. Strategic Inf. Syst., vol. 26, no. 4, pp. 322–359, 2017. + + +[36] C. Jensen and C. Boldyreff, “Role migration and advancement processes in OSSD projects: A comparative case study,” in Proc. 29th Int. Conf. Softw. Eng., 2007, pp. 364–374. + + +[37] Y. Fang and D. Neufeld, “Understanding sustained participation in open source software projects,” J. Manage. Inf. Syst., vol. 25, no. 4, pp. 9–50, 2009. + + +[38] D. Rozas, “Self-organisation in commons-based peer production, Drupal: ‘The drop is always moving’,” Ph.D. dissertation, University of Surrey, Guildford, U.K., 2017. [Online]. Available: https://davidrozas.cc/phd + + +[39] M. Osterloh and S. Rota, “Open source software development—just another case of collective invention?” Res. Policy, vol. 36, no. 2, pp. 157–171, 2007. + + +[40] R. Pham, L. Singer, and K. Schneider, “Building test suites in social coding sites by leveraging drive-by commits,” in Proc. Int. Conf. Softw. Eng., 2013, pp. 1209–1212. + + +[41] I. Steinmacher, C. Treude, and M. A. Gerosa, “Let me in: Guidelines for the successful onboarding of newcomers to open source projects,” IEEE Softw., vol. 36, no. 4, pp. 41–49, Jul./Aug. 2019. + + +[42] D. Sholler, I. Steinmacher, D. Ford, M. Averick, M. Hoye, and G. Wilson, “Ten simple rules for helping newcomers become contributors to open source projects,” PLoS Comput. Biol., vol. 15, no. 9, 2019, Art. no. e1007296. + + +[43] K. Crowston and J. Howison, “The social structure of free and open source software development,” First Monday, vol. 10, no. 2, 2005. + + +[44] K. R. Lakhani, “The core and the periphery in distributed and self-organizing innovation systems,” Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, 2006. + + +[45] R. Krishnamurthy, V. Jacob, S. Radhakrishnan, and K. Dogan, “Peripheral developer participation in open source projects: An empirical analysis,” ACM Trans. Manage. Inf. Syst., vol. 6, no. 4, pp. 14–45, 2016. + + +[46] P. Setia, B. Rajagopalan, V. Sambamurthy, and R. Calantone, “How peripheral developers contribute to open source software development,” Inf. Syst. Res., vol. 23, no. 1, pp. 144–163, 2012. + + +[47] J. Wang, “Survival factors for free open source software projects: A multi-stage perspective,” Eur. Manage. J., vol. 30, no. 4, pp. 352–371, 2012. + + +[48] B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens, “On the variation and specialisation of workload - A case study of the Gnome ecosystem community,” Empir. Softw. Eng., vol. 19, no. 4, pp. 585–1008, 2014. + + +[49] G. Von Krogh, S. Spaeth, and K. R. Lakhani, “Community, joining, and specialization in open source software innovation: A case study,” Res. Policy, vol. 32, no. 7, pp. 1217–1241, 2003. + + +[50] L. Dahlander and S. O’Mahony, “Progressing to the center: Coordinating project work,” Organization Sci., vol. 22, no. 4, pp. 961–979, 2011. + + +[51] C. Amrit and J. van Hillegersberg, “Exploring the impact of sociotechnical core-periphery structures in open source software development,” J. Inf. Technol., vol. 25, no. 2, pp. 216–229, 2010. + + +[52] K. Neuling, A. Hannemann, R. Klamma, and M. Jarke, “A longitudinal study of community-oriented open source software development,” in Proc. Int. Conf. Adv. Inf. Syst. Eng., 2016, pp. 509–523. + + +[53] A. Capiluppi and M. Michlmayr, “From the cathedral to the bazaar: An empirical study of the lifecycle of volunteer community projects,” in Proc. IFIP Int. Conf. Open Source Syst., 2007, pp. 31–44. + + +[54] H. Masmoudi, M. den Besten, C. de Loupy, and J.-M. Dalle, “Peeling the onion,” in Proc. IFIP Int. Conf. Open Source Syst., 2009, pp. 284–297. + + +[55] G. Von Krogh, S. Haefliger, S. Spaeth, and M. W. Wallin, “Carrots and rainbows: Motivation and social practice in open source software development,” MIS Quart., vol. 36, no. 2, pp. 649–676, 2012. + + +[56] A. Lee, J. C. Carver, and A. Bosu, “Understanding the impressions, motivations, and barriers of one time code contributors to FLOSS projects: a survey,” in Proc. 39th Int. Conf. Softw. Eng., 2017, pp. 187–197. + + +[57] A. Labuschagne and R. Holmes, “Do onboarding programs work?” in Proc. 12th Work. Conf. Mining Softw. Repositories, 2015, pp. 381–385. + + +[58] I. Steinmacher, M. A. G. Silva, M. A. Gerosa, and D. F. Redmiles, “A systematic literature review on the barriers faced by newcomers to open source software projects,” Inf. Softw. Technol., vol. 59, pp. 67–85, 2015. + + +[59] S. Balalí, I. Steinmacher, U. Annamalai, A. Sarma, and M. A. Gerosa, “Newcomers’ barriers... is that all? An analysis of mentors’ and newcomers’ barriers in OSS projects,” Comput. Supported Cooperative Work, vol. 27, pp. 679–714, 2018. + + +[60] C. Mendez et al., “Open source barriers to entry, revisited: A sociotechnical perspective,” in Proc. Int. Conf. Softw. Eng., 2018, pp. 1004–1015. + + +[61] S. Bayati, “Understanding newcomers success in open source community,” in Proc. 40th Int. Conf. Softw. Eng. Companion Proc., 2018, pp. 224–225. + + +[62] I. Steinmacher, M. A. Gerosa, T. U. Conte, and D. F. Redmiles, “Overcoming social barriers when contributing to open source software projects,” Comput. Supported Cooperative Work, vol. 28, no. 1/2, pp. 247–290, 2019. + + +[63] I. Steinmacher, G. Pinto, I. Wiese, and M. A. Gerosa, “Almost there: A study on quasi-contributors in open-source software projects,” in Proc. 40th Int. Conf. Softw. Eng. Companion Proc., 2018, pp. 985–1000. + + +[64] D. Nafus, “‘Patches don’t have gender’: What is not open in open source software projects,” New Media Soc., vol. 14, no. 4, pp. 256–266, 2012. + + +[65] K. Carillo and J.-G. Bernard, “How many hawks can hide under an umbrella? An examination of how lay conceptions conceal the contexts of free/open source software,” in Proc. Int. Conf. Inf. Syst., 2015. [Online]. Available: https://dblp.org/rec/conf/icis/CarilloB15 + + +[66] D. Nafus, “‘Patches don’t have gender’: What is not open in open source software projects,” New Media Soc., vol. 14, no. 4, pp. 669–683, 2012. + + +[67] A. Bosu and K. Z. Sultana, “Diversity and inclusion in open source software (OSS) projects: Where do we stand?” in Proc. ACM/IEEE Int. Symp. Empir. Softw. Eng. Mes., 2019, pp. 1–11. + + +[68] D. Izquierdo, N. Huesman, A. Serebrenik, and G. Robles, “OpenStack gender diversity report,” IEEE Softw., vol. 36, no. 1, pp. 28–33, Jan./Feb., 2019. +[68] M. Storey, A. Zagalsky, F. F. Filho, L. Singer, and D. M. German, “How social and communication channels shape and challenge a participatory culture in software development,” IEEE Trans. Softw. Eng., vol. 43, no. 2, pp. 185–204, Feb. 2017. + + +[69] M. Burnett, A. Peters, C. Hill, and N. Elarief, “Finding gender-inclusiveness software issues with GenderMag: A field investigation,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2016, pp. 2586–2598. + + +[70] M. K. Hyde, J. Dunn, P. A. Scuffham, and S. K. Chambers, “A systematic review of episodic volunteering in public health and other contexts,” BMC Public Health, vol. 14, no. 1, pp. 992–1008, 2014. + + +[71] M. K. Hyde, J. Dunn, C. Bax, and S. K. Chambers, “Episodic volunteering and retention: An integrated theoretical approach,” Nurse Educ. Voluntary Sector Quart., vol. 45, no. 1, pp. 45–63, 2016. + + +[72] L. M. Bryen and K. M. Madden, “Bounce-back of episodic volunteers: What makes episodic volunteers return?” Queensland University of Technology, Brisbane, Australia, Rep. no. CPNS32, 2006. + + +[73] I. Steinmacher, I. Wiese, A. P. Chaves, and M. A. Gerosa, “Why do newcomers abandon open source software projects?” in Proc. 6th Int. Workshop Cooperative Hum. Aspects Softw. Eng., 2013, pp. 25–32. + + +[74] R. D. Safrit and M. V. Merrill, “Management implications of contemporary trends in volunteerism in the United States and Canada,” J. Volunt. Adm., vol. 20, no. 2, pp. 12–23, 2002. + + +[75] M. Nunn, “Building the bridge from episodic volunteerism to social capital,” Fletcher World Aff., vol. 24, pp. 115–127, 2000. + + +[76] L. C. P. M. Meijs and J. L. Brudney, “Winning volunteer scenarios: The soul of a new machine,” Int. J. Volunt. Adm., vol. 24, no. 6, pp. 789–799, 2007. + + +[77] M. Turoff, “The design of a policy Delphi,” Technological Forecasting Soc. Change, vol. 2, no. 2, pp. 149–171, 1970. + + +[78] N. Dalkey and O. Helmer, “An experimental application of the Delphi method to the use of experts,” Manage. Sci., vol. 9, no. 3, pp. 458–467, 1963. + + +[79] W. T. Weaver, “The Delphi forecasting method,” The Phi Delta Kappan, vol. 52, no. 5, pp. 267–271, 1971. + + +[80] H. A. Linstone and M. Turoff, Eds., The Delphi Method: Techniques and Applications, vol. 18. Boston, MA, USA: Addison-Wesley Publishing Company, 2002. + + +[81] L. E. Miller, “Determining what could/should be: The Delphi technique and its application,” 2006. + + +[82] K. Conboy and B. Fitzgerald, “Method and developer characteristics for effective agile method tailoring: A study of XP expert opinion,” ACM Trans. Softw. Eng. Methodol., vol. 20, no. 1, 2010, Art. no. 2. + + +[83] M. F. Krafft, K.-J. Stol, and B. Fitzgerald, “How do free/open source developers pick their tools?: A Delphi study of the Debian project,” in Proc. 38th Int. Conf. Softw. Eng. Companion, 2016, pp. 232–241. + + +[84] A. Barcomb, K.-J. Stol, B. Fitzgerald, and D. Riehle, “Appendix to: Managing episodic contributors in free/ libre/ open source software communities,” IEEE Trans. Softw. Eng., to be published, doi: 10.1109/TSE.2020.2985093. + + +[85] C. Okoli and S. D. Pawlowski, “The Delphi method as a research tool: An example, design considerations and applications,” Inf. Manage., vol. 42, no. 1, pp. 15–29, 2004. + + +[86] K. Q. Hill and J. Fowles, “The methodological worth of the Delphi forecasting technique,” Technological Forecasting Soc. Change, vol. 7, no. 2, pp. 179–192, 1975. + + +[87] R. Loo, “The Delphi method: A powerful tool for strategic management,” Policing: An Int. J. Police Strategies Manage., vol. 25, no. 4, pp. 762–769, 2002. + + +[88] A. L. Delbecq, A. H. van de Ven, and D. H. Gustafson, Group Techniques for Program Planning: A Guide to Nominal Group and Delphi Processes. Glenview, IL, USA: Scott Foresman and Company, 1975. + + +[89] A. Carvalho and M. Sampaio, “Volunteer management beyond prescribed best practice: A case study of Portuguese non-profits,” Personnel Rev., vol. 46, no. 2, pp. 410–428, 2017. + + +[90] Y. Takhteyev and A. Hilts, “Investigating the geography of open source software through GitHub,” University of Toronto, Toronto, Canada, 2010. [Online]. Available: http://www.takhteyev.org/papers/Takhteyev-Hilts-2010.pdf + + +[91] J. C. Crotts and S. W. Litvin, “Cross-cultural research: Are researchers better served by knowing respondents’ country of birth, residence, or citizenship?” J. Travel Res., vol. 42, no. 2, pp. 186–190, 2003. + + +[92] Y. Y. Kim, “Intercultural personhood: Globalization and a way of being,” Int. J. Intercultural Relations, vol. 32, no. 4, pp. 359–368, 2008. + + +[93] V. Braun and V. Clarke, “Using thematic analysis in psychology,” Qualitative Res. Psychol., vol. 3, no. 2, pp. 77–101, 2006. + + +[94] D. Riehle, N. Harutyunyan, and A. Barcomb, “Pattern discovery and validation using scientific research methods,” Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany, Tech. Rep. CS-2020-01, Mar. 2020. [Online]. Available: https://dirkriehle.com/wp-content/uploads/2020/03/cs-fau-tr-2020-01.pdf + + +[95] E. G. Guba, “Criteria for assessing the trustworthiness of naturalistic inquiries,” Educ. Technol. Res. Develop., vol. 29, no. 2, pp. 75–91, 1981. + + +[96] V. Singh and W. Brandon, “Open source software community inclusion initiatives to support women participation,” in Proc. IFIP Int. Conf. Open Source Syst., 2019, pp. 68–79. + + +[97] H. Mäenpää, M. Munezero, F. Fagerholm, and T. Mikkonen, “The many hats and the broken binoculars: State of the practice in developer community management,” in Proc. 13th Int. Symp. Open Collaboration, 2017, Art. no. 1. + + +Ann Barcomb received the PhD degree from the University of Limerick, Limerick, Ireland. She is a member of the Open Source Research Group, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany and Lero–the Irish Software Research Centre. Throughout her career, she has been active in free/libre/open source software, in particular the Perl community. For more information, please visit ann@barcomb.org. + + +Klaas-Jan Stol is a lecturer with the School of Computer Science and Information Technology, University College Cork, Cork, Ireland, an SFI principal investigator and a funded investigator with Lero—the Irish Software Research Centre. His research interests include research methodology, and contemporary software development approaches. For more information, please visit k.stol@ucc.ie. + + +Brian Fitzgerald is director of Lero—the Irish Software Research Centre. He holds an endowed chair, the Frederick Krehbiel II chair in Innovation in Business and Technology, University of Limerick, Limerick, Ireland. His research interests include open source software, inner source, crowdsourcing, and agile methods. For more information, please visit bf@lero.ie. + + +Dirk Riehle received the PhD degree in computer science from ETH Zürich, Zürich, Switzerland. He is a professor of computer science at Friedrich-Alexander University, Erlangen, Germany. He once led the Open Source Research Group, SAP Labs, Silicon Valley, and founded the Open Symposium (OpenSym). He was the lead architect of the first UML virtual machine, blogs For more information, please visit at http://dirkriehle.com and can be reached at dirk@riehle.org. + + +For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/csdl. +---------------------------------------- +------------------------------- +Section 140: +When and Why Developers Adopt and Change Software Licenses + + +Christopher Vendome\textsuperscript{1}, Mario Linares-Vásquez\textsuperscript{1}, Gabriele Bavota\textsuperscript{2}, Massimiliano Di Penta\textsuperscript{3}, Daniel M. German\textsuperscript{4}, Denys Poshyvanyk\textsuperscript{1} + + +\textsuperscript{1}The College of William and Mary, VA, USA — \textsuperscript{2}Free University of Bolzano, Italy — \textsuperscript{3}University of Sannio, Italy — \textsuperscript{4}University of Victoria, BC, Canada + + +Abstract—Software licenses legally govern the way in which developers can use, modify, and redistribute a particular system. While previous studies either investigated licensing through mining software repositories or studied licensing through FOSS reuse, we aim at understanding the rationale behind developers’ decisions for choosing or changing software licensing by surveying open source developers. In this paper, we analyze when developers consider licensing, the reasons why developers pick a license for their project, and the factors that influence licensing changes. Additionally, we explore the licensing-related problems that developers experienced and expectations they have for licensing support from forges (e.g., GitHub). + + +Our investigation involves, on one hand, the analysis of the commit history of 16,221 Java open source projects to identify the commits where licenses were added or changed. On the other hand, it consisted of a survey—in which 138 developers informed their involvement in licensing-related decisions and 52 provided deeper insights about the rationale behind the actions that they had undertaken. The results indicate that developers adopt licenses early in the project’s development and change licensing after some period of development (if at all). We also found that developers have inherent biases with respect to software licensing. Additionally, reuse—whether by a non-contributor or for commercial purposes—is a dominant reason why developers change licenses of their systems. Finally, we discuss potential areas of research that could ameliorate the difficulties that software developers are facing with regard to licensing issues of their software systems. + + +Index Terms—Software Licenses, Mining Software Repositories, Empirical Studies + + +I. INTRODUCTION + + +Software licenses are the legal mechanism used to determine how a system can be copied, modified, or redistributed. Software licenses allow a third party to utilize code as long as they adhere to the conditions of the license. In particular, open source licenses are those that comply with the Open Source Definition [4]. Specifically, the goal of these licenses is to facilitate further copying, modifying, and distributing software as long as a set of ten conditions are met (such as free redistribution and availability of source code). + + +For software to be open source, its creators must choose an open source license. However, there is a large number of open source licenses in use today. They range from highly restrictive (such as the General Public License—GPL—family of licenses) to ones with very few restrictions (such as the MIT license). The choice of a license will determine if, and how, a given open source software can be reused. This is especially true for libraries that are expected to be integrated and distributed with the software that uses them. Furthermore, the choice of a license might also be affected by the dependencies used (e.g., software that uses a library under the GPL requires the software to be GPL also, while software that uses a library under the MIT license can be under any license, including commercial). + + +At some point, the creators of open source software must choose a license that: 1) expresses the developers’ philosophy; 2) meets their deployment goals, and 3) is consistent with the licenses of the components reused by that software. However, choosing a license is not an easy process. Developers do not necessarily have a clear idea on the exact consequences of licensing (or not licensing) their code under a specific license; for instance, developers ask questions on Question & Answer (Q&A) websites looking for advice on how to redistribute code licensed with a dual license among the other issues (e.g., question 2758409 in Stack Overflow [19] and question 139663 in the StackExchange site for programmers [28]). Also, the problem of license incompatibility between components is not trivial (see [15] for a detailed description of this problem). + + +During the evolution of a software system, its license might change. In our previous work [30], we empirically showed—for software hosted in GitHub—that license changes are common phenomena. Stemming from the results that we previously captured by analyzing licensing and their changes in software repositories [30], the goal of this work is to understand when and why changes in licensing happen. Specifically, this paper reports the results of a survey of 138 developers with the aim of understanding (i) when developers consider adding a license to their project, (ii) why they choose a specific license for their projects, and (iii) factors influencing license changes. The 138 participants are the respondents from a set of 2,398 invitees, i.e., 5.75% of the invitees. We identified such developers by sampling 16,221 Java projects on GitHub, and then subsetting to 1,833 projects where the license changed over time. Of these 138 developers, 52 developers offered insights to the aforementioned questions, while the remaining developers reinforced that licensing decisions are not necessarily made by all contributors, but by a subset that are the copyright holders. The main findings of this study are as the following: + + +1) Developers frequently license their code early, but the main rationale for delaying licensing is usually to wait until the first release; +2) Developers have strong intrinsic beliefs that affect their choice of licenses. Also, open source foundations, such as the Apache Software Foundation, the Free Software Foundation, and the Eclipse Software Foundation exert a powerful influence on the choice of a license; + + +3) We observed that the change of a license(s) of a system is predominantly influenced by the need to facilitate reuse (mostly in commercial systems); + + +4) Developers experience difficulties in understanding the licensing terms and dealing with incompatible licenses. + + +II. RELATED WORK + + +Our work is mainly related to (i) the automatic identification and classification of licensing in software artifacts, (ii) empirical studies investigating license adoption and license evolution, (iii) qualitative studies on software licensing. Table I presents prior work in licensing by reporting the main purpose of each study and the corresponding dataset used. + + +A. Identifying and Classifying Software Licensing + + +Automatic identification of software licensing has been widely explored before. To the best of our knowledge, the FOSSology project [17] was the first one aimed at solving the problem of license identification by extracting the licensing information of projects and using machine learning for classification. Another representative project is the ASLA tool by Tuunanen et al. [29], which showed an 89% accuracy with respect to classifying the licenses of files in FOSS systems. + + +The current state-of-the-art automated tool for license identification, Ninka, was proposed by German et al. [16]. Ninka relies on pattern-matching in order to identify licensing statements and return the license name and version (e.g., Apache-2.0). The evaluation of Ninka indicated a precision of 95%. + + +Since software is not always distributed with or as source code, the traditional approaches for license identification that are based on the parsing of the licensing statements are not always applicable (byte-code or binaries do not inherently contain licensing information). To ameliorate this problem, Di Penta et al. [9] proposed an approach that uses code search and textual analysis to automatically identify the licensing of jars. The approach automatically queried Google Code Search by extracting information from decompiled code. Additionally, German et al. investigated the ability to identify FOSS licensing in conjunction with proprietary licensing by analyzing 523,930 archives [12]. + + +In this paper, we rely on Ninka [16] for license identification, since it is the current state-of-the-art technique. However, our work does not aim to improve upon license identification or classification, but, rather, to understand the rationale behind licensing decisions. + + +B. Empirical Studies on Licenses Adoption and Evolution + + +Di Penta et al. [10] investigated license migration during the evolution and maintenance of six FOSS projects. While the authors were unable to find a generalizable pattern among the projects, the results suggested that both version and type of license were modified during the systems’ life cycles. + + +German et al. [15] investigated the way in which developers handle license incompatibilities by analyzing 124 FOSS packages and from this investigation they constructed a model that outlines the advantages and disadvantages of certain licenses as well as their applicability. Additionally, German et al. [13] conducted an empirical study to (i) understand the extent to which package licensing and source code files were consistent and (ii) evaluate the presence of licensing issues due to the dependencies among the packages. The authors investigated 3,874 packages of the Fedora-12 Linux distribution and they confirmed a subset of licensing issues with the developers at Fedora. Manabe et al. [21] analyzed FreeBSD, OpenBSD, Eclipse, and ArgoUML in order to identify changes in licensing. The authors found that each of the four projects exhibited different patterns of changes in licensing. + + +German et al. analyzed fragments of cloned code between the Linux Kernel and both OpenBSD and FreeBSD [14]. They investigated the extent to which terms of the licenses were adhered during the cloning of these code fragments. Similarly, Wu et al. [31] found that cloned files have a potential to be inconsistent in terms of licenses (e.g., one has a license, while the other does not). The paper describes the types of inconsistencies and illustrates the problem and the difficulty to resolve it through an empirical study of Debian 7.5. + + +The most related empirical study to this work is our previous work [30], which analyzed license usage and license changes over 16,221 projects and sought to extract rationale from commit messages and issue tracker discussions. The results indicated a lack of documentation of licensing in both sources. While sharing the same motivation, this work is novel as it investigates when and why developers choose to license a project or change licensing (as opposed to the extent to which these changes occur) and presents rationale from a survey conducted with actual developers of the projects from our + + +| Study | Purpose | Dataset | +|----------------|-------------------------------------------------------------------------|--------------------------| +| German et al. | Investigate the presence of license incompatibilities | 3,874 packages | +| Di Penta et al.| Investigate license evolution during a system’s maintenance and evolution | 6 systems | +| German et al. | Investigate the way in which developers address incompatible licensing | 124 systems | +| German et al. | Investigate licensing between copied code fragments in Linux and two BSD distributions | 3 systems | +| Manabe et al. | Investigate license change patterns within FOSS systems | 4 systems | +| Singh et al. | Investigate the reasons for the adoption of a particular FOSS license | 5,307 projects | +| Sojer et al. | Investigate reuse and legal implication of Internet code | 686 developers | +| Sojer et al. | Investigate FOSS code reuse | 869 developers | +| Vendome et al. | Investigate license usage and changes in FOSS systems and the rationale in the revision history and issue tracker | 16,221 systems | +dataset instead of relying just on the rationale from the issue tracker discussions or from commit messages. + + +C. Qualitative Studies on Software Licensing + + +Singh and Phelps [25] studied the reasons behind the adoption of a specific license in a FOSS project. Their results suggest that such a choice is mainly driven by social factors—the adoption of a license in a new project is based on the licenses adopted by socially close existing projects (e.g., projects from the same ecosystem). Their work considered license adoption from a social networking perspective to see how the “licensor” may be influenced toward a particular license(s) based on social proximity. Our work does not investigate latent social connections between developers or the projects in which they contributed. Instead, we directly surveyed the developers to understand their reasoning for adopting a particular license. + + +Sojer et al. conducted a survey with 869 developers regarding reuse of open source code and the legal implications of the resulting code [26]. One key finding was that industry and academic institutions did not prioritize knowledge regarding licensing and reuse. The authors compared a self-assessment to a questionnaire on licensing and found a discrepancy between perceived knowledge and actual understanding of licensing. Additionally, Sojer et al. conducted a survey of 686 practitioners regarding reuse of FOSS code and found that licensing of FOSS code was the second largest impedance for reuse [27]. While the authors point to possible reasons for this observation, our study specifically aims to understand the reasons for choosing and changing licenses as well as the types of problems that practitioners face due to licensing. + + +III. DESIGN OF THE STUDY + + +The goal of our study is to investigate when developers consider licensing issues and the reasons why developers pick or change licensing in FOSS projects. The context consists of software projects, i.e., the change history of 16,221 Java FOSS projects mined from GitHub, and subjects, i.e., 138 practitioners contributing to a subset of the mined projects. + + +A. Research Questions + + +We aim at answering the following research questions: + + + + + + +RQ1. + When and why do developers first assert a licensing to their project? This research question first examines when developers commit a license to at least one file in FOSS projects hosted on GitHub (i.e., the project goes from no licensing to at least one license). We complement this analysis with questions for developers to understand the actual rationale behind the empirical observations. + + + + + + +RQ2. + When and why do developers change the licensing of their project? This research question relies on a similar analysis as the previous question, but it specifically investigates licensing changes (i.e., the change from license $A$ to license $B$). + + + + + + +RQ3. + What are the problems that developers face with licensing and what support do they expect from a forge? This question aims at understanding the problems that developers experience with licensing to better support them. Additionally, we are interested in understanding the expectation that developers may have for support incorporated by forges. + + + + + + +In order to answer our research questions, we consider two perspectives: (i) evidence collected by analyzing projects’ change history; and (ii) evidence collected by surveying developers. Both perspectives are explained in the following. + + +B. Analysis of the Projects’ Change History + + +To investigate when developers pick or change licensing, we mined the entire commit history of 16,221 public Java projects on GitHub. We first queried GitHub, using the public API [2], to generate project information for all of the publicly available projects. We extracted a comprehensive list of 381,161 Java projects by mining the project information of over twelve million projects and locally cloned all of the Java repositories, which consumes a total of 6.3 Tb of storage space. We randomly sampled 16,221 projects due to the computation time of the underlying infrastructure to analyze the licensing of all the file revisions at commit-level granularity (1,731,828 commits that spanned 4,665,611 files). Table II reports statistics about size attributes of the analyzed dataset and the overall number of different licenses considered in our study. + + +We relied upon the MARKOS code analyzer [7] to extract the licensing throughout each project’s revision history. The code analyzer incorporates the Ninka license classifier [16] in order to identify the licensing statements and classify the license by family and version (when applicable) for each file. The code analyzer mined the change log of the 16,221 projects and extracted commit hash, date, author, file, commit message, change to file (Addition, Modification, or Deletion), license change (Boolean value), license name and version (reported as a list, when multiple licenses are detected). + + +The data extraction step for the 16,221 projects took almost 40 days, and a total of 1,731,828 commits spanning 4,665,611 files were analyzed. In the case of BSD and CMU licenses we only reported a variant of either case, since Ninka was unable to identify the particular version. In the case of GPL and LGPL, it is possible for the license to have an exception that allows developers to pick future versions of that license and we annotate the license with a “+” (e.g., GPL-2.0+ signifies that the terms of GPL-3.0 can also be used). + + +To identify licensing changes, we followed the same procedure exploited in our previous work [30]. In particular, we identify a commit $c_i$ as responsible for introducing a license in a code file $F$ if before $c_i$ Ninka did not identify any license in $F$, while after $c_i$ a license in $F$ is retrieved (i.e., No License $\rightarrow$ Some License transition on $F$). Instead, we consider $c_i$ as a licensing change if the license type and/or version detected by +Ninka on $F$ before $c_i$ is different than the one detected after $c_i$ (i.e., Some License $\rightarrow$ Some Other License transitions). + + +C. Analysis of the Developers’ Survey + + +To investigate the reasons why developers add/change the license(s) of their systems, we surveyed the developers who made licensing changes in the systems to which they contributed. To find potential developers for our survey, we utilized the results of our quantitative analysis. From the 16,221 projects that we analyzed, we found 1,833 projects that had experienced either a delayed initial license addition (i.e., No License $\rightarrow$ Some License transition happened after the first project commit) or licensing change (i.e., Some License $\rightarrow$ Some Other License) over their change history. We included both scenarios to understand the rationale behind both RQ$_1$ and RQ$_2$, which required a change in licensing. For each of these projects, we used the version control history to extract the set of all its contributors. From the 1,833 projects with licensing changes, we identified a total of 2,398 valid developers e-mail address, whom we targeted as potential participants for our study. By valid, we refer filtering out contributor e-mail addresses matching the following two patterns — “[user]@locahost. +” or “[user]@none. +”— since they pointed to clearly invalid domains. We also removed developers of the Android framework, since these have always been licensed under the Apache license. The 2,398 developers were invited via e-mail to fill-in an online survey hosted on Qualtrics [5] (the survey answers were all anonymous). This e-mail invitation included (i) a link to the survey, and (ii) a description of the specific licensing addition/change(s) we observed in their project’s history. After being contacted, some developers offered further insights regarding these changes by directly responding to our email. In total, we emailed 2,398 individuals and received 138 responses to the survey and 15 follow-up emails in which developers volunteered additional information. Overall, we had a response rate of 5.75% of the developers we contacted. + + +The survey consisted of seven questions (Q1-Q7); Q7 was optional (only 12 participants answered it). Tables III and IV list the survey questions and the responses of the developers. Q1 and Q2 were dichotomous questions. These questions were used to ensure that the respondents were involved in determining the project’s licensing. If a respondent did not answer “yes” to Q2, the survey ended for the participant. Out of 138 participants, 62 responded “no” to the Q2 and so they were ineligible for the remaining questions (Q3-Q7). Questions Q3 to Q6 were multiple choice questions and included the “Other” option. If the respondents chose “Other”, they could further elaborate using an open-ended field. Question Q7 was optional and open-ended. We chose to make it optional, because some developers may not agree that the forge should be responsible for features supporting licensing. Out of 138 respondents, 76 developers were eligible for the entire survey (Q1-Q7) as per their response to Q2, but only 52 of those individuals completed the survey. + + +Since questions Q3-Q7 also included open-ended responses, we relied on a formal grounded-theory [8] coding of the open-ended responses. Three authors read all the responses and categorized each response that represented the developer’s rationale. The categories from the three authors were analyzed and merged during a second round to obtain a final taxonomy of categories. The Tables in Section IV present the final results of the grounded-theory process. + + +IV. RESULTS + + +This section discusses the achieved results answering the three research questions formulated in Section III-A. + + +A. When are the licenses added to FOSS projects? + + +Fig. 1 shows the distribution of the number of commits in which licenses were introduced into the projects within our dataset (e.g., a license introduced during the tenth commit will be represented by the number 10). We present the raw commits in log scale due to outliers from large commit histories. At least 25% (first quartile) of the projects were licensed in the first commit (Fig. 1). The median was also at two commits and third quartile was at five commits. This observation indicates that FOSS projects are licensed very early in the change history with over 75% of the projects having a license by the fifth commit. Assuming (but this might not be always the case) that the observed history corresponds to the entire project history, this result suggests that licensing is important to developers. It is interesting to note that the mean commit number for adding a license is 21 and the maximum value is 8623 commits. These two values are indicators of a long tail with a small number of projects that consider licensing late in the change history. + + +Summary for RQ$_1$ (Project History Results): we observed that developers consider licensing early in the change histories of FOSS projects. While there are projects that assert a license after a larger number of commits, 75% of our dataset had a license asserted within the first five commits. Thus, the data suggests that most of the projects adopt licenses among the very first commit activities. + + +B. Why are licenses added to FOSS projects? + + +Table III reports the responses to Question 3 (Q3) of our survey in which we tried to ascertain the rationale behind the initial project licensing. 30.8% of developers indicated that the community influences the initial licensing. One explanation for the high prevalence of this response is that certain FOSS communities stipulate and enforce that a particular license +must be used. For example, the Apache Software Foundation requires that its projects or the code contributed to their projects are licensed under the Apache-2.0 license. Instead, the Free Software Foundation promotes the use of the GPL and LGPL family of licenses. + + +19.2% of developers chose the license with the goal of making their project reusable in commercial applications. These responses also indicate a bias toward more permissive licenses that facilitate such usage while restrictive licenses can discourage such usage, since they require that a system is licensed under the same terms. This finding provides a partial explanation for the trend toward more permissive licenses we observed in our previous work [30]. + + +The results of our survey also show that licensing-related decisions are impacted by inherent developer bias. 15.4% of developers supplied answers that we categorized as moral-ethical-beliefs. An example of this category was the response by one developer indicating, “I always use GPL-3.0 for philosophical reasons.” Similarly, a different developer echoed this comment stating “I always licence GPL, moral reasons.” + + +Satisfying a dependency constraint (i.e., the need to use a license based on the license of dependencies) was a relevant reason (9.6% - 7.7% picking the explicit option and 1.9% with an “Other” response categorized as dependency constraint). This result is important, since little work has been done to analyze licensing across software dependencies. This problem also poses challenges in both identifying all of the necessary dependencies as well as the license(s) of those dependencies. Some automated build frameworks like Maven [6] or Gradle [3] attempt to ameliorate this difficulty by listing dependencies in a file that drives the building process (e.g., Project Object Model file in Maven). However, licensing is not a required field in those files. + + +The remaining answers to this question described situations in which the license was inherited by the initial founders and persisted over time. Also, the companies have policies to specifically dictate a licensing convention. In the latter case, the respondent indicated that “company (...) policy is Apache-2.0” (company name omitted for privacy). It was also interesting to see that nobody choose a license based on requests by outsiders. + + +Lastly, we identified a category in licensing changes that related to license adoption and not changes. 7.7% of developers respond to our question on licensing changes that indicated the license was missing and it was added in a later commit. For this case, we added (License Addition) to the category for Q4 in Table III. The developers noted that “Setting the license was just forgotten in the first place” and “Accidentally didn’t include explicit licence in initial commit”. These cases are also important, since it can create inconsistencies within the system or mislead non-contributors that the project is unlicensed or licensed under incompatible terms. This result further reinforces that developers view early license adoption as important, but the lack of a license may be a mistake. + + +Summary for RQ1 (Survey Results): + the initial licensing is predominantly influenced by the community to which a developer is contributing. Subsequently, commercial reuse is a common factor, which may reinforce the prevalence of permissive license usage. While reuse is a consideration, non-contributors do not seem to impact the initial licensing choice. We also found that the inclusion of a particular dependency can impact the initial licensing of a project. + + +C. When are licenses changed in FOSS projects? + + +Fig. 2 shows the distribution of when licenses were changed in the projects within our dataset (i.e., Some license→Some Other License). As in the previous section, we present the raw commit number in which the changes occurred (log scale due to outliers from large commit histories). Interestingly, the minimum value was the second commit (i.e., a license changed right after its addition in the first commit). More generally, 25% of license changes occur in the first 100 commits. The median value is 559 commits while the mean is 3,993 commits. The third quartile (2,086 commits), quite smaller than the mean, suggests a long tail of license changes occurring late in the projects’ change histories. The maximum commit number with a license change was commit 56,746. Numbers at this extreme would cause the larger mean value compared to the median. Overall, the data suggests that certain projects change licenses early in the change history; however, the license changes are much more prevalent in later commits. + + +Summary for RQ2 (Project History Results): + we observed that developers change licensing later in the change history of the FOSS projects. While there are projects that change licensing early, our first quartile was 100 commits and third quartile was 2,086 commits, demonstrating more substantial development occurred before changing licensing. + + +D. Why are licenses changed in FOSS projects? + + +Table III shows the responses to Question 4 (Q4) of our survey in which we investigated the rationale behind license changes. Allowing reuse in commercial software was the most common reason behind licensing changes (32.7%). This option was also the second most prevalent for choosing the initial license (19.2% of developers). Combining these two results, it is clear that the current license of a project is heavily affected by its need to be reused commercially. As previously stated, this result qualitatively supports the observation from our previous work [30], where we observed that projects tend to migrate toward less-restrictive licenses. + + +7.7% of developers changed licensing due to community influence. This response was a more significant factor for the initial choice in licensing, but it further emphasizes the impact +that a community can assert. One developer commented, “community influence (contributing to Apache’s projects)”. Similarly, two developers commented about the influence the Eclipse Foundation exercised over license changes in their projects. Interestingly, one developer reported: “I wanted to use the most common one for OSS Java projects”. This response suggests that a particular license may pick up more momentum and spread for a particular language. Interestingly, we observed that 7.7% of the developers were willing to change the licensing due to requests from non-contributors. The fact that this response was more prevalent for changing licensing than choosing the initial license may be influenced by outsiders waiting until a project is stable or mature before inquiring about particular licensing. + + +We also observed that both the change in license(s) of a dependency or using a new dependency prompted developers to change licenses (5.8% of developers for both cases). This observation further demonstrates the difficulty or impact that dependency can have with respect to licensing. It also suggests that there could be inconsistencies between the licensing of a system and its dependencies. + + +Moral-ethical-beliefs are also a reason for 5.8% of developers. Interestingly, we observed both the beliefs of developers and beliefs of a philanthropist, who is funding the project’s development. While one developer acknowledged, “I simply wanted to pick a ‘free’ license and chose Apache without much consideration,” another developer indicated that “Philanthropic funders encouraged us to move to GPL3, as well as our own internal reflection on this as we came to understand GPL3 better.” In the former example, it is notable that the developer’s concern was not the impact of the Apache license in particular, but primary motivator was any free license (i.e., FOSS license). The latter indicates that the individuals funding the projects can influence the licensing. While the developers were not coerced to change to the GPL-3.0, they were still influenced by the beliefs of the individuals funding the system’s change history. + + +Summary for RQ2 (Survey Results): + developers seem to change licensing to support reuse in commercial systems. While community influence still impacts changing licensing, it appears to be a less significant factor with respect to license adoption. Based on our survey results, the reasons behind changing licensing are more diverse and more evenly distributed among the topics than we observed in the selection of the initial license. + + +E. What are the problems that developers face with licensing and what support do they expect from a forge? + + +Table IV shows the results for Questions 5-7 (Q5-Q7) that investigate both the problems that developers experience with licensing and expected licensing support from the forge. + + +In Q5, we investigated the problems related to licensing that developers have experienced. 23 out of 52 developers (44.2%), explicitly mentioned “No problem” in the “Other” field. For those who recognized problems, the main reason was the inability of others to use the project due to its license (17.3%). Since developers consider this a problem, it suggests that developers are interested in allowing broad access to their work. However, they may be constrained due to desired protections (e.g., patent protect from Apache-2.0 or GPL-3.0) or external factors, like the licensing of dependencies (external since the developers cannot change those licenses). + + +Additionally, developers indicated that choosing the correct license was difficult for them (13.5%). The litigious nature of these licenses can lead to misinterpretations by developers. For example, the Apache Foundation states on their webpage that “The Apache Software Foundation is still trying to determine if this version of the Apache License is compatible with the GPL” [1]. Additionally, 5.8% developers indicated that they experienced misunderstandings with respect to license compatibility. To make matters worse, 9.6% of the developers experienced compatibility problems with dependencies. Therefore, developers not only faced difficulty while determining the appropriate license, but they also misunderstood the compatibility among licenses and experienced incompatibility between their project’s licensing and a desired dependency’s licensing. + + +Developers also experienced difficulties with their users misinterpreting or not understanding the terms of their license. One developer stated that “Users do not read/understand the license, even though it is a most simple one.” This result poses two possible problems — either users (i.e., developers looking to reuse the code) ignore the actual licensing text or they struggle to interpret even the easier licenses. The former would demonstrate a bigger problem in that users do not take licensing seriously, while the latter demonstrates that the difficulty in understanding licensing is more extensive than just very litigious licenses. Reinforcing the second scenario, another developer noted the problem was “Just the usual challenges of talking with potential commercial partners who do not understand the GPL at all”. By phrasing the comment with the usual challenges, it suggests that the developer had repeated experience with partners unable to understand licensing. This is not necessarily an isolated case, but rather potentially widespread experience shared by other developers. + + +Regarding the support provided by the forge, in this case GitHub, we investigated the impact of a feature added to help document the license of a project—see Q6 in Table IV. This feature was added as a response to the criticism from some practitioners [24]. While 36.5% of developers did not have access to the feature at the time they created their project, the interesting result is that more than half (51.9%) of developers were not influenced by the availability of such a tool. Additionally, the “Other” responses indicated that the feature would not have had an impact on their choice (3.8%) and a single developer specifically chose not to license her project, leading to a combined 58% of developers that were unaffected by this feature. Thus, our data suggests that this GitHub feature did not affect/influence developers when licensing (or not) software hosted in GitHub. + + +Finally, we received 11 responses to our optional question (Q7) concerning whether forges should provide features that assist the licensing of their software. Since GitHub has been +criticized by practitioners [24] for a lack of licensing consideration, this question seeks to understand the features that practitioners expect from a forge to this end. 10 out of 11 participants answered “None”. Of those 10 developers, only one explained that a third party tool should handle license compatibility analysis. The respondent indicated that the ideal tool would utilize the various forges and build frameworks to be a dependency graph of license compatibility stating the following: + + +“This is the job of a 3rd party tool IMO since neither github nor forge do or should own all open source deps. A 3rd party tool ideally would know about github, bitbucket, etc + poms and pom license fields, etc and form a comprehensive dep-graph license compat view given a node.” + + +Another developer noted, “None. From our perspective it really isn’t that hard to put copyright and licence notices in our source files.” This comment is interesting since it conflicts with results from Q4, where developers indicated that licenses were sometimes missing or an incorrect license was used. + + +The only developer wishing support from the forge indicated a desire for a license compatibility checker and a license selection wizard. This developer commented the desire for two particular features stating the following: + + +“1) License compatibility checker - verify the license of your project with the license of included support software (gems, libraries, includes) and alert user to potential conflicts. This could also be used for the use case that you want to adopt a piece of software to add to an existing project + + +| Question/Answer | #D | % | +|-----------------|----|---| +| Q1. Were you involved in changes occurring to parts of the system that underwent license changes? | 138 | 54.3% | +| Yes | 75 | 54.3% | +| No | 63 | 45.7% | +| Q2. Were you involved in determining the license or change in license of the project or some of its files? | 138 | 53.7% | +| Yes | 76 | 53.7% | +| No | 62 | 46.3% | +| Q3. How did you determine/pick the initial license for your project or files in your project? | 52 | 7.7% | +| Dependency constraint | 4 | 7.7% | +| Community influence (e.g., contributing to Apache projects) | 16 | 30.8% | +| Requests by non-contributors to reuse your code | 0 | 0% | +| Interest of reuse for commercial purposes | 10 | 19.2% | +| Other (please specify) | 22 | 42.3% | +| — Closed-source | 1 | 1.9% | +| — Company-policy | 2 | 3.8% | +| — Dependency-constraint | 1 | 1.9% | +| — Inherit-license | 3 | 5.8% | +| — Moral-ethical-belief | 8 | 15.4% | +| — Project-Specific | 2 | 3.8% | +| — Social-trend | 2 | 3.8% | +| — None | 3 | 5.8% | +| Q4. What motivated or caused the change in license? | 52 | 5.8% | +| License of dependencies changed | 3 | 5.8% | +| Using a new library imposing specific licensing constraints | 3 | 5.8% | +| Allow reuse in commercial software | 17 | 32.7% | +| Requests by non-contributors to reuse your code | 4 | 7.7% | +| Other (please specify) | 25 | 48.1% | +| — Change-to-license-text | 2 | 3.8% | +| — Community-influence | 4 | 7.7% | +| — Fix-incorrect-licenses | 1 | 1.9% | +| — Improve-clarity | 1 | 1.9% | +| — Missing-license (License Adoption) | 4 | 7.7% | +| — Moral-Ethical-belief | 3 | 5.8% | +| — More-permissive-license | 1 | 1.9% | +| — New-license-version | 2 | 3.8% | +| — Personal-Preference/Project-specific | 1 | 1.9% | +| — Private-to-public-project | 1 | 1.9% | +| — Promote-Reuse | 1 | 1.9% | +| — Unclear | 1 | 1.9% | +| — None | 3 | 5.8% | + + +| Question/Answer | #D | % | +|-----------------|----|---| +| Q5. What problems (if any) have you experienced due to license selection in terms of code reuse? | 52 | 9.6% | +| My license was not compatible with desired dependencies | 5 | 9.6% | +| Others were unable to use my project unless I re-licensed it | 9 | 17.3% | +| A dependency changed licenses ad was no longer compatible | 1 | 1.9% | +| There was a misunderstanding of compatibility between licensing terms of two licenses | 3 | 5.8% | +| Choosing the correct license was difficult/confusing | 7 | 13.5% | +| Other (please specify) | 27 | 52.9% | +| — Code-unavailability | 1 | 1.9% | +| — Lack-of-undersanding-by-Users | 2 | 3.8% | +| — Unique-New-License | 1 | 1.9% | +| — No problems | 23 | 44.2% | +| Q6. Did GitHub’s mechanism for licensing impact your decision on licensing your project? | 52 | 5.8% | +| Yes, it caused me to license my project | 3 | 5.8% | +| No, I already planned on licensing | 27 | 51.9% | +| No, I did not want to license at project creation | 1 | 1.9% | +| Such a mechanism was not yet available when I created my project | 19 | 36.5% | +| Other (please specify) | 2 | 3.8% | +| — No impact | 2 | 3.8% | +| Q7. What kind of support would you expect from the forge/GitHub to help you managing licenses and licensing compatibility issues in your software? | 11 | 90.9% | +| None | 10 | 90.9% | +| License Checker and License Selection Wizard | 1 | 9.1% | +- is it compatible? 2) License selection wizard - when you begin a project, the wizard can ask you a series of questions (do you want to allow commercial use, do you require mods to be licensed the same as original, etc) and then suggest a license for the project.” + + +While only one developer wanted support from the forge, this single developer’s comments seem to address many of the problems and difficulty with respect to licensing for which we found evidence in Q6 of the survey. + + +Summary for RQ3 (Survey Results): + although 44.2% of the developers surveyed indicated that they have not experienced problems with licensing, the remaining respondents provided a diverse set of answers. They primarily were related to license incompatibility or difficulty understanding the licensing. Lastly, the survey indicated that GitHub’s mechanism to encourage or aid in licensing was not necessary or unavailable to the surveyed developers. We also found that most developers did not expect support from the forge, but one did indicate the desire for a third-party tool. However, one developer did express interest in forge’s support, and the comments aligned with our results regarding problems that developers actually faced. + + +V. LESSONS AND IMPLICATIONS + + +Intrinsic beliefs of the developers. + The first important observation is that the participants have a bias toward FOSS licensing from an ethical perspective. 52% of the respondents indicated (Q6) that they planned on licensing the project prior to creation; only 6% of the respondents (Q6) were influenced to license their project due to GitHub’s licensing feature (i.e., a combo list with license names). Similarly, the “Other” responses regarding the reason for a project’s initial licensing (Q3) indicated a sense of obligation. For example, one developer said: “It was the only moral and ethical choice”. + + +Delayed licensing. + Developers do not necessarily have to decide to open source from the beginning and delay doing it. While we empirically observed early license adoption in general, one developer wrote in an email that they waited to choose a license: “this project just didn’t have a license on day 1 and it was added at first release.” Similarly, one developer responded to the survey that licensing changed due to “change private to public project”. This observation suggests that licensing is still important to these developers, but it may not be considered relevant until the project reaches a certain level of maturity. Thus, there is the need for tools to add and verify licensing information of a system at any given point in time. + + +Community and organizational influence. + Our results indicate that communities, and in particular FOSS foundations (such as the Apache Software, Eclipse, and the Free Software foundations) exert powerful influence on the choice of a license by its developers. About 31% of the participants responded that initial licensing is done by following community’s specific licensing guidelines. Improving or developing on top of existing software from a foundation mostly requires using the same license aligning with foundation’s philosophy. + + +License misunderstanding. + The survey stresses the need for aid in explaining licenses and the implications of their use. About 20% of the respondents highlighted that licensing is confusing and/or hard to understand (Q5): 13.5% of respondents indicated that developers—both the authors and the users—find licensing confusing or difficult (Q5), and 6% of developers also noted that there were misunderstandings between license compatibility. Additionally, one “Other” respondent stated, “Users do not read/understand the license, even though it is a most simple one,” which suggests that developers experienced misunderstanding whether on their own or by users. + + +Reuse for commercial distribution. + The results regarding licensing changes indicated that commercial usage of code is a concern in the open source community. We found that practitioners used permissive licenses to facilitate commercial distributions, in some cases they change to a more permissive license for this purpose. + + +Dependency influence. + A software system must choose its dependencies so to avoid conflicts due to incompatibilities between the system’s license(s) and the depending components’ license(s). Similarly, others will choose to use a particular software system based on its license. Thus, the change of a license in a system has the potential of creating a chain reaction: those that use it might need to change their license, or drop it as a dependency; for the system changing license, the potential pool of reusable components will change accordingly—it might need to drop a different dependency, or it might be able to add a dependency with a previously incompatible license. + + +Forge’s support. + Most of our respondents do not expect any licensing support from the forge. It is likely that the individuals that benefit the most from licensing support in the forge are those who are looking to reuse software. This is supported by our results that indicate that the license(s) of dependencies is an important consideration, since it might impact the ability to reuse the dependency or require a change of the license(s) of the software that uses it. Thus, compliance-oriented features may aid developers to ensure they can legally reuse software. + + +Finally, our results demonstrate that external factors like community, license prevalence, and licenses of dependencies have an important impact on licensing. + + +A feature provided by the forge to support domain suggested licensing could benefit practitioners. Since developers indicated that licensing is difficult, a more informative feature could help practitioners determine the appropriate licensing. For instance, the current licensing support feature provided by GitHub feature is not particularly informative for developers. Basically, it provides a link to choosealicense.com, but does not provide further guidance to the developer. Also, it does not cover issues related to compatibilities at all. Moreover, applications within the same domain may be utilizing some of the same dependencies or require similar grants for redistribution and reuse. To better support developers, a forge could include a domain analysis feature to detect similar applications [22] and suggest to the developer/maintainer the license used by +similar systems (if no other criteria has been considered, such as community or dependencies). + + +VI. Threats to Validity + + +Threats to construct validity relate to the relationship between theory and observation, and can be mainly due to imprecision extracting licensing and results from the developer survey. In order to identify the licenses, we relied on Ninka [16], which has been empirically evaluated indicating a precision of 95%, when it is able to identify the license (85% of the time in the same study showing the precision). In order to classify the free responses, we conducted a formal Grounded Theory analysis with two-author agreement. In particular, all of the responses were read and categorized by three authors and the agreement of two of them was considered necessary. Another threat concerns the fact that, possibly, GitHub could have mirrored only a fraction of the projects’ change history; hence, it is possible that the first commits in GitHub may not correspond to the first commits in the projects’ history. Finally, the response rate of our study is 5.75%, below the response rate often achieved in survey studies [18], i.e. 10%. However, explicitly targeting original developers is usually challenging because many of them may not be active, the email addresses are invalid, or even impossible to contact because they are no longer using the email addresses we collected. + + +Threats to internal validity relate to internal, confounding factors that would bias the results of our study. In analyzing both license introduction and licensing changes, we considered the commit in which we observed the phenomena as an instance to ensure we did not introduce duplicates. We only excluded developers of projects from the Android framework since the project has always been Apache licensed. Therefore, we did not have a bias while selecting developers. To address lack of coverage of our original options in our survey, we added a free form option “Other” to each question. In addition, we only presented the full survey to developers that indicated that they were involved in the licensing decision(s). Another possible threat to internal validity concerns the fact that, possibly, the 138 respondents decided to participate to the survey because they had greater interest in licensing problems than others. However, results shown in Section IV suggest that this is not the case, e.g., respondents comprise people who are directly involved in the licensing, but they did not necessarily experience any licensing problems. + + +Threats to external validity relate to the ability to generalize the results from the study, and we do not assert that these observations are representative of the FOSS community. While we randomly sampled the projects from GitHub, we only did it for Java projects. Thus, other languages and forges may demonstrate different behavior as well as the developers of those projects may have different beliefs. However, GitHub is the most popular forge with a large number of public repositories. A larger evaluation on multiple forges and projects in other languages is necessary to understand when licenses are adopted and changed in the general case. Additionally, we surveyed actual developers of these projects. While we do not claim that the rationale is complete, the conclusions represent explicit feedback as opposed to inferred understanding. Therefore, the rationale is a definitive subset. We do not claim that these results apply in the context of closed source systems, since we required source code to identify licensing. + + +Finally, to limit this threat to external validity, we examined the diversity of our data set using the metrics proposed by Nagappan et al. [23]. To understand the diversity, we matched the projects in our dataset against the projects mined by Boa [11], finding 1,556 project names that matched between the two datasets. We used these 1,556 projects to calculate our diversity score across six dimensions. The results were 0.45 for programming language, 0.99 for developers, 1.00 for project age, 0.99 for number of committers, 0.96 for number of revisions, 0.99 for number of program languages, suggesting that our dataset is diverse excluding the programming language score (impacted by selecting Java projects). Overall, our score was 0.35, which suggests that we cover over a third of FOSS projects with 9.5% of our dataset. + + +VII. Conclusions + + +We investigated the reasons on when and why developers adopt and change licenses during evolution of FOSS Java projects on GitHub. To this aim, we conducted a survey with developers that contributed changes to the projects that included licensing changes. We observed that developers typically adopt a license within the first few commits, suggesting that developers consider licensing as an important task. Similarly, we observe that most licensing changes appear after a non-negligible period of development as visible from the observed history. We then explored the reasons for the initial licensing, license changes, and problems experienced by developers with respect to software licensing. We observed that developers view licensing as an important yet non-trivial feature for their projects. License implications or compatibility are not always clear and so they can lead to changes. Additionally, there are external factors influencing the projects’ licensing, such as community, purpose of usage (i.e., commercial systems), and use of third-party libraries. While developers did not strongly indicate an expectation for licensing support by the forge, it is evident that third-party tools or features within the forge would aid developers in helping to deal with licensing decisions and changes. + + +Acknowledgements + + +We would like to thank all the open source developers who took time to participate in our survey. Specifically, we would like to acknowledge developers who provided in-depths answers and responded to follow-up questions. This work is supported in part by NSF CAREER CCF-1253837 grant. Massimiliano Di Penta is partially supported by the Markos project, funded by the European Commission under Contract Number FP7-317743. Any opinions, findings, and conclusions expressed herein are the authors’ and do not necessarily reflect those of the sponsors. +REFERENCES + + +[1] Apache License, Version 2.0 (current) https://www.apache.org/licenses/. Last accessed: 2015/03/23. + + +[2] GitHub API. https://developer.github.com/v3/. Last accessed: 2015/01/15. + + +[3] Gradle. https://gradle.org/. + + +[4] Open Source Definition http://opensource.org/osd. + + +[5] Qualtrics http://www.qualtrics.com/. + + +[6] Apache. Apache maven project. https://maven.apache.org/. + + +[7] G. Bavota, A. Ciemniewska, I. Chulani, A. De Nigro, M. Di Penta, D. Galletti, R. Galoppini, T. F. Gordon, P. Kedziora, I. Lener, F. Torelli, R. Pratola, J. Pukacki, Y. Rebahi, and S. G. Villalonga. The market for open source: An intelligent virtual open source marketplace. In 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014, Antwerp, Belgium, February 3-6, 2014, pages 399–402, 2014. + + +[8] J. Corbin and A. Strauss. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology, 13(1):3–21, 1990. + + +[9] M. Di Penta, D. M. Germán, and G. Antoniol. Identifying licensing of jar archives using a code-search approach. In Proceedings of the 7th International Working Conference on Mining Software Repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa, May 2-3, 2010, Proceedings, pages 151–160, 2010. + + +[10] M. Di Penta, D. M. Germán, Y. Guéhéneuc, and G. Antoniol. An exploratory study of the evolution of software licensing. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, pages 145–154, 2010. + + +[11] R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013, pages 422–431, 2013. + + +[12] D. M. Germán and M. Di Penta. A method for open source license compliance of java applications. IEEE Software, 29(3):58–63, 2012. + + +[13] D. M. Germán, M. Di Penta, and J. Davies. Understanding and auditing the licensing of open source software distributions. In The 18th IEEE International Conference on Program Comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2, 2010, pages 84–93, 2010. + + +[14] D. M. Germán, M. Di Penta, Y. Guéhéneuc, and G. Antoniol. Code siblings: Technical and legal implications of copying code between applications. In Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 (Co-located with ICSE), Vancouver, BC, Canada, May 16-17, 2009, Proceedings, pages 81–90, 2009. + + +[15] D. M. Germán and A. E. Hassan. License integration patterns: Addressing license mismatches in component-based development. In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings, pages 188–198, 2009. + + +[16] D. M. Germán, Y. Manabe, and K. Inoue. A sentence-matching method for automatic license identification of source code files. In ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20-24, 2010, pages 437–446, 2010. + + +[17] R. Gobeille. The FOSSology project. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR 2008 (Co-located with ICSE), Leipzig, Germany, May 10-11, 2008, Proceedings, pages 47–50, 2008. + + +[18] R. M. Groves. Survey Methodology, 2nd edition. Wiley, 2009. + + +[19] J. Hartsock. jquery, jquery ui, and dual licensed plugins (dual licensing) [closed] http://stackoverflow.com/questions/2758409/jquery-jquery-ui-and-dual-licensed-plugins-dual-licensing. Last accessed: 2015/02/15. + + +[20] Y. Manabe, Y. Hayase, and K. Inoue. Evolutional analysis of licenses in FOSS. In Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), Antwerp, Belgium, September 20-21, 2010, pages 83–87. ACM, 2010. + + +[21] Y. Manabe, Y. Hayase, and K. Inoue. Evolutional analysis of licenses in FOSS. In Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), Antwerp, Belgium, September 20-21, 2010., pages 83–87, 2010. + + +[22] C. McMillan, M. Grechanik, and D. Poshyvanyk. Detecting similar software applications. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 364–374, Piscataway, NJ, USA, 2012. IEEE Press. + + +[23] M. Nagappan, T. Zimmermann, and C. Bird. Diversity in software engineering research. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 466–476, 2013. + + +[24] S. Phipps. Github needs to take open source seriously http://www.infoworld.com/d/open-source-software-github-needs-take-open-source-seriously-208046. + + +[25] P. Singh and C. Phelps. Networks, social influence, and the choice among competing innovations: Insights from open source software licenses. Information Systems Research, 24(3):539–560, 2009. + + +[26] M. Sojer, O. Alexy, S. Kleinknecht, and J. Henkel. Understanding the drivers of unethical programming behavior: The inappropriate reuse of internet-accessible code. J. of Management Information Systems, 31(3):287–325, 2014. + + +[27] M. Sojer and J. Henkel. Code reuse in open source software development: Quantitative evidence, drivers, and impediments. Journal of the Association for Information Systems, 11(12):868–901, 2010. + + +[28] J. T. Confusion about dual license (mit/gpl) javascript for use on my website http://programmers.stackexchange.com/questions/139663/confusion-about-dual-license-mit-gpl-javascript-for-use-on-my-website. Last accessed: 2015/02/15. + + +[29] T. Tuunanen, J. Koskinen, and T. Kärkkäinen. Automated software license analysis. Autom. Softw. Eng., 16(3-4):455–490, 2009. + + +[30] C. Vendome, M. Linares-Vásquez, G. Bavota, M. Di Penta, D. M. Germán, and D. Poshyvanyk. License usage and changes: A large-scale study of Java projects on GitHub. In The 23rd IEEE International Conference on Program Comprehension, ICPC 2015, Florence, Italy, May 18-19, 2015. IEEE, 2015. + + +[31] Y. Wu, Y. Manabe, T. Kanda, D. M. Germán, and K. Inoue. A method to detect license inconsistencies in large-scale open source projects. In The 12th Working Conference on Mining Software Repositories MSR 2015, Florence, Italy, May 16-17, 2015. IEEE, 2015. +---------------------------------------- +------------------------------- +Section 141: +Sustainability of Open Source software communities beyond a fork: How and why has the LibreOffice project evolved? + + +Jonas Gamalielsson*, Björn Lundell +University of Skövde, P.O. Box 408, SE-541 28 Skövde, Sweden + + +ARTICLE INFO +Article history: +Received 19 October 2012 +Received in revised form 7 November 2013 +Accepted 8 November 2013 +Available online 21 November 2013 + + +Keywords: +Open Source software +Fork +Community evolution + + +ABSTRACT +Many organisations are dependent upon long-term sustainable software systems and associated communities. In this paper we consider long-term sustainability of Open Source software communities in Open Source software projects involving a fork. There is currently a lack of studies in the literature that address how specific Open Source software communities are affected by a fork. We report from a study aiming to investigate the developer community around the LibreOffice project, which is a fork from the OpenOffice.org project. In so doing, our analysis also covers the OpenOffice.org project and the related Apache OpenOffice project. The results strongly suggest a long-term sustainable LibreOffice community and that there are no signs of stagnation in the LibreOffice project 33 months after the fork. Our analysis provides details on developer communities for the LibreOffice and Apache OpenOffice projects and specifically concerning how they have evolved from the OpenOffice.org community with respect to project activity, developer commitment, and retention of committers over time. Further, we present results from an analysis of first hand experiences from contributors in the LibreOffice community. Findings from our analysis show that Open Source software communities can outlive Open Source software projects and that LibreOffice is perceived by its community as supportive, diversified, and independent. The study contributes new insights concerning challenges related to long-term sustainability of Open Source software communities. + + +© 2013 The Authors. Published by Elsevier Inc. Open access under CC BY license. + + + + +Introduction +Many organisations have requirements for long-term sustainable software systems and associated digital assets. Open Source software (OSS) has been identified as a strategy for implementing long-term sustainable software systems (Blondelle et al., 2012a; Lundell et al., 2011; Müller, 2008). For any OSS project, the sustainability of its communities is fundamental to its long-term success. In this study we consider long-term sustainability of communities in OSS projects involving a fork. Our overarching goal was to establish rich insights concerning how and why the LibreOffice project and associated communities have evolved the LibreOffice project and associated communities have evolved. More specifically, we report on commitment with the LibreOffice project, retention of committers, and insights and experiences from participants in the LibreOffice community. Overall, the study has revealed several key findings. First, the LibreOffice project, which was forked from the OpenOffice.org project, shows no sign of long-term decline. Second, the LibreOffice project has attracted the long-term and most active committers in the OpenOffice.org project. Third, our analysis shows that Open Source software communities can outlive Open Source software projects. Fourth, LibreOffice is perceived by its community as supportive, diversified, and independent. + + + + +The issue of forking OSS projects has been an ongoing issue of debate amongst practitioners and researchers. It has been claimed that “Indeed, the cardinal sin of OSS, that of project forking (whereby a project is divided in two or more streams, each evolving the product in a different direction), is a strong community norm that acts against developer turnover on projects” (Agerfalk and Fitzgerald, 2008). Further, it has been claimed that few forks are successful (Ven and Mannaert, 2008). Therefore, it is perhaps not surprising to see claims for that “there must be a strong reason for developers to consider switching to a competing project” (Wheeler, 2007). However, it has also been argued that “forking has the capability of serving as an invisible hand of sustainability that helps open source projects to survive extreme events such as commercial acquisitions, as well as ensures that users and developers have the necessary tools to enable change rather than decay” (Nyman et al., 2012). Similarly, Brian Behlendorf, co-founder of Apache Software Foundation, states that the “right to fork means that you don’t have to have any tolerance for dictators, you don’t have to deal with... +people who make bad technical decisions – you can put the future into your own hands, and if you find a group of other people who agree with you, you can create a new project around it” (Severance, 2012). Another argument is that code forking can positively impact on both governance and sustainability of OSS projects at the levels of the software, its community and business ecosystem (Nyman and Lindman, 2013). From this, there is clearly a need for increased knowledge about how OSS communities are affected by a fork. + + +There are two specific objectives. For the first objective, we characterise community evolution over time for the LibreOffice project and the related OpenOffice.org and Apache OpenOffice projects. For the second objective, we report on insights and experiences from participants in a community of the branched project LibreOffice in order to explain how and why the project has evolved after the fork from its base project OpenOffice.org. + + +The paper makes four novel contributions. First, we establish a characterisation of the LibreOffice project and the related OpenOffice.org, and Apache OpenOffice projects with respect to history, governance, and activity. Second, we present findings regarding developer commitment with the projects under different governance regimes. Third, we present findings regarding retention of committers in the projects under different governance regimes. Fourth, we report on rich insights and experiences from participants in the LibreOffice project with a view to characterise its community and its way of working. In addition, we demonstrate approaches involving metrics for analysing long-term sustainability of communities (with or without forks) in OSS projects, and illustrate their use on different OSS projects. + + +There are five reasons which motivate a study on the LibreOffice project. Firstly, LibreOffice is one of few OSS projects which have had an active community for more than 10 years (when including the development in OpenOffice.org), with significant commercial interest. Secondly, there have been tensions within the OpenOffice.org project which finally led to the creation of the Document Foundation and the LibreOffice project (Byfield, 2010; Documentfoundation, 2013a). Thirdly, the project has reached a certain quality in that it has been adopted for professional use in a variety of private and public sector organisations (Lundell, 2011; Lundell and Gamalielsson, 2011). Therefore, its community is likely to attract a certain level of attention from organisations and individuals. Fourthly, previous studies of the base project OpenOffice.org (Ven et al., 2007) and more recent studies of LibreOffice (Gamalielsson and Lundell, 2011) show that there is widespread deployment in many organisations in a number of countries. This in turn imposes significant challenges for a geographically distributed user community. Fifthly, previous results (Gamalielsson and Lundell, 2011, 2012) and anecdotal evidence from an official spokesperson for the LibreOffice project (Nouws, 2011) suggest significant activity in the LibreOffice community. This motivates a more in-depth investigation of how and why the LibreOffice project evolved. + + +Hence, there is a need to extend previous studies on the LibreOffice project and in so doing include investigation of the project which LibreOffice was forked from (the OpenOffice.org project) and also alternative branches (the Apache OpenOffice project). An investigation of the OpenOffice.org project is interesting since it has been widely deployed. Further, the project is a natural source for recruitment to the LibreOffice project. Similarly, Apache OpenOffice is also interesting to investigate since it is the project that succeeded the OpenOffice.org project after Oracle abandoned it. Further, the investigation of Apache OpenOffice enables a more comprehensive study of community dynamics since the OpenOffice.org project is a potential source for recruitment to the Apache OpenOffice project as well. + + +For the rest of this paper we position our exploration of sustainability of OSS communities in the broader context of previous research on OSS communities (Section 2). We then clarify our research approach (Section 3), and report on our results (Sections 4 and 5). Thereafter, we analyse our results (Section 6) followed by discussion and conclusions (Section 7). + + + + +On sustainable Open Source software communities + + + + +Many companies need to preserve their systems and associated digital assets for more than 30 years (Lundell et al., 2011), and in some industrial sectors (such as avionics) even more than 70 years (Blondelle et al., 2012b; Robert, 2006). In such usage scenarios “there will be problems if the commercial vendor of adopted proprietary software leaves the market” with increased risks for long-term availability of both software and digital assets (Lundell et al., 2011). Similarly, for organisations in the public sector, many systems and digital assets need to be maintained for several decades. This causes organisations to vary concerning different types of lock-in and inability to provide long-term maintenance of critical systems and digital assets (Lundell, 2011). For this reason, sustainability of communities has been identified as essential for long-term sustainability of OSS. + + +There are many different aspects of an OSS project that can affect community sustainability. Good project management practice includes to consider different incentives for contributing to OSS communities. This in turn may affect the future sustainability of communities (Bonaccorsi and Rossi, 2006). Previous research has shown that there are a number of different kinds of motivations for individuals and firms that have impact on any decision concerning participation in OSS projects. Such motivations are sometimes categorised into economic, social, and technological types of incentives (Bonaccorsi and Rossi, 2006). Earlier research also suggests that an effective structure of governance is a basis for healthy and sustainable OSS communities (de Laat, 2007). In particular, aspects such as clear leadership, congruence in terms of project goals, and good team spirit are of fundamental importance. Moreover, the community manager in an OSS project plays a key role for achieving an effective structure of governance (Michlmayr, 2009). Further, the licensing of OSS may affect the community. It has been claimed that “fair licensing of all contributions adds a strong sense of confidence to the security of the community” (Bacon, 2009). It has also been claimed that the choice of OSS license type “can positively or negatively influence the growth of your community.” (Engelfriet, 2010) To successfully master the art of establishing a long-term sustainable OSS community is a huge challenge. As in all organisations, there are “times in every community when repetition, housekeeping, and conflict play a role in an otherwise enjoyable merry-go-round. When the community begins to see more bureaucracy and repetition than useful and enjoyable contributions, something is wrong.” (Bacon, 2009) + + +A fork is often a consequence of inadequate OSS project governance. It has been claimed that forks “are generally started when a number of developers do not agree with the general direction in which the project is heading” (Ven and Mannaert, 2008). In particular, conflicts within communities can arise due to inadequate working processes, lack of congruence concerning project goals, and unclear (or in other ways inadequate) leadership. There are different views on what is considered an OSS project fork. It has been claimed that in order to be considered a fork, a project should (Robles and Gonzalez-Barahona, 2012): (1) have a new project name, (2) be a branch of the original OSS project, (3) have an infrastructure that is separated from the infrastructure of the original project, e.g. web site, mailing lists/forums, and SCM (Software Configuration Management system), (4) have a new developer community that is disjoint from the community of the original project, and (5) have a different structure of governance. There are +also related concepts that are similar to OSS project forking such as (Robles and Gonzalez-Barahona, 2012): cloning (which involves the design of a software system that mimics another system), branching (where source code is duplicated within an SCM, creating parallel threads of development), derivation (which involves the creation of a new software system that is based on an existing system and which is compatible with the existing system), and modding (where existing software is enhanced, typically by enthusiasts, by providing patches and extensions to the existing software). There are different possible outcomes of a fork attempt. Four different categories have been identified by Wheeler (2007): (1) the forked project dies (e.g. libc/glibc), (2) the forked project re-merges with the original project (e.g. gcc/egcs), (3) the original project dies (e.g. XFree86/X.org), and (4) successful branching where both the original and forked project succeeds and typically have separate communities. A possible fifth outcome is that both the original and forked project dies (Robles and Gonzalez-Barahona, 2012). + + +Governance is of fundamental importance for sustainability and evolution of an OSS project and its associated communities. Three different phases of governance have been identified by de Laat (2007): (1) “spontaneous” governance, (2) internal governance, and (3) governance towards outside parties. The first phase of governance concerns the situation where the community (including both volunteer and potentially commercial actors) is self-directing without any formal and explicit control or coordination. Given the licensing framework, control and coordination that emerge stem from the degree of contribution by individual members. High performing members of a community may become informal leaders. The second phase is often adopted in larger projects that have existed for a longer time, and involves formal and explicit control and coordination in order to support more effective governance. Different tools are used for this including modularisation of software, assignment of roles to contributors, delegation of decision making, training and indoctrination, formalised infrastructure to support contributors, and leadership style (autocracy/democracy). A third phase of governance became necessary due to an increased external interest in OSS projects from national and international organisations in both the private and public sector. This increased institutionalisation of OSS led to an increased risk of litigation due to software patent infringements. As a solution, initiatives were taken to create legal shells around OSS projects to protect against lawsuits. One way of implementing this is by establishing non-profit foundations (such as the Linux Foundation and the Mozilla Foundation) for the governance of OSS projects. + + +In the context of OSS projects, it has been shown that “little research has been conducted on social processes related to conflict management and team maintenance” (Crowston et al., 2012). There are several open questions related to this, such as “How is team maintenance created and sustained over time?” (Crowston et al., 2012). Our study is also motivated by the fact that there is a lack of research presenting rich insights from large and widely deployed OSS projects. In particular, there is a need for increased knowledge related to community involvement in projects involving a fork. We also note that there are different, and seemingly conflicting, views amongst practitioners concerning the effect of a fork on involved projects and associated communities. This further motivates our study. For the remainder of this section we position our study with respect to earlier research. + + +There are a few studies focusing on forks in an OSS context. However, none of these studies focus on community involvement over time and do not investigate specific OSS projects in-depth. One of these studies focused on motivations for forking SourceForge.net hosted OSS projects (Nyman and Mikkonen, 2011). Another study surveyed a large number of OSS project forks with a specific focus on the temporal evolution of forks, reasons for forking, and outcomes of forks (Robles and Gonzalez-Barahona, 2012). A similar but more limited study focused on the motivations and impact of the fork mechanism in OSS projects (Visser, 2012). Another study has a focus on code maintenance issues in forked projects in the BSD family of operating systems (Ray and Kim, 2012). + + +Further, there are studies on the evolution of OSS projects over time, but such studies do not always have a community focus and are not always targeted at specific projects. Examples include a study on the total growth rate of OSS projects (Deshpande and Riehle, 2008), and work on the evolution of social interactions for a large number of projects on SourceForge.net over time (Madey et al., 2004). Another example is a study on survival analysis of OSS projects involving the application of different metrics based on the duration of thousands of projects in the FLOSSMETRICS database (Samoladas et al., 2010). There are also studies which focus on the evolution of software over time for specific OSS projects but which do not consider the community aspect. An example is a study on the Linux kernel based on Lehman’s laws of software evolution, which involved the application of code oriented metrics over time (Israeli and Feitelson, 2010). A similar approach was used in a case study on the evolution of Eclipse (Mens et al., 2008). Further, the growth of FreeBSD and Linux was studied and compared to earlier results on code evolution (Izurieta and Bieman, 2006). Another study on the topic of software evolution proposes a model of the Linux kernel life cycle (Feitelson, 2012). + + +A somewhat different strand of research involves development and application of different kinds of statistical measures for estimation and prediction of the survivability (Raja and Tretter, 2012; Wang, 2012), success (Crowston et al., 2003, 2006; Lee et al., 2009; Midha and Palvia, 2012; Sen et al., 2012; Subramaniam et al., 2009; Wiggins et al., 2009; Wiggins and Crowston, 2010) and attractiveness (Santos et al., 2013) of OSS projects. Such measures may consider factors related to (Wang, 2012): developer characteristics (e.g. user and developer effort, service quality, leadership and adherence to OSS ideology), software characteristics (e.g. license terms, targeted users, software modularity and quality), and community attributes (e.g. organisational sponsorship, financial support, trust and social network ties). However, forks are usually not explicitly addressed in such research and the focus is more on the overall survivability or success of OSS projects rather than focusing on the behaviour of communities associated with the projects. Further, such research typically use a large selection of projects from different OSS forges for statistical validation of the measures, whereas our study provides an in-depth analysis of a few inter-related OSS projects employing both a quantitative and qualitative approach. + + +There are other studies which do have a focus on the evolution of communities for specific OSS projects, but do not address the effects of a fork. For example, case studies have been conducted on the Debian project involving quantitative investigations of evolution of maintainership and volunteer contributions over time (Robles et al., 2005; Michlmayr et al., 2007). Another study involved an investigation of developer community interaction over time for Apache web server, Gnome and KDE using social network analysis (Lopez-Fernandez et al., 2006). A similar study involved the projects Evolution and Mono (Martinez-Romo et al., 2008). Case studies on the Nagios project (Gamalielsson et al., 2010), and Top-Cased & Papyrus projects (Gamalielsson et al., 2011) addressed community sustainability and evolution over time with a special focus on organisational influence. Other research partially focusing on community evolution are early case studies on large and well-known OSS projects including the Linux kernel (Moon and Sproul, 2000), Gnome (German, 2003), Apache web server (Mockus et al., 2002), Mozilla (Mockus et al., 2002), and FreeBSD (Dinh-Trong and Bieman, 2005). +Further, there are no earlier reported in-depth studies on any of the three projects (LibreOffice, OpenOffice.org, and Apache OpenOffice) with a focus on the evolution of OSS project communities over time except for our own earlier studies on LibreOffice (Gamalielsson and Lundell, 2011, 2012). In a study on the process of participation in OSS communities, Shibuya and Tamai (2009) compare the communities for the Writer tool in the OpenOffice.org project, MySQL server in the MySQL project, and GTK+ in the GNOME project. This was done using different kinds of project documentation and quantitative data from bug tracking systems and source code repositories. However, this is a very limited study which only partially covers the OpenOffice.org project. There is another study that also has a community focus but from an open user experience design perspective, rather than a community evolution perspective (Bach and Carroll, 2010). Further, there are studies on OpenOffice.org without a community focus. One such study focused on code evolution (Rossi et al., 2009). Specifically, the study explored the relation between code activities, bug fixing activities, and software release dates for five projects including OpenOffice.org. In another study the maintenance process of the OpenOffice.org project was analysed using its defect management and version management systems (Koponen et al., 2006). There are also studies focusing on issues related to migration, adoption, and deployment of OpenOffice.org (Huysmans et al., 2008; Rossi et al., 2006; Ven et al., 2010; Seydel, 2009). + + + + +Research approach + + + + +To address our first objective (to characterise community evolution over time for the LibreOffice project and the related OpenOffice.org and Apache OpenOffice projects) we undertook an analysis of the LibreOffice project and the related OpenOffice.org and Apache OpenOffice projects. This was done through a review of documented project information and a quantitative analysis of project repository data in order to investigate the sustainability in OSS communities. This included analysis of different project phases under different governance regimes. For the OpenOffice.org project this encompassed both the time period with governance by Sun Microsystems and by Oracle. For the rest of this paper we refer to the three projects as OO (OpenOffice.org), LO (LibreOffice), and AOO (Apache OpenOffice). OO with governance by Sun Microsystems is hereafter referred to as SOO, and OO with governance by Oracle is hereafter referred to as AOO. + + +To contextualise insights from the LibreOffice project, we undertook an analysis of data from a number of different sources. First, we established a characterisation of the three projects (LO, OO and AOO) by undertaking an analysis of: the history and governance of the projects, the release history, and commits to the SCM and contributing committers over time. Second, to investigate developer commitment with the projects we used different metrics that consider to what extent committers have been involved in and contributed to the different projects under different governance regimes. Third, to investigate retention of committers in the projects under different governance regimes we used different metrics that consider: the recruitment of committers over time, the retirement of committers over time, the distribution of commits for committers contributing to different combinations of projects, and the temporal commitment patterns between projects for committers. + + +In our quantitative analysis we adopt and extend approaches from earlier studies (Gamalielsson et al., 2011; Gamalielsson and Lundell, 2011, 2012). This is done in order to analyse the contributions in terms of committed SCM artefacts of the OSS projects over time. SCM data was collected from the official repositories for LO and AOO, and for OO from a website recommended at the AOO website which keeps the legacy source code. The data for the LO project was collected from the LO website,1 where the Git sub-repositories “core”, “binfilter”, “dictionaries”, “translations” and “help” were used in the analysis. The choice of sub-repositories was done after having a personal dialogue with key LO contributors. For the OO project, data was collected from an archive website,2 where the Mercurial repository was used in the analysis. Data for the AOO project was collected from the AOO website,3 where the SVN repository was used in the analysis. Data until 31 May 2013 were used for LO and AOO, and data until the end of the OO project (April 2011) were used. Logs for all projects were extracted from the repositories and these were thereafter analysed using custom made scripts. Further, a semi-automated approach involving manual inspection was used to associate commit id aliases with the same actual committer. + + +To address our second objective (to report on insights and experiences from participants in a community of the branched project LibreOffice in order to explain how and why the project has evolved after the fork from its base project OpenOffice.org), we undertook a case study on the LO project in order to investigate experiences from participants in the project with a view to gain insights from the effects of the fork that led to the establishment of the LO project. + + +In order to analyse insights and experiences concerning participation in the LO project, the two researchers conducted interviews with active participants in the LO community. As our goal was to specifically identify incentives and motivations for creation of the LO project, our strategy for identifying potential interviewees was to include key informants in key roles and interviewees with long experience from the project. In addition, we also sought to include interviewees with less experience who joined the project after the fork as a strategy to include additional perspectives. Interviewees were selected on the basis of being actively involved in the LO project. + + +Data collection was based on the results of face-to-face interviews conducted in English. Interviews were recorded, transcribed, and vetted by each interviewee. Questions were prepared in advance, and shown to the interviewee before the conduction of the interview. Each interview was conducted in an informal setting and allowed each interviewee to extensively elaborate on all issues covered during the interview. A total of 12 interviews were conducted, ranging in time from 8 to 43 min and resulting in 67 pages of transcribed and vetted interview data.4 In this process each interviewee was allowed to further elaborate and clarify their responses. + + +Analysis of the transcribed interview data took place over an extended time-period to allow time for reflection. Individual analysis was supplemented by group sessions in which researchers discussed and reflected on the interpretations from each researcher. + + +The coding of interview data was conducted in a manner which follows Glaser’s ideas on open coding (Lings and Lundell, 2005). The unit of coding was sentences or paragraphs within interview notes. The focus was on constant comparison: indicator to indicator, indicator to emerging concepts and categories (Lings and Lundell, 2005). The goal of the analysis was to develop and refine abstract concepts, which are grounded in data from the field (as interpreted via collected data in the transcriptions). The coding process resulted in a set of categories, each presented as a subsection in Section 5 of this paper. + + + + +1 http://www.libreoffice.org/developers-2/, accessed 18 June 2013. +2 http://hg.services.openoffice.org/DEV300, accessed 18 June 2013. +3 http://incubator.apache.org/openofficeorg/source.html, accessed 18 June 2013. +4 All interviews were conducted during February 2012. +4. Community evolution over time + + +In this section we report on results related to the first objective. Table 1 presents the main results from our observations concerning community evolution over time as reported in the following sections. + + +4.1. Characterisation of projects + + +In this section we present an overarching characterisation of the three projects. For each project we provide an historical overview, describe its governance, and report on project activity. + + +4.1.1. Organisations and overview of projects + + +The OO project was established as an OSS project on 13 October 2000 (Openoffice, 2004). Its initial release was on 1 October 2001 and the first stable version (v1.0) was released on 30 April 2002 (Openoffice, 2002). Initial development begun within StarDivision, a German based company that was acquired by Sun Microsystems in mid-1999 (Crn, 1999). Before establishing OO, development and provision of the code base was closed source. OO was governed by its community council, which comprised OO community members who also created a charter for the establishment of the council (Openoffice, 2013). The Sun contributor agreement needed to be signed by developers wishing to contribute, whereby the contributions are jointly owned by the developer and Sun corporation. The Oracle corporation acquired Sun (and thereby also the OO project) on 27 January 2010 (Oracle, 2010). Oracle also used a contributor agreement (almost identical to the Sun contributor agreement) that needed to be signed by developers wishing to contribute to the project. Oracle stopped support for commercial OpenOffice.org on 15 April 2011 (Marketwire, 2011a). + + +LO is a LGPL-licensed Open Source office productivity tool for creation and editing of digital artefacts in the Open Document Format (ODF), which is its native file format. The Document Foundation (TDF) was established on 28 September 2010 (Linuxuser, 2010) under German jurisdiction. The first beta release of LO was provided on the same date (Pclosmag, 2011). TDF has as its mission to facilitate the evolution of the LO project, which is a fork from the OO project since the date of establishing TDF (Documentfoundation, 2013a). TDF is an independent, meritocratic, self-governing, not-for-profit foundation that evolved from the OO community. It was formally established by members from the OO community in September 2010 and it is supported by a large number of small (and some larger) organisations. It has a steering committee currently consisting of eight members (excluding six deputy members), and there are also four other founding members. Further, there are four official spokespersons. TDF is open to individuals who can and are willing to contribute to its activities and who also agree with the core values of the foundation. Organisational participation is also encouraged, for example by supporting individuals financially to work and contribute in the community. TDF commits itself to give “everyone access to office productivity tools free of charge to enable them to participate as full citizens in the 21st century” (Documentfoundation, 2013b). Further, TDF supports the preservation of mother tongues by encouraging the translation, documentation and promotion of TDF facilitated office productivity tools in the languages of individual contributors. Moreover, TDF commits to allow users to create and maintain their digital artefacts in open document formats based on open standards. In addition, TDF openly seeks voluntary financial contributions (donations) via the project web site for individuals and organisations that want to support the further evolution of the LO project and TDF. Besides from having strong support from volunteer contributors, LO is also receiving support from commercial companies including RedHat, Novell and Canonical (Documentfoundation, 2013c). + + +Oracle donated the OO project to the Apache Software Foundation (ASF) on 1 June 2011 (Marketwire, 2011b). The project was thereafter established as an (incubating) ASF project on 13 June 2011 after undergoing a proposal and voting process (Apache, 2013a). The new project was in connection with this given the name Apache OpenOffice. AOO is licensed under APL v2 and comprises six office productivity applications. The first stable release of AOO (v3.4) was provided on 8 May 2012 (Openoffice, 2012). Apache OpenOffice became a top-level Apache project on 17 October 2012 (Apache, 2013a). ASF was established on 1 June 1999 under U.S. Jurisdiction (Apache, 1999). The mission of ASF is to establish projects delivering freely available and enterprise-grade products that are of interest for large user communities (Apache, 2013b). Apart from AOO, ASF maintains other well-known projects such as HTTP Server, Struts, Subversion, and Tomcat. Like TDF, ASF is an independent, meritocratic, self-governing, and not-for-profit organisation that is governed by the community members that collaborate within ASF projects since 1999. ASF has a board of directors that is annually elected by members of ASF, and which manages the internal organisational affairs of the foundation according to the ASF bylaws. The board consists of nine individuals, and in turn appoints a set of officers whose task is to take care of the daily operation of the foundation. The decision making in individual ASF projects regarding content and direction is delegated by the board of directors to so called project management committees. Each of these committees can govern one or several project communities. Individuals (unaffiliated and working with companies) that are willing and capable of contributing to ASF projects are welcome to participate. Further, ASF accepts donations and has a sponsorship program for individuals and organisations willing to contribute financially. We also note that IBM is an active supporter and contributor to the AOO project (IBM, 2011). Finally, we note that long before the establishment of the AOO project, researchers indicated that leadership and control in the OO project under Sun governance “is remarkably similar to that of Apache” (Conlon, 2007). + + +Fig. 1 summarises the evolution of the projects (OO, LO, and AOO) over time, and includes selected major events related to each project. Moreover, it illustrates how OO (black upper bar), LO (dark grey middle bar), and AOO (light grey lower bar) are interrelated and overlap in time. + + +4.1.2. Project activity + + +The version history of OO, LO and AOO is shown in Table 2. It can be observed that there has been a continuous flow of new OO releases for more than 10 years. On 25 January 2011 the Document Foundation (TDF) announced the first stable version of LO, +which constitutes a fork from OO (Documentfoundation, 2013a). TDF has thereafter regularly provided new releases of LO. Further, the first stable version of AOO was announced on 8 May 2012, which replaced the discontinued OO project. + + +The developer activity in OO, LO and AOO is presented in Fig. 2, which shows the number of commits for each month from September 2000 to May 2013. We note that activity in the OO project varies, with distinct peaks in connection with the OO 2.0 (September 2005) and OO 2.4 (March 2008) releases. It can also be observed that the activity level decreased dramatically around August 2008, which is just before the release of OO version 3.0. A contributing reason for this significant drop in activity may be that major changes in terms of features had been implemented for version 3 and that subsequent activity was more focused on bug fixing. We can also observe that the activity in LO and AOO varies over time, but with peaks less distinct than those observed for OO. + + +Fig. 3 illustrates the number of active committers during each month of the projects. It can be observed that there are a large number of committers active early in the OO project, and that the activity decreases considerably shortly after the release of the first stable version of OO (version 1.0) in May 2002. The number of committers increases to a higher level after the release of OO 3.1 in May 2009. We note that there is a discord between number of monthly commits and committers in OO in the interval between January 2003 and January 2009 in that there are relatively few monthly committers contributing a large number of monthly commits. This may be explained by the fact that there are a number of both first and second level releases in the interval, which often co-occur with an elevated level of commits. Further, few committers often provide the majority of commits in OSS projects (see Section 4.2 for more details concerning commitment with the projects). For LO, it can be noted that committer participation peaks significantly in October 2010 and during the subsequent months in connection with the fork from OO. LO participation also peaks in connection with the release of version 4.0 in February 2013. It can also be observed that there was a rise in committer participation in AOO until September 2012. +---------------------------------------- +------------------------------- +Section 142: +4.2. Commitment with the projects + + +In this section we report on the commitment with the projects in terms of SCM contributions. Fig. 4 provides an overview of the commitment with the projects. The figure illustrates the number of committers that have contributed to the seven possible (mutually exclusive) combinations of the three projects. The area of a combination reflects the number of committers and the colour of a combination represents the average number of commits per committer in all projects for the combination. Totally, there have been 795 unique code contributors, who have been active in at least one of the three projects (the sum of committers in all areas). The main observation in Fig. 4 is that the 67 contributors who have committed to both OO and LO have provided the overwhelming majority of the commits (4339 commits per committer). Those committers constitute the backbone in the developer communities of both OO and LO. Further, the 8 contributors to all three projects have provided a substantial amount of commits (1329 commits per committer). Contributors in all other project combinations have had a very limited impact with respect to number of commits (127 commits per committer or less). + + +Table 3 provides a more detailed picture of commitment to the separate projects for the combinations illustrated in Fig. 4. The table shows the proportion of committers that have contributed to the seven possible combinations of the three projects. The table also shows (in brackets) the number of commits that the committers in the different project combinations contribute in the different projects. It can be observed that the 67 contributors who have committed to both OO and LO have provided the majority of the commits in both OO (92%) and LO (56.4%). Further, the 133 + + +| Table 2 | Version history of OpenOffice.org (OO), LibreOffice (LO), and Apache OpenOffice (AOO). | +|---------|------------------------------------------------------------------------------------------| +| | OO | LO | AOO | Date (YYYY-MM-DD) | +|---------|----|----|-----|-------------------| +| OO initial | 2001-10-01 | +| OO 1.0 | 2002-04-30 | +| OO 1.1 | 2003-09-02 | +| OO 2.0 | 2005-10-20 | +| OO 2.1 | 2006-12-12 | +| OO 2.2 | 2007-03-28 | +| OO 2.3 | 2007-09-17 | +| OO 2.4 | 2008-03-27 | +| OO 3.0 | 2008-10-13 | +| OO 3.1 | 2009-05-07 | +| OO 3.2 | 2010-02-11 | +| LO 3.3 B1 | 2010-09-28 | +| LO 3.3 | 2011-01-25 | +| LO 3.4 | 2011-04-12 | +| LO 3.5 | 2011-06-03 | +| LO 3.6 | 2012-02-14 | +| LO 3.7 | 2012-05-08 | +| LO 3.8 | 2012-08-12 | +| LO 4.0 | 2013-02-14 | +| AOO 3.4 | 2013-07-23 | +| AOO 4.0 | 2013-07-25 | +Fig. 2. Number of monthly commits for the OpenOffice.org (black), LibreOffice (dark grey) and Apache OpenOffice (light grey) projects. + + +Fig. 3. Number of monthly committers for the OpenOffice.org (black), LibreOffice (dark grey) and Apache OpenOffice (light grey) projects. + + +Table 3 +Proportion of commits for committers contributing to different combinations of projects (number of commits in brackets). + + +| Project combination | LO prop. [%] | AOO prop. [%] | OO prop. [%] | +|---------------------|--------------|---------------|--------------| +| LO | 37.2 (23,846) | – | – | +| AOO | – | 26.4 (939) | – | +| OO | – | – | 6.1 (16,867) | +| LO & AOO | 0.3 (170) | 32.1 (1140) | – | +| LO & OO | 56.4 (36,152)| – | 92.0 (254,745)| +| AOO & OO | – | 0.2 (8) | <0.1 (1) | +| LO, AOO & OO | 6.1 (3914) | 41.3 (1466) | 1.9 (5121) | +committers only participating in OO have provided only 6.1% of all commits. This is in contrast with the situation where committers have contributed either only to LO or only to AOO. In these two cases the contributions constitute 37.2%, and 26.4% of all commits, respectively. Further, we note that the 17 committers who have contributed to both LO and AOO (but not OO) have contributed significantly to AOO (32.1%) but very little to LO (0.3%). It may also be considered surprising that only one of the AOO committers has participated in both OO and AOO (but not LO). It is perhaps also unexpected that committers contributing to all three projects are behind 41.3% of all commits in AOO. + + +Further, as earlier mentioned, commits have been contributed to the projects under different governance regimes, which have different lengths (SOO: 112 months, OOO: 16 months, LO: 33 months, and AOO: 24 months). Of the 209 committers in OO, 197 committers have been active during the Sun governance of the OO project and contributed 267,011 commits. Further, 81 committers have contributed 9723 commits during the Oracle governance of the OO project. + + +Fig. 5 illustrates the proportion of all commits as a function of proportion of committers for SOO (solid black trace), OOO (dashed black trace), LO (dark grey trace), and AOO (light grey trace). It can, for example, be noted that for SOO and LO, 10% of the committers (19 and 64, respectively) contribute 90.5% (241,645) and 88.8% (56,905) of the commits. Further, the same proportion of committers in OOO and AOO (8 and 4 committers, respectively) contribute 41.6% (4045) and 54.1% (1922) of the commits, respectively. Hence, for SOO and LO, a relatively small proportion of committers contribute the majority of the commits, whereas a larger proportion of the committers in OOO and AOO contribute the majority of the commits. It should also be mentioned that a large proportion of all committers contribute only a few commits (5 commits or less are made by 21.3% of the SOO committers, 12.3% of the OOO committers, 54.3% of the LO committers, and 34.9% of the AOO committers). + + +Table 4, which is based on the data illustrated in Fig. 5, shows the proportion of commits for different proportions of committers for SOO, OOO, LO, and AOO. Similarly, Table 5 shows the proportion of commits for the top N committers in the projects for different values of N. For example, 5% of the most active LO committers contribute 78% of all LO commits. It can be observed that the proportion of commits for LO in Table 5 is significantly smaller compared to the proportion of commits for LO in Table 4. This is due to the fact that there are many committers (645) in LO and that the top 5 committers therefore are much fewer than the top 5% of committers. For AOO it is the other way around: the top 5 committers is a much greater proportion of committers than the top 5%, and therefore the proportion of commits for AOO is greater in Table 5. +---------------------------------------- +------------------------------- +Section 143: +4.3. Retention of committers + + +In this section we report on the retention of committers for the different projects. Fig. 6 shows the recruitment of committers, retirement of committers and the current number of active committers in the projects for each project month. Recruitment is represented by the accumulated number of committers who have made their first commit (solid black trace). Retirement is represented by the accumulated number of committers who have made their last commit (dashed black trace). The current number of active committers is represented by the difference between the number of recruited and retired committers (grey trace). It can be observed that LO has by far the highest recruitment rate with approximately 20 new committers each month on average. At the same time, LO suffers from a high retirement rate. This is perhaps not surprising since, as earlier mentioned, half of all LO committers only have provided 5 commits or less. However, we cannot observe any long term trend towards a decreased number of active committers. There has roughly been between 100 and 150 currently active committers since the start of the LO project. SOO had a high recruitment rate during the first two years of the project, but a considerably lower recruitment rate during the rest of the project except for the last few months. From having approximately 75 currently active committers on average during the first two years, the SOO stabilised at around 50 currently active committers during the second half of the project. Noticeable about OOO is that recruitment has been slow except for the first few months. Further, the retirement rate in OOO has been comparably high, especially during the later part of the project. This led to a dramatic drop in currently active committers from the 10th project month and onwards. AOO has had a + + +| Table 4 | Proportion of commits for different proportions of committers in SOO, OOO, LO, and AOO. | +|---------|---------------------------------| +| Prop. of committers | SOO | OOO | LO | AOO | +| Top 5% | 86% | 27% | 78% | 33% | +| Top 15% | 93% | 52% | 93% | 69% | +| Top 20% | 95% | 62% | 95% | 80% | + + +| Table 5 | Proportion of commits for different numbers of committers in SOO, OOO, LO, and AOO. | +|---------|---------------------------------| +| Number of committers | SOO | OOO | LO | AOO | +| Top 5 | 79% | 31% | 33% | 60% | +| Top 15 | 89% | 59% | 58% | 93% | +| Top 20 | 91% | 69% | 66% | 96% | +positive trend in terms of number of active committers during the first 16 project months. This is due to a high recruitment rate and a low retirement rate. However, AOO has lately experienced a stagnation in recruitment and an increasing rate of retirement. This has resulted in a halving of number of active committers in AOO during the second project year. We acknowledge that the total number of project months differ between projects (SOO: 112 months, OOO: 16 months, LO: 33 months, and AOO: 24 months). + + +The distribution of commits among committers is further explored in the following in order to better explain commitment with the different projects at committer level. Fig. 7 provides details regarding the distribution of commits in LO (dark grey bar colour) and OO (black bar colour) for the 67 committers only contributing to LO and OO. Committers are sorted on the sum of commits in the two projects (in descending order). As stated earlier in connection with Table 2, the black area represents 92% of all commits in OO and the dark grey area represents 56.4% of all commits in LO. However, the LO commits only comprise 12.4% of all commits in Fig. 7, and the OO commits comprise 87.6%. At the level of individual committers, it can be observed that one of the projects often hugely dominates. For example, the top committer in Fig. 7 contributes 89,931 commits to OO, but only two commits to LO. In fact, the top six committers only contribute 0.4% of all their commits in LO. + + +Similarly, Fig. 8 provides details regarding the distribution of commits for the 17 committers contributing only to LO (dark grey bar colour) and AOO (light grey bar colour). The light grey area represents 32.1% of all commits in AOO, and the dark grey area represents 0.1% of all commits in LO. Given these proportions, it is not surprising that the contribution to the different projects is unbalanced. The LO commits only comprise 13% of all commits in Fig. 8, and the AOO commits comprise 87%. The unbalance is clearly visible at the level of individual committers in Fig. 8. For example, committers 3, 4 and 8 contribute a very small proportion of commits to LO. Only committer 10 contributes a larger proportion of commits to LO. + + +Fig. 9 provides details regarding the distribution of commits in LO (dark grey bar colour), AOO (light grey bar colour), and OO (black bar colour) for the 8 committers contributing to all three projects. The black area represents 1.9% of all commits in OO, the light grey area represents 41.3% of all commits in AOO, and the dark grey area represents 6.1% of all commits in LO. Like in Figs. 7 and 8, +the contribution to the different projects is somewhat unbalanced. The AOO commits comprise 14% of all commits in Fig. 9, and the LO and OO commits comprise 37.3% and 48.8%, respectively. As an example of unbalance for individual committers, the top committer contributes 2261 commits to LO, but only 69 to AOO. One aspect that can contribute to the unbalance in Figs. 7–9 is the fact that projects have different life spans and have accumulated different total amounts of commits. For example, there have been 77 times more commits in OO compared to AOO. + + +Table 6 +Major commitment patterns for committers who have contributed to LO. + + +| Pattern ID | Commitment pattern | Commits | Committers | +|------------|---------------------|---------|------------| +| LP1 | | 33642 (52.5%) | 58 (9.0%) | +| LP2 | | 23846 (37.2%) | 553 (85.7%) | +| LP3 | | 3052 (4.8%) | 2 (0.3%) | +| LP4 | | 2385 (3.7%) | 5 (0.8%) | + + +Tables 6 and 7 illustrate the major temporal commitment patterns between projects (OO in black colour, LO in dark grey colour, and AOO in light grey colour) for committers who have contributed to LO (Table 6) and AOO (Table 7). In total, 13 commitment patterns were identified for the 645 LO committers. The four most significant of these patterns (LP1 through LP4) are shown in Table 6. These four patterns account for 98.2% of all LO commits and for 95.8% of all LO committers. Similarly, Table 7 shows the four most significant patterns (AP1 through AP4) out of a total of 10 identified patterns for the 43 AOO committers. These four patterns account for 93.4% of all AOO commits and for 74.4% of all AOO committers. Each committer is assigned to one distinct pattern by comparing the dates of the first and latest commit in the projects the committer has been active in. For example, a committer is assigned to LP1 if commitment in OO and LO has been sequential and the committer has not contributed to AOO. This means that for LP1 the latest commit in OO precedes the first commit in LO. Another example is LP4, where the involvement in OO overlaps with the involvement in LO. Hence, for LP4 the latest commit in OO is after the first commit in LO and the committer has not been active in AOO. In connection with each commitment pattern, Tables 6 and 7 show the number and proportion of commits and committers. The tables are sorted on number of commits assigned to a specific pattern in descending order. + + +In Table 6 it is evident that the pattern accounting for the largest amount of LO commits (52.5%) is LP1, where committers have contributed to OO and LO in sequence but not to AOO. There are also other commitment patterns for committers involved in only OO and LO (LP4 and two other patterns not among the four most significant) which together account for 3.9% of the commits. The second most significant pattern in terms of commits (37.2%) is LP2, where committers have contributed only to LO. This pattern applies for the clear majority (85.7%) of the committers. The patterns LP1 and LP2 are clearly dominating and together involve 89.7% of all commits and 94.7% of all committers. It should also (once again) be pointed out that committers who have been involved in OO before their involvement in LO (LP1) contribute a greater proportion of the commits compared to those who only have contributed to LO (LP2). + + +In Table 7 it can be observed that the pattern accounting for the largest amount of AOO commits (29.2%) is AP1, where committers have contributed to LO within the period during which they have contributed to AOO. The second most significant pattern in terms of commits (26.4%) is AP2, where committers only have contributed to AOO. When comparing with the LO patterns, we find that there is a more diversified set of commitment patterns that account for significant amounts of commits in AOO. Further, we note that a significant proportion of the AOO commits (41.3%) stem from committers who have previous and in some cases current experience in both OO and LO (AP3, AP4 and another pattern not shown in Table 7). + + +To sum up concerning recruitment to LO, 553 of the 645 committers in LO (constituting 85.7%) have not been active in OO or AOO, and have therefore been directly recruited to LO. Further, 75 +of the 645 committers in LO have also contributed to OO. Of these 75 committers, 66 committers have contributed to OO before they started to contribute to LO and have thereafter not contributed to OO, and can therefore be claimed to have been recruited from OO to LO. These 66 committers are influential in that they together have provided the majority of the LO commits (58.7%). The remaining 9 of the 75 committers have been active in LO and OO in parallel. Further, 25 of the 645 committers in LO have also contributed to AOO, but these committers have only contributed to 2.2% of all LO commits. + + +For AOO, 17 of the 43 committers (constituting 39.5%) have not been active in OO or LOO, and have therefore been directly recruited to AOO. Further, 8 of the 43 committers in AOO have also contributed to OO before they started to contribute to AOO and have contributed to LO before or during their AOO involvement, and can therefore be claimed to have been recruited from OO and LO. These 8 committers together contribute a significant amount (41.3%) of all AOO commits. We also note that the 17 committers who have only contributed to AOO and LO have mostly contributed to the two projects in parallel and have contributed a considerable amount of the AOO commits (32.1%). + + + + +Insights and experiences from the LibreOffice community + + + + +This section reports on results related to the second objective. Table 8 shows the main themes for investigation with associated main results from our observations concerning insights and experiences from the LibreOffice community. + + +All interviewees are active participants in the LO project, and several of them expressed that they have been active from the start of the project. Our interviewees include participants in the project who were active in the formation of TDF and several have central roles related to the LO project, even though our interviewees also include some contributors with less experience from participation in the project. From this, it would be appropriate to characterise our sample of interviewees as being dominated by experts and thereby to consider our conduction of research interviews as dominated by elite interviews. + + +Six broad categories emerged from our coding and analysis of the interview transcriptions. Each is presented as a separate section below, with a subheading aimed to characterise the category. + + +5.1. Considerations for creation of the LibreOffice project + + +Over time, members of the OO community started to perceive frustration and discontent due to a number of circumstances in the OO project. Concerns amongst community members include perceptions on: vendor dominance, copyright assignment, lack of influence, lack of fun, and bureaucracy in the project. For example, as expressed by a community member: “I started in OpenOffice, and it was fun in the beginning, but over the year you were able to see behind, and I didn’t like what I saw.” Similarly, another community member expressed a view that “it stopped being fun. It stopped being an Open Source project under Oracle”. A different respondent particularly emphasised bureaucracy in the OO project as an inhibitor to contributing: “In the past, I tried once to get involved in OpenOffice by submitting patches, but that was hell of a job to do, because of all the bureaucracy in the project, so that’s why I didn’t follow up on that and just quit with it.” Overall, the essence of these circumstances seems to originate from a lack of trust. + + +From this, the idea of starting a new branch of the OO project evolved amongst community members. This course of events brought many thoughts among members of the community, as illustrated in a comment raised by one person involved in the creation of LO: “When this whole story with Oracle started to look a bit fishy, you just meet people and you start talking, and you start thinking, and then you start planning.” Further, it is clear that a number of issues were considered before taking action, as illustrated by another person involved: “Before we started we had a lot of discussions. Shall we start? When do we start? How do we start? Which people do we get involved as soon as possible, or a bit later, or whatever?”. Once different issues had been considered it was time to take action, as expressed by a different respondent: “we founded the LibreOffice project, we got people together to agree and, you know, got the initial structure set up”. + + +Further, the choice of a copyleft license7 was mentioned as an important prerequisite for several contributors to the LO project. Hence, there seems to be consensus amongst contributors in the LO + + + + +7 OSS licenses are often broadly categorised as either copyleft licenses (e.g. GPL, which is a strong copyleft license, and LGPL, which is a weak copyleft license) or permissive licenses (e.g. BSD and MIT). The main difference between these two license categories is that copyleft licenses ensure that derivative work remains open source, whereas permissive licenses do not (Brock, 2013). +project that permissive licenses should not be used for the project. As expressed by one respondent: “to me licensing is key and a copyleft license, a weak copyleft license, is pretty much mandatory for me to be interested in the project, because otherwise I know where it’s gonna go, pretty soon we will be writing proprietary software”. The importance of avoiding permissive licensing was further emphasised by another respondent: “the permissive license would lose half of our volunteer developers, because they are real volunteers. They are in the project for fun. They don’t want to give away their work to a corporation.” The same respondent also acknowledged that there are contributing companies that understand and act in accordance with fundamental values of the Open Source movement and that contributors accept this: “They easily give away their work to companies like Suse, Redhat, Canonical, that contribute to the project, that are transparent in the way that they behave in the project.” Further, one respondent pointed out that, apart from upsetting the community, switching from a copyleft to a permissive license would require a time consuming IP-clearance process. This process would require rewriting of code just because of the license and potentially stall the actual development of new features in a project. + + +In essence, interviewees involved in the process of establishing the LO project seem to have considered the establishment of the LO project with its independent foundation (TDF) and use of a weak copyleft license as an inevitable action to take given the perceived dissatisfaction amongst community members in the OO project. + + +5.2. Perception of LibreOffice + + +Immediate reaction was requested as we were seeking what respondents associated with LO rather than probing for a description or definition of it. On some occasions this caused respondents to hesitate before replying. Perhaps not surprisingly, some contributors with extensive experience from the project were hesitant in responding to this question, and one even commented: “It’s a hard question, because its not a factual question. I cannot use my mind.” + + +Overall, contributors gave a variety of ideological and emotional responses, such as: “freedom”, “something I believe in”, “It’s my project”, “a group of friends”. As put by one contributor: “[LibreOffice] is a project I have contributed to shape, and so there is also a lot of emotional participation”. Similarly, other respondents expressed “It has a very deep meaning for me, I guess, having done a lot of work there”, and “It’s my project, the project that I am working on, so, yeah.” + + +The concept also triggered a number of expressions of excitement, as illustrated in the following comments “Exciting project that is fun to hack on”, “It’s very positive to hear people talk about LibreOffice”, and “It’s cool, it’s home, it’s something exciting”. Further, respondents also associated the concept with personal commitment. For example, as expressed by one interviewee: “It’s a group of friends and people who we work with, I would say.” + + +In addition, the concept also gave rise to a number of more rational associations. Some of those expressions relate to quality for the software system, such as: “The best office suite in the world” and “[LibreOffice is an] interesting, exciting project with a huge amount of work, but very good, how do people work, how we work and what we manage to do”. Yet, others relate to the development model used in the project: “Community developed office suite”, whereas others were related to the developed system: “Open Source office package”. Finally, some respondents seemed to have been flattered when probed for their association concerning the concept LO, when responding jokingly “I recognise the name”. + + +5.3. Participation in the LibreOffice project + + +The extent to which contributions from participants in the LO project are related to their professional activities vary amongst respondents. We note that contributions stem from both volunteer and paid-for activities, and responses revealed that contributors are employed by several different organisations, including self-employed specialists. + + +Several respondents expressed that working on the LO project is part of their professional activities, as illustrated by the following responses: “I am working for LibreOffice in my professional activities”, “I am paid for working on LibreOffice”, and “It’s my full time job”. Further, some respondents also expressed that their incentives for participation in the project were motivated by a technical need from their professional activities, as illustrated by one respondent: “I wanted to use it to replace Microsoft Access, at what is now my day job”. + + +For several contributors there is significant congruence between their professional activities and their contributions to the project as volunteers. For example, one of the respondents expressed that “there is a huge overlap of my professional activities”, and another that “it is my professional activity … it’s not all of my job, I have other parts to my job I have to do … I do stuff in my free time as well”. + + +There were also those expressing that working on the LO project is in symbiosis with a professional job even though not directly part of it: “it is not related, but it is in harmony basically”. Yet others expressed that their incentives for participations were motivated by business opportunities: “I have a small company in [country X], and doing all kind of services, support for LibreOffice and the old OpenOffice, so that makes it logic to contribute in the project too, for me it’s a logic combination”. + + +Further, there are also contributors participating in the project primarily by volunteer activities. In the words of one respondent: “we do use LibreOffice in the company I work, but mostly the activities I do for LibreOffice is mostly as my hobby”. + + +Amongst respondents we also identified those for which professional and volunteer activities seem to merge: “For me it’s like a hobby that turned into some occupation, and it’s very hard to draw a line between what I am doing privately and what I am doing as an employee, and it mostly matches the interest from both company and what I would personally do”. + + +5.4. Motivations for contributing to the LibreOffice project + + +Several interviewees found it difficult to single out specific issues that motivate them to contribute to the project. For example, as put by one contributor: “That’s a very hard question, isn’t it … I think everyone is just this mixed bundle, all sorts of motivations”. Another respondent expressed that: “There are so many answers to that. It’s kind of hard”. + + +Respondents expressed a number of different types of motivations for contributing to the LO project. Several comments are of emotional nature, such as: “because it is fun and very rewarding”, “it’s fun to contribute and while you contribute the project gets ahead so it’s even more fun”, and “I want to do something that seems to be useful for people and significant. I think it’s the joy of relationship, and just working with other people and seeing good things happen”. Further, some emotional comments emphasised motivations for contributing to the project in the future: “in the future if it stays fun and the community stays a nice place to be in and, yeah, it’s … you can continue”. + + +Closely related to emotions, respondents also amplified social rewards and social recognition as enablers for their motivation to contribute. For example, respondents expressed: “Cleaning up ugly +things is socially rewarding” and also that “positive feedback is what drives me”. + + +Similarly, there are also ideological motivations expressed amongst respondents: “I believe in free software because I think that is the proper alternative to proprietary software” and “I care about software freedom”. + + +There are also intellectual motivations that seem to drive contributors. For example, one respondent motivates participation in the LO project with an argument that having a good office package is “one of the biggest tasks that doesn’t have already a good solution for it in Open Source”. Similarly, another respondent considered establishment of a high quality LO project as “a professional challenge. Because not having any money, you have to be smarter than your competitors”. + + +For some respondents with a long-term commitment to the LO project, their participation has led to a desire to see the project succeed. As stated by one respondent: “I’ve invested plenty of time in this branch of software, so I really really have a personal desire to see it succeed”. Others expressed a motivation for improving the way-of-working in the LO project as follows: “It may not be readily visible but we still need to add more structure and more processes, and I think I want to continue to do that”. + + +Visionary goal driven motivations for the future for the LO project were also expressed as follows: “it’s fun. I am convinced it’s the right thing to do. I think it’s the right project, at the right time, with the right people, and the right mind sets”. Similarly, in the words of another respondent: “I think we can change a lot with this project by running it differently and pushing borders and thinking outside the box there”. Further, motivations also seem to stem from frustration concerning a perceived lack of influence in the old OO project, as commented by one respondent: “I was active in OpenOffice.org project in the past, and there were lots of things that I loved of that product, but a lot of things that made me feel frustrated about influence, about things that were not picked up, and on the development side. And, so I am really motivated to work on LibreOffice, to make it better, to see it improve compared to the old OpenOffice, and that is a strong motivation”. + + +Finally, amongst respondents we observe strong commitment with the project. As expressed by one respondent: “it’s fun, it’s something that I like to do, and it’s not the first free software project that I contribute to. It’s something that I have been doing for good chunks of my life now”. Similarly, for some such strong commitment and motivation for participation is also related to stark emotions: “It’s purely the love”. + + +5.5. Future outlook for the LibreOffice project + + +An overwhelming impression from responses is that contributors perceive a positive future for the LO project. Several respondents gave a number of emotional expressions, and we observe an expectation for a more diverse developer community amongst respondents in the future. For example, as stated by one respondent: “I believe that we will stay diversified and that we will be able to embrace more and more, not only individuals, but companies as well”. + + +Respondents raised the budget for the project as an issue and stressed that there is a “need to strengthen the project”. Other comments concern the way of working and how to organise work in the project, as illustrated by one respondent: “we still need to consolidate the organisation. We still need to increase the number of members”. + + +Several respondents envisaged a bright future for the LO project, as illustrated by the following comments: “Hereto, it has all the attributes of a very successful project. So, I believe that we will execute on the plan on releases as we have until recently, until now, because we have the time based schedule, that we always deliver on time”, “Whatever happens, it will continue in some way or another, some shape or another. I think that code base, it’s just too many users for it to disappear. It’s there to stay”, and “I think it is a bright future, and it grows, but it takes time”. Further, one respondent also expressed a view on the project in relation to an existing proprietary alternative as follows: “We are going to grow. We are going to take over the market, and we will have a follower called Microsoft behind us”. + + +However, there were also those predicting a somewhat more modest future for the LO project. For example, in the view of one respondent “we keep running as we are, I think”. + + +Further, a number of comments also revealed that the evolution for the LO project seemed to have exceeded the expectations of the respondents, as illustrated by the following comments: “while this is a young project, I am surprised by how diverse it is and how healthy it is”, “I think we’re doing very well, and we had a major breakthrough, milestone, when we finally got these German authorities to prove our idea of a foundation. And we’re past this quite important milestone. Yeah, I am very positive about the future”, and “I think we are not yet aware of what is possible with it, and I am beginning to realise how much bigger this thing can get”. + + +The importance of community and its role for the LO project was amplified by a number of respondents as an enabler for its future success. Several comments signalled a strong identity for members of the LO community as illustrated by one respondent who stressed the importance of community values as follows: “This is a community, not a company, so we don’t have titles”, and commented that there is consequently no need for business cards when working within the community. However, the same respondent suggested that there actually is a need for a business card in certain situations, such as when community members need to communicate with external organisations. Further, the importance of a vibrant community was also stressed by one respondent as follows: “the more rich and diverse and compelling we make our ecosystem, the stronger it is”. + + +Similarly, another comment stressed the importance of successful governance for a community as follows: “governance is key, if there is no governance at the end there is no project. So, some discipline is necessary, but the discipline can not go to the level of making the others scared to come inside. At the moment they are still a little bit scared. We are trying to make them less scared”. + + +5.6. Lessons learnt from participation in the LibreOffice project + + +From responses it is evident that contributors perceive participation in the LO project as positive and rewarding in a number of different ways. We observed a variety of different lessons learnt from participants in the project, and a number of comments touched upon excitement, opportunities with open collaboration, and a positive inclusive atmosphere that seems to promote learning. + + +Several respondents elaborated on their experiences from participation in the LO community, and attached a number of positive characteristics to the community. For example, as commented by one respondent with a long experience from participation in the community: “it’s a true fun, diverse, vibrant, community project. . . . I started in OpenOffice, and it was fun in the beginning, but over the year you were able to see behind, and I didn’t like what I saw”. Similarly, another respondent stressed the possibility to have an impact by providing value for individuals, organisations and society more broadly: “the thrilling thing about LibreOffice is that it really makes a difference. You can see people using it and appreciating it”. + + +Further, another respondent stressed the opportunity with open collaboration as an important lesson from participation in the +project, as illustrated by the following comment: “I think it really shows that cooperation in an open way is profitable, makes sense, and I think that is a very valuable lesson”. Similarly, another respondent perceived benefits of open collaboration as follows: “It’s things like this [name of a practitioner conference], meeting with people and collaborating with different people with different mentalities, and tolerating each others and each other’s ideas; and yeah, even with completely different approaches to the project and expectations, to get something big out of it, yeah”. Another respondent stressed the inherent nature of sharing experiences when collaborating in a community, involving providing and gaining valuable lessons, as follows: “I think that I’ve, at the end I have really got as much as I have given, because in term of human experiences, just incredible”. + + +Several respondents stressed the importance of the welcoming environment in the LO project, with particular emphasis on skills development. For example, as expressed by one respondent “I think it’s good for my writing skills and coding skills”. + + +Similarly, several respondents stressed the welcoming nature and an established practice for mentoring new contributors as something highly appreciated, as illustrated by the following comment: “I am pleased that we have much more welcoming environment for new developers to participate with us, and I am very pleased that a lot of these people have now very quickly become senior advisers in their own right. And that they, themselves, can feel free to mentor other people and bootstrap other new developers up to the same situation. To repeat the process on others that was done to them to make them valuable and respected developers with commit access.” Further, as indicated by another respondent, the mentoring process seems to be founded in each individual’s ability and with a careful consideration in the LO project to acknowledging and appreciating contributions from all contributors: “I think it’s the exceptionally welcoming nature of the LibreOffice community and the speed at which I was recognised for my contributions and my skills and my abilities. It’s not like that in every project, you know. . . . With LibreOffice it happens very fast”. + + +Finally, another lesson learnt expressed by one respondent clearly stressed the perception of feeling rewarded from contributing to the LO project: “The most important experience was the weeks before we actually switched the upstream, and all the preparation, and then going out public and seeing how in matter of few minutes the IRC channels that we created filled with people who started to download, use and actually build LibreOffice, and those tireless moments we spent on the IRC trying to fix the possible breakages they had and it was just a magic moment to see that the things were actually moving ahead. It’s emotional”. + + + + +Analysis + + + + +6.1. Analysis of community evolution over time + + +From our results we make a number of observations related to our results on project activity. Firstly, there have been regular and frequent releases of stable versions of the software (LO including the former development in OO) for a time period of more than ten years. Other examples of well known OSS projects with release histories extending over many years are Apache web server8 and the Linux kernel9, which have had frequent releases since 1995 and 1991, respectively. We note that, as for LO (and AOO), both these projects are governed by a foundation10 (i.e. third phase governance according to the categorisation proposed by de Laat (2007)). Secondly, there has been substantial activity in LO (including the former development in OO) for more than ten years. Despite some variation between stable releases, our findings suggest a long-term trend towards a sustainable community as we have not observed any signs of a lasting decline in the community activity. As a comparison, there has been stable community activity over many years in the aforementioned Apache web server and Linux kernel projects. + + +Based on results concerning commitment with the projects we find that a large proportion of the most influential committers in LO have been involved for long periods of time both before and after the fork from OO, which indicates that the developer community has a strong commitment with the LO branch. A strong commitment of contributors over long time periods has been observed earlier in a study on the Debian project where it was observed that maintainers “tend to commit to the project for long periods of time” and that “the mean life of volunteers in the project is probably larger than in many software companies, which would have a clear impact on the maintenance of the software” (Michlmayr et al., 2007). Further, our results show that a relatively small proportion (5%) of the most active LO committers contribute the majority of commits (78%) and that the five most active committers contribute 33% of all commits in the LO project. In comparison, a relatively small proportion (5%) of the most active AOO committers contribute a smaller proportion of commits (33%) and that the five most active committers contribute 60% of all commits in the AOO project. In acknowledging that our analysis of the AOO project is based on a significantly shorter time window than the LO project, we note that both projects have communities of committers larger than “the vast majority of mature OSS programs” (Krishnamurthy, 2002). Results concerning commitment with each project support findings from previous research which show that for OSS projects “the bulk activity, especially for new features, is quite highly centralised” (Crowston et al., 2012). + + +Results on retention of committers show that SOO and LO have been more successful in recruiting and retaining committers over time compared to OOO and AOO. Results also show that there is no sign of any long term decline in LO in terms of number of currently active committers. Further, results concerning contributions to both the LO and AOO projects show that few new developers (i.e. those who have not contributed to the OO project) provide limited contributions to the LO project (representing 0.3% of all LO commits) and a significant amount (32.1%) of the AOO commits. When considering long-term contributors (i.e. those who have contributed to all three projects) there are still limited contributions except for AOO (representing 1.9% of all commits in OO, 6.1% of all commits in LO and 41.3% of all commits in AOO). Further, the two most dominating commitment patterns for committers who have contributed to the LO project are that committers only commit to LO and that committers have done all their contributions in OO before starting to contribute to LO (together involving 94.7% of all LO committers and 89.7% of all LO commits). In comparison, the two most dominating commitment patterns for committers who have contributed to the AOO project are that committers have contributed to LO within the period during which they have contributed to AOO (together comprising 59.9% of all AOO committers and 55.6% of all AOO commits). Moreover, a clear majority (85.7%) of the LO committers have been directly recruited to LO, whereas less than half (39.5%) of the AOO committers have been directly recruited to AOO. It is not uncommon that + + +8 http://httpd.apache.org/. +9 http://www.kernel.org/. +10 The Apache Software Foundation (http://www.apache.org/) and the Linux Foundation (http://www.linuxfoundation.org/). +developers are simultaneously involved in more than one project (Lundell et al., 2010). However, our results show that a limited number of contributors are simultaneously active in the LO and AOO projects. + + +6.2. Analysis of insights and experiences from the LibreOffice community + + +Results from the study indicate a systematic approach in the LO project for mentoring new contributors. The project has adopted a systematic approach and supportive work practices for providing guidance to new contributors. As an example, this is done via mentoring and provision of “LibreOffice Easy Hacks” that are specifically aimed at inexperienced contributors. Efforts made in this project seem to go beyond what is established practice in many other OSS projects. For any project it is important to promote organisational learning and ease introduction of new contributors to a project and its work practices. This has been recognised in previous research (Lundell et al., 2010). Further, our results also show that LO project participants seem keen to encourage and acknowledge contributions from new participants in the community. + + +Our results clearly show that use of a weak copyleft license is seen as appropriate for the LO project for a number of reasons. One reason is a perceived risk that the source code will not continue to be provided according to core principles of software freedom. This choice of Open Source license for the project has been referred to as adhering to a “keep-open” license (Engelfriet, 2010). In acknowledging that there are a number of factors affecting the attractiveness for a project, it seems evident that the choice of a “keep-open” license is considered appropriate amongst new contributors as the project has managed to attract a significant number of new contributors. Further, an additional indication of the preference for a “keep-open” license amongst those contributors to the LO project that were also contributing to the OO project stem from results in our interviews. This in turn reinforces the observation (see above) that the majority of the contributors to the OO project that decided to continue contributing to one of the projects (AOO or LO) have chosen the LO branch. + + +An effect of the fork was that a part of the OO community has evolved into a new form as founding members of the LO community stem from the OO community. Over time, the new LO project has managed to attract a significant number of new contributors now managed and governed by TDF. This is in contrast with the approach taken by the AOO project, which adopted an already established structure for governance and work practices (ASF). + + +There is a complex inter-relationship between community and company values which impacts on opportunities for long-term maintenance and support for OSS projects. A number of respondents express that besides their involvement in the LO community they are also affiliated with various commercial organisations. For some respondents there is also a symbiosis between their different involvements. Further, our results from respondents strongly support several of the motivational factors for individual participation in OSS projects that have been identified in earlier research (Bonaccorsi and Rossi, 2006). In particular, social motivations such as that it is fun to contribute and the sense of belonging to a community are important to LO contributors. Another social motivation observed in the LO community is the opportunity to provide an alternative to proprietary software solutions. Further, we note that technological motivations such as learning and the opportunity of getting contributions and feedback from the community are also present amongst LO contributors. Some respondents, who are also active in small companies, see business opportunities in participating in the LO community. Hence, our study confirms earlier studies concerning individual motivations for participation in OSS projects. + + +6.3. Implications + + +The study has revealed a number of insights concerning governance and community evolution. For long-term contributors active under several governance regimes during more than 10 years there have been several changes concerning the way of working in the different communities. + + +Contributors starting under the OO project (under governance by Sun followed by Oracle), and later active in the AOO project have experienced different corporate governance regimes followed by adoption of the Apache way of working. This transition of the project into governance under the existing ASF has involved a significant change for participants in terms of changed governance and changing conditions for contributors due to adoption of institutionalised practices and with a change from a weak copyleft license to a permissive license. + + +On the other hand, contributors starting under the OO project, and later active in the LO project have also experienced different corporate governance regimes (Sun and Oracle) followed by adoption of a new way of working implied by establishment of a tailor made foundation (TDF) as a legal framework for maintenance of the LO project. For these contributors, there has been continued use of a weak copyleft license. In this way, our results show that contributors shaped TDF with a view to support their preferred way of working in the LO project. + + +It should be noted that the choice of the same weak copyleft license as for the base project when establishing the LO project was possible without a prior IPR clearance. Further, this was possible despite the fact that the copyright for the code base in the base project was controlled by a different organisation (Oracle corporation). These circumstances allowed for that the LO project was able to immediately continue development on the same code base. However, when establishing the AOO project there was a need for IPR clearance in connection with transferring copyright of the code base to ASF and change to a new Open Source license. This transfer to ASF involved significant efforts and resulted in a significant time window between AOO project start and the first release of AOO. + + +From the analysis of the three specific projects investigated (LO, OO, and AOO), it is shown that significant development experiences – both in terms of contributors and their contributions – has been maintained and transferred from the OO project into the two independent projects (LO and AOO). + + +The importance of establishing a strong sense for the OSS community in the context of large global OSS projects is closely related to the importance of establishing a sense of teamness in global software development projects (Lings et al., 2007). In both Open Source and proprietary licensed software projects there is a need for managing collaboration involving developers with different socio-cultural backgrounds. However, a key difference between Open Source based collaboration in large community based projects and large inter-organisational collaborations using proprietary software in global contexts lies in the possibility to successfully fork an OSS project and establish a new project with a separate governance. The importance of face to face meetings is recognised both in the contexts of inter-organisational collaboration in the field of global software engineering (Lings et al., 2007) and large globally distributed OSS projects analysed in this study. Further, from our analysis in this study we note that the importance of establishing a common vision for an OSS community relates to experiences in the context of global software engineering concerning the importance of gaining “executive support from all the sites” in a globally distributed software development project (Paasivaara, 2011). +7. Discussion and conclusions + + +7.1. Discussion + + +The transition and formation of the LibreOffice community seems to be successful. However, we acknowledge the short time period after the fork (33 months) and that our early indications of a successful LibreOffice community after transition from OpenOffice.org need to be confirmed by an analysis over a longer time period at a later stage. As a comparison, a well-known fork with significant uptake and a long-term sustainable community is OpenBSD,(^\text{11}) which was forked from NetBSD in 1995 and still has an active developer community (Gmane, 2013). + + +Further, when considering Open Source software products in long-term maintenance scenarios for potential adoption, it is critical to understand and engage in communities related to the Open Source software project. For the base project analysed (OpenOffice.org), a governance structure has been established and the OpenOffice.org community was governed by its community council (Openoffice, 2013). Similarly, the investigated branch after the fork (LibreOffice) has also established a governance structure referred to as the Document Foundation (Documentfoundation, 2013a). Despite such explicitly documented governance structures, project participants may decide to fork a project, which happened when the Document Foundation established the LibreOffice project as a fork from OpenOffice.org on 28 September 2010. Our results suggest that this fork may actually be successful. We note that our observation indicates that the LibreOffice project may be an exception to the norm since previous research claims that there have been “few successful forks in the past” (Ven and Mannaert, 2008). + + +From our results, it remains to be seen to what extent the LibreOffice and Apache OpenOffice projects may successfully evolve their projects and associated communities in a way that can be sustainable long-term. So far it seems that LibreOffice has been the more successful project in terms of growing associated communities. Our results suggest that the choice of Open Source license significantly impacts on conditions for attracting contributions to Open Source projects. Amongst contributors to the LibreOffice project there is clear preference for contributing to an Open Source project which use the same weak copyleft license as the base project. This use of a keep-open license in the LibreOffice project may significantly impact on the willingness to contribute to an Open Source project for which they do not possess the copyright. This may be so both amongst volunteer and company affiliated developers. Our results show strong indications of congruence between professional roles and contributions to the LibreOffice community for community members. + + +We acknowledge that the LibreOffice project has been established and openly available for external contributions for a longer time period than the Apache OpenOffice project. This can partly be explained by a later start for the Apache OpenOffice project since there has been a state of void between 15 April 2011 when Oracle abandoned OpenOffice.org and 13 June 2011 when Apache OpenOffice was established as an Apache Software Foundation project. Further, we note that the first commits in the Apache OpenOffice repository were contributed in August 2011. Therefore, it is perhaps not surprising that a number of contributors from the OpenOffice.org project became involved in the LibreOffice project, since there was no active OpenOffice.org project to contribute to for several months. However, it should be noted that after August 2011, when the first commits were contributed and Apache OpenOffice became openly available, committers have continued to contribute to the LibreOffice project. + + +The situation analysed in the paper has an inherent complexity in that it involves three projects for which there are complex interactions, influences, and relationships both with respect to code and community dynamics. Therefore this study challenges previously established categorisations of fork outcomes and also how the concept of fork is defined. This is since the foundation for such categorisations and definitions often consider the relationship between two projects, often referred to as the base and the forked project (Robles and Gonzalez-Barahona, 2012; Wheeler, 2007). Further, this study has shown that individual contributors in related OSS developer communities can contribute to several projects over a period of time, including the base and the forked project. + + +The analysis of sustainability of Open Source software communities and evolution of two independent Open Source software projects after a fork shows there is potential for successful branching. Our specific emphasis has been to investigate insights and experiences from community members for the project which was established as an outcome of a fork. From this we find that long-term community members seem to manage establishing a new project and a tailor-made foundation for its governance in a way that is appealing to old and new contributors. + + +In situations such as the one analysed in this study there is no one-to-one correspondence between Open Source software project and Open Source software community. Consequently, when assessing the sustainability of such communities it is important to recognise that individual contributors are involved in multiple projects. Therefore, any such assessment must take into account that community involvement goes beyond any single project. + + +Irrespective of how relationships between the projects are perceived with transition from the base project to the two new projects, our results from the analysis of the three inter-related projects with associated transitions from the OpenOffice.org project go beyond previously established categorisations of fork outcomes. Our results thereby provide valuable insights for extending the existing body of knowledge concerning forks. + + +7.2. Conclusions + + +Our study presents findings from the first comprehensive analysis of Open Source software projects involving a fork. The study reveals a number of important findings related to long-term sustainability of Open Source software communities. + + +Related to the characterisation of community evolution over time for the three inter-related Open Source projects, the study presents several important findings: First, the LibreOffice project shows no sign of long-term decline, and that as such details circumstances under which a fork can be successful. Second, the majority of contributors to the OpenOffice.org project who continued in one of the succeeding projects chose to continue contributing to the LibreOffice project. Further, LibreOffice has attracted the long-term and most active committers in the OpenOffice.org project, and it is thereby demonstrated that successful transfer and evolution of know-how and work practices can be achieved beyond individual Open Source software projects. Third, OpenOffice.org (under governance of Sun) and LibreOffice have been more successful in recruiting and retaining committers over time compared to OpenOffice.org (under governance of Oracle) and Apache OpenOffice. This suggests that effective governance and work practices that are appreciated by community members is fundamental for long-term sustainability. Fourth, a minority of the LibreOffice committers have been recruited from OpenOffice.org and have contributed a clear majority of the LibreOffice commits. On the other hand, the vast majority of LibreOffice committers have been directly recruited to the project but their commits to the project are in minority. From this we conclude that apart from community efforts for making it easier to contribute to an Open Source software + + +(^{11}) http://www.openbsd.org/. +project it also important to address challenges related to long-term retention of contributors. + + +The study makes a novel contribution by revealing important insights and experiences from members of the LibreOffice community, and provides explanations for why the LibreOffice project has evolved as it has. There is clear preference for use of a copyleft license amongst contributors to the LibreOffice project, both amongst volunteers and those affiliated with companies. The use of such a license in the LibreOffice project is perceived as a prerequisite for entry amongst many volunteer contributors and those affiliated with companies. This suggests that such an Open Source license is preferred amongst contributors in Open Source software projects with a strong community identity. Further, the study shows that it is important that values amongst contributors and other stakeholders are congruent with effects of the particular Open Source license used. Results from the study elaborate tension in a community and details circumstances under which community members need to vary in order to avoid an ineffective collaboration climate in an Open Source software project. Further, the study reveals important motivations for joining and contributing to the LibreOffice project over time, including: a perceived welcoming atmosphere in the community; a sense of supportive and effective work practices; appreciation for independence and control of developed solutions by members of the community; and a strong identity and appraisal of community diversity. Thereby the study has detailed the importance of nurturing Open Source software communities in order to establish long-term sustainable Open Source software projects. From a contributor perspective, the study shows that Open Source software communities can outlive Open Source software projects. In particular, for projects with associated devoted communities with strong conviction for future directions for projects and communities, we find strong indications for that forking can be used as one effective strategy for overcoming perceived obstacles in the current way of working in a project in order to improve the situation. + + +The findings from our analysis of the LibreOffice project (and the related OpenOffice.org and Apache OpenOffice projects) contribute new insights concerning challenges related to long-term sustainability of Open Source software communities. For software systems with long life-cycles, the success by which an Open Source software project manages to recruit and retain new contributors to its community is critical for its long term sustainability. Hence, good practice with respect to governance of Open Source software projects is perceived by community members as a fundamental challenge for establishing sustainable communities. + + +References + + +Ågerfalk, P., Fitzgerald, B., 2008. Outsourcing to an unknown workforce: exploring open sourcing as a global sourcing strategy. MIS Quarterly 32 (2), 385–410. + + +Apache, 1999. The Apache Software Foundation Board of Directors Meeting Minutes, http://www.apache.org/foundation/records/minutes/1999/board_minutes_1999_06_01.txt (accessed June 2013). + + +Apache, 2013a. Apache OpenOffice, http://openoffice.apache.org/ (accessed June 2013). + + +Apache, 2013b. The Apache Software Foundation – Foundation Project, http://www.apache.org/foundation/ (accessed June 2013). + + +Bach, P., Carroll, J., 2010. Characterizing the dynamics of open user experience design: the cases of firebox and OpenOffice.org. JAIS 11 (special issue), 902–925. + + +Bacon, J., 2009. The Art of Community. O’Reilly Media, Sebastopol. + + +Blondelle, G., Arberet, P., Rossignol, A., Lundell, B., Labeze, P., Berrendonner, R., Gauffret, P., Faudot, R., Langlois, B., Maioncello, L., Moro, P., Rodriguez, J., Puerta Peña, J.M., Bonafous, E., Mueller, R., 2012a. Polarsys towards long-term availability of engineering tools for embedded systems. In: Proceedings of the Sixth European Conference on Embedded Real Time Software and Systems (ERTS 2012), Toulouse, France, 1–2 February. + + +Blondelle, G., Langlois, B., Gauffret, P., 2012b. How Polarsys addresses Long Term Support and develops the ecosystem of Eclipse tools for Critical Embedded Systems. EclipseCon on US 2012, Reston, Virginia, 26–28 March, http://www.eclipsecon.org/2012/sessions/how-polarsys-addresses-long-term-support-and-develops-ecosystem-eclipse-tools-critical-embe + + +Bonaccorsi, A., Rossi, C., 2006. Comparing motivations of individual programmers and teams to take part in the open source movement: from community to business. Knowledge, Technology & Policy 18 (4), 60–64. + + +Brock, A., 2013. Understanding commercial agreements with open source projects. In: Coughlan, S. (Ed.), Thoughts on Open Innovation – Essays on Open Innovation from Leading Thinkers in the Field, OpenForum Europe Ltd for OpenForum Academy, Brussels. + + +Byfield, B., 2010. The Cold War Between OpenOffice.org and LibreOffice. Linux Magazine, http://www.linux-magazine.com/Online/Blogs/Off-the-Beat-Bruce-Byfield-s-Blog/The-Cold-War-Between-OpenOffice.org-and-LibreOffice (accessed June 2013). + + +Conlon, M.P., 2007. An examination of initiation, organization, participation, leadership, and control of success in Open Source software development projects. Information Systems Education Journal 5 (38), 1–13. + + +Crn, 1999. Sun Microsystems Buys Star Division, http://www.crn.com/news/channel-programs/18804525/sun-microsystems-buys-star-division.htm (accessed June 2013). + + +Crowston, K., Annabi, H., Howison, J., 2003. Defining Open Source Software project success. In: Proceedings of the International Conference on Information Systems (ICIS 2003), Seattle, WA, USA, 14–17 December. pp. 327–340. + + +Crowston, K., Howison, J., Annabi, H., 2006. Information systems success in free and Open Source software development: theory and measures. Software Process: Improvement and Practice 11 (2), 123–148. + + +Crowston, K., Kangning, W., Howison, J., Wiggins, A., 2012. Free/Libre open-source software development: what we know and what we do not know. ACM Computing Surveys 44 (2) (article 7). + + +de Laat, P., 2007. Governance of open source software: state of the art. Journal of Management Information and Governance 11 (2), 165–177. + + +Deshpande, A., Riehle, D., 2008. The total growth of Open Source. In: Russo, B., et al. (Eds.), Open Source Development, Communities and Quality. IFIP Advances in Information and Communication Technology, vol. 275. Springer, New York, pp. 197–209. + + +Dinh-Trong, T.T., Bieman, J.M., 2005. The FreeBSD project: a replication case study of open source development. IEEE Transaction on Software Engineering 31 (6), 481–494. + + +Documentfoundation, 2013a. The Document Foundation, http://www.documentfoundation.org/ (accessed June 2013). + + +Documentfoundation, 2013b. The Document Foundation Manifesto, http://www.documentfoundation.org/pdf/manifesto.pdf (accessed June 2013). + + +Documentfoundation, 2013c. The Document Foundation – Our Supporters, http://www.documentfoundation.org/supporters/ (accessed June 2013). + + +Engellfriet, A., 2010. Choosing an Open Source license. IEEE Software 27 (1), 48–49. + + +Fetelson, D.G., 2012. Perpetual development: a model of the Linux kernel life cycle. Journal of Systems and Software 85 (4), 859–875. + + +Gamalielsson, J., Lundell, B., 2011. Open Source communities for long-term maintenance of digital assets: what is offered for ODF & OOXML? In: Hammouda, L., Lundell, B. (Eds.), Proceedings of SOS 2011: Towards Sustainable Open Source. Tampere University of Technology, Tampere, pp. 19–24, ISBN 978-952-15-2411-0, ISSN 1737-836X. + + +Gamalielsson, J., Lundell, B., 2012. Long-term sustainability of Open Source software communities beyond a fork: a case study of LibreOffice. In: Hammouda, L., et al. (Eds.), Open Source Systems: Long-Term Sustainability. IFIP Advances in Information and Communication Technology, vol. 378. Springer, Heidelberg, pp. 29–47. + + +Gamalielsson, J., Lundell, B., Lings, B., 2010. The Nagios community: an extended quantitative analysis. In: Agerfalk, P., et al. (Eds.), Open Source Software: New Horizons. IFIP Advances in Information and Communication Technology, vol. 319. Springer, Berlin, pp. 85–96. + + +Gamalielsson, J., Lundell, B., Mattsson, A., 2011. Open Source software for model driven development: a case study. In: Hissam, S. (Ed.), Open Source Systems: Grounding Research. IFIP Advances in Information and Communication Technology, vol. 365. Springer, Heidelberg, pp. 348–367. + + +German, D., 2003. The GNOME project: a case study of open source global software development. Journal of Software Process: Improvement and Practice 8 (4), 201–215. + + +Gmane, D., 2013. Information about gmane.os.openbsd.cvs, http://dir.gmane.org/gmane.os.openbsd.cvs (accessed June 2013). + + +Huysmans, F., Ven, K., Verelst, J., 2008. Reasons for the non-adoption of OpenOffice.org in a data-intensive administration. First Monday 13 (10). + + +IBM, 2011. IBM to Contribute to New, Proposed OpenOffice.org Project, http://www-03.ibm.com/press/us/en/pressrelease/34638.wss (accessed June 2013). + + +Isaëla, A., Fettelson, D.G., 2010. The Linux kernel as a case study in software evolution. Journal of Systems and Software 83 (3), 485–501. + + +Izurieta, C., Bieman, J., 2006. The evolution of FreeBSD and Linux. In: Proceedings of the 5th ACM-IEEE International Symposium on Empirical Software Engineering (ISESE’06), September 21–22, Rio de Janeiro, Brazil. + + +Koponen, T., Lintula, H., Hotti, V., 2006. Defects reports in Open Source Software maintenance process – OpenOffice.org case study. In: Proceedings of Software Engineering Applications (SEApp’06), Dallas, TX, USA, 13–15 November. + + +Krishnamurthy, S., 2002. Cave or community? An empirical examination of 100 mature Open Source projects. First Monday 7 (6). + + +Lee, S.-Y.T., Kim, H.-W., Gupta, S., 2009. Measuring open source software success. Omega 37 (2), 426–438. + + +Lings, B., Lundell, B., 2005. On the adaptation of Grounded Theory procedures: insights from the evolution of the 2G method. Information Technology & People 18 (3), 196–211. +Lings, B., Lundell, B., Ågerfalk, P.J., Fitzgerald, B., 2007. A reference model for suc- +cessful distributed Development of Software Systems. In: Proceedings of the Second International Conference on Global Software Engineering (ICGSE 2007). IEEE Computer Society, pp. 130–139. + + +Linususer, 2010. OpenOffice.org Community Announces. The Document Foun- +dation. http://www.openoffice.org/press-release/announces-the-document-foundation (accessed June 2013). + + +Lopez-Fernandez, L., Robles, G., Gonzalez-Barahona, J.M., Herraz, I., 2006. Apply- +ing social network analysis techniques to community-driven Libre software projects. International Journal of Information Technology and Web Engineering 1 (3), 27–48. + + +Lundell, B., 2011. e-Governance in public sector ICT-procurement: what is shaping practice in Sweden? European Journal of ePractice 12 (6), http://www.epractice.eu/files/European%20Journal%20of%20ePractice%20Volume%2012%266.pdf. + + +Lundell, B., Gamalielsson, J., 2011. Towards a Sustainable Swedish e-Government Practice: Observations from unlocking digital assets. In: Proceedings of the IFIP 11th Government Conference 2011 (EGOV 2011), Delft, The Netherlands, 28 August–2 September 2011. + + +Lundell, B., Lings, B., Lindqvist, E., 2010. Open Source in Swedish companies: where are we? Information Systems Journal 20 (6), 519–535. + + +Lundell, B., Lings, B., Syberfeldt, A., 2011. Practitioner perceptions of Open Source software in the embedded systems area. Journal of Systems and Software 84 (9), 1540–1549. + + +Madye, G., Freeh, V., Tynan, R., 2004. Modeling the F/OSS community: a quantitative investigation. In: Koch, S. (Ed.), Free/Open Source Software Development. Idea Group Publishing, Hershey, pp. 203–221. + + +Marketwire, 2011a. Oracle Announces Its Intention to Move OpenOffice.org to a Community-Based Project, http://www.marketwire.com/press-release/oracle-announces-its-intention-to-move-openofficeorg-to-a-community-based-project-nasdaq-orcl-1503027.htm (accessed June 2013). + + +Marketwire, 2011b. Oracle to Contribute to Apache, http://www.marketwire.com/press-release/statements-on-openofficeorg-contribution-to-apache-nasdaq-orcl-1521400.htm (accessed June 2013). + + +Meyers-Romero, J., Robles, G., Ortuño-Pérez, M., Gonzalez-Barahona, J.M., 2008. Using social network analysis techniques to study collaboration between a FLOSS community and a company. In: Russo, B., et al. (Eds.), Open Source Development, Communities and Quality. IFIP Advances in Information and Communication Technology, vol. 275, Springer, New York, pp. 171–186. + + +Mens, T., Fernández-Ramírez, J., Degrandts, S., 2008. The evolution of Eclipse. In: Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008), pp. 386–395. + + +Michlmayr, M., 2009. Community management in Open Source projects. The Euro- +pean Journal for the Informatics Professional X (3), 22–26. + + +Michlmayr, M., Robles, G., Gonzalez-Barahona, J.M., 2007. Volunteers in large Libre software projects: a quantitative analysis. In: Sowe, S.K., et al. (Eds.), Emerging Free and Open Source Software Practices. IGI Publishing, Hershey, pp. 1–24. + + +Midha, V., Palvia, P., 2012. Factors affecting the success of Open Source software. Journal of Systems and Software 85 (4), 895–905. + + +Mockus, A., Fielding, R.T., Herbsleb, J.D., 2002. Two case studies of Open Source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology 11 (3), 309–346. + + +Moon, Y.J., Sproull, L., 2000. Essence of distributed work: the case of the Linux kernel. First Monday 5 (12), 1–7. + + +Müller, R., 2008. Open Source – Value Creation and Consumption. In: Open Expo, Zürich, 24–25 September. + + +Nouws, L., 2011. LibreOffice – the first year and looking forward! In: Presented at ODF Plugfest, Gouda, Netherlands, 2011-11-18, http://plugfest. + + +Nyman, L., Mikkonen, T., Lindman, J., Fougère, M., 2012. Perspectives on code forking and sustainability in Open Source software. In: Hammouda, L., et al. (Eds.), Open Source Systems: Long-Term Sustainability. IFIP Advances in Information and Communication Technology, vol. 378, Springer, Heidelberg, pp. 274–279. + + +Openoffice, 2002. OpenOffice.org Community Announces, OpenOffice.org 1.0. Free Office Productivity Software, http://www.openoffice.org/about_us/oooe_release.html (accessed June 2013). + + +Openoffice, 2004. OpenOffice.org Is Four, http://www.openoffice.org/about_us/birthday4.html (accessed June 2013). + + +Openoffice, 2012. The Apache OpenOffice Project Announces Apache OpenOffice™ 3.4, http://www.openoffice.org/news/aoo34.html (accessed June 2013). + + +Openoffice, 2013. Community Council, http://wiki.services.openoffice.org/wiki/Community_Council (accessed June 2013). + + +Oracle, 2010. Oracle Completes Acquisition of Sun, http://www.oracle.com/us/corporate/press/044428 (accessed June 2013). + + +Paasivaara, M., 2011. Coaching global software development projects. In: Proceed- +ings of the 30th International Conference on Global Software Engineering (ICGSE 2011). IEEE Computer Society, pp. 84–93. + + +Pclomsag, 2011. Free At Last! LibreOffice 3.3 Released, http://pclomsag.com/html/Issues/201103/page14.html (accessed June 2013). + + +Raja, U., Tretter, M.J., 2012. Defining and evaluating a measure of Open Source project survivability. IEEE Transactions on Software Engineering 38 (1), 163–174. + + +Ray, B., Kim, M., 2012. A case study of cross-system porting in forked projects. In: Pro- +ceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 11–16 November 2012, Cary, NC. + + +Robert, S., 2006. On-board development – the open-source way. In: IST/ARTEMIS Workshop, Helsinki, 22 November. + + +Robles, G., Gonzalez-Barahona, J.M., 2012. A comprehensive study of software forks: dates, reasons and outcomes. In: Hammouda, L., et al. (Eds.), Open Source Systems: Long-Term Sustainability. IFIP Advances in Information and Commu- +nication Technology, vol. 378. Springer, Heidelberg, pp. 1–14. + + +Robles, G., Gonzalez-Barahona, J.M., Michlmayr, M., 2005. Evolution of volunteer participation in Libre software projects: evidence from Debian. In: Proceedings of the First International Conference on Open Source Systems (OSS 2005), pp. 100–107. + + +Rossi, B., Scotto, M., Sillitti, A., Succi, G., 2006. An empirical study on the migration to Open Source software in a public administration. International Journal of Information Technology and Web Engineering (IJITWE) 1 (3), 64–80. + + +Rossi, B., Russo, B., Succi, G., 2009. Analysis of Open Source development evolution iterations by means of burst detection techniques. In: Boldyreff, C., et al. (Eds.), Open Source Ecosystems: Diverse Communities Interacting. IFIP Advances in Information and Communication Technology, vol. 299. Springer, Berlin, pp. 83–93. + + +Samoladas, I., Stamos, I., Angelos, L., 2010. Survival analysis on the duration of open source projects. Information and Software Technology 52 (9), 902–922. + + +Santos, C., Kuk, G., Kon, F., Pearson, J., 2013. The attraction of contributors in free and Open Source software projects. Journal of Strategic Information Systems 22 (1), 45–69. + + +Sen, R., Singh, S.S., Borle, S., 2012. Open Source software success: measures and analysis. Decision Support Systems 52 (2), 364–372. + + +Severance, C., 2012. The Apache Software Foundation: Brian Behlendorf. Computer 45 (1), 1–6. + + +Seydel, J., 2009. OpenOffice.org: when will it be ready for prime time? In: Proceed- +ings of the Southwest Decision Sciences Institute Conference (SWDSI), 25–28 Ed. + + +Shibuya, B., Tamai, T., 2009. Understanding the process of participating in open source communities. In: Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development. IEEE Computer Society, Washington, DC, USA, pp. 1–4. + + +Subramaniam, C., Sen, R., Nelson, M.L., 2009. Determinants of Open Source software project success: a longitudinal study. Decision Support Systems 46 (2), 576–585. + + +Ven, K., Mannenat, H., 2008. Challenges and strategies in the use of Open Source Soft- +ware by Independent Software Vendors. Information and Software Technology 50 (9–10), 991–1002. + + +Ven, K., Huysmans, P., Verelst, J., 2007. The adoption of open source desktop software in a large public administration. In: Proceedings of the 13th Americas Conference on Information Systems (AMCIS 2007), 9–12 August, Keystone, CO. + + +Ven, K., Van Kerckhoven, G., Verelst, J., 2010. The adoption of open source desktop software: a qualitative study of Belgian organizations. International Journal of IT/Business Alignment and Governance (IJITBAG) 1 (4), 1–17. + + +Viseur, R., 2012. Forks impacts and motivations in free and open source projects. International Journal of Advanced Computer Science and Applications (IJACSA) 3 (2), 117–122. + + +Wang, J., 2012. Survival factors for Free Open Source software projects: a multi-stage perspective. European Management Journal 30 (4), 352–371. + + +Wheeler, D.A., 2007. Why Open Source Software/Free Software (OSS/FS, FLOSS, or OSS) is important. In: Wheeler, D.A. (Ed.), Open Source Software: New Horizons. IFIP Advances in Information and Communication Technology, vol. 319. Springer, Berlin, pp. 294–307. + + +Wiggins, A., Howison, J., Crowston, K., 2009. Heartbeat: measuring active user base and potential user interest in FLOSS projects. In: Boldyreff, C., et al. (Eds.), Open Source Ecosystems: Diverse Communities Interacting. IFIP Advances in Information and Communication Technology, vol. 299. Springer, Berlin, pp. 94–104. + + +Jonas Gamalielsson is a researcher at the University of Skövde’s Informatics Research Centre. He has conducted research on open source and open standards in several projects. He has been involved in the Open Source Action (OSA) project (2008–2010), the Nordic (NordForsk) OSS Researchers Network (2009–2012), and the ITEA2-project OPEES (Open Platform for the Engineering of Embedded Systems). Further, he is participating in the ORIOS (Open Source based Reference implementations for Open Standards) project. He has also been involved in the Fifth and Eighth International Conference on Open Source Systems (OSS 2009 and OSS 2012). + + +Björn Lundell is a senior researcher at the University of Skövde’s Informatics Research Centre. He has been researching the Open Source phenomenon for several years and participated in a number of research projects in different leading roles, including: co-lead for a work package in the EU FP6 CALIBRE project (2004–2006), the project manager in the Swedish National Research Project OSS (2005–2008), and is currently the project leader for the ORIOS project (2012–2015). He is a founding member of IFIP WG 2.13 on Open Source Software, and was program co-chair for the Eighth International Conference on Open Source Systems (OSS 2012). +---------------------------------------- +------------------------------- +Section 144: +Code Reuse in Stack Overflow and Popular Open Source Java Projects + + +Adriaan Lotter + +Department of Information Science + +University of Otago + +Dunedin, New Zealand + +adriaan.lotter@otago.ac.nz + + +Sherlock A. Licorish + +Department of Information Science + +University of Otago + +Dunedin, New Zealand + +sherlock.licorish@otago.ac.nz + + +Sarah Meldrum + +Department of Information Science + +University of Otago + +Dunedin, New Zealand + +sarah-meldrum@outlook.com + + +Bastin Tony Roy Savarimuthu + +Department of Information Science + +University of Otago + +Dunedin, New Zealand + +tony.savarimuthu@otago.ac.nz + + +Abstract—Solutions provided in Question and Answer (Q&A) websites such as Stack Overflow are regularly used in Open Source Software (OSS). However, many developers are unaware that both Stack Overflow and OSS are governed by licenses. Hence, developers reusing code from Stack Overflow for their OSS projects may violate licensing agreements if their attributions are not correct. Additionally, if code migrates from one OSS through Stack Overflow to another OSS, then complex licensing issues are likely to exist. Such forms of software reuse also have implications for future software maintenance, particularly where developers have poor understanding of copied code. This paper investigates code reuse between these two platforms (i.e., Stack Overflow and OSS), with the aim of providing insights into this issue. This study mined 151,946 Java code snippets from Stack Overflow, 16,617 Java files from 12 of the top weekly listed projects on SourceForge and GitHub, and 39,616 Java files from the top 20 most popular Java projects on SourceForge. Our analyses were aimed at finding the number of clones (indicating reuse) (a) within Stack Overflow posts, (b) between Stack Overflow and popular Java OSS projects, and (c) between the projects. Outcomes reveal that there was up to 3.3% code reuse within Stack Overflow, while 1.8% of Stack Overflow code was reused in recent popular Java projects and 2.3% in those projects that were more established. Reuse across projects was much higher, accounting for as much as 77.2%. Our outcomes have implication for strategies aimed at introducing strict quality assurance measures to ensure the appropriateness of code reuse, and licensing requirements awareness. + + +Keywords—Code reuse, Stack Overflow, Java projects, OSS, Q&A, Quality + + +I. INTRODUCTION + + +Quality plays a fundamental role in software success [30]. Thus, quality standards have been developed to provide guidance for software developers, covering the requirements for producing high quality, defect-free software [30, 31]. Under the ISO-9126 quality model, for example, it is stated that the quality requirements for software should cover efficiency, functionality, reliability, usability, reusability, and maintainability [9]. Such standards have also been the subject of previous academic studies (e.g., Singh et al. [22]). + + +With quality as an underlying motivator for instilling good software development practices while creating software, developers should be particularly conscious when employing code reuse from external sources (e.g., from open source (OS) portals) [29], which may impact software efficiency, functionality, reliability, usability, and maintainability. While code reuse allows for previously tested and quality-assured code to be implemented in a system, reusing code from untrusted sources may lead to system harm [16]. The implications of code reuse could be particularly significant for software maintainability, as poor knowledge of reused code at the time of software development will likely create challenges for future corrective and perfective actions. As discussed in Roy et al. [40], understanding the levels of reuse and cloning could be valuable for developers in terms of assisting with issues related to plagiarism, software evolution, debugging, code compaction, and security. Furthermore, Kashima et al. [36] noted that there are several OSS licenses that require software outcomes derived from original solutions to be published under the same license. This demands that developers are aware of the legal implications of the licenses under which OSS and code posted on other portals (such as Stack Overflow) are published. Additionally, businesses also need to be aware of the reuse occurring within outsourced development [20], as under these conditions they may face future legal challenges. + + +Code reuse is formally defined as “the use of existing software or software knowledge to construct new software” [15]. It is prevalent in many software, including those produced by top-tier software development companies such as Google [34]. Beyond such industry leaders, code reuse has been found to be exceptionally common in Mobile Apps, with some of these products consisting entirely of reused elements [13]. This high level of reuse seen in the practice of developers stems from the benefits it provides in terms of easily adding and enhancing system features [25]. The accessibility of readily available solutions to coding problems is highly attractive to both novice and experienced programmers [25]. In fact, in a study by Sojer et al. [21], the responses from 869 developers confirmed that they consider ad hoc reuse of code from the internet to be important for their work. Similarly, Heinemann et al. [18] also found that 90% of the OS projects they analyzed contained reused code, reiterating the point that code reuse is found extensively in many software systems. + + +The ease and attractiveness of code reuse has been particularly aided by readily accessible code fragments on Q&A websites, such as Stack Overflow. Stack Overflow is a very popular Q&A website which allows members of the public to post development related questions and/or answers, with the answers often containing code fragments. + + +1 http://www.stackoverflow.com +Recent evidence shows that the majority of the questions that are asked on Stack Overflow usually receive one or more answers [6], and this forum is often a substitute for official programming languages’ tutorials and guides [24]. + + +With both implications for software maintainability and licensing when reusing Stack Overflow code fragments, of interest to us is the potential effects reusing code from this portal could have on effort for future changes and correct use of license to avoid future legal issues. The aim of this paper is thus to investigate the levels of code reuse within Stack Overflow, and between Stack Overflow and OSS projects. We focus on the Java programming language, given its popularity 2, and the need to understand reuse beyond Python (Yang et al. [8]). With a strong body of knowledge around the scale of developers’ reuse practices, team leaders may begin to introduce stricter quality assurance measures to ensure the appropriateness of reused code fragments. We thus answer five research questions in our portfolio of work. Firstly, we explore, what is the extent of Java code reuse within Stack Overflow? to understand how the community operates as an ecosystem in the provision of self-support (RQ1). Related to this question, we next explore, what is the extent of code reuse between answers published under the same question in Stack Overflow? to understand the degree of innovation (or lack thereof) that is prevalent on this platform (RQ2). Answers to these two questions are particularly useful for the software engineering community as within-source code migration is likely to increase the risk of incorrect author attribution, due to (a) having more copies in existence, and (b) increasing the number of ‘steps’ a piece of code could have taken from its origin to where it was found. This could in turn lead to unsuspecting license violations for those implementing these code snippets in OSS. + + +Our third research question, what is the extent of code reuse between Stack Overflow and the current most popular Open Source Java Projects? helps us to understand recent code reuse trends (RQ3). Related to this research question we examine, what is the extent of code reuse between Stack Overflow and the all-time most popular Open Source Java projects? to understand software practitioners’ behavior to code reuse over time (RQ4). Additionally, we answer, are there differences in the nature of reuse found between the different contexts in terms of scale and size? to provide deeper evidence for the nature and ranges of code reuse between Stack Overflow and OSS projects (RQ5). Beyond understanding the extent of code reuse (or clones) existing between OSS and Stack Overflow, it is important to understand how practitioners’ attitude towards this practice has changed over time. Our investigation led by the latter three questions will provide initial evidence on the extent of code reuse between projects developed more recently and those having existed for longer. + + +The remaining sections of this paper are organized as follows. We provide our study background in Section 2. We next provide our research setting in Section 3, before providing our results in Section 4. We then discuss our findings and their implications in Section 5, prior to considering threats to the study in Section 6. Finally, we provide concluding remarks and point to future research in Section 7. + + +II. BACKGROUND + + +Software practitioners would benefit from developing maintainable software systems that are free of code license violations, and thus, code reuse should be given serious consideration during development. Both of these topics (i.e., software maintenance and license) have been investigated to various extents, and their importance has been widely noted in the literature. Firstly, the maintainability of a software system is highly significant to all its stakeholders, especially when considering project lead-times and costs [9]. Maintainability refers to the likelihood of performing software improvements in a given period, and is said to become more difficult with the prevalence of code reuse [32]. Kamiya et al. [32] established that code reuse could introduce multiple points of failure if code fragments are ‘buggy’, and in fact, it has been noted that approximately half of the changes made to code clone groups are inconsistent [15]. + + +The issue of code reuse and maintainability becomes more complex when the reused code is sourced from external sources (e.g., Stack Overflow). This is due to potential code incompatibility issues and sub-optimal solutions, which are often tied to a lack of developer understanding. Also, code fragments provided on Stack Overflow are largely written for accompanying some textual explanation, and not for immediate use as such. In fact, for many developers, online sources such as Stack Overflow are of utility, when they are faced with issues that require knowledge they do not possess. This brings into question their likely understanding of such code, which in turn brings into question the software’s quality. Furthermore, security complications may arise, as evidence has shown that Stack Overflow portal includes insecure code [10]. + + +An example of how catastrophic code reuse could be is illustrated by Bi [11]. This author shows that a piece of Stack Overflow code was used in the NissanConnect EV mobile app, which accidentally displayed a piece of text reading “App explanation: the spirit of stack overflow is coders helping coders”. This example illustrates that code reused from Stack Overflow and other similar portals are not always examined thoroughly. Although this example illustrates a non-threatening issue, many similar cases could introduce security and functionality-related problems if not inspected properly. Thus, it is important to investigate and understand the extent of code reuse occurring between software systems and online code resources such as Stack Overflow. + + +Recently, several research studies have been conducted on the topic of code reuse and Stack Overflow. For instance, Abdalkareem et al. [25] investigated code reused from Stack Overflow in Mobile Apps and found that 1.3% of the Apps they sampled were constructed from Stack Overflow posts. They also discovered that mid-aged and older Apps contained Stack Overflow code introduced later in their lifetime. An et al. [19] also investigated Android Apps and found that 62 out of 399 (15.5%) Apps contained exact code clones; and of the 62 Apps, 60 had potential license violations. In terms of Stack Overflow, they discovered that 1,226 posts contained code found in 68 Apps. Furthermore, +126 snippets were involved in code migration, where 12 cases of migration involved Apps published under different licenses. Yang et al. [8] noted that, in terms of Python projects, over 1% of code blocks in their token form exist in both GitHub and Stack Overflow. At an 80% similarity threshold, over 1.1% of code blocks in GitHub were similar to those in Stack Overflow, and 2% of Stack Overflow code blocks were similar to those in GitHub. + + +In terms of attribution, in ensuring conformance to license requirements, Baltes et al. [27] found that 7.3% of popular repositories on GitHub contained a reference to Stack Overflow. In the context of Java projects, a minimum of two thirds containing copied code did not contain a reference to Stack Overflow. Additionally, only 32% of surveyed developers were aware of the attribution requirements of Stack Overflow. This could result in complicated legal issues for developers. In fact, the study of licensing violations is also the subject of previous research [4, 23, 21]. It has been noted that license violations occur frequently in OS projects [4], as well as in Q&A websites such as Stack Overflow, where the community itself has inquired about the issue [3]. As stated by German et al. [7], it is illegal for code fragments from one system to be implemented in another if their licenses are incompatible. As such, developers are required to be cautious with their work and should be aware of the legal consequences involved with code reuse from internet sources. Although license violations do not have direct implications for quality, it does pose potential legal problems, which could result in the removal of software and court costs. Additionally, from a software development perspective, licensing issues could result in further costs to resolve complications, implement system changes, and fix reputation damage. + + +Stack Overflow is covered under the CC BY-SA 3.0: Creative Commons Attribution-ShareAlike 3.0 license [2], and as such, developers have the right to transform and build upon the content on Stack Overflow. However, new software using Stack Overflow code must be distributed under the same license as the original. Furthermore, credit must be given to the specific answer on Stack Overflow, a link must be provided for the license, and the developer should specify if they introduced changes. Noticeably, code reuse from Stack Overflow has been shown to exist in various OSS projects, with varying amounts of reuse levels. The reused code, however, is not often acknowledged, and the lack of attribution results in license violations in many projects [3, 25]. As such, additional research is required to both validate and extend the current literature. We pursue this line of work in this study, in answering our five research questions (RQ1-RQ5 stated earlier). + + +III. RESEARCH SETTING + + +A. Data Collection and Processing + + +To address the research questions posed three sets of data were extracted, including Stack Overflow code snippets and two sets of OSS projects’ source code. For the purpose of this study each dataset was required to only contain Java files. To collect the necessary data, we utilized the Stack Overflow data dump, SourceForge, and GitHub. A key motivator for selecting these sources was their popularity in the programming community and their open access to data. + + +The projects selected from SourceForge and GitHub were all based on popularity (both weekly and all time), resulting in projects being selected which were widely used and contributed towards. As such, we believe that the effects of code reuse would be more significant for these projects than for less popular ones. + + +Stack Overflow Java Snippets: The Java ‘snippets’ from Stack Overflow were extracted using the data explorer function to create the first dataset. Answer posts were then selected based on having at least one “ +” tag and were filtered on the language Java. Of these answers, only those which were selected as accepted answers were kept, on the premise that such snippets will be trusted and thus reused. As a final filter, only answers from 2014 to 2017 were selected to ensure relevancy. This resulted in 117,526 answers. These answers were then separated into individual code snippets, based on each being within “ +… +” tags. This resulted in 404,799 individual code snippets. Of these snippets, only those with more than one line of code were selected. Ultimately, 151,954 code snippets were extracted and saved as Java files, and 151,946 were analyzed since eight returned errors when they were processed. + + +Top Weekly OSS Projects: The second dataset of files extracted were projects with the greatest weekly popularity, with the specific week of sourcing starting on December 18, 2017. We extracted the top 10 weekly Java projects on SourceForge and GitHub. This resulted in a preliminary sample of 20 projects, in line with previous research done by Heinemann et al. [18] on Open Source Java projects. Each of these projects was investigated, and those containing at least one Java file was selected. Ultimately, 12 suitable projects were finally selected for the analysis, which contained a total of 16,617 Java files. Five files returned errors during processing, as reported in Table III. + + +All Time Most Popular OSS Projects: The final dataset covered projects that had the highest all-time popularity on SourceForge. As above, the top 20 projects were selected, and 16 were appropriate for the analysis (i.e., contained at least one Java source file). We did not extract projects from GitHub in this round given the richness of the projects that were extracted from SourceForge. The projects were filtered on popularity, as well as containing Java code. However, four projects were included in the subset which did not contain Java files, leaving 39,616 files. The final list of projects and their summaries can be found in Table IV with 39,558 Java files being used in our analyses after processing. + + +B. Tools and Techniques + + +To answer our research questions an appropriate clone (reuse) detection tool was required. We conducted a review of several tools, including NiCad [14], SourcererCC [12] and CCFinderX [32]. We selected CCFinderX given its performance and popularity among researchers [5, 25, 32]. Its token-based techniques for clone detection is computationally more efficient than alternative methods, it has a high recall rate, and is able to detect all hidden clones [5]. As discussed by Kamiya et al. [32], the software works by employing a lexical analyzer to create token sequences, after which it applies rule-based transformations to these sequences (based on the specific programming language). +The lexical analyzer is used to transform sequences of characters into sequences of tokens, which are word-like entities [33]. These entities can be identifiers, keywords, numbers, literals, operators, separators, or comments [33, 1]. The matching of clones is then computed using a suffix-tree algorithm, “in which the clone information is represented as a tree with sharing nodes for leading identical subsequences and the clone detection is performed by searching the leading nodes on the tree” [33]. + + +When utilizing CCFinderX for the analyses, several parameters were configured. We followed previous recommendations and used CCFinderX default settings [25]. The minimum clone length, representing the absolute count of tokens, was set at its default value of 50. As such, code blocks were only considered if they contain at least 50 tokens. Additionally, the minimum unique token set value was configured as default (being 12). Hence, code blocks were only considered if it contains at least 12 unique tokens in addition to having an absolute minimum count of 50 tokens. The shaper level was also set at its default of 2. The shaper restricts code blocks from being considered a candidate clone if an outer block ‘}’ splits the token sequence. The final two parameters were the ‘P-match application’ and the ‘Pre-screening application’. The P-match application parameter is by default ticked, and denotes that variables and function names are not replaced with special characters. The Pre-screening application was, by default, not ticked, as we wanted to retain all clone instances. Pre-screen is ticked to filter outcomes where there are visually too many code clones. The output from CCFinderX includes both file metrics and clone metrics. The file metrics provide file-level insights into the data, whereas the clone metrics provide information regarding clone sets. One set exists for each unique group of clones. As such, a clone set will contain a minimum of 2 code blocks. Additionally, we were able to identify the number of files containing clones and clone-sets present in different files in the data (refer to Figure 1 for example). + + +In order to determine the extent of code reuse occurring within files, between files, and between projects/datasets, the Radius metric (RAD) of CCFinderX was utilized. + + + + +After performing the analysis, clone-sets were selected based on their specific RAD values, and in turn these were used to select the individual files involved. The RAD metric, as defined by Kamiya et al. [32], gives an indication of the maximum distance to a common directory between the files involved in a clone-set. As such, clones found within the same file will have a Radius of 0, and clones found between two files in the same directory will have a Radius of 1, and so on. + + +C. Measures for Answering RQs + + +To answer the first four research questions posed (RQ1-RQ4), five analyses were performed. These analyses involve calculating the following metrics: Firstly, the number of files containing at least one clone is computed. Secondly, by using the previous measure, we got a measure of the percentage of files containing clones. This allows us to compare our results with those from similar studies, such as that of Yang et al. [8]. Thirdly, by summing the population variable (pop) of each clone-sets we identified the total number of clones present in the files. Fourthly, the total number of clone-sets reveals all unique clones. Fifthly, among these clone sets we identified which clones involved more than one file. + + +To answer +RQ1. What is the extent of Java code reuse within Stack Overflow? +, all Stack Overflow files were stored in the same directory when the CCFinderX was executed, and as such a Radius of 1 was used to identify between-file clone-sets. Answering the second research question ( +RQ2. What is the extent of code reuse between answers published under the same question in Stack Overflow? +) required Stack Overflow files to be stored in separate directories based on the questions under which they were posted. As such, a Radius of 1 would indicate that clones exist between answers for the same question, and a Radius of 2 would indicate that clones exist between answers for separate questions. Having a Radius of 2, however, does not imply that intra-question clones (i.e., clones under the same question) do not exist, it simply implies that a clone is also found between questions. This can hide intra-question clones, and as such a manual inspection was performed on the clone-sets with a Radius of 2 to identify intra-question clones hidden by the maximum Radius value. Figures 2 and 3 demonstrate the situation, where both cases have a Radius of 2, however only one (Figure 2) has an intra-question clone. The code piece, denoted by ‘A’, is found under the same question (Question 1). + + + + +To answer the third ( +RQ3. What is the extent of code reuse between Stack Overflow and the current most popular Open Source Java Projects? +) and fourth ( +RQ4. What is the extent of code reuse between Stack Overflow and the all-time most popular Open Source Java projects? +) questions each project’s files were extracted and saved under the same directory. Furthermore, the Stack Overflow files were saved two directories away, which allowed us to identify clone-sets with clones found between Stack Overflow and a project(s) using a Radius value of 2. The primary measurements required to answer the research questions includes the total number of files containing at least one clone, the total number of clones present in these files, and +the number of unique clones. +RQ5. Are there differences in the nature of reuse found between the different contexts in terms of scale and size? + was answered through follow up statistical analyses involving the outcomes above. +---------------------------------------- +------------------------------- +Section 145: +D. Reliability Checks + + +To ensure that the results obtained from our analyses were reliable, we conducted a manual investigation of 60 clone pairs detected by CCFinderX. Initially author AL (first author) performed the checks, which were then discussed with author SAL (second author) who triangulated the outcomes and provided confirmation. Within the sample of the 60 clone pairs, 20 were randomly obtained from the Stack Overflow analysis in Section IV (A), 20 from Section IV (C – a), and 20 from Section IV (D – a). For each selected clone-pair, it was determined to what extent the two pieces of code were similar, and the nature of the code was also recorded (i.e., is it a class, method, or piece of code within a method that was detected as being a clone). The extent to which clones were similar was rated either ‘Exact’, ‘High’, or ‘Medium’. For those rated as ‘Exact’, the code in question would be identical copies, including all identifiers, the structure, and the functionality. For those rated as ‘High’, the primary difference between the two pieces of code would be the identifiers. Finally, those ranked as ‘Medium’ were considered to still be similar in structure, although identifiers, minor pieces of data structures, and minor pieces of functionality may be different. The results from the analyses are given in Tables I and II, where Table I reflects the number of clone pairs considered similar to a given extent, and Table II displays the nature of code elements detected in the sample. +---------------------------------------- +------------------------------- +Section 146: +TABLE I. MANUAL CHECK OF DETECTED CLONE SIMILARITY + + +| Similarity | SO | SO & All Time Most Popular | SO & Current Most Popular | Total | +|------------|----|-----------------------------|---------------------------|-------| +| Exact | 10 | 6 | 3 | 19 | +| High | 10 | 12 | 13 | 35 | +| Medium | 0 | 2 | 4 | 6 | +---------------------------------------- +------------------------------- +Section 147: +TABLE II. CODE CLONES ELEMENTS + + +| Nature of Code Element | SO | SO & All Time Most Popular | SO & Current Most Popular | Total | +|------------------------|----|-----------------------------|---------------------------|-------| +| Class | 5 | 0 | 1 | 6 | +| Method | 5 | 6 | 8 | 19 | +| Part of Method | 10 | 14 | 11 | 35 | + + +Our results show that it is highly plausible that these pieces of code could have been copied directly, or at least have been adapted to fit the software in question (refer to Table I for details). Furthermore, Table II shows that the majority of clones were code found within methods. Thus, it appears that if a developer is to copy a piece of code from Stack Overflow, then it is likely that this code would provide some additional functionality to a method. +---------------------------------------- +------------------------------- +Section 148: +IV. RESULTS +---------------------------------------- +------------------------------- +Section 149: +A. Java Code Reuse within Stack Overflow (RQ1) + + +Our analysis of the Stack Overflow files revealed that, overall, 5,041 files (out of 151,946) contained at least one clone (or were reused). Thus, 3.3% of Stack Overflow Java code snippets have a duplicate found elsewhere in Stack Overflow. Furthermore, it was observed that within the 5,041 files, a total of 8,786 clones were present, indicating that some contained multiple clones. In terms of clone sets, 3,530 unique code snippets were observed to have clones. However, when focusing on clones found in at least two files, this number reduced to 2,338. As a result, we are able to determine that there were potentially 2,338 unique license violations existing within the Stack Overflow files extracted (refer to Section II for Stack Overflow licensing requirements), and that these cumulatively appear in 5,863 places. The additional 1,192 (i.e., 3530 minus 2338) unique clones were found within the same files, and as such, do not present potential license violations as they are contained within the same answers by the same author. +---------------------------------------- +------------------------------- +Section 150: +B. Java Code Reuse between Answers on Stack Overflow (RQ2) + + +To further investigate code reuse within Stack Overflow we also looked at the amount of reuse occurring within answers given to the same questions. Our analyses reveal that of the 151,946 Stack Overflow files 2,666 contained clones found under the same question. This equates to 1.8% of the total files, and implies that this amount of snippets had at least one clone (code duplication) published under the same question. Within these 2,666 files, a total of 3,559 clones were found, again indicating that some answers contained multiple clones. Out of the 3,559 clones discovered, the number of unique clones were found to be 1,763. Additionally, from the 2,666 Stack Overflow files containing clones, we were able to identify that they were present in the answers in responses to 1,207 unique questions (out of 46,082 in total). Hence, 2.6% of Java related questions on Stack Overflow can be expected to contain two or more answers with the same code. +---------------------------------------- +------------------------------- +Section 151: +C. Code Reuse between Stack Overflow and Current Popular Projects (RQ3) + + +a) Stack Overflow and Project Reuse Analysis: + The analysis of the Stack Overflow and our top weekly OSS projects revealed that 12,763 files (out of 168,558; five project files were removed by CCFinderX’s due to errors) contained at least one clone. Based on this result we observed that 7.6% of the files under consideration contain at least one clone. Of the 12,763 files, a total of 5,447 of these were Stack Overflow files (out of 151,946 files), and 7,316 were top weekly OSS project files (out of 16,612 files). This indicates that when introducing the project files, 406 additional Stack Overflow files contain clones (refer to Section IV (A)). This implies that these 406 Stack Overflow files contain code that is not found anywhere else on Stack Overflow, with the clones being solely between Stack Overflow and at least one project. Additionally, the project files with clones account for 44% of the total project files, which, as a proportion is much greater than that of the Stack Overflow files (just 3.3%). This is primarily believed to be a +result of the size of the project files, with their average token size being 617, compared to a much smaller 48 for the Stack Overflow files. We performed further probing of the data, observing that in the 12,763 files containing at least one clone, 21,893 clone sets existed. In other words, there were 21,893 unique code snippets which have at least one clone. Of these, a smaller number of clone sets contained clones found in both Stack Overflow and top weekly OSS project files. This figure is 223, indicating that 223 unique code snippets are found between the Stack Overflow and project files. These clones cumulatively appear in 1,627 files (1.0% of 168,558), with each appearing in an average of 7.3 files. In total, these 223 unique code snippets appear 1,995 times. + + +b) Inter-Project Reuse Analysis: Of the 12,763 files containing clones, a total of 75,959 clones were discovered within these. When the project files were analyzed independently, it was found that 7,287 (i.e., 57%) of the project files contained clones among themselves, giving an average of 2,979.1 clones per project (as depicted in Table III). Additionally, when investigating clone-sets, we observe 212 clones in at least two projects, with these appearing 1,995 times. Further probing also revealed that 29 project files (out of 7,316) contained clones that are only found in Stack Overflow files, and not in any other project. In other words, these 29 clones are found in a one to one fashion between one project and Stack Overflow, and as such they are most likely to have migrated directly between Stack Overflow and a project, since there is no evidence of these originating internal to the project. The direction of this migration, however, is not known, although independent of these situations, our reliability checks above show that there were no attributions, and thus, licensing issues could arise. + + +D. Code Reuse between Stack Overflow and All-Time Most Popular Projects (RQ4) + + +a) Stack Overflow and Project Reuse Analysis: The analysis of the Stack Overflow and all time most popular Java projects revealed that overall 24,537 files (out of 191,504; 58 project files were removed by CCFinderX’s due to errors) contained at least one clone. Based on this result we observe that approximately 12.8% of the files under question contain at least one clone. However, only 5,554 Stack Overflow files contained a clone, which is 513 more than when Stack Overflow files were considered on their own. On the other hand, 18,983 project files (out of 39,558 files) contained at least one clone, which is approximately 48% of the total project files. Again, it should be noted, that the average length of a project file was 652 tokens. Furthermore, of the 24,537 files containing at least one clone, 51,282 clone sets existed. In other words, there were 51,282 unique code snippets which had at least one clone. Of these, a smaller number of clone sets contain clones found in both Stack Overflow and the projects. This figure is 450, indicating that 450 unique code snippets were found between the Stack Overflow and project files. These clones cumulatively appear 4,334 times (2.3% of 191,504), or in 6.4 files on average. + + +b) Inter-Project Reuse Analysis: Within the 24,537 files a total of 245,750 clones were discovered. Additionally, when analyzed independently, it was found that 18,935 of the project files contained clones among themselves (i.e., 77.2%), giving an average of 9,186.9 clones per project (as depicted in Table IV). Additionally, when investigating clone-sets, it was found that 726 clones were found it at least two projects, with these appearing 6,377 times. We noticed that 48 project files (out of 18,983) contained clones that are only found in Stack Overflow files, and not in any other project. As above, these 48 files are found directly between one project and Stack Overflow, and as such is highly likely to have migrated directly between Stack Overflow and a project. + + +| Project | Number of Java files | Average number of tokens/file | Number of clones | Number of files with clones/s | +|------------------|----------------------|-------------------------------|-----------------|-------------------------------| +| Awesome Java | 57 | 220 | 18 | 15 | +| Leetcode | 1327 | 498.4 | 1996 | 486 | +| Dubbo | 5576 | 936.7 | 13823 | 2869 | +| Elastic-Search | 1018 | 120.5 | 320 | 189 | +| Java Design | 3966 | 673 | 13674 | 2327 | +| Patterns | 239 | 713.4 | 2906 | 140 | +| Apache OpenOffice| 17 | 252.3 | 11 | 9 | +| Proxeye | 3799 | 277.7 | 2356 | 1037 | +| Qmui Android | 164 | 673.2 | 230 | 71 | +| Sap NetWeaver | 252 | 286.6 | 187 | 89 | +| Server Adapter | 1612 | 3836.8 | 35749 | 7216 | +| for Eclipse | | | | | +| Sefin | | | | | +| Total | 13813 | 406.4 | 2979.1 | 609.7 | + + +E. Contextual Differences in Scale and Size of Reuse (RQ5) + + +In addition to the findings above, the results displayed in Table V and Figure 4 show that the sizes of clones found within the various contexts are different. Of primary interest is the larger mean sizes of the clones within Stack Overflow (refer to boxplots in Figure 4-A, B). These larger sizes suggest that there is a likelihood of the clones detected being true positives, i.e., they are indeed evidence of reuse where entire snippets are copied. Additionally, the median and upper quartile of the top weekly Java projects clone sizes are greater than that of the other four contexts where project files were included. This is displayed in Figure 4, graphs C, D, E, and F; where D can be seen to have a greater median and upper quartile value. This indicates that newer projects are constructed to a greater extent from reused elements. + + +In Table V the average and maximum sizes of clones found within the various contexts are presented. Interestingly, the clones in terms of their maximum sizes are smaller for the two analyses looking at Stack Overflow and OSS projects together (277 and 324 respectively). As such, we can see that the code clones found between Stack Overflow and OSS projects are at most 324 tokens in length. However, when looking at inter-project clones, we notice that the maximum values are much higher, with the biggest clone consisting of 1,369 tokens. This suggests that code reuse between projects involves copying of larger pieces of code, including entire components. In contrast to this, Stack Overflow code usually provides smaller code snippets as answers to specific coding questions, and so, evidence here may be linked to this reality. +To test for statistically significant differences between the six groups of measures (refer to Table V), in terms of clone sizes, a Kruskal-Wallis test was performed. This test was selected as it is non-parametric in nature (i.e., does not assume that the data follows a Normal distribution), and it does not require sample sizes to be equivalent [28]. +---------------------------------------- +------------------------------- +Section 152: +TABLE IV. SUMMARY OF THE ALL-TIME MOST POPULAR JAVA PROJECTS (INTER-PROJECT) + + +| Project | Number of Java files | Average number of tokens/file | Number of clones | Number of files with clone(s) | +|--------------------------|----------------------|-------------------------------|------------------|-------------------------------| +| Angry IP Scanner | 219 | 397 | 102 | 48 | +| Catacombae | 91 | 758.6 | 223 | 33 | +| Cyclops Group | 2609 | 151.9 | 2545 | 1291 | +| Eclipse Checkstyle Plug-in | 1708 | 319 | 3115 | 782 | +| Freemind | 529 | 772 | 495 | 192 | +| Hibernate | 2392 | 285.6 | 2148 | 627 | +| Hitachi Vantara - Pentaho | 24494 | 673.2 | 112415 | 12008 | +| Libjpeg-turbo | 12 | 2061.3 | 44 | 7 | +| OpenCV | 148 | 1003.9 | 508 | 94 | +| Sap NetWeaver Server Adapter for Eclipse | 239 | 713.4 | 2921 | 144 | +| Sweet Home 3D | 233 | 2408.3 | 1476 | 142 | +| TurboVNC | 245 | 886.5 | 495 | 114 | +| Vuze – Azureus | 3639 | 750 | 5784 | 1461 | +| Weka | 42 | 1505.1 | 66 | 21 | +| Xtreme Download Manager | 155 | 806.4 | 468 | 71 | +| Total | 39558 | 14592.2 | 146990 | 18983 | +| Average/mean | 2472.4 | 912 | 9186.9 | 1186.4 | +---------------------------------------- +------------------------------- +Section 153: +TABLE V: CLONE SIZE STATISTICS + + +| Data Group | Median | Mean | Max | Mean Rank | +|--------------------------|--------|-------|-------|-----------| +| A. Stack Overflow | 66 | 85.7 | 938 | 14869.7 | +| B. Stack Overflow Intra-Answers | 69 | 87.2 | 938 | 15480.4 | +| C. Stack Overflow and Top Weekly | 57 | 67.9 | 277 | 11014.3 | +| D. Top Weekly | 60 | 84.3 | 774 | 13478.7 | +| E. Stack Overflow and Top All Time | 58 | 71.2 | 324 | 11646.4 | +| F. Top All Time | 58 | 69.2 | 1369 | 11392.1 | + + +Our result reveals a statistically significant outcome (significance level = 0.05), providing evidence that our outcomes are different (H(5) = 1409, p <0.01). Given this finding we further examined the distributions for A, B, D in Table V against others (C, E, F) with post hoc Kruskal-Wallis tests. Outcomes confirm that there were significantly bigger clones (p <0.05) for Stack Overflow, Stack Overflow Intra-Answers and Top Weekly projects when compared to the other distributions. This, alongside the results in Table V and the boxplots in Figure 4, provide preliminary evidence that the nature of clones, in terms of their sizes, are different for different data sets. We thus plan further analyses to investigate why these differences exist. +---------------------------------------- +------------------------------- +Section 154: +V. DISCUSSION AND IMPLICATIONS + + +Discussion +: Quality is an important element in all software development projects. In particular, the quality of freely available software should be a key consideration for its users. However, the migration of code between OSS projects and online Q&A platforms complicates such assessments. Stack Overflow as a platform, for instance, often acts as a medium through which code migrates between many projects, and as such, the quality of the code in many projects is influenced by factors that are beyond the control of their programmers. Furthermore, OSS projects are often published under specific licenses, which adds an additional level of complexity in terms of understanding their availability for reuse. In fact, users of the code published on Q&A platforms often lack the required understanding of the code, which can have direct implications for quality management if such code is reused in software projects. In order to investigate the extent of code reuse in these situations we focused on Java code from Stack Overflow and popular OSS projects. Here we revisit our outcomes to answer our five research questions (RQ1-RQ5). + + +RQ1. What is the extent of Java code reuse within Stack Overflow? + Our results indicate that within Stack Overflow, approximately 3.3% of the Java code sampled have at least one clone elsewhere on the website. Additionally, we found that up to 2,338 unique license violations could be present within these answers. This evidence duplicates that of Python code, which also revealed a 3.3% duplication [8]. It should be noted, however, that the Python code examined in Yang et al.’s [8] study was processed to remove the effects of white space and comments, which increase the performance of clone detection tools and lead to better comparisons. To this end, our outcomes is at best conservative, and so Java code reuse could be actually higher than 3.3% in Stack Overflow. + + +The results from our study, along with that of Yang et al. [8], indicate that code reuse is prevalent in Stack Overflow in both Java and Python contexts. The near identical results obtained by these two studies suggest that users and developers of the Stack Overflow platform should expect just over 3% of code on Stack Overflow to be duplicated. When considering the parameter settings for these code blocks to be considered candidate clones, it should be emphasized that these clones are of significant size (at least 50 tokens). Unlike many small snippets found on Stack Overflow, these clones meet the specified requirements set before the analysis, and as such, it is more likely that these code blocks are not clones by coincidence, rather they are reused. Hence, developers need to be cautious with reusing larger code blocks from Stack Overflow, and be prepared to rigorously evaluate such code before its usage. In addition, +instances of reuse demand proper attribution so that the community is aware of how Stack Overflow knowledge is recycled. We believe that a software tool could be of utility in terms of aiding developers wanting to evaluate the appropriateness of code for reuse, and also detecting exactly where such code originated from to help with correct attribution. + + +RQ2. What is the extent of code reuse between answers published under the same question in Stack Overflow? + We observed that 1.8% of all Java snippets (i.e., code in answers) have at least one clone within other answers provided for the same question. Our evidence also revealed that 2.6% of questions sampled contain at least one clone pair between its answers. Furthermore, there were 1,763 potential unique license violations in our sample data. As with insights provided in response to RQ1, this outcome has implication for developers using Stack Overflow code in terms of the need to be aware of the rate of code duplication within Stack Overflow. With an overall duplication rate of 3.3%, we notice that a significant proportion of this duplication refers to clones between answers in different questions. As a result, developers may not give attribution to the original authors. Furthermore, in cases where these code blocks have migrated from external sources, having duplicates within Stack Overflow may make it more difficult to find these original sources. Without complete knowledge of the origin of reused code, developers may publish their OSS under different licenses, which will result in license violations. In fact, given the conservative settings used for our analyses, we anticipate that the reuse rate for smaller code snippets may be much higher. As such, if duplicated code can be identified by Stack Overflow, then the process of identifying the most appropriate solution (code) may be expedited, since users will be able to avoid duplicated answers. Having repeated duplicate answers may also result in convoluted pages, which could lead to slower problem solving for developers. + + +RQ3. What is the extent of code reuse between Stack Overflow and the current most popular Open Source Java Projects? + Our evidence showed that between Stack Overflow and the top weekly Java projects, approximately 223 unique code snippets appeared in both sets of files. Between the Stack Overflow and project files, these snippets appeared in a total of 1,627 files. This evidence shows that, overall, 1.0% of the project files contain one of these Stack Overflow clones. However, it should be noted that the percentage of project files containing clones is higher when compared to the percentage of Stack Overflow files that contained code. This outcome suggests that the current most popular open source Java projects tend to use code copied from Stack Overflow. In fact, within the projects, we discovered that approximately 57% of these files contained a clone. These clones were found either within a single project, or between projects. In a study by Koschke et al. [26], they discovered that approximately 7.2% of all lines of code in Open Source Java projects were exact clones. These findings indicate that there is a high levels of code reuse and duplication within Open Source Java projects. Our findings suggest that an opportunity exists for developers to reduce their intra-project reuse, which could result in less maintainability issues. Furthermore, developers should also consider that code reuse is occurring between these projects, and as such, they should become acquainted with licensing requirements (refer to Section II). + + +RQ4. What is the extent of code reuse between Stack Overflow and the all-time most popular Open Source Java projects? + When we compared Stack Overflow Java code against the all-time most popular OSS projects on SourceForge we observed that 450 unique code fragments were evident in both datasets, and that these appear in 4,334 files in total. This evidence shows that approximately 2.3% of the files sampled contained at least one clone, and that there is one unique clone for every 54.5 project files. In fact, the proportion of project files containing clones was quite high, with approximately 77.2% containing clones when excluding the Stack Overflow files. + + +Considering our outcomes against those of previous work [26], where 7.2% of code reuse was found, we believe that code reuse is high in popular Open Source Java projects. Interestingly, the percentage of files containing clones is higher for the all-time most popular projects, when compared to the newer, top weekly projects. It is thus more likely that code copied from these projects could have originally came from a different source, hence, creating a nested code reuse situation. Furthermore, the developers of these systems may potentially benefit from reducing the amount of reused code, thus improving the maintainability of their projects. + + +RQ5. Are there differences in the nature of reuse found between the different contexts in terms of scale and size? + Our results show that there are differences in the sizes of clones found across our datasets. Our evidence shows that when reuse was done in Stack Overflow most of the snippets were copied. We also observed that current popular Java projects had a greater extent of reused elements from other projects. We believe that newer projects may be constructed more commonly from whole elements of other projects, i.e., the mean clone length is greater than that of the ‘Top All-Time’ group in Table V, possibly due to the availability of these elements, or perhaps developers are more willing to reuse in recent times. Similar outcomes were reported for Android Mobile Apps [25], which tends to dominate recent application development environments. Evidence here indicates that developers’ behaviors are potentially changing, as we are seeing them incorporate larger pieces of copied code into their work. As such, the effects, both negative and positive, resulting from copying code will be amplified for these projects. In situations where the copied code is well explained in the respective sections on websites such as Stack Overflow, it could lead to better quality software, since the functionality is well understood, tested, and documented by developers. However, if larger pieces of code are copied and pasted without having sufficient accompanying documentation (e.g., comments), then it is likely that the software in question will contain code which is not understood by developers, thus bringing into questions its functionality, reliability, debuggability, and overall quality. + + +Our results also show a great degree of code duplication between all-time popular OSS projects, and, in fact, the scale and size of reuse was generally higher between OSS projects. This evidence is understandable given that Stack Overflow is generally known for shorter code snippets aimed at answering specific questions. Code duplication +between projects was possibly driven by the use of common third party libraries, but could also be through intentional duplication of similar functionalities. The fact that Stack Overflow snippets were also copied suggests that reuse may be a part of practitioners’ culture. Thus, there are implications for making sure the correct license is used, and developers are aware of the strengths and weaknesses of the code that are copied. Furthermore, on the backdrop of the need for the community to develop high quality, maintainable and secured code, developers should carefully evaluate code that is reused. + + +Implications: + Our investigation has shown that code clones do exist across Java-based projects and Stack Overflow. Having clones or duplicates within a system is unavoidable, since many software elements often rely on the same functionalities. However, in cases where many code clones exist, it is possible that developers may experience negative side-effects. Firstly, it is important to understand that high levels of code cloning can have negative effects on software quality, in terms of inconsistencies in code. Studies have found that around half of the software projects investigated had clones which contained inconsistencies, i.e., clones are changed inconsistently, with many of these being unintentional [15, 37]. Furthermore, these works also found that between 3-23% of code clones represented a fault. Thus, it is important for developers to be aware of the levels of code clones that exist within their software. To this end, we believe that tracking clones could improve the overall quality of software. This notion of tracking clones, and thus, being more aware of them, have been shown to improve software debugging [38, 39]. Another implication of our findings relates to probable licensing violations. Copying code from other projects or websites such as Stack Overflow without adhering to licensing requirements may result in complicated legal issues, and thus developers should take caution when doing so. + + +VI. THREATS TO VALIDITY + + +Our analyses were conducted with CCFinderX, which uses a token-based approach to identify clones. This technique itself has some limitations, including a lower precision rate compared to some alternative techniques, primarily Abstract Syntax Tree (AST) techniques [5]. Additionally, CCFinderX had preset parameter settings for its analyses. These parameters were given specific values, which were used to filter all texts in order to identify candidate clones. As such, the detection of clones was based on code meeting the set requirements given by CCFinderX, possibly leading to some clones being missed by the software. This is particularly important when considering that we worked with Stack Overflow data, with which we had an average file token size of 48. Thus, we can assume that some smaller snippets from Stack Overflow reused in our Open Source projects were not detected, and thus, our results could be conservative. + + +In fact, our reliability checks show that many clones were of smaller sizes (refer to Table II). However, as code chunks get smaller, the ability to trace these back to their original source becomes challenging. Smaller code fragments may also be labelled as clones accidentally. That said, our contextual analyses performed for reliability evaluation ascertained that code was duplicated, and that there were no attributions. This evidence thus confirms the potential for future maintenance and quality issues, and possible licensing complications. + + +Additionally, we did not introduce a time element to determine the direction of reuse. As such, we cannot make conclusive statements regarding the temporal copying of code from Stack Overflow into OSS projects, and in terms of the direction of the copying (i.e., if code was copied from Stack Overflow to OSS projects, or OSS to Stack Overflow). Lastly, our sample of projects may not be representative of all software projects, and as such a large-scale study may produce more generalizable insights. The total number of projects on SourceForge containing Java code alone is over 40,000, and GitHub has over 3.5 million available Java-based repositories. Thus, a larger study may help validate the results obtained from this study. However, the initial study completed here reflects the findings from highly-used projects, making code reuse an important element to consider. + + +VII. CONCLUSION AND FUTURE RESEARCH + + +There is an imperative that the software engineering community develop and deliver high quality software. Improper code reuse as a practice may create barriers to the delivery of high-quality software however, and particularly in terms of software maintainability and confirming to legal requirements. With code reuse being a popular practice in the software engineering community, and Q&A forums such as Stack Overflow fueling this practice, it is pertinent to understand how this practice could affect future software maintenance and correct use of license to avoid legal issues. Towards this goal, we investigated the levels of code reuse within Stack Overflow, and between Stack Overflow and popular OSS projects. + + +Our findings have indicated that clones (reuse) do exist in all of the examined contexts (within Stack Overflow, between Stack Overflow and OSS, and between OSS), with numerous cases of code duplication detected in each setting. Outcomes in the work show that projects are all highly likely to contain code that has been copied from sources external to their own code. Additionally, our findings are similar to the research conducted on mobile apps and Python projects. As such, the levels of code reuse in these studies indicate that Java developers need to be made aware of licensing issues and the problems that could arise from ad-hoc copying. In particular, the quality assurance activities in software projects can be more comprehensive and could place greater emphasis on code reused from platforms such as Stack Overflow. This stands in agreement with [40], which discussed the benefits that code clone analysis can provide for software analysis. We also further believe that due to the increased amount of external code being integrated into projects, an even greater need exists for utilizing clone analysis software. If licensing knowledge and correct attribution is improved, then code fragments implemented from external sources will be less likely to cause licensing violations. + + +Our inter-project analyses showed that the top weekly Java projects had a greater average token size when compared to the all-time most popular Java projects. To further analyze this phenomenon, a time-based comparison of code reuse in OSS projects could be beneficial in +identifying the changes in reuse behavior over time. From our preliminary results it appears that newer projects have larger pieces of reused code, which could indicate that inter-project reuse of whole components is occurring. The work completed here can be replicated for a larger sample of projects in order to validate our results and assess the scale of reuse more generally. Additionally, research may look beyond the scope of OSS projects to contrast our findings with closed source projects. Our research may also be expanded to provide insights into the direction of migration of clones. An et al. [19] published results on code migration for Android mobile apps, and Inoue et al. [17] have developed a tool for tracking code in open source repositories, however dedicated work is required to investigate the direction of code migration from Stack Overflow (and other such portals) to OSS projects. + + +REFERENCES + + +[1] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: principles, techniques, and tools. Harlow, Essex: Pearson, 2014. + + +[2] Anon. Creative Commons License Deed. Available: https://creativecommons.org/licenses/by-sa/3.0/, Feb. 2018. + + +[3] Anon. Do I have to worry about copyright issues for code posted on Stack Overflow? Available: http://meta.stackexchange.com/questions/12527/do-i-haveto-worry-about-copyright-issues-for-code-posted-on-stack-overflow, Feb. 2018. + + +[4] A. Mathur, H. Choudhary, P. Vashist, W. Thies, and S. Thilagam, “An Empirical Study of License Violations in Open Source Projects,” presented at the 35th Annual IEEE Software Engineering Workshop. DOI:http://dx.doi.org/10.1109/sew.2012.24, 2012. + + +[5] C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Science of Computer Programming, vol. 74, pp. 470–495, 2009. + + +[6] J. Cordeiro, B. Antunes, and P. Gomes, “Context-based recommendation to support problem solving in sof. Development,” In Proceedings of 3rd Int.Workshop on RSSE. 2012. + + +[7] D. M. German, M. Di Penta, Y-G. Guenecheu, and G. Antoniol, “Code siblings: Technical and legal implications of copying code between applications,” In Proc. of 6th Working Conference on Mining Software Repositories (MSR), 2017. DOI:http://dx.doi.org/10.1109/msr.2017.13. + + +[8] D. Yang, P. Martins, V. Saini, and C. Lopes, “Stack Overflow in Github: Any Snippets There?” In Proc. of 14th International Conference on Mining Software Repositories (MSR), 2017. DOI:http://dx.doi.org/10.1109/msr.2017.13. + + +[9] E. Johansson, A. Wesslen, L. Bratthall, and M. Host. “The importance of quality requirements in software platform development-a survey,” In Proc. of 34th Annual Hawaii International Conference on System Sciences, 2001. + + +[10] Felix Fischer et al, “Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security,” IEEE Symposium on Security and Privacy (SP), 2017. + + +[11] F. Bi, “Nissan app developer busted for copying code from Stack Overflow,” May. 2016. Available: https://www.theverge.com/2016/5/11195308/dont-get-busted-copying-code-from-stack-overflow + + +[12] H. Sajjani, V. Saini, J. Svaženjko, C. K. Roy, and C. V. Lopes, “SourceRec:CC,” In Proc. of 58th International Conference on Software Engineering – ICDE 16, 2016. + + +[13] I. J. Mojica, B. Adams, M. Nagappan, S. Dienst, T. Berger, and A. Hassan, “A Large-Scale Empirical Study on Software Reuse in Mobile Apps,” IEEE Software, vol. 31, no. 2, pp. 78–86, 2014. DOI:http://dx.doi.org/10.1109/ms.2013.142 + + +[14] J. R. Cordy and C. K. Roy, “The NiCad Clone Detector,” Presented at the IEEE 19th International Conference on Program Comprehension, 2011. + + +[15] J. Krinke, “A Study of Consistent and Inconsistent Changes to Code Clones,” Presented at the 14th Working Conference on Reverse Engineering (WCRE), 2007. + + +[16] J. C. Knight and M. F. Dunn, “Software quality through domain-driven certification,” Ann. Softw. Eng., vol. 5, pp. 293–315, 1998. + + +[17] K. Inoue, Y. Sasaki, P. Xia, and Y. Manabe, “Where does this code come from and where does it go? — Integrated code history tracker for open source systems,” In Proc. of 34th International Conference on Software Engineering, 2012. + + +[18] L. Heinemann, F. Deissenboeck, M. Gleirscher, B. Hummel, and M. Irlbeck, “On the Extent and Nature of Software Reuse in Open Source Java Projects,” Lecture Notes in Computer Science Top Productivity through Software Reuse, pp. 207–222, 2011. + + +[19] L. An, O. Mlouki, F. Khomh, and G. Antoniol, “Stack Overflow: A code laundering platform?” In Proc. of IEEE 24th SANER, 2017. + + +[20] M. Sojer and J. Henkel, “Code Reuse in Open Source Software Development: Quantitative Evidence, Drivers, and Impediments,” Journal of the Association for Information Systems, vol. 11, pp. 868–901, 2010. + + +[21] M. Sojer and J. Henkel, “License risks from ad hoc reuse of code from the internet,” Communications of the ACM, vol. 54, pp. 74, 2011. + + +[22] M. Singh, A. Mittal, and S. Kumar, “Survey on Impact of Software Metrics on Software Quality,” International Journal of Advanced Computer Science and Applications, vol. 3, 2012. + + +[23] O. Mlouki, F. Khomh, and G. Antoniol, “On the Detection of Licenses Violations in the Android Ecosystem,” In Proc. of IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2016. + + +[24] P., L., A. Bacchelli, and M. Lanza, “Leveraging crowd knowledge for software comprehension and development,” CSMR, IEEE Computer Society, 2013, p. 57–66. + + +[25] R. Abdalkareem, E. Shihab, and J. Rilling, “On code reuse from StackOverflow: An exploratory study on Android apps,” Information and Software Technology, vol. 88, pp. 148–158, 2017. + + +[26] R. Koschke and S. Bazrafshan, “Software-Clone Rates in Open-Source Programs Written in C or C++,” In Proc. of IEEE 23rd SANER, 2016. + + +[27] S. Baltes, R. Kiefer, and S. Diehl, “Attribution Required: Stack Overflow Code Snippets in GitHub Projects,” In Proc. of IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 2017. + + +[28] S. Sawilowsky and G. Fahoome, Kruskal-Wallis Test: Basic, Wiley StatsRef: Statistics Reference Online, 2014. + + +[29] S. Haefliger, G. Von Krogh, and S. Speth, “Code Reuse in Open Source Software,” Management Science, vol. 54, pp. 180-193, 2008. + + +[30] S. H. Kan, Metrics and Models in Software Quality Engineering (2nd ed.), Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002. + + +[31] V. Suma and T.R. Gopalakrishnan nair, “Effective Defect Prevention Approach in Software process: Achieving Better Quality levels,” World Academy of Science, Engineering and Technology, vol. 42, pp. 258-262, 2008. + + +[32] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: a multilingual token-based code clone detection system for large scale source code,” IEEE TSE, vol. 28, pp. 654–670, 2002. + + +[33] T. Ægidius Mogensen, “Lexical Analysis,” Introduction to Compiler Design Undergraduate Topics in Computer Science, pp. 1–37, 2011. + + +[34] V. Bauer, J. Eckhardt, B. Hauptmann, and M. Klimek, “An exploratory study on reuse at google,” In Proc. of 1st International Workshop on Software Engineering Research and Industrial Practices - SER&IPs, 2014. + + +[35] W.b. Frakes and K. Kang, “Software reuse research: status and future,” IEEE Transactions on Software Engineering, vol. 31, pp. 529–536, 2005. DOI:http://dx.doi.org/10.1109/tse.2005.85 + + +[36] Y. Kashima, Y. Hayase, N. Yoshida, Y. Manabe, and K. Inoue, “An Investigation into the Impact of Software Licenses on Copy-and-paste Reuse among OSS Projects,” In Proc. of 18th Working Conference on Reverse Engineering, 2011. + + +[37] Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner, “Do code clones matter?” In Proc. of IEEE 31st International Conference on Software Engineering, 2009. + + +[38] Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: Finding copy-paste and related bugs in large-scale software code,” IEEE Trans. Softw. Eng, vol. 32, pp. 176–192, 2006. + + +[39] L. Jiang, Z. Su, and E. Chiu, “Context-based detection of clone-related bugs,” In Proc. ESEC/FSE, ACM, 2007. + + +[40] C. K. Roy, J. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Science of Computer Programming, vol. 74, no. 7, pp. 470-495, 2009. +---------------------------------------- +------------------------------- +Section 155: +Reuse and maintenance practices among divergent forks in three software ecosystems + + +John Businge\textsuperscript{1,2} · Moses Openja\textsuperscript{3} · Sarah Nadi\textsuperscript{4} · Thorsten Berger\textsuperscript{5,6} + + +Accepted: 25 October 2021 / Published online: 4 March 2022 +© The Author(s) 2022 + + +Abstract +With the rise of social coding platforms that rely on distributed version control systems, software reuse is also on the rise. Many software developers leverage this reuse by creating variants through forking, to account for different customer needs, markets, or environments. Forked variants then form a so-called software family; they share a common code base and are maintained in parallel by same or different developers. As such, software families can easily arise within software ecosystems, which are large collections of interdependent software components maintained by communities of collaborating contributors. However, little is known about the existence and characteristics of such families within ecosystems, especially about their maintenance practices. Improving our empirical understanding of such families will help build better tools for maintaining and evolving such families. We empirically explore maintenance practices in such fork-based software families within ecosystems of open-source software. Our focus is on three of the largest software ecosystems existence today: Android, .NET, and JavaScript. We identify and analyze software families that are maintained together and that exist both on the official distribution platform (Google play, nuget, and npm) as well as on GitHub, allowing us to analyze reuse practices in depth. We mine and identify 38 software families, 526 software families, and 8,837 software families from the ecosystems of Android, .NET, and JavaScript, to study their characteristics and code-propagation practices. We provide scripts for analyzing code integration within our families. Interestingly, our results show that there is little code integration across the studied software families from the three ecosystems. Our studied families also show that techniques of direct integration using git outside of GitHub is more commonly used than GitHub pull requests. Overall, we hope to raise awareness about the existence of software families within larger ecosystems of software, calling for further research and better tools support to effectively maintain and evolve them. + + +Keywords Clone-and-own · Change propagation · Variant synchronisation · Empirical study · Variant developers · Version control systems · Pull requests · Cherry-picking changes · Rebasing changes · Squashing changes · Software product lines · Variants + + +Communicated by: Federica Sarro + + +\textsuperscript{1} John Businge +johnxu21@gmail.com + + +Extended author information available on the last page of the article. +1 Introduction + + +The increased popularity of social-coding platforms such as GitHub made forking a powerful mechanism to easily clone software repositories for creating new software. A developer may fork a mainline repository into a new forked repository, often transforming governance over the latter to a new developer, while preserving the full revision history and establishing traceability information. While forking allows isolated development and independent evolution of repositories, the traceability allows comparing the revision histories, for instance, to determine whether one repository is ahead of the other (i.e., contains changes not yet integrated in the other). It also allows easier commit propagation across the repositories. + + +Many studies on forking exist, often focusing on the reasons and outcomes (Nyman et al. 2012; Robles and González-Barahona 2012; Viseur 2012; Nyman and Lindman 2013; Nyman and Mikkonen 2011; Zhou et al. 2018; Zhou et al. 2019; 2020) or on the community dynamics as influenced by forking (Gamalielsson and Lundell 2014). The community typically distinguishes between two kinds of forks (Zhou et al. 2020): social forks that are created for isolated development with the goal of contributing back to the mainline and divergent forks that are created for splitting off a new development branch, often to steer the development into another direction without intending to contribute back, while leveraging the mainline project that defines or adheres to some standards (Sung et al. 2020). Divergent forks are more relevant for supporting large-scale software reuse—the focus of this paper. + + +Studies on divergent forks usually rely on general heuristics to identify as many forks as possible, without systematically verifying that these are indeed divergent forks. Additionally, when studying code propagation techniques, existing studies do not consider the intricacies of git to identify the possible types of code propagation (e.g., offline git rebasing without using GitHub at all), but focus only on pull requests. To address the first challenge of identifying divergent forks, we use the insight that there are particular ecosystems that have a systematic way of publishing “members” of the ecosystem. For example, most Android apps are published on the Google Play store. Similarly, most Eclipse plug-ins are distributed on the Eclipse marketplace. The advantage of such ecosystems is that each member has a unique ID that identifies it. Thus, given an open-source GitHub repository and its fork, we can verify whether the fork is actually an independent version of the original mainline (which is a core criteria of a divergent fork) by checking that both the mainline and the fork are listed as separate entries in the corresponding distribution platform. To address the second challenge of considering the git intricacies, we design a technique that identifies the majority of code propagation techniques on Git and GitHub by leveraging all commit meta data. Inspired by the notion of software families (a.k.a., program families (Parnas 1976; Czarnecki 2005; Dubinsky et al. 2013; Apel et al. 2013; Krueger and Berger 2020b; Stanculescu et al. 2015; Berger et al. 2020)—portfolios of managed and similar software systems in an application domain—we use the term software family, or family for short, to refer to a mainline repository and its corresponding divergent forks. We refer to each family member as a variant. + + +We present a large-scale empirical study on reuse and maintenance practices via code propagation among software families in software ecosystems. We take the above considerations into account and study three large-scale ecosystems in different technological spaces: Android, JavaScript, and .NET. Android is one of the largest and most successful software ecosystem with substantial software reuse (Mojica et al. 2014; Li et al. 2016; Sattler et al. 2018; Berger et al. 2014). The JavaScript ecosystem distributes its packages +through npm, which is by far the largest package manager with over 1.82M package distributions.\footnote{As seen on Libraries.io by June 2021} The .NET ecosystem has a package management system, nuget, that is moderately large with over 261K packages.\footnote{As seen on Libraries.io by June 2021} As such, our three selected ecosystems vary in their nature (apps versus packages), their programming languages (Java, JavaScript, and C#), and their sizes (in terms of their distribution platforms). + + +Our study addresses two main research questions: + + +RQ1 + +What are the characteristics of software families in our ecosystems? + + +We investigate general characteristics of the families and their variants, including the number of variants per family and the divergence of application domains, developer ownership, and variant popularities within the families. We also determine the frequencies of variant maintenance, looking at releases numbers. This allows putting the studied maintenance and co-evolution practices into context. + + +RQ2 + +How are software families maintained and co-evolved in our ecosystems? + + +To determine management practices, we investigate how code is propagated between the mainline and its divergent forks in the family. For example, are pull requests used as the main propagation technique? Is code propagated only from the mainline to the forks, or is there propagation in the other direction, too? We study the code propagation mechanisms used as well as the kinds of changes being propagated. + + +To the best of our knowledge, our work is the first to provide a large-scale in-depth study of code-propagation practices in divergent forks. Understanding these code-propagation strategies exercised by developers can help in building better tool support for software customization and code reuse. We analyze pairs of mainline and fork open source projects whose package releases are available in package distribution platforms of the three ecosystems: Android comprising 38 software families, .NET comprising 526 software families, and JavaScript comprising 8,837 software families. + + +Our results show that the majority (82\%) of forks we study are owned by developers different than those of the within a family. Such distinction of ownership gives us confidence that we are studying real divergent forks. Interestingly though, we find little code propagation across all the mainline–fork pairs in the three ecosystems we studied. The most used code propagation technique is +git merge/rebase + that is used in 33\% of Android mainline-fork pairs, 11\% of JavaScript pairs, and 18\% of .NET pairs. We find that cherry picking is less frequently used, with only 9\%, 0.9\%, and 2.5\% of Android, JavaScript, and .NET pairs using it, respectively. Among the three pull request integration mechanisms we studied (merge, rebase, and squash), the most used pull request integration mechanism is the merge option in the direction of fork $\rightarrow$ mainline, where 2.4\%, 7\%, and 11\% of the pairs in Android, JavaScript, and .NET use this strategy. We find that integrating commits using squashed or rebased pull requests is rare in all three ecosystems. Overall, we find that when code propagation occurs, it seems that fork developers perform this propagation directly through +git + and outside of GitHub’s built-in pull request mechanism. This observation implies that simply relying on pull requests to understand code propagation practices in divergent forks is not enough. + + +In summary, this work makes the following contributions: + + + + + + +We propose leveraging the main distribution platforms of three ecosystems to precisely identify divergent forks. We devise a technique to identifying families in +these ecosystems by using data both from GitHub and the respective distribution platform. + + + + + + +In contrast to previous studies on code propagation strategies that either focused only on pull requests or on directly comparing commit IDs, we are the first to study code propagation while considering pull requests with the options of squash / rebase as well as git rebased and cherry-picked commits. + + + + +We analyze the prevalence of code propagation within software families as well as the types of propagation strategies used. + + +We synthesize implications of our results for code reuse tools. + + +We provide an online appendix (2020) containing our datasets, intermediate results, and the scripts to trace code propagation between any mainline-fork pair. + + + + +An earlier version of this work appeared as a conference paper (Businge et al. 2018). It focused on analyzing code propagation at the commit level within only the Android ecosystem. It also provided preliminary insights on the reasons why different app variants exist. This article extends the conference paper as follows. First, we extend our analysis with two more ecosystems of moderate to large scale. Second, we substantially improve our identification of code integration methods by not focusing solely on pull requests or direct comparison of commit IDs. Instead, we are the first to consider most types of code propagation techniques, including rebasing, squashing, and cherry-picking commits. Third, we contribute a toolchain for analyzing code propagation between any mainline–fork pair. (iv) We provide more discussion of the implications of our results. + + +Parts of RQ1 for the JavaScript ecosystem have been previously presented as a workshop paper (Businge et al. 2020). In this article, our additional contributions for RQ1 for the JavaScript ecosystem are the following. First, we refine the JavaScript dataset by ensuring that the mainline-fork pairs exist both on GitHub and the npm package manager. To this end, we eliminate a total of 2,456 mainline-fork pairs where either the mainline or fork were deleted from GitHub, but their package releases still existed on the npm package manager. Second, we provide a more detailed description of how the dataset was collected and provide the full refined dataset in the replication package. Third, we create an additional dataset of new families from the .NET ecosystem. Fourth, in addition to the new characteristic variant ownership as well as more illustrative graph comparisons, we discuss the characteristics of the mainline–fork pairs across all three ecosystems. +---------------------------------------- +------------------------------- +Section 156: +2 Background on Code Propagation Strategies + + +We now discuss the mechanisms offered by GitHub and similar social-coding platforms to propagate code among different repositories. We describe characteristics of these mechanisms and the kind of metadata they generate, which an automated identification technique can potentially rely on. + + +While a mainline and a forked repository are under no obligation to synchronize any changes, developers commonly propagate their code changes (e.g., new features or bug fixes) among repositories via commit integration (Jiang et al. 2017; Openja et al. 2020). For tracing such propagation, however, the metadata provided by GitHub is not always reliable. For instance, Kalliamvakou et al. (2014) and Kononenko et al. (2018) found a large number of pull requests appearing as not merged while they were actually merged. The authors find that it is not uncommon for destination repositories to resolve pull requests outside GitHub. +Table 1 Changes of commit metadata during code propagation for the different kinds of code propagation with GitHub or Git facilities + + +| Metadata changed | Pull Requests | Git Commands | +|------------------|---------------|--------------| +| | Merge | Squash | Rebase | Cherry-pick | Merge | Rebase | +| Commit ID | No | Yes | Yes | Yes | No | No | +| Author Name | No | Yes | No | No | No | No | +| Author Date | No | Yes | No | No | No | No | +| Committer Name | No | Yes/No | Yes/No | Yes/No | No | No | +| Committer Date | No | Yes | Yes | Yes | No | No | +| Commit Message | No | Yes | No | No | No | No | +| File details | No | No | No | No | No | No | + + +Yes: metadata change + +No: no change of metadata + + +This is why our work considers both commit integration through GitHub and commit integration directly using git, but outside GitHub. + + +In the following, we describe code propagation using GitHub and git facilities. Table 1 provides details on the relationship between commits across forked repositories based on the respective code propagation technique used. To collect the information in this table, we read the official references (Vandehey 2019)(^2)(^3) and online resources(^4) as well as created toy repositories to mimic the various integration scenarios in order to verify this information. We use these insights for creating our code propagation traceability technique described in Section 3.3. + + +2.1 Propagation with GitHub Facilities + + +A pull request has a head ref, which is the reference for the source repository and a branch a developer wants to pull commits from; we refer to it as the source branch. A pull request also has a base ref, which is the reference for the destination repository into which the pulled commits are integrated into; we refer to it as the destination branch for clarity. The source and destination branches may belong to the same repository or to different repositories. When studying code propagation in a software family, we are mainly interested in pull requests from one source repository in the family to another destination repository in the same family. + + +Once a pull request is submitted on GitHub, a developer can use its user interface to integrate the commits in the pull request into the destination branch using one of these three options: (i) merge the pull request commits, (ii) rebase the pull request commits, and (iii) squash the pull request commits. + + + + +Merge pull request commits + is the default. When the developer chooses this option, the commit history in the destination branch will be retained exactly as it is. As can be seen from Table 1, the metadata of the integrated commits from the source branch remain + + + + +(^2)https://www.atlassian.com/git/tutorials/merging-vs-rebasing + +(^3)https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-request-merges + +(^4)https://cloudfour.com/thinks/squashing-your-pull-requests/ +unchanged in the destination branch. However, a new merge commit will be created in the destination branch to “tie together” the histories of both branches (GitHub 2020). + + + + + + +Rebase and merge pull request commits +: When the integrator selects the Rebase and merge option on a pull request on GitHub, all commits from the source branch are replayed onto the destination branch and integrated without a merge commit. From Table 1, we can see that using this integration technique, the commit metadata between source and destination preserves the author name, author date, and commit message but alters the commit ID, committer name, and committer date. The committer name becomes the name of the developer from the destination repository who rebased and merged the pull request. Note that if the developer who submitted the pull request is coincidentally the same as the developer who integrates it (e.g., because the developer works on both repositories), then the committer name will remain the same (GitHub 2020). + + + + + + +Squash and merge pull request commits +: When the integrator selects the Squash and merge option on a pull request on GitHub, the pull request’s commits are squashed into a single commit. Instead of seeing all of a contributor’s commits from the source branch, the commits are squashed into one commit and included in the commit history of the destination branch. Apart from the file details, all other commit meta data changes. The committer name changes unless, similar to above, the original committer and the developer merging the pull request are the same (GitHub 2020). +---------------------------------------- +------------------------------- +Section 157: +2.2 Propagation with Git Facilities (Cherry Pick, Merge, and Rebase Commits) + + +A developer may also not rely on the GitHub user interface and instead choose to integrate commits from a source branch into a destination branch outside GitHub using one of the git integration commands. The integrator has to first locally fetch commits from the source branch (for example mainline) that contains the commits they wish to integrate into their branch. They then perform the integration locally using one of four options outlined below ((i) git merge, (ii) git rebase, (iii) git cherry-pick, and (iv) other Git commands that rewrite commit history) and afterwards, push the changes to their corresponding GitHub repository.5 + + + + + + +Git cherry-pick commits +: Cherry picking is the act of picking a commit from one branch and integrating it into another branch. Commit cherry picking can, for example, be useful if a mainline developer creates a commit to patch a pre-existing bug. If the fork developer cares only about this bug patch and not other changes in the mainline, then they can cherry pick this single commit and integrate it into their fork. As shown in Table 1, the author name, author date, commit message, and file details of the cherry picked commit remain the same in the destination branch. The commit ID, committer name, and committer date however do change. Note that the committer name may remain the same if the integrator is the same developer who performed the original commit in the source branch. + + + + + + +Git merge commits +: Like in the pull request merge, git merge also preserves all the commit metadata and creates an extraneous new merge commit in the destination branch that ties together the histories of both branches. + + + + + + +Git rebase commits +: Rebasing is an act of moving commits from their current location (following an older commit) to a new head (newest commit) of their branch (Chacon + + + + + + +5 +https://www.atlassian.com/git/tutorials/merging-vs-rebasing + +and Straub 2014b). Git rebase deviates slightly from rebasing pull requests on GitHub as it does not change the committer information. To better understand git rebase, let us explain it with an illustration based on the experiments we carried out. On the left-hand side of Fig. 1, we have a mainline repository and a fork repository where each repository made updates to the code through commits C3 and C4 in the mainline and commits F1 and F2 in the fork. The fork developer observes that the new updates in the mainline are interesting and decides to integrate them using rebasing. After rebasing, the commit history will look the right side of Fig. 1. Notice that the IDs and the order of the integrated commits C3 and C4 in the fork branch are unchanged. However, the IDs of commits F1 and F2 change to F1’ and F2’. In this case, Git rebase is like the fork developer saying “Hey, I know I started this branch last week, but other people made changes in the meantime. I don’t want to deal with their changes coming after mine and maybe conflicting, so can you pretend that I made [my changes] today?” (Vandehey 2019). + + + + +Other Git commands that rewrite commit history +: Git has a number of other tools that rewrite commit history, including changing commit messages, commit order, or splitting commits (Chacon and Straub 2014a). These commands include: +git commit --amend +, +git rebase -i HEAD~N +, and +git --squash +, etc. Most of these commands significantly change the history and the meta data of commits. If the integrator uses any of these commands in the destination repository, then there is no straightforward way to match the integrated commits across the two repositories (Chacon and Straub 2014a). +---------------------------------------- +------------------------------- +Section 158: +3 Methodology + + +Our goal is to improve the empirical understanding of maintenance practices, specifically code propagation in software families. We identify and analyze software families by using data from both GitHub and the distribution platforms of the three ecosystems. +---------------------------------------- +------------------------------- +Section 159: +3.1 Identifying Software Families + + +Given the different nature of our studied ecosystems in terms of what information each distribution platform stores and how this information is accessed, we employ different techniques to identify Android families versus JavaScript and .NET families. Figure 2 shows an overview of this process. We extract families in the Android ecosystem from... +GitHub and Google Play while the families in .NET and JavaScript are extracted from Libraries.io.6 + + +3.1.1 Identifying Android Families + + +We are interested in identifying families of real Android apps that are evidently used by end users. Taking all GitHub repositories with Android apps into account would also include toy apps or course assignments. To this end, we identify source repositories of apps that also exist on Google Play. We mainly match GitHub repositories and Google Play apps via their unique identifier—the package name contained in the app manifest file (AndroidManifest.xml). Such manifest files also declare the app’s components, necessary permissions, and required hardware and Android version. As such, each Android app in a software family must have a unique package name, which excludes any forked repositories where the package name was not modified. More specifically, we identify Android families using a relatively conservative filtering approach as follows. + + + + + + +Using GitHub’s REST API v3, we identify 79,338 mainline repositories matching the following criteria: (1) is not a fork; (2) the repository contains the word “Android” in the name/description/readme; (3) has been forked at least twice; (4) was created before 01/07/2019 (we mined on 14/12/2019, so we used the date 01/07/2019 to obtain repositories that have some history) (5) has an AndroidManifest.xml file; (6) has a description or readme.md file; and (7) has a number of forks $\geq 2$ to reduce the chance of finding student assignments (Munaiah et al. 2017). + + + + + + +To ensure that we are collecting real-world apps, we check if the identified mainline repositories exist on Google Play. From each repository’s AndroidManifest.xml file, we extract the app’s package name and check its existence on Google Play. In total, + + + + + + +6https://libraries.io/ +we find 7,423 mainline repositories representing an actual Google Play app (Businge et al. 2017). + + + + + + +We filter out duplicate mainline repositories containing +AndroidManifest.xml + files with the same package name. Such duplicates easily arise when an app’s source code is copied without forking. Since package names are unique on Google Play, only one of these duplicate repositories can actually correspond to the Google Play app. We manually select one repository from these duplicates by considering repository popularity (number of forks and stars on GitHub), repository and app descriptions on both GitHub and Google Play, as well as the developer name on GitHub and Google Play. In some cases, the Google Play app description conveniently linked to the GitHub repository. As a result of this step, we discard 1,232 repositories and are left with 6,191 mainline repositories. + + + + + + +To ensure that we study repositories with enough development history, we filter out mainlines with fewer than six commits in their lifetime, according to the median number of commits in GitHub projects found by prior work (Kalliamvakou et al. 2014). This leaves us with 4,337 mainline repositories. + + + + + + +We filter out mainline repositories without any active forks, which have no commit after the forking date and were probably abandoned. This leaves us with 1,166 mainline repositories, which have a total of 12,025 active forks altogether. + + + + + + +We remove forks that have the same package name as their mainline. If no forks remain for a given mainline, we also remove this mainline. For the forks with different package names than their corresponding mainline, we check the existence of the fork’s package name on Google Play in order to ensure that the fork is also a real (and different) Android app. This leaves us with 69 app families comprising of 95 forks. + + + + + + +Finally, by manual inspection, we filter out forked repositories whose app package name points to a Google Play app that is not the correct app. This analysis is based on the observation that, sometimes, fork developers copy code including the +AndroidManifest.xml + from another app without changing the package name. This practice results in the forked app’s package name pointing to an app that exists on Google Play, but that is not the one hosted in the GitHub repository. We inspect the +Readme.md + and unique commit messages in the GitHub repository and the respective Google Play description page. Eliminating all mismatched apps leaves a total of 38 app families comprising of 54 forked apps—our final dataset to answer the research questions. + + + + + + +3.1.2 Identifying JavaScript and .NET Families + + +A family in the JavaScript and .NET ecosystems comprises packages of libraries of applications written in the respective language. Similar to the Android ecosystem, we only consider packages that exist as source-code repositories on GitHub and on the ecosystem’s main distribution channels: +npm + and +nuget +. The metadata of a package release on the package managers of +npm + or +nuget + is similar. On both package managers, a package’s metadata include: source repository of the package (GitHub, GitLab, BitBucket), number of dependent projects/packages, number of dependencies, number of package releases, and the package contributors. Fortunately, most of the data of 37 package managers for different ecosystems can be found on one central location +Libraries.io +, which is a platform that periodically collects all data from different package managers. In addition to the metadata +for a specific package on a given package manager, Libraries.io also extends the package metadata with more information from GitHub. For example, it stores a Forkboolean field, which indicates whether the corresponding repository of a package is a fork. Such a field Forkboolean can help us identify forked repositories that have published their packages. Note that this is different from the Android ecosystem where such explicit traceability does not exist, which is why we first mine repositories from GitHub and then filter out those that are published on Google Play. In contrast, with .NET and JavaScript, we mine the families directly from Libraries.io. We extract the families from the latest Libraries.io data dump release 1.6.0 that was released on January 12, 2020. The meta-model for the data on the Libraries.io data dump can be found online.7 We extract .NET and JavaScript families from Libraries.io with the following steps: + + + + +Using the package’s field Platform, we filter out the packages that are distributed on nuget and npm package managers. + + +Next, we use the field Forkboolean to identify repositories that are forks, and use the field Fork Source Name with Owner to identify the fork repository name as well as the parent repository (mainline). We extract all fork repositories that map to published packages on nuget and npm. + + +Next, we merge the sets of packages from Step 1 and Step 2 to identify only packages that make a mainline-fork pairs (i.e., where the fork repository and its corresponding mainline in the set in Step 2 have their packages present in the set in Step 1. Using the GitHub API, we then verify that indeed the mainline parent of the divergent fork and they are still existing on GitHub so as to eliminate wrong pairs and (e.g., those that have been deleted from GitHub). From the .NET ecosystem, we identify a total of 526 software families having a total of 590 mainline–fork pairs. From the JavaScript ecosystem, we identify a total of 8,837 software families having a total of 10,357 mainline–fork pairs. Similar to Android families, a family in .NET and JavaScript contains at least one mainline and one or more variant forks. + + + + +3.2 Identifying Family Characteristics (RQ1) + + +We now describe how we identify characteristics of the identified families and their variants (i.e., mainlines and forks) for our three ecosystems. + + +We define and calculate various metrics as follows. Note that, given the different nature of these ecosystems and the type of information available for each, some metrics are specific to only some of the ecosystems. For example, FamilySize is a metric we can calculate for all variants in all the three ecosystems. On the other hand, given the difference in nature of Android variants and JavaScript/.NET packages, we need to calculate variant popularity differently across the ecosystems (downloads and reviews versus dependents and dependencies). + + +In the following, we discuss the goal of each metric and how we calculate it. Overall, we look at metrics that fall into general characteristics of variants, variant maintenance activity, variant ownership, and variant popularity. For repositories in the Android ecosystem, we extract the metrics from GitHub and Google Play store. For repositories in the .NET and JavaScript ecosystems, we extract the metrics from GitHub and Libraries.io. + + +Table 3 in Section 4 summarizes all metrics (and provides their values). + + +7https://libraries.io/data +3.2.1 General Characteristics + + +Family Size + We record the number of variants (metric +FamilySize + in Table 3) for all families in the three ecosystems. Note that a family with +FamilySize + = 2 has one mainline and one fork while a family with +FamilySize + = 3 has one mainline and two forks. + + +Variant Package Dependencies + ecosystems provide a huge bazaar of software that can be reused through explicit package dependencies (Decan et al. 2019). Since a divergent fork inherits functionality from the mainline and may also continuously synchronize with the mainline to acquire new changes, one would expect that the number of package dependencies for a mainline and fork would be the same. However, it would be interesting to see cases where they are not the same. In this context, for example, if the fork has more dependencies, it could mean that fork is implementing new features that are not in the mainline. We extract the number of dependencies from Libraries.io. For Android, we extracted the dependencies from the apps Gradle files on GitHub. + + +Android variant categories + Using the variant’s metadata available on Google Play, we also determine its variant category (e.g., Business, Finance, Productivity) and extract its description. We also record whether the variants are listed under the same category on Google Play, which helps us understand the nature of the variants in a family. + + +3.2.2 Identifying Maintenance Activities (JavaScript & .NET only) + + +A repository with many releases shows that it is being actively maintained since each release indicates either bug fixes or new features being introduced. To this end, we are interested in seeing the relationship between the mainline and the fork in terms of the number of package releases on the package distribution platforms. We collect the number of package releases for variants in the .NET and JavaScript ecosystems from Libraries.io. The metrics related to variant maintenance activity are +PackageReleasesMLV + for the mainline variants and +PackageReleasesFV + for the fork variants. Unfortunately the package manager for variants in Android ecosystem (Google Play store) does not keep history for the applications, and therefore we cannot extract variant releases from there. An alternative to collect the variant releases in the Android ecosystem is to collect them from the repositories themselves using the GitHub API. Unfortunately, we found that using the GitHub API to collect the list of releases of a repository returns zeros for most of the repositories even when a repository has releases. For example, we can see that the Android divergent fork imaeses / k-9(^8) has releases. However, when we access the fork using the GitHub API for a list of releases(^9), we can see that it returns an empty list. To this end, we decided not to collect package releases for the variants in the Android ecosystem. + + +3.2.3 Identifying Variant Ownership Characteristics + + +We would like to identify whether the mainline and fork variant have common owners. This is interesting to study since we determine if whether variant fork are started by the owners of the mainlines or if they are started different developers not in the mainline. We define + + + + +(^8)https://github.com/imaeses/k-9/releases + + +(^9)https://api.github.com/repos/imaeses/k-9/releases +the owner of a repository as a contributor who has access rights of integrating changes into the repository (i.e., a repository committer). As we explained in Section 2, based on the different kinds of commit integration techniques, it might be difficult to identify the original repository of a given commit (especially in cases where a mainline has many forks). To this end, we identify a repository committer (owner) as one who has merged at least one pull request, since we are certain that only contributors who have access rights to a repository can integrate changes. We consider that the mainline and a fork variant have common owners if there exists at least one common owner between them. With this criteria, both the mainline and fork variant should have at least one same developer (not a bot) who merged a pull request in both repositories. This means that our ownership criteria relies on each variant merging at least one pull request. Since we have very few variant pairs in the Android ecosystem, this would reduce further the very small dataset of variant pairs. To this end, we apply the described method only on the variants of .NET and JavaScript ecosystems, which have moderately large to very large dataset of variant pairs and use a different criteria to identify the owners of Android variants that we explain later. Since all the variants are published in Google Play, then each variant has an owner. We identify only 89 of the 590 mainline–fork pairs in the .NET ecosystem where both the mainline and fork variant had any merged PR by a real developer. For the JavaScript ecosystem we identify only 89 of the 10,357 mainline–fork pairs where both the mainline and fork variant had any merged PR by a real developer. + + +For the variant pairs in the Android ecosystem, we employ another method to identify ownership that covers all the dataset. We mine ownership from Google Play store. On Google play store, each variant has an attribute developer id or dev id, which is the name of the developer/company (owner) that uploads the variant on its updates on the marketplace. + + +3.2.4 Identifying Variant Popularity + + +We want to understand the popularity of the variants we are studying in terms of whether they are widely used in their respective ecosystems. We extract the popularity metrics from the distribution platform of each of our studied ecosystems. We use a different popularity measure for variants in the Android ecosystem than those from .NET and JavaScript. + + + + + + +Android variants +: For the variants in the Android ecosystem, we define two popularity metrics for the number of downloads on Google play, DownloadsMLV and DownloadsFV for the mainline and divergent fork respectively. We also define two popularity metrics for the number of reviews on Google play ReviewsMLV and ReviewsFV for the mainline and divergent fork, respectively. + + + + + + +JavaScript and .NET variants +: For variants in these two ecosystems, we record the number of other packages on the JavaScript and .NET that depend on the mainline and the fork variants (DependentPackagesMLV and DependedntPackagesFV respectively). We also record the number of other projects on GitHub that depend on the mainline and variant (DependentProjectsMLV and DependentProjectsFV respectively). All the variant’s dependent packages / projects are extracted from Libraries.io. The package and project dependents are a good way of measuring popularity since they give an indication of what other packages / projects are interested in the functionality provided by the variant. +3.3 Identifying Code Propagation (RQ2) + + + + + + +Answering RQ2 requires determining whether and how any code was propagated among the variants of a software family. To identify code propagation, we rely on categorizing commits in the history of the mainline and the forks based on the possible types of code propagation we discussed in Section 2. + + +Figure 3 illustrates the relationship between variants in the same family. Specifically, we demonstrate the relationship between the commits in the mainline variant of a family and any of its divergent forks. We identify two broad categories of commits: (1) common commits are those that exist in both the mainline variant and the forked variant and represent either the starting commits that existed before the forking date or propagated commits and (2) unique commits that exist only in one variant. For each (mainline variant, fork variant) pair in a family, we first identify common commits and then identify unique commits, as follows. + + +3.3.1 Identifying Common Commits + + +To ensure we correctly categorize commits, we perform the following steps in this exact order. Once a commit is categorized in one step, we do not need to analyze it again in the following steps. We consider only the default repository branch master/main branch for both the mainline and forks. + + + + +Inherited commits: + The fork date is the point in time at which the fork variant is created. At that point, all commits in the fork are the same as those in the mainline, and + + + + + +we refer to them as +InheritedCommits +. In Fig. 3, the +InheritedCommits + are the purple commits 1, 2, and 3. To extract these commits for either variants, we collect all the commits since the first commit in the history until the fork date. + + +Pull-Request commits: + We first collect the merged pull requests in each repository and identify the pull requests whose source and destination branches belong to the analyzed repository pair. The GitHub API +:owner/:repo/pulls/:pull_number + provides all the information of a given pull request. One can identify the source and destination branches using the pull request objects +['head']['repo']['full_name'] + and +['base']['repo']['full_name'] + from the returned json response, respectively. Based on the source and destination information, we can always identify the direction of the pull request as +fork → mainline + or +mainline → fork +, as shown in Fig. 3. For each pull request, we collect the pull request commits +pr_commits + using the GitHub API +:owner/:repo/pulls/:pull_number/commits +. Regardless of how a pull request gets integrated, the commit information in the source repository is always identical to that in +pr_commits +. Thus, we can always identify the pull request commits in the source repository by comparing the IDs of the commits in +pr_commits + to those in the history of the source repository. The tricky part is identifying the integrated commits in the destination repository. Based on the information discussed in Section 2 and summarized in Table 1, we can identify the pull request commits in the destination repository as follows: + + + + + + +Merged pull request commits: + Based on Table 1, the commit IDs of pull request commits integrated using the default merge option do not change. Thus, to identify these commits, we simply compare the IDs of the +pr_commits + to those in the commit history of the destination repository. + + + + + + +Rebased pull request commits: + Recall from Table 1 that integrated commits from a rebased pull request have different commit IDs on the destination branch. Thus, we identify the rebased commits in the destination branch by comparing the remaining unchanged commit metadata, such as author name, author date, commit message, and file details. + + + + + + +Squashed pull request commits: + As part of a squashed pull request’s metadata, GitHub records the ID of the squashed commit on the destination branch in the +merge_commit_sha + attribute.(^\text{10}) Using this ID, we can identify the exact squashed commit in the destination repository. For extra verification, we also compare the changed files of all commits in the pull request with the changed files in the identified squashed commit. + + + + + + +Git merged commits: + After identifying all commits related to pull requests, we now analyze any remaining unmatched commits to identify if they might have been propagated directly through Git commands. Recall from Section 2 that this includes merged, rebased, and cherry-picked commits. + + + + +Git cherry-picked commits: + We locate cherry-picked commits in the source and destination commit histories by comparing the following commit metadata: commit ID, author name, author date, commit message and filenames and file changes. We can also identify the source and the destination branches of the cherry picked commits by looking at the com- + + + + +(^{10})https://developer.github.com/v3/pulls/ +mitter dates of the matched commits. We mark the commit with the earlier committer date to be from the source branch and that with the later date to be in the destination branch. + + + + +Git merged and Git rebased commits: + At this point, we have already identified all integrated pull request commits as well as cherry picked commits. Thus, any remaining commits that have the same ID in the histories of both variants must have been propagated through git merge or git rebase. As shown in Table 1 and Fig. 1, any commits integrated through git rebase have exactly the same ID and meta data in both the source and destination branch. Similarly, commits integrated through git merge also have the same exact information. While we can differentiate git-merged and git-rebased commits by finding merge commits (those with two parents) and marking any commits between the merge commit and the common ancestor as commits that are integrated through git merge, this differentiation is not important for our purposes. We are only interested in marking both types of commits as propagated commits. Thus, for our purposes, we can identify commits integrated via Git rebase or Git merge, but do not differentiate between them. Similar to pull requests, both types of commits may be pulled from any of the branches to the other. However, unlike pull requests, it is not possible to identify which variant the propagated commit originated from. This is because of the nature of distributed version-control systems where commits can be in multiple repositories, but there is no central record identifying the commits’ origin. Since it is common for commits to be pulled from the mainline and pushed into the fork repository as a result of the fork trying to keep in sync with the new changes in the mainline, we make an assumption that all commits that we identify as integrated through git merge or git rebase are pulled from the mainline variant and pushed into the fork variant. +---------------------------------------- +------------------------------- +Section 160: +3.3.2 Identifying Unique Commits + + +To identify the unique commits between the mainline and fork we use the +compare + GitHub API(^\text{11}). The +compare + GitHub API compares between the mainline branch and fork branch, as one of the items, return the diverged commits that comprise the number of commits a given branch (say mainline branch) is ahead of the other branch (fork branch) as well the number of commits the branch is behind the other. The commits that the mainline branch is ahead of the fork branch are the unique commits to the mainline, while the commits the mainline is behind the fork are the unique commits to the fork. +---------------------------------------- +------------------------------- +Section 161: +3.3.3 Verifying our Commit Categorization Methods + + +We verify our methods of identifying common commits for the different commit propagation techniques discussed in Section 3.3.1 in two phases: we first test our scripts on six toy projects we created ourselves, where we intentionally include at least one example of each commit propagation technique and verify that the commits are correctly categorized. Second, we manually analyze some of the results of our scripts from a sample of six real mainline–fork pairs that are part of our data collection from each ecosystem, and which we + + +(^{11}\text{https://docs.github.com/en/rest/reference/repos#compare-two-commits}) +provide all details for in our online appendix. From earlier version of this work in the conference paper (Businge et al. 2018), we noticed integrated pull requests between mainline and the variant forks were very rare. To this end, when testing our scripts, in addition to the variant forks which have very a limited number of integrated commits, we also use social forks that have lots of integrated commits with their mainline counterparts. In this section, we will discuss only the following 3 pairs, which we show in Table 2: + + + + +(dashevo / dash-wallet, sambarboza / dash-wallet): The repository sambarboza / dash-wallet is a social fork. The mainline dashevo / dash-wallet has a total 445 PRs. Our scripts identifies that 74 of these 445 pull requests were integrated from the fork repository sambarboza / dash-wallet into the mainline repository dashevo / dash-wallet. We show the details of these 74 PRs in Table 2. Our technique identified that 3 of the 74 PRs were integrated using the PR merge option (all together having a total of 13 commits). There were 43 of the 74 PRs that were integrated using PR squash option (having a total of 194 commits), 2 of the 74 + + + + +| Technique | # PRs | # Commits | +|-----------|-------|-----------| +| Android | | | +| dashevo / dash-wallet (D), sambarboza / dash-wallet (S) | PR Merged | 3 | 13 | +| | Squashed | 43 | 194 | +| | Rebased | 2 | 6 | +| | Unclassified | 26 | 167 | +| Git | Merge/rebase | 405 | | +| | Cherry-pick | 0 | | +| | Total | 74 | 785 | +| .NET | | | +| flagbug / YoutubeExtractor (D), Kimmax / SYMMExtractor (S) | PR Merged | 2 | 2 | +| | Squashed | 0 | 0 | +| | Rebased | 0 | 0 | +| | Unclassified | 0 | 0 | +| Git | Merge/rebase | 3 | | +| | Cherry-pick | 1 | | +| | Total | 2 | 6 | +| JavaScript | | | +| TerriaJS / terriajs (S), bioretics / rer3d-terriajs (D) | PR Merged | 9 | 101 | +| | Squashed | 0 | 0 | +| | Rebased | 0 | 0 | +| | Unclassified | 0 | 0 | +| Git | Merge/rebase | 1,825 | | +| | Cherry-pick | 10 | | +| | Total | 9 | 1,936 | + + +The first two mainline–fork pairs in the table we have S = source (fork) and D = destination (mainline). The last mainline–fork pair we have S = source (mainline) and D = destination (fork). +PRs used the PR rebase option having a total of 6 commits, and the integration option of the 26 PRs was unclassified (having a total of 167). We identified a total of 405 commits that were integrated using the +git + merge/rebase integration option and no commit was integrated using +git + cherry-pick option. + + + + + + +(flagbug / YoutubeExtractor, Kimmax / SYMMExtractor): + The repository Kimmax / SYMMExtractor is a +variant fork +. The mainline flagbug / YoutubeExtractor has a total of 32 pull requests. Our scripts identifies that 2 of the 32 PRs were integrated from the fork repository Kimmax / SYMMExtractor into the mainline repository (lagbug / YoutubeExtractor (see details in Table 2). The two PRs were integrated using the merge PR option having a total of two commits that were integrated. We also identified a total of three commits that were integrated using the +git + merge/rebase integration option and 1 commit was integrated using +git + cherry-pick option. + + + + + + +(TerriaJS / terriajs, bioretics / rer3d-terriajs): + The repository bioretics / rer3d-terriajs is a +variant fork +. The fork bioretics / rer3d-terriajs has a total of 10 pull requests. Our scripts identifies that 9 of the 10 pull requests were integrated from the mainline TerriaJS / terriajs into the fork bioretics / rer3d-terriajs. The 9 PRs had a total of 101 commits. There were no commits integrated using the PR squash and PR rebase options. A total of 1,825 were integrated using the option +git + merge/rebase integration option and only 10 commits integrated using +git + cherry-pick option. + + + + + + +Given the above results of our scripts, we select some of the identified code propagation techniques and manually verify them. For each analyzed mainline–fork pair, we randomly sample a pull request from each identified pull request integration technique that were returned by our scripts. We manually analyze those sampled pull requests and their commits, including the commit metadata to verify the correctness of the identified propagation technique. For each of these sampled pull requests, we also randomly select two commits and manually analyze them to make sure they have been correctly classified. For example, in the pair [getodk / collect (D), lognaturel / collect (S)] (lognaturel / collect is a social fork), our script reveals that the commits in the pull requests numbered 3531, 3462 and 3434 were integrated using merging, squashing and rebasing, respectively. We manually verify that these pull requests have been in fact integrated using these techniques by looking at their commit metadata. Similarly, for the pair [dashevo / dash-wallet (D), sambarboza / dash-wallet (S)] (sambarboza / dash-wallet is a social fork), we verify that the commits in the pull requests number 421, 333, and 114 were integrated using merging, squashing, and rebasing, respectively. We also look at the results returned by integration outside GitHub ( +git + merge/rebase and +git + cherry-pick). For example, our results indicate that the pair [FredJul/Flym (D), Etuldan/spaRSS (S)] (Etuldan/spaRSS is a variant fork), has no commits integrated using pull requests but had 34 and five commits integrated using +git + merge/rebase and +git + cherry-picking, respectively. We manually verify these five latter commits and confirm their correctness. + + +As the pair dashevo / dash-wallet, sambarboza / dash-wallet from Table 2 shows, there were some pull requests that our scripts were not able to classify. As part of our manual verification, we find that the GitHub API indicates that they are integrated into the destination repository since their +merge + date is not +null +. On deeper investigation, we discover that all the unclassified pull request commits were integrated. +into a different branch from the master branch. For example, pull requests 514 and 512 from the fork sambarboza/dash-wallet were both integrated in the branch evonet-develop on the mainline repository. We also observed that both pull requests had an integration build test failure (Travis CI). This explains why the commits are missing in the history of the master branch and why our scripts could not classify those integrated commits. + + +One would wonder if we have a threat to construct validity since we do not consider the commit integration into other branches other than the default (main/master). For example, the scenario we presented above of unclassified pull requests that were integrated in the development branch (“staging”), but that were missing in the main branch since they failed the integration build test. If any of the 167 are integrated from the staging branch into the master branch using any of the integration techniques that do not completely rewrite the commit history (i.e., PR merge/squash/rebase, git merge/rebase/cherry-pick), then our script would always identify them as commits that were integrated between the mainline and the fork using the git merge/rebase option. As such, our script minimizes the threat to validity of the unclassified pull requests. + + +Our manually verified data for both the toy projects and the real projects gives us confidence that our scripts can correctly identify the commits integrated through different integration mechanisms in any mainline–fork pair of any repository. + + +3.3.4 Fork Variability Percentage + + +To quantify how much a fork differs from its mainline, we define a metric variability percentage as follows: + + +[ +\text{VariabilityPercentage} = \frac{\text{uniqueFV}}{\text{uniqueFV} + \text{CommonCommits}} \times 100 +] + + +where (\text{CommonCommits} = \text{Pull Request commits} + \text{Git commits} + \text{InheritedCommits}) as shown in Fig. 3. (\text{VariabilityPercentage}) measures the percentage of unique commits in a fork, when compared to all the commits in that fork. A lower percentage means that most of the changes in the fork are either starting commits (i.e., the fork did not make many changes after the fork date) or merged commits that are propagated from/to the mainline. Both these cases indicate that the functionality in the fork is not much different/variable from that in the mainline. On the other hand, a higher (\text{VariabilityPercentage}) indicates more specific customizations in the fork. +---------------------------------------- +------------------------------- +Section 162: +4 Variant Family Characteristics (RQ1) + + +We now present the characteristics of our identified software families within the ecosystems. Table 3 shows all the metrics we defined with values. + + +4.1 General Variant Characteristics + + + + +Variant Family FamilySize. + Figure 4 shows the number of variants (i.e., family size) in each of the variant families of the three ecosystems we studied. + + + + +We can see that the distributions of family sizes for all three ecosystems are right-skewed with most families having two members. Specifically, 28 (73%) of 38 software families, 7,731 (87%) of 8,837 software families, and 475 (90%) of 526 software families have only two variants. The three distributions also show that larger families are +| Metric | Mean | Min | Median | Max | Description | +|--------------------------------|------|-----|--------|------|-------------------------------------------------| +| +FamilySize + | | | | | | +| Android apps | 2.4 | 2 | 2 | 7 | Number of variants in an Android family | +| .NET apps | 2.1 | 2 | 2 | 7 | Number of variants in a .NET family | +| JavaScript apps | 2.2 | 2 | 2 | 16 | Number of variants in a JavaScript family | +| +App Dependencies (.NET & JavaScript) + | | | | | | +| PackageDependenciesMLV | 40.4 | 0 | 26 | 140 | Number of mainline variant packages dependencies on Android | +| | 2.3 | 0 | 1 | 49 | Number of mainline variant packages dependencies on .NET | +| | 11.8 | 0 | 7 | 267 | Number of mainline variant packages dependencies on JavaScript | +| PackageDependenciesFV | 22 | 0 | 22 | 81 | Number of fork variant packages dependencies on Android | +| | 2.0 | 0 | 1 | 25 | Number of fork variant packages dependencies on .NET | +| | 9.8 | 0 | 6 | 605 | Number of fork variant packages dependencies on JavaScript | +| +App Popularity (Android) + | | | | | | +| DownloadsMLV | 2,211K | 1 | 50K | 100M | Number of downloads of the mainline variant from Google Play | +| DownloadsFV | 5,479K | 5 | 1K | 100K | Number of downloads of the fork variant from Google Play | +| ReviewsMLV | 27K | 0 | 547 | 631K | Number of reviews of the mainline variant on Google Play | +| ReviewsFV | 2.8K | 0 | 45 | 161K | Number of reviews of the fork variant on Google Play | +| +App Popularity (.NET & JavaScript) + | | | | | | +| DependentPackagesMLV | 106 | 0 | 0 | 27K | Number of packages that depend on the mainline app on .NET | +| | 80 | 0 | 2 | 26K | Number of packages that depend on the mainline app on JavaScript | +| DependedntPackagesFV | 0.4 | 0 | 0 | 19 | Number of .NET packages that depend on the fork app on .NET | +| | 1.7 | 0 | 0 | 2K | Number of JavaScript packages that depend on the fork app on JavaScript | +| DependentProjectsMLV | 133 | 0 | 0 | 33K | Number of .NET projects that depend on the mainline app on GitHub | +| | 140 | 0 | 0 | 83K | Number of JavaScript projects that depend on the mainline app on GitHub | +| DependentProjectsFV | 0.5 | 0 | 0 | 82 | Number of .NET projects that depend on the fork app on GitHub | +| | 2 | 0 | 0 | 5K | Number of JavaScript projects that depend on the fork app on GitHub | +Table 3 (continued) + + +| Metric | Mean | Min | Median | Max | Description | +|-------------------------------|------|-----|--------|------|-------------------------------------------------| +| App Maintenance (.NET & JavaScript) | | | | | | +| PackageReleasesMLV | 14.6 | 1 | 2 | 188 | Number of mainline variant packages dependencies on .NET | +| | 15 | 1 | 8 | 1117 | Number of mainline variant packages dependencies on JavaScript | +| PackageReleasesFV | 3.6 | 1 | 2 | 54 | Number of fork variant packages dependencies on .NET | +| | 4 | 1 | 2 | 341 | Number of fork variant packages dependencies on JavaScript | + + +MLV mainline variant +FV forked variant + + +Fig. 4 Distribution of family sizes (number of variants in a family) of the three ecosystems. A variant family contains one mainline variant and at least one or more fork variants. The presented data corresponds to 38 software families, 8,837 software families, and 526 software families. Note that y-axes of Figs. 4b and c are presented on logarithmic scales. The axes of figures are also presented on different scales for visibility purposes. +rather seldom in all three ecosystems, but that the largest family sizes we observe are part of the JavaScript ecosystem. When identifying variant families from the different ecosystems, we observe that although Android is considered one of the largest known ecosystems (Mojica et al. 2014; Li et al. 2016; Sattler et al. 2018), identifying its variant families is rather difficult compared to the software packaging ecosystems (JavaScript and .NET) we studied. In the Android ecosystem is not compulsory to record any source repository of an Android variant on Google Play. To this end, we went through the lengthy process described in Section 3.1.1, applying a number of heuristics on GitHub repositories to identify families. + + + + +Variant Package Dependencies +: In Fig. 5, we present two scatter plots showing the graph of mainline dependencies versus the fork dependencies. Figures 5a to c show the scatter plots of the number fork variant package dependencies (y-axis) versus the number of mainline variant package dependencies (x-axis) for Android, .NET and JavaScript variants, respectively. A point in any of the scatter plots represents the number of package dependencies of a given fork variant (y-axis) and the number of package dependencies of the counterpart mainline variant (x-axis). In all scatter plots, it's not surprising that the number of package dependencies for a fork and its corresponding + + + + + + +Fig. 5 + Scatter plots of mainline and fork variant dependencies of other packages on the ecosystems. The datasets mainline–fork variants of 54 mainline–fork pairs for Android, 590 mainline–fork pairs for .NET and 10,357 mainline–fork pairs for JavaScript. +Note +: The graphs are presented on different scales for visibility purposes. +mainline are correlated. This confirms that fork variants inherit the original dependencies of the mainline. However, we also observe points in all the scatter plots where one variant has more dependencies than the other. This means that the variant with more packages dependencies has functionality that is not included in the counterpart variant. Although the observation is more prominent for the mainline variant since we see many points below the diagonal lines for the two graphs (the forks do not keep in sync with the mainline), it is interesting that we also have some fork variants with more dependencies. Follow-up studies could investigate what and why new functionalities related to the used dependencies are being introduced in the variants. + + + + +Android variant categories: + + + + +Figure 6 shows the distribution of variants in the different categories on Google Play. We can see that 12 of the 54 forks (22%) are listed in a different category from the mainline, which suggests that these variants serve different purposes. However, the majority of pairs include variants in the same category. +---------------------------------------- +------------------------------- +Section 163: +4.2 Variant Maintenance Activity (JavaScript & .NET) + + +Figure 7 shows the release distributions for both the mainline and the fork variants in the JavaScript and .NET ecosystems. Each point on the x-axis represents a pair, and we sort the pairs by the number of mainline package releases. Figure 7a shows that the majority of mainline variants has multiple releases. Specifically, 5,888 of the 8,835 (67%) mainline variants have $\geq 5$ package releases on the JavaScript package manager. The fork variants have fewer, but still multiple releases. Specifically, 2,389 of the 10,357 mainline variants (23%) have $\geq 5$ package releases on the JavaScript package manager. Interestingly, from the plot we also observe a number of forks having more releases than their mainlines. Looking at Fig. 7b, for .NET variants, we observe a similar distribution like that of JavaScript. + + + + +Fig. 6 + Relationship between the variant categories listed on Google Play for each variant in the Android Mainline–Fork Pairs. +Same + = mainline–fork pairs share the same category and +Different + = mainline–fork pairs share different category. +variants in Fig. 7a. These results are interesting, since they indicate that developers of forked variants usually do not make a one-off package distribution. They are continuously distributing new releases of their packages, further emphasizing that these are indeed variant forks. + + +Observation 1–RQ1: + Families in fact exist in our three software ecosystems. We collected 38,526, and 8,837 different families. While both the mainlines and forks have multiple releases, the number of releases is significantly higher than those of the forks. Still it indicates that the latter are usually not one-shot releases; with some having even more than their mainlines. +---------------------------------------- +------------------------------- +Section 164: +4.3 Variant Ownership Characteristics + + +Figure 8 shows the percentage of common owners in the mainline–fork variant pairs of our three studied ecosystems. For the Android variants the analysis is based on all the data we collected (54 mainline–fork variant pairs). However, for the .NET and JavaScript variants we only analysed a subset of the .NET and JavaScript mainline–variant pairs, respectively, due to the criteria we set out to identify variant ownership in Section 3.2. From Fig. 8, we can see relatively the same percentages of the common (Yes) and not common (No) developers across the three ecosystems. Overall, our results imply that the majority of forked variants are started and maintained by developers different from those maintaining the mainline counterparts. + + +Observation 2–RQ1: + The majority of the mainline–fork variant pairs for the three ecosystems we investigated are owned by different developers (91% for Android variants, 95% of JavaScript variants and 92% of .NET variants). This implies that the majority of forked variants in our datasets are started and maintained by developers different from those maintaining the mainline counterparts. +Fig. 8 Variant owners for the mainline–fork variant pair for the three ecosystem. Yes = mainline–fork variant pair has common developers and No = mainline–fork variant pair do not have common developers. The datasets of mainline–fork variant pairs of 54 from Android, 985 from JavaScript, and 89 from .NET ecosystems. Note: The graphs are presented on different scales for visibility purposes. + + +4.4 Variant Popularity Characteristics + + +Figure 9 shows the variant popularity for the variants in the three software packaging ecosystems of Android, JavaScript, .NET. + + + + +Android variants +: Figure 9a shows the variant downloads distribution for both the mainline and fork variants where each point on the x-axis represents a pair and we sort the pairs by the number of mainline downloads. We observe that the majority of the mainline variants are quite popular, 27 of the 38 mainline variants (71%) have $\geq 10K$ downloads. For fork variant popularity in terms of downloads, we observe that 10 of the 54 fork variants (19%) having $\geq 10K$ downloads. We believe it is natural that the mainline variants are more popular than their fork counterparts, since we assume they... +Fig. 9 Distributions of mainline and fork variant variants’ popularity metrics for the variants in the three ecosystems of Android, JavaScript and .NET. The datasets of 54 mainline–fork pairs for Android, 10,357 mainline–fork pairs for JavaScript, and 590 mainline–fork pairs for .NET ecosystems have been released first on Google Play. Figure 9b shows the variant reviews distribution for both the mainline and fork variants where each point on the x-axis represents + + + + +12Note that Google Play does not keep release history of its variants so it is not possible to obtain the first listing date of each variant. +a pair and we sort the pairs by the number of mainline reviews. We observe a similar distribution for number of reviews like those observed in the number of downloads. This is not surprising since previous studies have found downloads and reviews to be correlated (Businge et al. 2019). Overall, the variant popularity we observe gives us confidence that our data set consists of real variants. + + + + +JavaScript and .NET variants +: In Figs. 9c–f we present the popularity graphs for the variants in the two ecosystems of .NET and JavaScript. Figure 9c shows the dependent packages distributions for both the mainline and fork variants where each point on the x-axis represents a pair and we sort the pairs by the number of mainline dependent packages. We observe that the majority of mainline variants are quite popular, 6,157 of the 10,357 mainline variants (59%) having at least two dependent packages. For fork variants, we observe that 1,624 of the 10,357 mainline variants (16%) having at least two dependent packages. Figure 9d shows the dependent projects distributions for both the mainline and fork variants for the variants in the JavaScript ecosystem. Each point on the x-axis represents a pair and we sort the pairs by the number of mainline dependent project. We also observe a similar distribution for number of dependent projects such as that observed in the number of dependent packages. The remaining two graphs, Figs. 9e and f, show the same data for the .NET ecosystem, and both show similar trends to those observed for JavaScript. + + + + +Comparing the popularity of all the ecosystems, we observe that the mainline variants are more popular than the fork variant counterparts. This is not surprising since the forks are clones of the mainline. However from Fig. 9, in all the three ecosystems, it is interesting to observe a few fork variants being more popular than their mainline counterparts. In a follow-up study it would be interesting to investigate possible explanations why the variants are more popular than their mainline counterparts. Comparing the popularity of the variants in the JavaScript and .NET ecosystems, we observe that on average the variants in the JavaScript ecosystem are more popular than the variants in the .NET ecosystem. We also observe that the fork variants in the .NET in the ecosystem are less popular (have fewer dependent packages/projects) than the variants in the JavaScript ecosystem. In a follow-up study it would also be interesting to investigate why variants in JavaScript families are more popular than the variants in .NET families and also why the fork variant variants in the JavaScript families are more popular than the fork variant variants in the .NET families. + + +Tables 4 and 5 present a few examples showing the variant popularity (for all the three ecosystems) and variant maintenance activities (for only .NET and JavaScript). In Table 5 columns mainline and fork we use the package names of the variants since repository names on GitHub were too long. In both tables, we present two interesting examples of variant pairs that we randomly picked: (1) +abandoned mainlines +: the first variant pair in each of the ecosystems has the fork variant more popular that the mainline. When we + + +| mainline | fork | mainline downloads | fork downloads | mainline reviews | fork review | +|-------------------|-----------------------|--------------------|----------------|------------------|-------------| +| TobyRich / | TailorToys / | 10K | 100K | 106 | 1,034 | +| app-smartplane-android | app-powerup-android | | | | | +| opendatakit / | kobotoolbox / | 1,000K | 100K | 3,049 | 1,527 | +| collect | collect | | | | | +Table 5 Example of mainline–fork pairs from the .NET and JavaScript ecosystems showing statistics on the popularity and maintenance activities + + +| mainline | fork dependent packages | mainline dependent packages | fork package releases | mainline package releases | fork package releases | +|------------|-------------------------|-----------------------------|-----------------------|--------------------------|-----------------------| +| .NET | Flurl.Signed | Flurl.Http.Signed | 3 | 10 | 6 | 10 | +| | Ninject | Portable.Ninject | 638 | 19 | 75 | 14 | +| JS | selenium | selenium-server | 97 | 2,046 | 2 | 51 | +| | gulp-istanbul | gulp-babel-istanbul | 5,867 | 11 | 24 | 14 | + + +JS JavaScript + + +compared the last release dates of the variants in all the ecosystems, we observed that the mainlines seem to have been abandoned while the fork variant continued to evolve. This is the reason the fork variants are more popular. In Table 5 we can also see that the fork variants have more releases than the mainlines. (2) Co-evolution: the second pair in each of the ecosystems we present another interesting case of co-evolution of both the mainline and fork variant. are continuously being maintained and where both are popular. In this cases, it would be interesting co-evolution of the variants in both technical and social aspects. Technical: for example investigating if the variants are complementary or they are competing? Social: What can we learn about the variant communities? + + +Observation 3–RQ1: Although the mainline variants are more popular, which is not surprising, there is quite a number of fork variants that are also popular. We also observe a few of the fork variants being more popular than their mainline counterparts. This again tells us the forks we are studying are indeed variant forks being used by the community of other developers (in the cases of .NET and JavaScript variants) and for Android variants, being downloaded and installed on user phones. We have pointed out some interesting research directions that can be investigated in follow-up studies. +---------------------------------------- +------------------------------- +Section 165: +5 Code Propagation in the Software Families (RQ2) + + +So far, we have analyzed the characteristics of the software families in across our three ecosystems. Our results from RQ1 give us confidence that the fork variants in our data set are indeed variant forks. In RQ2, we present the results of how variants in the same family co-evolve. Specifically, we are interested in their code propagation practices to understand if the variants evolve separately or if they propagate code between each other after the forking date. We present the results of code propagation between family variants in terms of propagated commits, while differentiating the propagation mechanisms we explained in Sections 2 and 3.3. Recall that these commit types determine the various code propagation strategies (e.g., pull requests versus direct integration through git). +Tables 6, 7, 8 and 9 show the metrics we use in this RQ to measure the types of propagated commits in the ecosystems of Android, JavaScript, and .NET. Where applicable, we specify the direction of the propagated code, i.e., mainline→fork or fork→mainline. Recall from Section 3.3.1 that we do not differentiate between git merge and git rebase commits and that we assume that all integrated git merge and git rebase commits are in the direction mainline→fork. This is why Tables 7 and 8 show only one metric gitPullMLV-FV to represent these two commit integration types. Tables 6–9 show the summary of the descriptive statistics of all the metrics we use to investigate code + + +| Metric | Mean | Min | Median | Max | Description | +|-------------------------|------|-----|--------|------|--------------------------------------------------| +| +Android variants + | | | | | | +| mergedPRsMLV-FV | 0.31 | 0 | 0 | 15 | Number of merged PR from the mainline to the fork variant. | +| mergedPRsFV-MLV | 0.09 | 0 | 0 | 4 | Number of merged PR from a given the fork to the mainline variant. | +| prMergedCommitsMLV-FV | 8.33 | 0 | 0 | 427 | Number of merged PR commits from the mainline to the fork variant. | +| prMergedCommitsFV-MLV | 0.57 | 0 | 0 | 28 | Number of merged PR commits from the fork to the mainline variant. | +| prSquashedMLV-FV | 0 | 0 | 0 | 0 | Number of squashed PR from the the mainline to the fork variant. | +| prSquashedFV-MLV | 0 | 0 | 0 | 0 | Number of squashed PR from a given the fork to the mainline variant. | +| prRebasedMLV-FV | 0 | 0 | 0 | 0 | Number of rebased PR from the the mainline to the fork variant. | +| prRebasedFV-MLV | 0 | 0 | 0 | 0 | Number of rebased PR from a given the fork to the mainline variant. | +| +.NET variants + | | | | | | +| mergedPRsMLV-FV | 0 | 0 | 0 | 3 | Number of merged PR from the mainline to the fork variant. | +| mergedPRsFV-MLV | 0.2 | 0 | 0 | 13 | Number of merged PR from a given the fork to the mainline variant. | +| prMergedCommitsMLV-FV | 0.2 | 0 | 0 | 30 | Number of merged PR commits from the mainline to the fork variant. | +| prMergedCommitsFV-MLV | 1.2 | 0 | 0 | 207 | Number of merged PR commits from the fork to the mainline variant. | +| prSquashedMLV-FV | 0 | 0 | 0 | 0 | Number of squashed PR from the the mainline to the fork variant. | +| prSquashedFV-MLV | 0 | 0 | 0 | 5 | Number of squashed PR from a given the fork to the mainline variant. | +| prSquashedCommitsFV-MLV | 0.1 | 0 | 0 | 14 | Number of squashed PR commits from the fork to the mainline variant. | +| prRebasedMLV-FV | 0 | 0 | 0 | 0 | Number of rebased PR from the the mainline to the fork variant. | +| prRebasedFV-MLV | 0 | 0 | 0 | 0 | Number of rebased PR from a given the fork to the mainline variant. | +Table 6 (continued) + + +| Metric | Mean | Min | Median | Max | Description | +|-------------------------------|------|-----|--------|-----|-----------------------------------------------------------------------------| +| JavaScript variants | | | | | | +| mergedPRs +MLV-FV + | 0 | 0 | 0 | 26 | Number of merged PR from the mainline to the fork variant. | +| mergedPRs +FV-MLV + | 0.4 | 0 | 0 | 4 | Number of merged PR from a given the fork to the mainline variant. | +| prMergedCommits +MLV-FV + | 0.1 | 0 | 0 | 399 | Number of merged PR commits from the mainline to the fork variant. | +| prMergedCommits +FV-MLV + | 0.57 | 0 | 0 | 28 | Number of merged PR commits from the fork to the mainline variant. | +| prSquashed +MLV-FV + | 0 | 0 | 0 | 2 | Number of squashed PR from the mainline to the fork variant. | +| prSquashed +FV-MLV + | 0 | 0 | 0 | 21 | Number of squashed PR from a given the fork to the mainline variant. | +| prSquashedCommits +MLV-FV + | 0.4 | 0 | 0 | 52 | Number of squashed PR commits from the mainline to the fork variant. | +| prSquashedCommits +FV-MLV + | 0 | 0 | 0 | 109 | Number of squashed PR commits from the fork to the mainline variant. | +| prRebased +MLV-FV + | 0 | 0 | 0 | 2 | Number of rebased PR from the mainline to the fork variant. | +| prRebased +FV-MLV + | 0 | 0 | 0 | 3 | Number of rebased PR from a given the fork to the mainline variant. | +| prRebasedCommits +MLV-FV + | 0.4 | 0 | 0 | 4 | Number of rebased PR commits from the mainline to the fork variant. | +| prRebasedCommits +FV-MLV + | 0 | 0 | 0 | 25 | Number of rebased PR commits from the fork to the mainline variant. | + + +propagation at the commit level for all the three ecosystems of Android, JavaScript, and .NET. + + +5.1 Pull Request Propagation (Commit Integration Inside GitHub) + + +We present the results of the pull request integration techniques: merge, rebase and squash (as well as the unclassified PRs) for the mainline–fork pairs in all the three ecosystems of Android, JavaScript, and .NET. In Table 6 the results of the summary statistics and in Table 7 we present the details of the summary statistics. We also present the distributions of the integration in both directions in Fig. 10. + + +Figure 10 shows the box plots showing the distributions of the different PR integration techniques. For example for the variants in the Android ecosystem, the distribution of the PR integration in both directions of mainlines → fork and fork → mainline are shown in Fig. 10a. There was only one pull request in each direction of integration. Both pull requests were integrated using the PR merge option. There was no PR integrated using any of the other PR integration options. We can see that in all the boxplots the majority of the mainline–fork variant pairs have zero PRs integrated in either direction. This implies that most of the pairs do not integrate PRs between themselves. +Table 7 Number of mainline–fork pairs, pull requests involved in code propagation in our dataset of 54 mainline–fork pairs, 10,357 mainline–fork pairs, and 590 mainline–fork pairs from the ecosystems of Android, JavaScript, and .NET, respectively + + +| | Mainline→ Fork | Fork→ mainline | +|------------------|----------------|----------------| +| | Pairs | PRs | Commits | Pairs | PRs | Commits | +| +Android variants + | | | | | | | +| PR Merged | 1 | 1 | 5 | 1 | 2 | 427 | +| Rebased | 0 | 0 | 0 | 0 | 0 | 0 | +| Squashed | 0 | 0 | 0 | 0 | 0 | 0 | +| Unclassified | 0 | 0 | 0 | 0 | 0 | 0 | +| Git Cherry-pick | 5 | n/a | 250 | 4 | n/a | 136 | +| gitPullMLV-FV | 18 | n/a | 13,198 | n/a | n/a | n/a | +| +.NET variants + | | | | | | | +| PR Merged | 9 | 13 | 96 | 67 | 139 | 721 | +| Rebased | 0 | 0 | 0 | 0 | 0 | 0 | +| Squashed | 0 | 0 | 0 | 13 | 21 | 72 | +| Unclassified | 0 | 0 | 0 | 3 | 3 | 9 | +| Git Cherry-pick | 15 | n/a | 99 | 16 | n/a | 138 | +| gitPullMLV-FV | 106 | n/a | 5,601 | n/a | n/a | n/a | +| +JavaScript variants + | | | | | | | +| PR Merged | 99 | 162 | 1,862 | 724 | 1,394 | 4,523 | +| Rebased | 1 | 1 | 4 | 11 | 13 | 67 | +| Squashed | 5 | 6 | 72 | 132 | 250 | 1,048 | +| Unclassified | 7 | 10 | 33 | 23 | 32 | 134 | +| Git Cherry-pick | 95 | n/a | 275 | 91 | n/a | 251 | +| gitPullMLV-FV | 1,180 | n/a | 40,001 | n/a | n/a | n/a | + + +For example, the Android apps, the first row in the direction of mainline→ fork, only 1 fork variant merged 1 PR from the mainline containing 5 commits and in the direction of fork→ mainline, only 1 mainline merged 2 PRs containing 427 commits. + + +Table 7 shows the details of summary statistics in the distributions. For example, in the top section of Table 7 (Android variants) and in the first row, we observe 1 of the 54 mainline–fork variant pairs that integrated 1 PR having a total of 5 commits, using the merge pull request option, in the direction of mainline→ fork. In the same row, in the direction of fork→ mainline, we observe 1 mainline–fork pair that integrated 2 PRs, having a total of 427 commits, using the merge pull request option, in the direction of fork→ mainline. + + +We can see that for Android variants only 1 of the 54 (1.9 %) mainline–fork pairs integrated commits using the merge pull request option. We observe more or less similar trends for the mainline–fork variants pairs in the other two ecosystems. For the JavaScript mainline–fork variant pairs, we observe 99 of the 10,357 mainline—fork variant pairs (1 %). +Table 8 Git based (outside GitHub) code propagation practices, at commit level, for the 54 mainline–fork pairs, 10,357 mainline–fork pairs, and 590 mainline–fork pairs in the Android, JavaScript, .NET ecosystems, respectively + + +| Metric | Mean | Min | Median | Max | Description | +|-------------------------|------|-----|--------|------|-----------------------------------------------------------------------------| +| +Android variants + | | | | | | +| gitCherrypickedMLV-FV | 4.6 | 0 | 0 | 168 | Number of git cherry-picked commits from the mainline to the fork variant. | +| gitCherrypickedFV-MLV | 2.5 | 0 | 0 | 75 | Number of git cherry-picked commits from the fork to the mainline variant. | +| gitPullMLV-FV | 244 | 0 | 0 | 6567 | Number of git merged/rebased commits from the mainline to the fork variant.| +| +.NET variants + | | | | | | +| gitCherrypickedMLV-FV | 1.5 | 0 | 0 | 42 | Number of git cherry-picked commits from the mainline to the fork variant. | +| gitCherrypickedFV-MLV | 0.4 | 0 | 0 | 148 | Number of git cherry-picked commits from the fork to the mainline variant. | +| gitPullMLV-FV | 9.5 | 0 | 0 | 2,317| Number of git merged/rebased commits from the mainline to the fork variant. | +| +JavaScript variants + | | | | | | +| gitCherrypickedMLV-FV | 4.6 | 0 | 0 | 168 | Number of git cherry-picked commits from the mainline to the fork variant. | +| gitCherrypickedFV-MLV | 0 | 0 | 0 | 70 | Number of git cherry-picked commits from the fork to the mainline variant. | +| gitPullMLV-FV | 3.7 | 0 | 0 | 6,035| Number of git merged/rebased commits from the mainline to the fork variant. | + + +integrating commits, using the merge pull request option, in the direction of mainline→fork and 724 of the 10,357 mainline–fork pairs (7 %) in the direction of fork→mainline. We observe very few mainline–fork variant pairs, in the JavaScript software packaging ecosystem, integrating commits using the pull request squash/rebase options in either integration directions. For the mainline–fork variant pairs in the .NET ecosystem, we observe 9 of 590 mainline–fork pairs (1.5 %) and 67 of the 590 mainline–fork pairs (11.3 %) integrating commits, using the merge pull request option, in the direction of mainline→fork and fork→mainline, respectively. We did not observe any commits integrated using the rebased pull request option in either integration direction, while for the commits integrated using the squash pull request option, we only observed integration in the direction of fork→mainline accounting for 13 of the 590 mainline–fork pairs (2 %). + + +We observe that there are more mainline–fork variant pairs integrating commits in the direction of fork→mainline as opposed to mainline→fork irrespective of the PR integration option used. For Android variants we observed 1 pair each in either direction (1.9 % each); for JavaScript variants we have 867 of 10,357 mainline–fork pairs (8.4 %) in the direction of fork→mainline to 105 of 10,357 mainline–fork pairs (14 %). Regarding the pull request integration options, we can see that the merge pull request option is clearly the most frequently used in all integration directions and in all the three ecosystems. In all three software packaging ecosystems, the squash and rebase options are rarely used. However, comparing the two PR options, squash and rebase, we observe that the squash PR option is used more often. +Table 9 Unique commits and variability percentage for the 54 mainline–fork pairs, 10,357 mainline–fork pairs, and 590 mainline–fork pairs in the Android, JavaScript, .NET ecosystems, respectively + + +| Metric | Mean | Min | Median | Max | Description | +|-----------------|-------|-----|--------|--------|-----------------------------------------------------------------------------| +| +Android variants + | | | | | | +| unique +MLV + | 1,122 | 0 | 228 | 18,961 | Number of unique commits in the mainline variant in a given mainline–fork pair. | +| unique +FV + | 98.3 | 1 | 16 | 1,646 | Number of unique commits in the fork variant in a given mainline–fork pair. | +| InheritedCommits | 1,884 | 10 | 755 | 29,110 | Number of common commits between a given fork and the mainline variant. | +| VariabilityPercentage | 15 | 0 | 2.7 | 93.8 | Percentage of unique commits according to (1). | +| +.NET variants + | | | | | | +| unique +MLV + | 102.2 | 0 | 3 | 10,789 | Number of unique commits in the mainline variant in a given mainline–fork pair. | +| unique +FV + | 16.2 | 0 | 5 | 605 | Number of unique commits in the fork variant in a given mainline–fork pair. | +| InheritedCommits | 224.5 | 0 | 42.1 | 20,538 | Number of common commits between a given fork and the mainline variant. | +| VariabilityPercentage | 20 | 0 | 11 | 99 | Percentage of unique commits according to (1). | +| +JavaScript variants + | | | | | | +| unique +MLV + | 33.5 | 0 | 3 | 10,223 | Number of unique commits in the mainline variant in a given mainline–fork pair. | +| unique +FV + | 12.8 | 0 | 5 | 1,229 | Number of unique commits in the fork variant in a given mainline–fork pair. | +| InheritedCommits | 111.5 | 14 | 32 | 66,861 | Number of common commits between a given fork and the mainline variant. | +| VariabilityPercentage | 22.3 | 0 | 14 | 99 | Percentage of unique commits according to (1). | + + +Observation 1–RQ2 +: Code propagation using PRs is rarely used in all the mainline–fork variant pairs from the three ecosystems that we studied. Unsurprisingly, we have observed that PRs in the direction of fork → mainline are more than those in the direction of mainline → fork. However, although low numbers are observed, there are some PRs in the direction of mainline → fork. We have also observed that, in all the three ecosystems, the most used integration option is by far the merge PR option. The squash and rebase PR option are less frequently used in mainline–fork variant pairs all the three ecosystems, although the squash PR option is more used than the rebase PR option. The low numbers could be attributed to the fact that the fork variants are created not to submit PRs but to diverge away from the mainline to solve a different problem. A follow-up study involving a user study could investigate motivation behind fork variant are creation and why there is limited collaboration between mainline and fork variants. +5.2 Git Propagation (Commit Integration Outside GitHub) + + +In this section we present the results of commit integration outside GitHub relating to +git cherry-pick + and +git merge/rebase + (gitPullMLV-FV). The summary statistics of these two commit integration techniques are presented in Table 8. In Table 7, the detailed results corresponding to the summary statistics in Table 8 are presented. We first present the results of +git cherry-pick +, and we follow with the results of +git merge/rebase +. + + + + + + +git cherry-pick commit integration +: Like we stated in Section 3.3 commits can be cherry-picked from mainline in two directions: mainline→fork or fork→mainline. The two metrics: +gitCherrypickedMLV-FV + and +gitCherrypickedFV-MLV + (in Table 8) corresponding to the two commit integration directions for the mainline→fork and fork→mainline, respectively, in the three ecosystems. In Fig. 11 we present boxplot distributions corresponding to the results in Table 8. We can see all the distributions only show outliers, meaning that most pairs do not have cherry-picked commits. The detailed statistics in Table 7 reveal the same results. For example, the upper part of Table 7 presenting the Android variants, we can see that there are only 5 of the 54 mainline–fork pairs (9%) that integrated a total of 250 commits in the direction of... +mainline→fork. In the direction of fork→mainline there were 4 of the 54 mainline–fork pairs (7.4%) integrating a total of 136 commits. Like the results of pull request integration presented earlier, we can also clearly see that commit integration using +git cherry-pick + is rarely used in the mainline–fork variant pairs in all the three ecosystems we have studied. Unlike pull request integration where the developer has to sync upstream or downstream the new changes, with +git cherry-pick + the developer have to search for specific commits to integrate. This requires to first look into the pool of new changes and identify the ones of interest to cherry-pick. If the mainline and fork variant have diverged solving different problems, then finding the interesting commits in the new changes might be laborious. We hypothesize that this could be one of the reasons why there are few numbers of commits observed in mainline–fork variant pairs in the three ecosystems. A follow up study to confirm or refute this hypothesis would add value to this study. + + + + + + +git merge/rebase + commit integration +: In Table 8 we can see metric +gitPullMLV-FV + representing the +git merge/rebase + commit integration in the direction of mainline→fork, in the three ecosystems. Again we can see that the all the medians for all the metric in all the three ecosystems are all zeros. Figure 11 shows three boxplots showing the distributions of +gitPullMLV-FV + metric for the mainline–fork variant pairs in the three ecosystems. From the boxplots, we can also observe that the medians are all zeros. In Table 7 we present the detailed statistics for the metric +gitPullMLV-FV +. For Android mainline–fork variant pairs, we observe 18 of the 54 mainline–fork pairs (33%) with a total of 13,198 commits being integrated in the direction of mainline→fork. For .NET mainline–fork variant pairs, we observe 106 of the 590 mainline–fork pairs (18%) with a total of 5,601 commits being integrated in the direction of mainline→fork. And finally for JavaScript mainline–fork variant pairs, we observe 1,180 of the 10,357 mainline–fork pairs (11%) with a total of 40,001 commits being integrated in the direction of mainline→fork. We can see that although +git merge/rebase + still rarely used in the mainline–fork variants in all the three ecosystems, it is more used than the other two options of pull requests and +git cherry-pick +. We can conclude that +git merge/rebase + is the most used code integration mechanism between the variants. +in variant families. Again, we speculate that the lack of integration mainline–fork variant pairs could be as a result of the variants diverging to solve different problems from those being solved by their mainline counterparts. + + + + + + +Observation 2–RQ2 +: Like the integration technique using PRs, we also observe that +git merge/rebase + and +git cherry-pick + integration techniques are also less frequently used in the variants in the three ecosystems. However, we observe that integration using +git merge/rebase + is the most commonly used integration mechanism between the mainline–fork variants in all the three ecosystems which occurs in the integration direction of mainline→fork. In general, a follow-up study to investigate why most variants do not share code would reveal reasons for the low numbers of integration. +---------------------------------------- +------------------------------- +Section 166: +5.2.1 Fork Variability Percentage + + +This section presents the results of variability percentage (metric +VariabilityPercentage +) for the fork variants in the three ecosystems. In Table 6, we present the summary statistics for the metrics used to calculate +VariabilityPercentage + in (1). Figure 12 presents the distributions of the metric +VariabilityPercentage + of the fork variants in the three ecosystems. We can see that the medians are 2.7%, 11%, and 14%, for variants in the three ecosystems of Android, .NET, and JavaScript, respectively. A high value of the metric +VariabilityPercentage + implies that the fork differs from its mainline counterpart. For the fork variants in the Android ecosystem, we observe quite a number of the forks, 35 of the 54 (35%), have a high +VariabilityPercentage + (≥ 10%). The fork variants from the .NET ecosystem, we also observe the majority of the forks, 281/590 (53%), have a high +VariabilityPercentage + (< 10%). Lastly, the fork variants in the JavaScript ecosystem, we also observe quite the majority of the forks, 6,076/10,357 (58%), have a relatively high +VariabilityPercentage + (< 10%). + + + Distribution of fork variability percentage— +VariabilityPercentage + for the variants in the three ecosystems. The datasets of 54 fork variants, 10,357 fork variants, and 590 fork variants from the ecosystems of Android, JavaScript, and .NET, respectively. +Observation 3–RQ2: The majority of the fork variants in the three ecosystems of Android, JavaScript, and .NET highly differ from their mainline counterparts (i.e., they have higher numbers of unique commits). The findings of forks variants differing from their mainlines could be used to support our earlier finding relating to limited commit integration in the mainline–fork variant pairs in the three ecosystems. + + +5.3 Summary + + +We have presented results of code propagation practices among mainline–fork variant pairs from the three ecosystems of Android, .NET, and JavaScript. Overall, in all the studied mainline–fork variant pairs of the three ecosystems, we observe infrequent code propagation, regardless of the type propagation mechanism or direction. The most used code propagation technique is +git merge/rebase +, which is used in 33% of Android mainline-fork pairs, 11% of JavaScript pairs, and 18% of .NET pairs. For integration using pull requests, developers often integrate code in the direction of fork → mainline compared to those in the direction of mainline → fork, in all the mainline–fork variants. The code integration in the direction of mainline → fork is often done using the +merge + pull request option or +git merge/rebase + outside GitHub. Moreover, the +squash + and +rebase + pull request options are less frequently used in mainline–fork variant pairs, although the +squash + PR option is more used than the +rebase + pull request option. Finally, by comparing the fork variability percentage, we observed a high percentage difference between the fork variants and their mainline counterparts, indicated by the higher number of unique commits. These results are consistent across all the variants of the three ecosystems (i.e., Android, JavaScript, and .NET) that we studied. Our findings potentially indicate that the fork variants are being created with the intention of diverging away from the mainline to solve a different problem (i.e., with no intention to sync in any way with the original mainline). Future studies could investigate the motivation behind fork variants’ creation and why there is a limited collaboration between mainline and fork variants. +---------------------------------------- +------------------------------- +Section 167: +6 Discussion and Implications + + +The observations from our two research questions have several implications for future research on co-evolution of software families and for respective tool support. + + +Implications for Identifying Variant Forks As opposed to previous studies that relied on heuristics applied to GitHub repositories to identify Variant forks, in this study, we ensure that all members of a variant family represent different variants in the marketplace (Google Play, JavaScript, and .NET). Relying on only heuristics applied to GitHub repositories to find variant forks may have false positives (i.e., fork classified as a variant fork, yet it is a social fork). The method for identifying divergent forks can be reused by other researchers interested in studying variant families in other ecosystems, including operating-system packages (e.g., Debian packages (Berger et al. 2014)) and ecosystems established for other programming languages. In fact, most of the popular programming language today, such as JavaScript, Java, PHP, .NET, Python, and many more have their own package managers. +available that host hundreds of thousands of packages. More details on the package managers can be found on Libraries.io which is a platform we have used to identify and extract details about variant families from the JavaScript and .NET ecosystem. Libraries.io references packages from over 37 package managers where one can obtain software families in the different ecosystems. + + +Implications for Forking Studies + Observation 1–RQ2 and Observation 2–RQ2 suggest that, in our studied divergent forks, direct integration using +git + outside of GitHub is more commonly used than GitHub pull requests. +This implies that simply relying on pull requests to understand code propagation practices in divergent forks is not enough. + Furthermore, it seems that integration using +git rebase + is common, as per Observation 2–RQ2. Rebas ing complicates the git history and empirical studies that do not consider rebasing may report skewed, biased and inaccurate observations (Paixão and Maia 2019). Thus, in addition to looking beyond pull requests when studying code propagation, +studies must also consider rebased commits. + In this paper, we contribute reusable tooling for identifying these rebased commits. + + +Implications for Integration Support Tools + Regardless of the integration technique used, our findings based on the variants from the three ecosystems studied suggest that code propagation rarely happens between a fork and its mainline. In our datasets, we observe 35% of 54 mainline–fork pairs, 21% of 590 mainline–fork pairs, and 11.5% of 10,357 mainline–fork pairs that integrated commits using at least one of the commit integration techniques in the three ecosystems of Android, .NET, and JavaScript, respectively. The lack of integration may be problematic, since the fork variants may rely on the correct functionality of the existing code from the mainline. This means that any bugs that exist in the mainline will also exist in these forks, unless bug fixes are propagated from one variant to the other. However, current integration techniques (Lillack et al. 2019; Krueger and Berger 2020a; Krueger et al. 2020) do not necessarily facilitate finding such bug fixes. For example, code integration using pull requests and +git merge / rebase + may not be the best when integrating changes in variant forks since they involve syncing upstream / downstream all the changes missing in the current branch. Alternatively, cherry picking is probably more suitable for bug fixes since the developer can choose the exact commits they want to integrate. However, GitHub’s current setup does not make it easy to identify commits to cherry-pick without digging through the branch’s history to identify relevant changes since the last code integration. As a result of the difficulty of finding commits to cherry-pick, developers may end up fixing the same bugs, which would result in duplicated effort and wasted time. To check if a possible duplication of effort occurs in our data set, we looked at the unique commits of the variants and indeed found that developers independently update files shared by the variants. For example, in the mainline–fork variant pair ( +k9mail / k-9, imaeses / k-9 +) the shared file +ImapStore.java +(^\text{13}) has been touched by 15 different developers in 142 commits in the mainline variant while in the fork variant it has been touched by one developer in 9 different commits. It is possible that these developers could be fixing similar bugs existing in these shared artifacts. Moreover, the study of Jang et al. (2012) reports that during the parallel maintenance of cloned code, a bug found in one clone can exist in other clones, thus, it needs to be fixed multiple times. Furthermore, as a result of different developers changing shared files, it is possible that these developers do not integrate + + +(^{13}\text{src/com/fsck/k9/mail/store/ImapStore.java. Same path for both mainline and fork.}) +code because of “fear of merge conflict.” In relation to this conjecture, several studies have reported that merging diverged code between repositories is very laborious as a result of merge conflicts (Stanciulescu et al. 2015; Brun et al. 2011; de Souza et al. 2003; Perry et al. 2001; Sousa et al. 2018; Mahmood et al. 2020; Silva et al. 2020). To this end, it would be interesting for future research to interview the developers of our forks (and further forks) to determine whether the lack of support for cherry picking bug fixes or specific functionality does indeed contribute to the lack of code propagation. In that case, developing a patch recommendation tool that can inform developers of possible interesting changes as soon as they are introduced in one variant and recommend them to other variants in a family can help save developers’ efforts. The recent work by Ren et al. (2018) that focused on providing the mainline with facilities to explore non-integrated changes in forks to find opportunities for reuse is one step towards this direction. Our work opens up more opportunities for applying such tools since, as mentioned above with respect to identifying divergent forks, we provide a technique for identifying such forks by combining information from GitHub and the ecosystem’s main delivery platform as well as we mention various other ecosystems where a similar strategy can be adopted. Finally, the limited sharing of changes can give rise to quality issues. We did not specifically investigate the propagation of test cases, which might not be propagated as well. Developing techniques for propagating test cases within families could significantly enhance the quality of variants within families. The potential of test-case propagation has recently been pointed out in a preliminary study by Mukelabai et al. (2021). + + +Implications for Future Research + + +Our work is the first to perform a large-scale empirical study on the practices used to manage software families within software ecosystems. Our results give rise to the following open research questions that could be addressed as follow up studies to further understand the evolution of such families. + + + + + + +More than two variants in a family: + In the results of RQ1, we showed that there are quite a number of families that had a FamilySize of more than two variants (i.e., mainline with two or more fork variants). However, in this study we only concentrated on the practices used to manage mainline-fork pairs. For example, we did not look at fork-fork pairs in a given family or looking at the holistic evolution the families that have more than two variants. It would be interesting to extend the study to those families study the evolution of the family. + + + + + + +Variant dependencies: + In RQ1, we observed that in some variant pairs in all the three ecosystems, one of the mainline or fork variant in the pair has more dependencies than the other. This implies that the variant that has more dependencies implements new functionality relating to the extra dependencies that are missing in the counterpart. It would be interesting to investigate what/why the new functionality is missing in the counterpart variant. Another interesting research relating to dependencies would be to investigate if there are some variants in a family that have updated their code to depend on new releases of the common dependencies, while other variants in the same family are still dependent on the old releases of the dependencies. Updating code to implement a new release of a dependency may involve fixing incompatibilities, especially if the new release of the dependency involves a breaking change. To avoid effort duplication, a tool could be developed that could help in transplanting patches (related to the incompatibility fixes), to other variants in the family that have not yet migrated their code to the new API-breaking change of the release of the common dependency. + + + + + + +Limited sharing of changes in unique commits: + In RQ2 we have observed that there is limited sharing of the changes in the unique commits between the mainline–fork variant +pairs in the three ecosystems. We hypothesized that one of the possible reasons could be the variants diverging from each other to solve different problems. We also stated that fork variants could be created to support a new technology, serve different community, target different content, to support a frozen feature in the mainline. Fork variants created for the above reasons are likely to have little to share with their mainline variants. It would be interesting to carry out a study involving mixed methods of quantitative and user studies to verify our hypothesis. + + + + + + +Impediments in co-evolving variants in software families: + Like in the study of Robles and González-Barahona (2012), in our dataset we also observed that some mainline–fork variant pairs continue to co-exist, while others one of the variants in the pair is abandoned as the other continues to evolve. A Follow-up study can be conducted to investigate the impediments to co-evolving these variants. Inspirations can be leveraged from the studies of the co-evolution of Eclipse platform and its third-party plugins (Businge et al. 2012a; 2013; 2010; 2012b; 2015; Businge et al. 2019; Kawuma et al. 2016). +---------------------------------------- +------------------------------- +Section 168: +7 Related Work + + +We discuss related work on (i) variant forking and on (ii) code propagation in forked projects, as well as we discuss (iii) general studies on forking. + + +7.1 Variant Forking + + +To understand the variants in our variant families, RQ2 explored the reasons forks were created. While there are existing studies on variant forks, most of these were done in the pre-GitHub days of SourceForge, before the advent of social coding environments (Nyman et al. 2012; Robles and González-Barahona 2012; Viseur 2012; Nyman and Lindman 2013; Laurent 2008; Nyman and Mikkonen 2011). These studies reported controversial perceptions around variant forks in the pre-GitHub days (Chua 2017; Dixion 2009; Ernst et al. 2010; Nyman and Mikkonen 2011; Nyman 2014; Raymond 2001). However, Zhou et al. (2020) recently report that these perceptions have changed with the advent of GitHub. In the Pre-GitHub days, variant forks were frequently considered as risky to projects, since they could fragment a community and lead to confusion of developers and users. Jiang et al. (2017) state that, although forking is controversial in the traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. The authors further report that developers carry out social forking to submit pull requests, fix bugs, add new features, and keep copies. Zhou et al. (2020) also report that most variant forks start as social forks. Robles and González-Barahona (2012) comprehensively study a carefully filtered list of 220 potential forks of different projects that were referenced on Wikipedia. The authors assume that a fork is significant if a reference to it appears in the English Wikipedia. They found that technical reasons and discontinuation of the original project were the most common reasons for creating variant forks, accounting for 27.3% and 20% respectively. More recently, Zhou et al. (2020) interviewed 18 developers of variant forks on GitHub to understand reasons for forking in more modern social coding environments that explicitly support forking. The authors report that the motivations they observed align with the above prior studies. + + +All the above works studied forks for any type of project, not limited to a specific technological space (e.g., web applications or mobile apps). Our paper is different in that it focuses +on Android apps, triangulating data from both GitHub and Google Play to study real-world apps. Specifically, we study variant reuse practices in RQ2 and, different from both studies ((Zhou et al. 2020) and (Robles and González-Barahona 2012)), we investigate additional phenomena, such as code propagation with RQ3. + + +Another difference between the current study and the study of Zhou et al. (2020) is the heuristics the two studies employ to determine variant forks. Zhou et al. (2020) classify forks on GitHub as variant forks using the following heuristics: (i) contain the phrase “fork of” in the description, (ii) received at least three external pull requests, (iii) have at least 100 unique commits, (iv) have at least one year of development, and (v) have changed their name. In our work, we use the external validation of a fork being listed on Google Play under a different package name, and we use the description there to verify that this app is indeed a variant of the mainline. + + +7.2 Code Propagation Practices + + +There are only a few studies that investigated code integration between a given repository and its forks. Stanciulescu et al. (2015) studied forking on GitHub using a case study of Marlin, an open source firmware for 3D printers. The authors observed that many forked variants share their changes with the mainline. However, their work does not differentiate between social and variant forks. Thus, we do not know whether this observed prevalent code propagation is simply due to the fact that these are social forks created with the main goal of contributing back to the original project (Zhou et al. 2019). In our current paper, we are interested only in variant forks. Recently, Zhou et al. (2020) observed that only 16% of their 15,306 studied variant forks ever synchronized or merged changes with their mainline repository. However, based on their discussed threats to validity, it seems that the authors relied only on common commit IDs to identify shared commits. As we explained in Section 2, there are several integration techniques that result in propagated commits having different commit IDs. Thus, relying only on the commit ID may result in missing other shared commits. To mitigate this problem, our work identifies integrated commits that preserve the commit ID as well as those that may have been integrated using techniques that change the commit ID. Another study on code propagation practices work of Kononenko et al. (2018). The authors considered three types of commit integration: GitHub merge, cherry-pick merge and commit squashing. In comparison to our study, we only do not study commit squashing but we look at other techniques the authors did not consider like: GitHub rebase and squash pull requests as well as git merge and rebase. + + +Code propagation practices do not necessarily have to be in the context of forks. For example, German et al. (2016) investigated how Linux uses Git. The authors stated that code changes are variant to track because of the proliferation of code repositories and because developers modify (“rebase”) and filter (“cherry-pick”) the history of these changes to streamline their integration into the repositories of other developers. To this end, the authors presented a method \textit{continuousMining} that crawls all known git repositories of a project multiple times a day to record and analyze all change-sets of a project. They authors state that \textit{continuousMining} not only yields a complete git history, but also catches phenomena that are variant to study such as rebasing and cherry-picking. While we do not continuously capture the “live” history of a software project, we are able to capture rebased and cherry-picked commits in the context of forked projects by relying on the commit meta data, after a thorough investigation of how this meta data changes depending on the propagation strategy. +7.3 Other Studies About Forking + + +Gamalielsson and Lundell (2014) studied the long term sustainability of Open Source software communities in Open Source software projects involving a fork. The authors study was based on LibreOffice project, which is a fork from the OpenOffice project. They wanted to understand how Open Source software communities were affected by a forking. The authors undertook an analysis of the LibreOffice project and the related OpenOffice and Apache OpenOffice projects by reviewing documented project information and a quantitative analysis of project repository data as well as a first hand experiences from contributors in the LibreOffice community. Their results strongly suggested a long-term sustainable LibreOffice community that had no signs of stagnation in the LibreOffice project 33 months after the fork. They also reported that good practice with respect to governance of Open Source software projects is perceived by community members as a fundamental challenge for establishing sustainable communities. Nyman (Nyman 2014) interviewed developers to understand their views on forking. His findings from the interviews differentiate good forks, which are those that (i) revive abandoned programs, (ii) experiment with and customize existing programs, or (iii) minimize tyranny and resolve disputes by allowing involved parties to develop their own versions of the program, vs. bad forks, which are those that (i) create confusion among users or (ii) add extra work among developers (including both duplication of efforts and increased work if attempting to maintain compatibility). +---------------------------------------- +------------------------------- +Section 169: +8 Threats to Validity + + +Internal Validity We identify four issues that could threaten the internal validity of our results: (1) In Section 3.1, the heuristics used for app family data identification in Steps 2 & 6 resulted in mismatch in the mapping of some the forks on GitHub and Google Play. We mitigated the threat by carrying out a through manual analysis in Section 3.1–Step 7 and discarded the mismatched apps. Some of the steps we carried out during Android variant’s data collection are manual, and any errors in those could affect our results. (2) Although we did not observe any cases where the developer changed the message in cherry-picked commits, we acknowledge that our algorithm will not be able to identify such cases; instead, our algorithm will identify them as unique commits in the respective variants. (3) We also acknowledge that our tool chain may miss some commits that are integrated using more than one integration technique. For example in Section 3.3.3, we presented the unclassified merged pull requests, which were listed on the GitHub API as merged yet they were not merged in the master branch. We discovered that the pull requests were integrated in a different branch other than the mainline but had all failed the build integration tests. To this end, when integrating commits from a fork → mainline, as a “best practice”, developers may wish to first integrate the commits into a different branch (say staging branch) perform and integration test and then later integrate them into the master. However, following the “best practice” we have explained, if the developer first integrates into the development branch using one commit integration technique. Thereafter the developer may wish integrate the same commits into the master using a different technique that changes the original integrator’s metadata (for example cherry-picking). In that case, our toolchain will miss such commits. (4) In Section 2.2, we also stated that our scripts are not able to identify the integrated commits if the integrator uses git commands that rewrite the commit history. +However, like we stated in Section 3.3.3, we believe that the practice of rewriting contributions from the community is likely to be rare with experienced developers, since rewriting changes commit authorship. (5) In Step 6 of Section 3.1, we eliminated all Android mainlines that did not at least one fork having a different package name on Google Play store. This means that we eliminate fork variants that were created for different markets other than Google play. However, unlike Google play where one can use an app’s package name as a unique ID on Google play, other markets, such as anzhi, apkmirror, appsapk do not implement this strategy which means we cannot easily identify the correct app for a given GitHub repository. Therefore, we intentionally focus only on Android apps that are distributed on Google play store, which limits the number of Android families we are able to identify. + + +Construct Validity + The calculation of variability percentage of the fork variants treats commits the same way irrespective of the number of files touched. For example, a commit that has touched 100 files is treated the same as one that has just touched on file. While this may be misleading, the measure provides some indication of unique development activity. + + +External Validity + We analyzed only 54 Android mainline–fork variant pairs while there exists millions of android applications on Google Play and other Android markets, which means that our results might not be representative of all the Android applications. However, we also analyze mainline–fork variant pairs from two other ecosystems that also show similar results and behavior. +---------------------------------------- +------------------------------- +Section 170: +9 Conclusion + + +We presented a large-scale exploratory study on reuse and maintenance practices via code propagation between variant forks and their mainline counterparts in software ecosystems. Our subject ecosystems cover different technological spaces: Android, JavaScript, and .NET. As part of our study, we designed a systematic method to identify real variant forks as well. + + +We identified and analyzed families of variants that are maintained together and that exist both on the official package distribution platforms (Google play, nuget, and npm) as well as on GitHub, allowing us to analyze reuse practices in depth. For variants in a given ecosystem, we mined from both sources of information—from GitHub and the package distribution site—to study their characteristics, including their variations, and code-propagation practices. In the Android ecosystem we identified 38 software families with a total of 54 mainline–fork pairs, in the .NET ecosystem 526 software families with 590 mainline–fork pairs, and in the JavaScript ecosystem 8,837 JavaScript software families with 10,357 mainline–fork pairs. We provide a toolchain for analyzing code integration between any mainline-fork variant pair. Regardless of the integration technique used, our findings suggest that code integration rarely happens between a fork and its mainline. In our study, in the Android ecosystem, we observed only 19 of the 54 (35 %) that integrated commits using at least one of the commit integration techniques we discussed. In the .NET ecosystem, we observed a total of 126 of the 590 mainline–fork pairs (21 %) that that integrated commits using at least one of the commit integration techniques. In the JavaScript ecosystem, we observe a total of 1,189 of the 10,357 mainline–fork pairs (11.5 %) that integrated commits using at least one of the commit integration techniques. +Overall, we analyzed variant forks on GitHub for two main reasons: (1) many previous studies focused on social forks, (2) the few studies on variant forks are conducted in the pre-GitHub days of SourceForge. In the future, it would be interesting to investigate a middle ground between the variant forks and social forks. For example, one could investigate if the practices observed in the variant forks are different from those of social forks. + + +Acknowledgements We thank Serge Demeyer for comments on earlier drafts of this work. + + +John Businge’s work is supported by the FWO-Vlaanderen and F.R.S.-FNRS via the EOS project 30446992 SECO-ASSIST. Thorsten Berger’s work is supported by Swedish research council and Wallenberg Academy. Sarah Nadi’s research was undertaken, in part, thanks to funding from the Canada Research Chairs Program. + + +Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. + + +References + + +Online appendix (2020) https://github.com/johnxu21/emse2020 + + +GitHub I. (2020). About pull request merges. https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-request-merges + + +Apel S, Batory D, Kastner C, Saake G (2013) Feature-oriented software product lines, Springer, Berlin + + +Berger T, Pfeiffer R, Tartler R, Dienst S, Czarnecki K, Wasowski A, She S (2014) Variability mechanisms in software ecosystems. Inf Softw Technol 56(11):1520–1535 + + +Berger T, Steghöfer JP, Ziadi T, Robin J, Martinez J (2020) The state of adoption and the challenges of systematic variability management in industry. Empir Softw Eng 25:1755–1797 + + +Brun Y, Holmes R, Ernst MD, Notkin D (2011) Proactive detection of collaboration conflicts. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, ESEC/FSE ’11. Association for Computing Machinery, New York, pp 168–178. https://doi.org/10.1145/2025113.2025139 + + +Businge J, Decan A, Zerouali A, Mens T, Demeyer S (2020) An empirical investigation of forks as variants in the npm package distribution. In: Papadakis M, Cordy M (eds) Proceedings of the 19th Belgium-Netherlands software evolution workshop, BENEVOL 2020, Luxembourg, December 3-4, 2020, CEUR Workshop Proceedings, vol. 2912. CEUR-WS.org. http://ceur-ws.org/Vol-2912/./paper1.pdf + + +Businge J, Kawuma S, Bainomugisha E, Khomh F, Nabaasa E (2017) Code authorship and fault-proneness of open-source android applications: An empirical study. In: Proceedings of the 13th international conference on predictive models and data analytics in software engineering, PROMISE. ACM, New York, pp 33–42. https://doi.org/10.1145/3127005.3127009 + + +Businge J, Kawuma S, Openja M, Bainomugisha E, Serebrenik A (2019) How stable are eclipse application framework internal interfaces? In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). pp 117–127. https://doi.org/10.1109/SANER.2019.8668018 + + +Businge J, Openja M, Kavaler D, Bainomugisha E, Khomh F, Filkov V (2019) Studying android app popularity by cross-linking github and google play store. In: SANER + + +Businge J, Openja M, Nadi S, Bainomugisha E, Berger T (2018) Clone-based variability management in the android ecosystem. In: 2018 IEEE international conference on software maintenance and evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018, pp 625–634 + + +Businge J, Serebrenik A, van den Brand M (2012) Compatibility prediction of eclipse third-party plug-ins in new eclipse releases. In: 12th IEEE international working conference on source code analysis and manipulation, SCAM 2012, Riva del Garda, Italy, September 23-24, 2012, pp 164–173 + + +Businge J, Serebrenik A, van den Brand M (2012) Survival of eclipse third-party plug-ins. In: 28th IEEE international conference on software maintenance, ICSM 2012, Trento, Italy, September 23-28, 2012, pp 368–377. https://doi.org/10.1109/ICSM.2012.6405295 +Businge J, Serebrenik A, van den Brand M (2013) Analyzing the eclipse API usage: Putting the developer in the loop. In: 17th European conference on software maintenance and reengineering, CSMR 2013, Genova, Italy, March 5-8, 2013. pp 37–46 + + +Businge J, Serebrenik A, van den Brand MGJ (2010) An empirical study of the evolution of Eclipse third-party plug-ins. In: EVOL-IWPSE’10. ACM, pp 63–72 + + +Businge J, Serebrenik A, van den Brand MGJ (2015) Eclipse API usage: the good and the bad. Softw Qual J 23(1):107–141. https://doi.org/10.1007/s11219-013-9221-3 + + +Chacon S, Straub B (2014) git tools - rewriting history. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History + + +Chacon S, Straub B (2014) Pro Git Apress + + +Chua BB (2017) A survey paper on open source forking motivation reasons and challenges. In: Alias RA, Ling PS, Bahri S, Finnegan P, Sia CL (eds) 21st Pacific Asia conference on information systems, PACIS 2017, Langkawi, Malaysia, July 16-20, 2017. p 75 + + +Czarnecki KBanâtre JP, Fradet P, Giavitto JL, Michel O (eds) (2005) Overview of generative software development. Springer, Berlin + + +Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Softw Eng 24(1):381–416. https://doi.org/10.1007/s10664-017-9589-y + + +Dixion J (2009) Different kinds of open source forks – salad, dinner, and fish. https://jamesdixon.wordpress.com/2009/05/13/different-kinds-of-open-source-forks-salad-dinner-and-fish/ + + +Dubinsky Y, Rubin J, Berger T, Duszynski S, Becker M, Czarnecki K (2013) An exploratory study of cloning in industrial software product lines. In: CSMR + + +Ernst NA, Easterbrook SM, Mylopoulos J (2010) Code forking in open-source software: a requirements perspective. arXiv:1004.2889 + + +Gamalielsson J, Lundell B (2014) Sustainability of open source software communities beyond a fork: How and why has the libreoffice project evolved? J Syst Softw 89:128–145. https://doi.org/10.1016/j.jss.2013.11.1077. http://www.sciencedirect.com/science/article/pii/S0164121213002744 + + +German DM, Adams B, Hassan AE (2016) Continuously mining distributed version control systems: An empirical study of how linux uses git. Empir Softw Eng 21(1):260–299 + + +Jang J, Agrawal A, Brumley D (2012) Redebug: Finding unpatched code clones in entire OS distributions. In: IEEE symposium on security and privacy, SP 2012, 21-23 May 2012, San Francisco, California, USA. IEEE Computer Society, pp 48–62. https://doi.org/10.1109/SP.2012.13 + + +Jiang J, Lo D, He J, Xia X, Kochhar PS, Zhang L (2017) Why and how developers fork what from whom in github. Empir Softw Eng 22(1):547–578. https://doi.org/10.1007/s10664-016-9436-6 + + +Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: MSR + + +Kawuma S, Businge J, Bainomugisha E (2016) Can we find stable alternatives for unstable eclipse interfaces? In: 2016 IEEE 24th international conference on program comprehension (ICPC), pp. 1–10. https://doi.org/10.1109/ICPC.2016.7503716 + + +Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, de Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, ICSE-SEIP ’18. Association for Computing Machinery, New York, pp 124–133. https://doi.org/10.1145/3183519.3183542 + + +Krueger J, Berger T (2020) Activities and costs of re-engineering cloned variants into an integrated platform. In: 14th international working conference on variability modelling of software-intensive systems (VaMoS) + + +Krueger J, Berger T (2020) An empirical analysis of the costs of clone- and platform-oriented software reuse. In: 28th ACM SIGSOFT international symposium on the foundations of software engineering (FSE) + + +Krueger J, Mahmood W, Berger T (2020) Promote-pl: A round-trip engineering process model for adopting and evolving product lines. In: 24th ACM international systems and software product line conference (SPLC) + + +Laurent AS (2008) Understanding open source and free software licensing. O’Reilly Media, Newton + + +Li L, Martinez J, Ziadi T, Bissyandé TF, Klein J, Traon YL (2016) Mining families of android applications for extractive spl adoption. In: SPLC + + +Lillack M, Stanciulescu S, Hedman W, Berger T, Wasowski A (2019) Intention-based integration of software variants. In: 41st international conference on software engineering (ICSE) + + +Mahmood W, Chagama M, Berger T, Hebig R (2020) Causes of merge conflicts: A case study of elasticsearch. In: 14th international working conference on variability modelling of software-intensive systems (VaMoS) +Mojica IJ, Adams B, Nagappan M, Dienst S, Berger T, Hassan AE (2014) A large scale empirical study on software reuse in mobile apps. IEEE Softw 31(2):78–86 + + +Mukelabai M, Berger T, Borba P (2021) Semi-automated test-case propagation in fork ecosystems. In: 43rd international conference on software engineering, new ideas and emerging results track (ICSE/NIER) + + +Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating GitHub for engineered software projects. Empir Softw Eng 22(6):3219–3253 + + +Nyman L (2014) Hackers on forking. In: Proceedings of The international symposium on open collaboration. pp 1–10 + + +Nyman L, Lindman J (2013) Code forking, governance, and sustainability in open source software. Technol Innov Manag Rev 3:7–12 + + +Nyman L, Mikkonen T (2011) To fork or not to fork: Fork motivations in sourceforge projects. In: Open source systems: grounding research. pp 259–268 + + +Nyman L, Mikkonen T, Lindman J, Fougère M (2012) Perspectives on code forking and sustainability in open source software. In: Open source systems: long-term sustainability. pp 274–279 + + +Openja M, Adams B, Khomh F (2020) Analysis of modern release engineering topics: – a large-scale study using stackoverflow –. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). pp 104–114. https://doi.org/10.1109/ICSME46990.2020.00020 + + +Paixão M, Maia P (2019) Rebasing in code review considered harmful: A large-scale empirical investigation. In: 2019 19th international working conference on source code analysis and manipulation (SCAM). pp 45–55 + + +Parnas DL (1976) On the design and development of program families. IEEE Trans Softw Eng 2(1):1–9. https://doi.org/10.1109/TSE.1976.233797 + + +Perry DE, Siy HP, Votta LG (2001) Parallel changes in large-scale software development: An observational case study. ACM Trans Softw Eng Methodol 10(3):308–337. https://doi.org/10.1145/383876.383878 + + +Raymond ES (2001) The Cathedral & the Bazaar: Musings on linux and open source by an accidental revolutionary. Newton, O’Reilly Media Inc + + +Ren L, Zhou S, Kästner C (2018) Poster: forks insight: providing an overview of github forks. In: 2018 IEEE/ACM 40th international conference on software engineering: companion (ICSE-Companion). pp 179–180 + + +Robles G, González-Barahona JM (2012) A comprehensive study of software forks: dates, reasons and outcomes. In: Open source systems: long-term sustainability. pp 1–14 + + +Sattler F, von Rhein A, Berger T, Johansson NS, Hardø MM, Apel S (2018) Lifting inter-app data-flow analysis to large app sets. Autom Softw Eng 25:315–346 + + +Silva LD, Borba P, Mahmood W, Berger T, Moisakis J (2020) Detecting semantic conflicts via automated behavior change detection. In: 36th IEEE international conference on software maintenance and evolution (ICSME) + + +Sousa M, Dillig I, Lahiri SK (2018) Verified three-way program merge. Proc ACM Program Lang 2(OOPSLA). https://doi.org/10.1145/3276535 + + +de Souza CRB, Redmiles D, Dourish P. (2003) Breaking the code, moving between private and public work in collaborative software development. In: Proceedings of the 2003 international ACM SIGGROUP conference on supporting group work, GROUP ’03. Association for Computing Machinery, New York, pp 105–114. https://doi.org/10.1145/958160.958177 + + +Stanciulescu S, Schulze S, Wasowski A (2015) Forked and integrated variants in an open-source firmware project. In: IEEE international conference on software maintenance and evolution (ICSME), ICSME ’15 + + +Sung C, Lahiri SK, Kaufman M, Choudhury P, Wang C (2020) Towards understanding and fixing upstream merge induced conflicts in divergent forks: An industrial case study. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering: software engineering in practice, ICSE-SEIP ’20. Association for Computing Machinery, New York, pp 172–181. https://doi.org/10.1145/3377813.3381362 + + +Vandehey S (2019) Rebase and merge. https://cloudfour.com/thinks/squashing-your-pull-requests/ + + +Viseur R (2012) Forks impacts and motivations in free and open source projects. Int J Adv Comput Sci Appl - IJACSA 3(2) + + +Zhou S, Stănciulescu C, Leßenich O, Xiong Y, Wasowski A, Kästner C (2018) Identifying features in forks. In: Proceedings of the 40th international conference on software engineering. pp 105–116 + + +Zhou S, Vasilescu B, Kästner C (2019) What the fork: A study of inefficient and efficient forking practices in social coding. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. pp 350–361 + + +Zhou S, Vasilescu B, Kästner C (2020) How has forking changed in the last 20 years? a study of hard forks on github. In: Proceedings of the 42nd international conference on software engineering. Accepted +Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. + + +John Businge is a Postdoctoral fellow in the LORE lab at the University of Antwerp, Belgium. He received his PhD from Eindhoven University of Technology, the Netherlands in 2013. After receiving his PhD, he was a lecturer at Mbarara University of Science and Technology, Uganda. For six months in 2016 he was a Fulbright research scholar at the University of California, Davis in the U.S.A. His research focuses on mining software repositories, clone detection, program analysis, variability management, and empirical software engineering. + + +Moses Openja is a PhD student and a member of SWAT Lab Polytechnique Montreal, Canada. He received his bachelor’s degree in 2017 from Mbarara University of Science and Technology, Uganda and his masters degree in 2021 from Polytechnique Montreal, Canada. His research area includes software quality of machine learning applications, empirical software Engineering, Software maintenance and evolution in the software ecosystem, and release engineering. + + +Sarah Nadi is an Assistant Professor in the Department of Computing Science at the University of Alberta, and a Tier II Canada Research Chair in Software Reuse. She obtained her Master’s (2010) and PhD (2014) degrees from the University of Waterloo in Canada. Before joining the University of Alberta in 2016, she spent approximately two years as a post-doctoral researcher at the Technische Universität Darmstadt in Germany. Sarah’s research focuses on providing intelligent support for software maintenance and reuse, including creating recommender systems to guide developers through correctly and securely reusing individual functionality from external libraries. +Thorsten Berger is a Professor in Computer Science at Ruhr University Bochum in Germany. After receiving the PhD degree from the University of Leipzig in Germany in 2013, he was a Postdoctoral Fellow at the University of Waterloo in Canada and the IT University of Copenhagen in Denmark, and then an Associate Professor jointly at Chalmers University of Technology and the University of Gothenburg in Sweden. He received competitive grants from the Swedish Research Council, the Wallenberg Autonomous Systems Program, Vinnova Sweden (EU ITEA), and the European Union. He is a fellow of the Wallenberg Academy—one of the highest recognitions for researchers in Sweden. He received two best-paper awards and one most influential paper award. His service was recognized with distinguished reviewer awards at the tier-one conferences ASE 2018 and ICSE 2020. His research focuses on model-driven software engineering, program analysis, and empirical software engineering. + + +Affiliations + + +John Businge\textsuperscript{1,2} · Moses Openja\textsuperscript{3} · Sarah Nadi\textsuperscript{4} · Thorsten Berger\textsuperscript{5,6} + + +Moses Openja +openjamosesopm@gmail.com + + +Sarah Nadi +nadi@ualberta.ca + + +Thorsten Berger +thorsten.berger@rub.de + + +\textsuperscript{1} Mbarara University of Science and Technology, Mbarara, Uganda +\textsuperscript{2} University of Antwerp, Antwerp, Belgium +\textsuperscript{3} SWAT Lab., École Polytechnique de Montréal, Montréal, Canada +\textsuperscript{4} University of Alberta, Edmonton, Canada +\textsuperscript{5} Ruhr University Bochum, Bochum, Germany +\textsuperscript{6} Chalmers | University of Gothenburg, Gothenburg, Sweden +---------------------------------------- +------------------------------- +Section 171: +What to Expect from Code Review Bots on GitHub? A Survey with OSS Maintainers + + +Mairieli Wessel + +mairieli@ime.usp.br + +University of São Paulo + + +Alexander Serebrenik + +a.serebrenik@tue.nl + +Eindhoven University of Technology + + +Igor Wiese + +igor@utfpr.edu.br + +Universidade Tecnológica Federal do Paraná + + +Igor Steinmacher + +igor.steinmacher@nau.edu + +Northern Arizona University + + +Marco A. Gerosa + +marco.gerosa@nau.edu + +Northern Arizona University + + +ABSTRACT +Software bots are used by Open Source Software (OSS) projects to streamline the code review process. Interfacing between developers and automated services, code review bots report continuous integration failures, code quality checks, and code coverage. However, the impact of such bots on maintenance tasks is still neglected. In this paper, we study how project maintainers experience code review bots. We surveyed 127 maintainers and asked about their expectations and perception of changes incurred by code review bots. Our findings reveal that the most frequent expectations include enhancing the feedback bots provide to developers, reducing the maintenance burden for developers, and enforcing code coverage. While maintainers report that bots satisfied their expectations, they also perceived unexpected effects, such as communication noise and newcomers’ dropout. Based on these results, we provide a series of implications for bot developers, as well as insights for future research. + + +CCS CONCEPTS +• Human-centered computing → Open source software; • Software and its engineering → Software creation and management. + + +KEYWORDS +software bots, pull-based model, open source software, code review + + +ACM Reference Format: +Mairieli Wessel, Alexander Serebrenik, Igor Wiese, Igor Steinmacher, and Marco A. Gerosa. 2020. What to Expect from Code Review Bots on GitHub? A Survey with OSS Maintainers. In 34th Brazilian Symposium on Software Engineering (SBES ’20), October 21–23, 2020, Natal, Brazil. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3422392.3422459 + + +1 INTRODUCTION +Code review is a software quality assurance practice [8] common in Open Source Software (OSS) projects [3]. Since open source development involves a community of geographically dispersed developers [23], projects are often hosted on social coding platforms, such as GitHub [7]. To receive external contributions, repositories are shared by fork, and modified by pull requests. In the pull-based development model, project maintainers spend a non-negligible time inspecting code changes and engaging in discussion with contributors to understand and improve the modifications before integrating them into the codebase [15, 33] + + +Open source software communities use software bots to assist and streamline the code review process [9, 29]. In short, bots are software applications that integrate with human tasks, serving as interfaces that connect developers and other tools [26], and providing additional value to human users [12]. Accomplishing tasks that were previously performed solely by human developers, and interacting in the same communication channels as their human counterparts, bots have become new voices in the code review conversation [17]. According to Wessel et al. [29], code review bots differ from other bots by guiding contributors to provide necessary information before maintainers review the pull requests. On GitHub, these bots are responsible for leaving comments on pull requests, reporting continuous integration failures, code quality checks, and code coverage. + + +In theory, the automation provided by these bots should save maintainers effort and time [25], and lead them to focus on higher priority aspects of code review [2]. Nevertheless, the adoption of a code review bot, similar to any technological adoption, can bring unexpected consequences. Since, according to Mulder et al. [18], many effects are not directly caused by the new technology itself, but by the changes in human behavior that it provokes, it is important to assess and discuss the effects of new technology. In the case of the effect of software bots on project maintainers, this is often neglected. + + +In this paper, we aim to understand why open source maintainers integrate code review bots into the pull request workflow and how they perceive the changes these bots induce. In short, we answer the following research questions: + + +RQ1. What motivates maintainers to adopt code review bots? +RQ2. How do maintainers perceive the changes code review bots introduce to the software process? + + +To achieve our goal, we conducted a survey with 127 maintainers of OSS projects hosted on GitHub that adopted code review bots. We investigate the maintainers’ perceptions on whether project activity indicators change after bot adoption, such as the number of pull +requests received, merged, and non-merged, number of comments, and the time to close pull requests. + + +Analyzing the survey results, we found that maintainers were predominantly motivated by reducing their effort on tedious tasks to allow them to focus on more interesting ones, and enhancing the feedback communicated to developers. Regarding the changes introduced by the bot, we noted that less manual effort was required after adoption, a high-quality code was enforced, and pull request review sped up. However, four maintainers also reported unexpected aspects of bot adoption, including communication noise, more time spent on tests, newcomers’ dropout, and bots impersonating maintainers, which stressed out contributors. + + +Our contributions are twofold: (i) a set of maintainers’ motivations for using a bot to assist the code review process; and (ii) a discussion of how maintainers see the impact of bot introduction and support. These contributions may help maintainers anticipate bots’ effects on a project, and guide bot developers to consider the implications of new bots as they design them. Our findings, while preliminary, can suggest research hypotheses on the impact of code review bots on the code review process in open source projects, which follow-up studies can support or refute. +---------------------------------------- +------------------------------- +Section 172: +2 BACKGROUND AND RELATED WORK + + +Software bots have been designed to assist with the technical and social aspects of software development activities [13], including communication and decision-making [25]. Basically, these bots act as a conduit between software developers and other tools [25]. Wessel et al. have shown that bot adoption is indeed widespread in OSS projects hosted on GitHub [29]. GitHub bots have been developed to be integrated into the pull request workflow to perform a variety of tasks beyond code review support [31]. These tasks include repairing bugs [17, 27, 28], refactoring the code [32], recommending tools [4], detecting duplicated development [20], updating dependencies [16], and fixing static analysis violations [5]. + + +Despite their increasing popularity, understanding the effects of bots is a major challenge. Storey and Zagalsky [25] and Paikari and van der Hoek [19] highlight that the potential negative impact of task automation through bot technology is still neglected. While bots are often used to avoid interruptions to developers’ work, they may lead to other, less obvious distractions [25]. Additionally, Liu et al. [14] claim that bots may have negative impacts on the user experience of open source contributors, since the needs and preferences of maintainers and contributors are not the same. While previous studies provide recommendations on how to evaluate bots’ capabilities and performance [1, 4], they do not draw attention to the impact of bot adoption on software development or on how software engineers perceive the bots’ effects. + + +Wessel et al. [29] investigated the usage and impact of software bots to support contributors and maintainers with pull requests. After identifying bots on popular GitHub repositories, the authors classified these bots into 13 categories according to the tasks they perform. The third most frequently used bots are code review bots. Wessel et al. [30] also employed a regression discontinuity design on OSS projects, revealing that the bot adoption increases the number of monthly merged pull requests, decreases monthly non-merged pull requests, and decreases communication among developers. + + +Prior work has also investigated the impact of continuous integration (CI) and code review tools on GitHub projects [6, 11, 34]. While Zhao et al. [34] and Cassee et al. [6] investigated the impact of the Travis CI tool’s introduction on development practices, Kavaler et al. [11] turned to the impact of linters, dependency managers, and coverage reporter tools. Our work extends the literature by providing an understanding of why code review bots are being adopted and the effects of such adoption, focusing on the perceptions of open source maintainers. +---------------------------------------- +------------------------------- +Section 173: +3 STUDY METHODOLOGY + + +We conducted a survey to obtain insights on how open source maintainers perceive the impact of using code review bots on pull requests and the effects of these bots on the project activities. + + +3.1 Survey Design + + +We first identified OSS projects hosted on GitHub that at some point had adopted at least one code review bot [29]. To find these projects, we queried the GHTorrent dataset [10], searching for projects that had received comments on pull requests from any of the code review bots identified by Wessel et al. [29]. For each project, we determined when a bot was introduced based on the date of the bot’s first comment. Afterwards, we contacted maintainers who merged more than one pull request before and after the bot adoption. To avoid duplicate invitations, we kept only the first record of maintainers who appeared in more than one project. Our initial target population comprised 1,960 maintainers of projects that adopted code review bots and made their e-mail addresses publicly available via the GitHub API. + + +To increase survey participation, we followed the best practices described by Smith et al. [21], such as sending personalized invitations and allowing participants to remain anonymous. The survey was set up as an online questionnaire, and it was sent on September 18, 2019. We received answers for 3 months and sent a reminder on October 2019. Participation was voluntary, and the estimated time to complete the survey was 10 minutes. We received answers from 127 maintainers, while the delivery of 26 messages failed. For this survey, we had a response rate of $\approx 6.55\%$, which is consistent with other studies in software engineering [22]. + + +Our maintainers’ survey had three main questions, which we made publicly available.1 In summary, we asked maintainers about their expectations and perception of changes caused by the adoption of a code review bot. Regarding the changes in the software process level, we asked maintainers about the same activity indicators studied by Wessel et al. [29]: the number of opened, merged, and non-merged pull requests, number of comments, and the time to close pull requests. + + +3.2 Data analysis + + +We used a card sorting approach [35] to qualitatively analyze the answers to the open-ended questions Q1 and Q3. Two researchers conducted card sorting in two steps. In the first step, each researcher analyzed the answers (cards) independently and applied codes to each answer, sorting them into meaningful groups. This step was followed by a discussion meeting until reaching a consensus on the + + +1https://zenodo.org/record/3992379#.Xz1_iSlKg3E +Table 1: Reasons for adoption of code review bots + + +| Reasons | # of answers (%) | +|--------------------------------|------------------| +| Enhance feedback to developers | 31 (24.4%) | +| Reduce maintainers effort | 30 (23.6%) | +| Enforce high code coverage | 22 (17.3%) | +| Automate routine tasks | 20 (15.7%) | +| Ensure high-quality standards | 20 (15.7%) | +| Detect change effects | 7 (5.5%) | +| Curiosity | 5 (3.9%) | +| Improve interpersonal communication | 5 (3.9%) | +| Lack of available tools | 5 (3.9%) | +| Outside contributor’s suggestion | 2 (1.6%) | + + +code names and categorization of each item. At the end of this process, the answers were sorted into high-level groups. In the second step, the researchers analyzed the categories, aiming to refine the classification and group-related codes into more significant, higher-level categories and themes. We used open card sorting, meaning we had no predefined codes or groups; the codes emerged and evolved during the analysis process. In addition, we quantitatively analyzed closed-ended question (Q2) to understand developers’ perceptions of the impact of bots on pull requests. +---------------------------------------- +------------------------------- +Section 174: +4 RESULTS + + +In this section, we report our main findings. + + +4.1 Maintainers’ Motivations to Adopt a Code Review Bot + + +We asked maintainers what made them decide to start using bots to support code review activities. Four participants (3.15%) did not report any reason. The other answers were grouped into 10 categories, as can be seen in Table 1. + + +From the maintainers’ perspective, the most recurrent motivation relates to enhancing the feedback to developers (31 mentions). This category includes cases in which the respondents’ desired to see both code review metrics and additional information “in a pretty and automated fashion” and “without having to go to another tool.” Several respondents recognized the value of bot feedback for both reviewers and contributors: “bots write useful information as comments and you can analyze it without switching the context.” In addition, other respondents pointed out the importance of “giving uniform feedback to all contributors” and “let[ting] contributors see how they affect the code.” Another two respondents mentioned that this kind of feedback might also increase contributors’ public accountability, giving reviewers “confidence that the author cares about testing” and about the quality of the code contribution. + + +Another recurrent reason regards reducing maintainers’ effort (30 mentions). Several maintainers were motivated by the necessity to save time and reduce their own effort during the code review process. Most of them said that reducing maintainers’ effort on trivial tasks, such as finding syntax errors and checking code style and coverage requirements, allows them to “spend more time on the important parts.” Moreover, the feedback provided by a code review bot helps maintainers avoid “repeating the same comments for each pull request.” + + +With 22 mentions, enforcing high code coverage during the code review process was the third most common reason. In general, respondents mentioned that code review bots were adopted to help detect and prevent reduction in code coverage. They also mentioned that these bots “ensure good coverage to allow changes on the code base with high confidence that the project will continue to function as expected” since they “don’t want to drop (significantly) in coverage.” Respondents (20) also reported another related reason: ensure high-quality standards. Respondents said that using code review bots for “automating repetitive tasks ensures they get done, increasing code quality” and “reduce[s] the risk of bugs being missed by reviewers.” + + +Several maintainers (20) were also motivated by automating routine tasks that previously were manually performed. Respondents mentioned the desire to automate routine tasks in order to structure the process of code review and “make the process more repeatable.” The routine tasks include tracking the coverage and “automatically upload[ing] code coverage results to a 3rd-party service.” Others provided more generic answers, briefly mentioning “automation.” + + +Maintainers were also motivated by curiosity to test a new technological tool and by a suggestion of an outside contributor. In the other five cases, our respondents were motivated by improving interpersonal communication, since “an automatic answer by a bot isn’t taken personally” and “it is a friendly way to ensure quality.” Moreover, a code review bot “improves interpersonal communication on pull requests and thus may reduce the chance a pull request is abandoned by the author.” + + +Answer to RQ1. Maintainers reported 10 reasons for using code review bots. We found that several maintainers were motivated by enhancing the feedback to developers (24.4%), reducing their own efforts (23.6%), and enforcing high code coverage (17.3%). + + +4.2 Maintainers’ Perceptions of Bots Effects + + +We also asked maintainers about their perspective on the potential changes to their projects that the code review bot introduced. The answers followed a 5-point Likert scale with neutral, ranging from “Strongly disagree” to “Strongly agree.” In Figure 1, we observe that most of the respondents did not agree with the expected impact of bot adoption on pull requests, considering the five studied activities indicators: number of pull requests received, merged, and non-merged; number of comments; and the time to close pull requests. + + +Most of the respondents claimed that there is no relation between the number of pull requests and the presence of the bot; they stated that the amount of opened pull requests “depends on bugs or features for the software.” However, one respondent claimed that it could lead to an increase in the number of pull requests, and “a better experience for everyone involved (which might eventually lead to repeat contributors).” Regarding merged and non-merged pull requests, maintainers claimed that these trends are typically “human factors” unrelated to bot adoption. One maintainer believed that the ability to filter out contributions that reduce code quality also reduces the merge rates of pull requests. +Respondents (36%) perceived an increase in the number of comments made to pull requests after bot adoption. One respondent claimed that this increase occurs because contributions that drastically reduce the coverage stimulate the exchange of comments between maintainers and contributors. Another maintainer explained that the number of comments increased because maintainers and “contributors started discussing how to best test something.” + + +Maintainers believe (41% of them) that the code review bot helped decrease the time-to-close pull requests. One respondent did not agree with the statement, and left a comment telling us that the code review bot actually increased the time to merge pull requests, due to the need for additional time to write tests and obtain a stable code. Another maintainer commented that the bot increases the time to merge the contributions, though to them “it is not perceived as a bad thing.” + + +We also openly asked maintainers about the changes introduced by the adoption of code review bots on the maintenance process and in the project itself. Twenty-three participants (18.1%) did not report any change. The other responses were grouped into 13 categories, as can be seen in Table 2. + + +The most recurrent reported change is that the adoption of code review bots requires less manual labor from maintainers (33 mentions). In general, respondents mentioned that the maintenance process is easier when they have fewer manual tasks to perform, because they “need to spend less time on it.” The maintainers also suggest that bots could help reduce the number of human resources necessary to complete a task, which makes “it easier by reducing the number of review comments, general feedback and manual quality assurance required for a successful merge.” Nevertheless, maintainers are also aware of the implications that “automation like this is always prone to non-fatal error.” + + +Several maintainers (20) noticed changes in the quality of the contributions received, reporting that the bot helps to enforce high-quality code. In one example, a respondent mentioned that “the introduction of bots increased the quality of the code seen by maintainers in the initial review since contributors got timely (a few minutes) feedback about parts that failed basic quality standards such as missing tests, missing documentation, incorrect style, or broken functionality.” Another 6 respondents also realized positive effects on the quality of the code review process, which “translate in a more efficient code review and more robust codebase in the long term.” + + +Since one of the most common reasons to adopt a code review bot is to enforce code coverage, unsurprisingly, 16 respondents mentioned the increase in the code coverage after adoption. Most of the respondents reported that these bots help to “encourage to add more tests” when “the coverage is not good enough.” One respondent stated the importance of the awareness of code coverage: “the effects are visible to the contributors, and they will generally resolve any decreased coverage in the pull request.” Additionally, one respondent claimed that the bot feedback also “spurred further pull requests to increase coverage.” + + +Another bot adoption effect is that reviewing pull requests became faster, which was reported by 16 maintainers. Three respondents mentioned that faster reviews lead to faster merging. A respondent stated that high-quality pull requests were more quickly identified since “the human review step was always started with a + + + +baseline level of quality” and thus merged faster. In addition, another maintainer reinforced the efficiency of this process: “some of the bots do it so well, that we can merge pull requests immediately after opening it.” In addition, 7 maintainers also reported that the quality of the code review process improved. + + +Other categories, although less recurrent, called our attention to the negative effects reportedly caused by bot adoption. One respondent said that bots intimidate newcomers, since some newcomers close their pull requests after a bot comment. Another believes that, for a newcomer, receiving an assessment “you let coverage go down,” instead of a “thanks for your contribution,” “can be a little daunting.” Respondents also mentioned that after adoption testing started to require more time than development and the bot’s comments introduced noise. Another respondent said that a bot can impersonate human developers due to bots’ strict rules, which stressed out contributors. + + +Answer to RQ2. Among the positive changes incurred from code review bots, maintainers reported that less manual labor was required after bot adoption (25.9%) and bots enforced high-quality code (15.7%). The negative effects include communication noise, more time spent with tests, newcomers’ dropout, and bots impersonating maintainers. +---------------------------------------- +------------------------------- +Section 175: +5 DISCUSSION AND IMPLICATIONS + + +Adding a code review bot to a project can represent the desire to better communicate with developers, helping contributors and maintainers be more effective, and achieving improved interpersonal communication, as already discussed by Storey and Zagalsky [25]. In fact, our results reveal that the predominant reason for using a code review bot is to improve the feedback communicated to developers. Moreover, maintainers are also interested in automating code review tasks to reduce the maintenance burden and enforce high code coverage. + + +Most of the maintainers’ perceptions of how bots impact on maintenance are in line with the reported motivations. Indeed, maintainers started to spend less effort on trivial tasks, allowing them to focus on more important aspects of code review. Furthermore, code review bots guide contributors toward detecting change effects before maintainers triage the pull requests [29], ensuring high-quality standards and a faster code review. Bots’ feedback provides an immediate and clear sense of what contributors need to do to have their contribution reviewed. Maintainers also noted that contributors’ confidence increased when a code review bot provided situational awareness [25], indicating standards, language issues, and coverage to contributors. + + +On the one hand, adopting a bot save maintainers’ costs, time, and effort during the code review activities. On the other hand, our study also reports four unexpected and negative effects of adopting a bot to assist the code review process. Such effects include communication noise, more time spent with tests, newcomers’ dropout, and bots impersonating maintainers. Although less recurrent, these effects are non-negligible to the OSS community. + + +Previous work by Wessel et al. [29] has already mentioned the support for newcomer onboarding both in terms of challenges and as a feature maintainers desire. In our survey, maintainers claim it is easier for newcomers to submit a high-quality pull request with only the intervention of bots. However, another maintainer pointed out that when newcomers and casual contributors receive feedback from the bot, it can lead to rework, discussions, and ultimately dropping out from contributing. + + +Our study suggests practical implications for practitioners as well as insights and suggestions for researchers. + + +Awareness of bot effects. Indeed, the maintenance activities changed following the adoption of code review bots. This change can directly affect contributors’ and maintainers’ work. Hence, understanding how the code review bot adoption affects a project is important for practitioners, mainly to avoid unexpected or even undesired effects. Awareness of unexpected bot effects can lead maintainers to take countermeasures and/or decide whether or not to use a code review bot. + + +Improving bots’ design. Anyone who wants to develop a bot to support the code review process needs to consider the impact the bot may have on both technical and social contexts. Based on our results, further bot improvements can be envisioned. For example, in order to prevent bots from introducing communication noise, bot developers should know when and to what extent the bot should interrupt a human [14, 24]. + + +Improving newcomers support. As aforementioned, previous literature on bots already mentioned a lack of support for newcomers [29]. It is reasonable to expect that newcomers who receive friendly feedback will have a higher engagement level and thus sustain their participation on the project. Hence, future research can help bot designers by providing guidelines and insights to support new contributors. +---------------------------------------- +------------------------------- +Section 176: +6 THREATS TO VALIDITY + + +Since we leverage qualitative research methods to categorize the open-ended questions asked in our survey, we may have introduced categorization bias. To mitigate this bias, we conducted this process in pairs and carefully discussed categorization among the authors. Regarding our survey, the order that we presented the questions to the respondents may have influenced the way they answered them. In addition, we cannot guarantee that maintainers correctly understood sentences 4 and 5. We tried to order the questions based on the natural sequence of actions to help respondents understand the questions’ context. +---------------------------------------- +------------------------------- +Section 177: +7 FINAL CONSIDERATIONS + + +In this work, we conducted a preliminary investigation into maintainers’ perceptions of the effects of adopting bots to support the code review process on pull requests. The most frequently mentioned motivations for using bots including automating repetitive tasks, improving tools’ feedback to developers and reducing maintenance effort (RQ1). Moreover, maintainers cite several benefits of bots, such as decreasing the time to close pull requests and reducing the workload with laborious and repetitive tasks. However, maintainers also stated negative effects, including the introduction of noise and (RQ2). Based on these preliminary findings, future research can focus on better supporting and understanding bots’ influences on social interactions in the context of OSS projects. +Moreover, future work can investigate the effects of adopting a bot and the expansion of our analysis for other types of bots, activity indicators, and social coding platforms. + + +ACKNOWLEDGMENTS + + +We thank all the participants of this study, who volunteered to support our research. This work was partially supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001, CNPq (grant 141222/2018-2), and National Science Foundation (grants 1815503 and 1900903). + + +REFERENCES + + +[1] Ahmad Abdellatif and Emad Shihab. 2020. MSBot: Using Bots to Answer Questions from Software Repositories. Empirical Software Engineering (EMSE) 25 (2020), 1834–1863. https://doi.org/10.1007/s10664-019-09788-5 + + +[2] Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 712–721. + + +[3] Olga Baysal, Oleksii Kononenko, Reid Holmes, and Michael W Godfrey. 2016. Investigating technical and non-technical factors influencing modern code review. Empirical Software Engineering 21, 3 (2016), 932–959. + + +[4] Chris Brown and Chris Parpin. 2019. Sorry to Bother You: Designing Bots for Effective Recommendations. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE ’19). IEEE Press, Piscataway, NJ, USA, 54–58. https://doi.org/10.1109/BotSE.2019.00021 + + +[5] A. Carvalho, W. Luz, D. Marciolo, R. Bonfáciu, G. Pinto, and E. Dias Canedo. 2020. C-3PR: A Bot for Fixing Static Analysis Violations via Pull Requests. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society. + + +[6] Nathan Cassee, Bogdan Vasilescu, and Alexander Serebrenik. 2020. The silent helper: the impact of continuous integration on code reviews. In 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 49–60. + + +[7] Linda Erenholt, Francisco Gomez de Oliveira Neto, Riccardo Scandariato, and Philipp Leitner. 2019. Current and Future Bots in Software Development. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE ’19). IEEE Press, Piscataway, NJ, USA, 7–11. https://doi.org/10.1109/BotSE.2019.00009 + + +[8] Georgios Gousios and Diomidis Spinellis. 2012. GHtorrent: GitHub’s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 12–21. + + +[9] David Kavaler, Asher Trockman, Bogdan Vasilescu, and Vladimir Filkov. 2019. Tool choice matters: JavaScript quality assurance tools and usage outcomes in GitHub projects. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 476–487. + + +[10] Carlene Lebeuf, Alexey Zagalsky, Matthieu Foucault, and Margaret-Anne Storey. 2019. Defining and Classifying Software Bots: A Faceted Taxonomy. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE ’19). IEEE Press, Piscataway, NJ, USA, 1–6. https://doi.org/10.1109/BotSE.2019.00008 + + +[11] Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why developers are slacking off: Understanding how software teams use slack. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 333–336. + + +[12] Dongyu Lau, Mich J. Smith, and Kalyan Veeramachaneni. 2020. Understanding User-Bot Interactions for Small-Scale Automation in Open-Source Development. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3382998 + + +[13] Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2014. The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In Proceedings of the 11th Working Conference on Mining Software Repositories. 192–201. + + +[14] Samim Mirhosseini and Chris Parpin. 2017. Can Automated Pull Requests Encourage Software Developers to Upgrade Out-of-date Dependencies?. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (Urbana-Champaign, IL, USA) (ASE ’17). IEEE Press, Piscataway, NJ, USA, 94–94. http://dl.acm.org/citation.cfm?id=3155562.3155577 + + +[15] Martin Monperrus. 2019. Explainable Software Bot Contributions: Case Study of Automated Bug Fixes. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE ’19). IEEE Press, Piscataway, NJ, USA, 12–15. https://doi.org/10.1109/BotSE.2019.00010 + + +[16] KF Mulder. 2013. Impact of new technologies: how to assess the intended and unintended effects of new technologies. Handb. Sustain. Eng. (2013). + + +[17] Elahe Paikari and André van der Hoek. 2018. A Framework for Understanding Chatbots and Their Future. In Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (Gothenburg, Sweden) (CHASE ’18). ACM, New York, NY, USA, 13–16. https://doi.org/10.1145/3195836.3195859 + + +[18] Luyao Ren, Shurui Zhou, Christian Kastner, and Andrzei Woznarski. 2019. Identifying Redundancies in Fork-based Development. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 230–241. + + +[19] Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and Thomas Zimmermann. 2013. Improving developer participation rates in surveys. In 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 89–92. + + +[20] Igor Steinmacher, Gustavo Pinto, Igor Scaliante Wiese, and Marco A. Gerosa. 2018. Almost There: A Study on Quasi-contributors in Open Source Software Projects. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE-SEIP ’18). ACM, New York, NY, USA, 256–266. https://doi.org/10.1145/3180155.3180208 + + +[21] Igor Fábio Steinmacher. 2015. Supporting newcomers to overcome the barriers to contribute to open source software projects. Ph.D. Dissertation. Universidade de São Paulo. + + +[22] Margaret-Anne Storey, Alexander Serebrenik, Carolyn Penstein Rosé, Thomas Zimmermann, and James D. Herbsleb. 2020. B0Tse: Bots in Software Engineering (Dagstuhl Seminar 19471). Dagstuhl Reports 9, 11 (2020), 84–96. + + +[23] Margaret-Anne Storey and Alexey Zagalsky. 2016. Disrupting Developer Productivity One Bot at a Time. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). ACM, New York, NY, USA, 928–931. https://doi.org/10.1145/2950290.2938989 + + +[24] Margaret-Anne Storey, Alexey Zagalsky, Fernando Figueira Filho, Leif Singer, and Daniel M. German. 2017. How Social and Communication Channels Shape and Challenge a Participatory Culture in Software Development. IEEE Trans. Softw. Eng. 43, 2 (Feb. 2017), 185–204. https://doi.org/10.1109/TSE.2016.2584053 + + +[25] Simon Urli, Zhongxing Yu, Lionel Seinturier, and Martin Monperrus. 2018. How to Design a Program Repair Bot? Insights from the Repairinator Project. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (Gothenburg, Sweden) (ICSE-SEIP ’18). ACM, New York, NY, USA, 95–104. https://doi.org/10.1145/3183519.3183540 + + +[26] Rijnard van Tonder and Claire Le Goues. 2019. Towards an Engineer/Bot: Principles for Program Repair Bots. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE ’19). IEEE Press, Piscataway, NJ, USA, 43–47. https://doi.org/10.1109/BotSE.2019.00019 + + +[27] Mairieli Wessel, Bruno Mendes de Souza, Igor Steinmacher, Igor S. Wiese, Ivanilto Polato, Ana Paula Chaves, and Marco A. Gerosa. 2018. The Power of Bots: Characterizing and Understanding Bots in OSS Projects. Proceedings of the ACM Conference on Computer Supported Cooperative Work Social Computing 2, CSCW, Article 182 (Nov. 2018), 182:1–182:18. https://doi.org/10.1145/3274451 + + +[28] Mairieli Wessel, Alexander Serebrenik, Igor Scaliante Wiese, Igor Steinmacher, and Marco Aurelio Gerosa. 2020. Effects of Adopting Code Review Bots on Pull Requests to OSS Projects. In IEEE International Conference on Software Maintenance and Evolution. IEEE Computer Society. + + +[29] Mairieli Wessel and Igor Steinmacher. 2020. The Inconvenient Side of Software Bots on Pull Requests. In Proceedings of the 2nd International Workshop on Bots in Software Engineering (BotSE). https://doi.org/10.1145/3387940.3391504 + + +[30] Marvin Wyrich and Justus Bogner. 2019. Towards an Autonomous Bot for Automatic Source Code Refactoring. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE ’19). IEEE Press, Piscataway, NJ, USA, 24–28. https://doi.org/10.1109/BotSE.2019.00015 + + +[31] Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. 2015. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 367–371. https://doi.org/10.1109/MSR.2015.42 + + +[32] Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. 2017. The impact of continuous integration on other software development practices: a large-scale empirical study. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 60–71. + + +[33] Thomas Zimmermann. 2016. Card-sorting: From text to themes. In Perspectives on Data Science for Software Engineering. Elsevier, 137–141. +---------------------------------------- +------------------------------- +Section 178: +Why Do Developers Use Trivial Packages? + +An Empirical Case Study on npm + + +Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab + +Data-driven Analysis of Software (DAS) Lab + +Department of Computer Science and Software Engineering + +Concordia University + +Montreal, Canada + +{rab_abdu,o_nourry,s_alweha,s_mujahi,eshihab}@encs.concordia.ca + + +ABSTRACT + + +Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call ‘trivial packages’. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. + + +Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers’ beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be ‘deployment tested’ and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use. + + +CCS CONCEPTS + + +• Software and its engineering → Software libraries and repositories; Software maintenance tools; + + +KEYWORDS + + +JavaScript; Node.js; Code Reuse; Empirical Studies + + +ACM Reference Format: + + +Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab. 2017. Why Do Developers Use Trivial Packages? An Empirical Case Study on npm. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Paderborn, Germany, September 4–8, 2017 (ESEC/FSE’17). 11 pages. + +https://doi.org/10.1145/3106237.3106267 +---------------------------------------- +------------------------------- +Section 179: +1 INTRODUCTION + + +Code reuse is often encouraged due to its multiple benefits. In fact, prior work showed that code reuse can reduce the time-to-market, improve software quality and boost overall productivity [3, 32, 37]. Therefore, it is no surprise that emerging platforms such as Node.js encourage reuse and do everything possible to facilitate code sharing, often delivered as packages or modules that are available on package management platforms, such as the Node Package Manager (npm) [7, 39]. + + +However, it is not all good news. There are many cases where code reuse has had negative effects, leading to an increase in maintenance costs and even legal action [2, 29, 35, 41]. For example, in a recent incident code reuse of a Node.js package called left-pad, which was used by Babel, caused interruptions to some of the largest Internet sites, e.g., Facebook, Netflix, and Airbnb. Many referred to the incident as the case that ‘almost broke the Internet’ [33, 45]. That incident lead to many heated discussions about code reuse, sparked by David Haney’s blog post: “Have We Forgotten How to Program?” [26]. + + +While the real reason for the left-pad incident was that npm allowed authors to unpublish packages (a problem which has been resolved [40]), it raised awareness of the broader issue of taking on dependencies for trivial tasks that can be easily implemented [26]. Since then, there have been many discussions about the use of trivial packages. Loosely defined, a trivial package is a package that contains code that a developer can easily code him/herself and hence, is not worth taking on an extra dependency for. Many developers agreed with Haney’s position, which stated that every serious developer knows that ‘small modules are only nice in theory’ [8], suggesting that developers should implement such functions themselves rather than taking on dependencies for trivial tasks. Other work showed that npm packages tend to have a large number of dependencies [13, 14] and highlighted that developers need to use caution since some dependencies can grow exponentially [4]. In fact, in our dataset, we found that more than 11% of the trivial packages have more than 20 dependencies. +So, the million dollar question is “why do developers resort to using a package for trivial tasks, such as checking if a variable is an array?” At the same time, other questions regarding how prevalent trivial packages are and what the potential drawbacks of using these trivial packages remain unanswered. Therefore, we performed an empirical study involving more than 230,000 npm packages and 38,000 JavaScript applications to better understand why developers resort to using trivial packages. Our empirical study is qualitative in nature and is based on survey results from 88 Node.js developers. We also quantitatively validate the most commonly developer-cited reason and drawback related to the use of trivial packages. + + +Since, to the best of our knowledge, this is the first study to examine why developers use trivial packages, we first propose a definition of what constitutes a trivial package, based on feedback from JavaScript developers. We also examine how prevalent trivial packages are in npm and how widely they are used in Node.js applications. Our findings indicate that: + + +Trivial packages are common and popular. + Of the 231,092 npm packages in our dataset, 16.8% of them are trivial packages. Moreover, of the 38,807 Node.js applications on GitHub, 10.9% of them directly depend on one or more trivial packages. + + +Most developers do not consider the use of trivial packages as bad practice. + In our survey of the 88 JavaScript developers, 57.9% of them said they do not consider the use of trivial packages as bad practice, whereas only 23.9% consider it to be a bad practice. This finding shows that there is not a clear consensus on the issue of trivial package use. + + +Trivial packages provide well implemented and tested code and increase productivity. + Developers believe that trivial packages provide them with well implemented/tested code and increase productivity. At the same time, the increase in dependency overhead and the risk of breakage of their applications are the two most cited drawbacks. + + +Developers need to be careful which trivial packages they use. + Our empirical findings show that many trivial packages have their own dependencies. In fact, we found that 43.7% of trivial packages have at least one dependency and 11.5% of trivial packages have more than 20 dependencies. + + +In addition to the aforementioned findings, our study provides the following key contributions: + + + + +We provide a way to quantitatively determine trivial packages. + + +To the best of our knowledge, this is the first study to examine the prevalence, reasons for and drawbacks of using trivial packages in Node.js applications. Our study is also one of the largest studies on JavaScript applications, involving a survey of more than 80 JavaScript developers, 231,092 npm packages and 38,807 Node.js applications. + + +We perform an empirical study to validate the most commonly cited reasons for and drawbacks of using trivial packages in our developer survey. + + +We make our dataset of the responses provided by the npm developers publicly available. + + + + +The paper is organized as follows: Section 2 provides the background and introduces our datasets. Section 3 presents how we determine what a trivial package is. Section 4 examines the prevalence of trivial packages and their use in Node.js applications. Section 5 presents the results of our developer survey, presenting the reasons and perceived drawbacks for developers who use trivial packages. Section 6 presents our empirical validation of the most commonly cited reason for and drawback of using trivial packages. The implications of our findings are noted in section 7. We discuss the related works in section 8, the limitations of our study in section 9, and present our conclusions in section 10. +---------------------------------------- +------------------------------- +Section 180: +2 BACKGROUND AND DATASETS + + +JavaScript is used to write client and server side applications. Its popularity has steadily grown, thanks to popular frameworks such as Node.js and an active developer community [7, 46]. JavaScript projects can be classified into two main categories: packages that are used in other projects or applications that are used as standalone software. The Node Package Manager (npm) provides tools to manage Node.js packages. npm is the official package manager for Node.js and its registry contains more than 250,000 packages [25]. + + +To perform our study, we gather two datasets from two sources. We obtain Node.js packages from the npm registry and applications that use npm packages from GitHub. + + +Packages: + Since we are interested in examining the impact of ‘trivial’ packages, we mined the latest version of all the Node.js packages from npm as of May 5, 2016. For each package we obtained its source code from GitHub. In some cases, the package publisher did not provide a GitHub link, in which case we obtained the source code directly from npm. In total, we mined 252,996 packages. + + +Applications: + We also want to examine the use of the packages in JavaScript applications. Therefore, we mined all of the Node.js applications on GitHub. To ensure that we are indeed only obtaining the applications from GitHub, and not npm packages, we compare the URL of the GitHub repositories to all of the URLs we obtained from npm for the packages. If a URL from GitHub was also in npm, we flagged it as being an npm package and removed it from the application list. To determine that an application uses npm packages, we looked for the ‘package.json’ file, which specifies (amongst others) the npm package dependencies used by the application. + + +To eliminate dummy applications that may exist in GitHub, we choose non-forked applications with more than 100 commits and more than 2 developers. Similar filtering criteria were use in prior work by Kalliamvakou et al. [31]. In total, we obtained 115,621 JavaScript applications and after removing applications that did not use the npm platform, we were left with 38,807 applications. +---------------------------------------- +------------------------------- +Section 181: +3 WHAT ARE TRIVIAL PACKAGES ANYWAY? + + +Although what a trivial package is has been loosely defined in the past (e.g., in blogs [27, 28]), we want a more precise and objective way to determine trivial packages. To determine what constitutes a trivial package, we conducted a survey, where we asked participants what they considered to be a trivial package and what indicators they used to determine if a package is trivial or not. We devised an online survey that presented the source code of 16 randomly selected Node.js packages that range in size between 4 - 250 JavaScript lines of code (LOC). Participants were asked to 1) indicate if they thought the package was trivial or not and 2) specify what indicators they use to determine a trivial package. We opted to +Why Do Developers Use Trivial Packages? +An Empirical Case Study on npm + + +limit the size of the Node.js packages in the survey to a maximum of 250 JavaScript LOC since we did not want to overwhelm the participants with the review of excessive amounts of code. + + +We asked the survey participants to indicate trivial packages from the list of Node.js packages provided. We provided the survey participants with a loose definition of what a trivial package is, i.e., a package that contains code that they can easily code themselves and hence, is not worth taking on an extra dependency for. Figure 1 shows an example of a trivial package, called is-Positive, which simply checks if a number is positive. The survey questions were divided into three parts: 1) questions about the participant’s development background, 2) questions about the classification of the provided Node.js packages and 3) questions about what indicators the participant would use to determine a trivial package. We sent the survey to 22 developers and colleagues that were familiar with JavaScript development and received a total of 12 responses. + + +javascript +module.exports = function (n) { + return toString.call(n) === '[object Number]' && n > 0; +}; + + +Figure 1: Package is-Positive on npm + + +Participants Background and Experience. Of the 12 respondents, 2 are undergraduate students, 8 are graduate students and 2 are professional developers. Ten of the 12 respondents have at least 2 years of JavaScript experience and half of the participants have been developing with JavaScript for more than five years. + + +Survey Responses. We asked participants to list what indicators they use to determine if a package is trivial or not and to indicate all the packages that they considered to be trivial. Of the 12 participants, 11 (92%) state that the complexity of the code and 9 (75%) state that size of the code are indicators they use to determine a trivial package. Another 3 (20%) mentioned that they used code comments and other indicators (e.g., functionality) to indicate if a package is trivial or not. Since it is clear that size and complexity are the most common indicators of trivial packages, we use these two measures to determine trivial packages. It should be mentioned that participants could provide more than 1 indicator, hence the percentages above sum to more than 100%. + + +Next, we analyze all of the packages that were marked as trivial. In total, we received 69 votes for the 16 packages. We ranked the packages in ascending order, based on their size, and tallied the votes for the most voted packages. We find that 79% of the votes consider packages that are less than 35 lines of code to be trivial. We also examine the complexity of the packages using McCabe’s cyclomatic complexity, and find that 84% of the votes marked packages that have a total complexity value of 10 or lower to be trivial. It is important to note that although we provide the source code of the packages to the participants, we do not explicitly provide the size or the complexity of the packages to the participants, so they are not biased by any metrics, i.e., size or complexity, in their classification. + + +Based on the aforementioned findings, we used the two indicators JavaScript LOC ( \leq 35 ) and complexity ( \leq 10 ) to determine trivial packages in our dataset. Hence, we define trivial packages as ( { X_{\text{LOC}} \leq 35 \cap X_{\text{Complexity}} \leq 10 } ), where ( X_{\text{LOC}} ) represents the JavaScript LOC and ( X_{\text{Complexity}} ) represents McCabe’s cyclomatic complexity of package ( X ). Although we use the aforementioned measures to determine trivial packages, we do not consider this to be the only possible way to determine trivial packages. + + +Our survey indicates that size and complexity are commonly used measures to determine if a package is trivial. Based on our analysis, packages that have ( \leq 35 ) JavaScript LOC and a McCabe’s cyclomatic complexity ( \leq 10 ) are considered to be trivial. +---------------------------------------- +------------------------------- +Section 182: +4 HOW PREVALENT ARE TRIVIAL PACKAGES? + + +In this section, we want to know how prevalent trivial packages are. We examine prevalence from two aspects: the first aspect is from npm’s perspective, where we are interested in knowing how many of the packages on npm are trivial. The second aspect considers the use of trivial packages in JavaScript applications. + + +4.1 How Many of npm’s Packages are Trivial? + + +We use the two measures, LOC and complexity, to determine trivial packages, which we now use to quantify the number of trivial packages in our dataset. Our dataset contained a total of 252,996 npm packages. For each package, we calculated the number of JavaScript code lines and removed packages that had zero LOC, which removed 21,904 packages. This left us with a final number of 231,092 packages. Then, for each package, we removed test code since we are mostly interested in the actual source code of the packages. To identify and remove the test code, similar to prior work [22, 44, 48], we look for the term “test” (and its variants) in the file names and file paths. + + +Out of the 231,092 npm packages we mined, 38,845 (16.8%) packages are trivial packages. In addition, we examined the growth of trivial packages in npm. Figure 2 shows the percentage of trivial to all packages published on npm per month. We see an increasing trend in the number of trivial packages over time and approximately 15% of the packages added every month are trivial packages. We investigated the spike around March 2016 and found that this spike corresponds to the time when npm disallowed the un-publishing of packages [40]. + + +npm posts the most depended-upon packages on its website [38]. We measured the number of trivial packages that exist in the top 1,000 most depended-upon packages; we find that 113 of them are trivial packages. This finding shows that trivial packages are not +only prevalent and increasing in number, but they are also very popular among developers, making up 11.3% of the 1,000 most depended on npm packages. + + +Trivial packages make up 16.8% of the studied npm packages. Moreover, the proportion of trivial packages is increasing and trivial packages make up 11.3% of the top 1,000 most depended on npm packages. +---------------------------------------- +------------------------------- +Section 183: +4.2 How Many Applications Depend on Trivial Packages? + + +Just because trivial packages exist on npm, it does not mean that they are actually being used. Therefore, we also examine the number of applications that use trivial packages. To do so, we examine the package.json file, which contains all the dependencies that an application installs from npm. However, in some cases, an application may install a package but not use it. To avoid counting such instances, we parse the JavaScript code of all the examined applications and use regular expressions to detect the require dependency statements, which indicates that the application actually uses the package in its code. Finally, we measured the number of packages that are trivial in the set of packages used by the applications. Note that we only consider npm packages since it is the most popular package manager for Node.js packages and other package managers only manage a subset of packages (e.g., Bower [9] only manages front-end/client-side frameworks, libraries and modules). We find that of the 38,807 applications in our dataset, 4,256 (10.9%) directly depend on at least one trivial package. + + +Of the 38,807 Node.js applications in our dataset, 10.9% of them depend on at least one trivial package. +---------------------------------------- +------------------------------- +Section 184: +5 SURVEY RESULTS + + +We surveyed Node.js developers to understand the reasons for and the drawbacks of using trivial packages. We use a survey because it allows us to obtain first-hand information from the developers who use these trivial packages. In order to select the most relevant participants, we sent out the survey to developers who use trivial packages. We used Git’s +pickaxe + command on the lines that contain the required dependency statements in the applications; a procedure that provided us with the email and name of the developer who introduced the trivial package dependency. + + +Survey Participants. + To mitigate the possibility of introducing misunderstood or misleading questions, we initially sent the survey to two JavaScript developers and incorporated their minor suggestions to improve the survey. Next, we sent the survey to 1,055 developers from 1,696 applications. To select the developers, we ranked them based on the number of trivial packages they use. We then took a sample of 600 developers that use trivial packages the most, and another 600 of those that indicated the least use of trivial packages. The survey was emailed to the 1,200 selected developers, however, since some of the emails were returned for various reasons (e.g., the email account does not exist anymore, etc.), we could only reach 1,055 developers. + + +Note that if a package is required in the application, but does not exist, it will break the application. + + +The survey listed the trivial package and the application that we detected the trivial package in. We received 88 responses to our survey, which translates to a response rate of 8.3%. Our survey response rate is in line with, and even higher, than the typical 5% response rate reported in questionnaire-based software engineering surveys [42]. Of the 88 respondents, 83 of them identified as developers working either in industry (68) or as a full time independent developers (15). The remaining 5 identified as being a casual developers (2) or other (3), including one student and two developers working in executive positions at npm. As for the development experience of the survey respondents, the majority (67) of the respondents have more than 5 years of experience, 14 have between 3-5 years and 7 have 1-3 years of experience. The fact that most of the respondents are experienced JavaScript developers gives us confidence in our survey responses. +---------------------------------------- +------------------------------- +Section 185: +5.1 Do Developers Consider Trivial Packages Harmful? + + +The first question of our survey to the participants is: “Do you consider the use of trivial packages as bad practice?” The reason to ask this question so bluntly is that it allows us to gauge, in a very deterministic way, how the Node.js developers felt about the issue of using trivial packages. We provided three possible replies, Yes, No or Other in which case they were provided with a text box to elaborate. Of the 88 participants, 51 (57.9%) stated that they do NOT consider the use of trivial packages as bad practice. Another 21 (23.9%) stated that they indeed think that using trivial package is a bad practice. The remaining 16 (18.2%) stated that it really depends on the circumstances, such as the time available, how critical a piece of code is, and if the package used has been thoroughly tested. + + +Most of the surveyed developers (57.9%) do NOT believe that using trivial packages is a bad practice. +---------------------------------------- +------------------------------- +Section 186: +5.2 Why Do Developers Use Trivial Packages? + + +While we have answered the question as to whether developers think using trivial packages is a bad practice, what we are most interested in is why do developers resort to using trivial packages and what do they view as the drawbacks of using trivial packages. Therefore, the second part of the survey asks participants to list the reasons why they resort to using trivial packages. To ensure that we do not bias the responses of the developers, the answer fields for these questions were in free-form text, i.e., no predetermined suggestions were provided. After gathering all of the responses, we grouped and categorized the responses in a two-phase iterative process. In the first phase, the first two authors carefully read the participant’s answers and came up with a number of categories that the responses fell under. Next, they discussed their groupings and agreed on the extracted categories. Whenever they failed to agree on a category, a third author was asked to help break the tie. Once all of the categories were decided, the same two authors went through all the answers again and classified them into their respective categories. For the majority of the cases, the two authors agreed on most categories and the classifications of the responses. To measure the agreement between the two authors, we used Cohen’s Kappa coefficient [10]. The Cohen’s Kappa coefficient has +been used to evaluate inter-rater agreement levels for categorical scales, and provides the proportion of agreement corrected for chance. The resulting coefficient is scaled to range between -1 and +1, where a negative value means less than chance agreement, zero indicates exactly chance agreement, and a positive value indicates better than chance agreement [18]. In our categorization, the level of agreement measured between the authors was of +0.90, which is considered to be an excellent inter-rater agreement. + + +Table 1 shows the five reasons for using trivial packages, as reported by our survey respondents; another category was used to group the ‘no reason’ responses. Table 1 presents the different reasons, a description of each category and its frequency. These reasons are listed below, in order of their popularity: + + +R1. Well implemented & tested (54.6%) +: The most cited reason for using trivial packages is that they provide well implemented and tested code. More than half of the responses mentioned this reason. In particular, although it may be easy for developers to code these trivial packages themselves, it is more difficult to make sure that all the details are addressed, e.g., one needs to carefully consider all edge cases. Some example responses that mention these issues are stated by participants P68 and P4, who cite their reasons for using trivial packages as follows: P68: “Tests already written, a lot of edge cases captured [...].” & P4: “There may be a more elegant/efficient/correct/cross-environment-compatilble solution to a trivial problem than yours”. + + +R2. Increased productivity (47.7%) +: The second most cited reason is the improved productivity that using trivial packages enables. Trivial tasks or not, writing code on your own requires time and effort, hence, many developers view the use of trivial packages as a way to boost their productivity. In particular, early on in a project, a developer does not want to worry about small details, they would rather focus their efforts on implementing the more difficult tasks. For example, participants P13 and P27 state: P13: “[...] and it does save time to not have to think about how best to implement even the simple things.” & P27: “Don’t reinvent the wheel! if the task has been done before.”. The aforementioned are clear examples of how developers would rather not code something, even if it is trivial. Of course, this comes at a cost, which we discuss later. + + +R3. Well maintained code (9.1%) +: A less common, but cited reason for using trivial packages is the fact that the maintenance of the code need not to be performed by the developers themselves; in essence, it is outsourced to the community or the contributors of the trivial packages. For example, participant P45 states: “Also, a highly used trivial package is probable to be well maintained.”. Even tasks such as bug fixes are dealt with by the contributors of the trivial packages, which is very attractive to the users of the trivial packages, as reported by participant P80: “[...], leveraging feedback from a larger community to fix bugs, etc.” + + +R4. Improved readability & reduced complexity (9.1%) +: Participants also reported that using trivial packages improves the readability and reduces the complexity of their code. For example, P34 states: “immediate clarity of use and readability for other developers for commonly used packages[...]” & P47 states: “Simple abstract brings less complexity.” + + +R5. Better performance (3.4%) +: A few of the participants stated that using trivial packages improves performance since it alleviates the need for their application to depend on large frameworks. For example, P35 states: “[...] you do not depend on some huge utility library of which you do not need the most part.” + + +Only a small percentage (8.0%) of the respondents stated that they do not see a reason to use trivial packages. + + +The two most cited reasons for using trivial packages are 1) they provide well implemented and tested code and 2) they increase productivity. +---------------------------------------- +------------------------------- +Section 187: +5.3 Drawbacks of Using Trivial Packages + + +In addition to knowing the reasons why developers resort to trivial packages, we wanted to understand the other side of the coin - what they perceive to be the drawbacks of their decision to use these packages. The drawbacks question was part of our survey and we followed the same aforementioned process to analyze the survey responses. In the case of the drawbacks the Cohen’s Kappa agreement measure was +0.86, which is considered to be an excellent agreement. Table 2 lists the drawback mentioned by the survey respondents along with a brief description and the frequency of each drawback. + + +I1. Dependency overhead (55.7%) +: The most cited drawback of using trivial packages is the increased dependency overhead, e.g., keeping all dependencies up to date and dealing with complex dependency chains, that developers need to bear [7]. This situation is often referred to as ‘dependency hell’, especially when the trivial packages themselves have additional dependencies. This drawback came through clearly in many comments, for example, P41 states: +Table 2: Drawbacks of using trivial packages. + + +| Drawback | Description | # Resp. | % | +|------------------------|-----------------------------------------------------------------------------|--------|------| +| Dependency overhead | Using trivial packages results in a dependency mess that is hard to update and maintain. | 49 | 55.7%| +| Breakage of applications | Depending on a trivial package could cause the application to break if the package becomes unavailable or has a breaking update. | 16 | 18.2%| +| Decreased performance | Trivial packages decrease the performance of applications, which includes the time to install and build the application. | 14 | 15.9%| +| Slows development | Finding a relevant and high quality trivial package is a challenging and time consuming task. | 11 | 12.5%| +| Missed learning opportu- | The practice of using trivial packages leads to developers not learning and experiencing writing code for trivial tasks. | 8 | 9.1% | +| nities | | | | +| Security | Using trivial packages can open a door for security vulnerability. | 7 | 8.0% | +| Licensing issues | Using trivial packages could cause licensing conflicts. | 3 | 3.4% | +| No drawbacks | - | 7 | 8.0% | + + +"[...] people who don’t actively manage their dependency versions could [be] exposed to serious problems [...]") & P40: "Hard to maintain a lot of tiny packages". Hence, while trivial packages may provide well implemented/tested code and improve productivity, developers are clearly aware that the management of the additional dependencies is something they need to deal with. + + + + + + +Breakage of applications (18.2%): Developers also worry about the potential breakage of their application due to a specific package or version becoming unavailable. For example, in the left-pad issue, the main reason for the breakage was the removal of left-pad, P4 states: "Obviously the whole 'left-pad crash' exposed an issue". However, since that incident, npm has disabled the possibility of a package to be removed [40]. Although disallowing the removal solves part of the problem, packages can still be updated, which may break an application. For a non-trivial package, it may be worth it to take the risk, however, for trivial packages, it may not be worth taking such a risk. + + + + + + +Decreased performance (15.9%): This issue is related to the dependency overhead drawback. Developers mentioned that incurring the additional dependencies slowed down the build time and increased application installation times. For example, P64 states: "Too many metadata to download and store than a real code." & P34 states: "[...], slow installs; can make project noisy and unintuitive by attempting to cobble together too many disparate pieces instead of more targeted code." As mentioned earlier, in some cases it is not just the fact that the trivial package adds a dependency, but in some cases the trivial package itself depends on additional packages, which negatively impacts performance even further. + + + + + + +Slows development (12.5%): In some cases, the use of trivial packages may actually have a reverse effect and slow down development. For example, as P23 and P15 state: P23: "Can actually slow the team down as, no matter how trivial a package, if a developer hasn’t required it themselves they will have to read the docs in order to double check what it does, rather than just reading a few lines of your own source." & P15: "[...], we have the problem of locating packages that are both useful and "trustworthy" [...]"; it can be difficult to find a relevant and trustworthy package. Even if others try to build on your code, it is much more difficult to go fetch a package and learn it, rather than read a few lines of your code. + + + + + + +Missed learning opportunities (9.1%): In certain cases, the use of these trivial packages is seen as a missed learning opportunity for developers. For example, P24 states: "Sometimes people forget how to do things and that could lead to a lack of control and knowledge of the language/technology you are using". This is a clear example of where just using a package, rather than coding the solution yourself, will lead to less knowledge about the code base. + + + + + + +Security (8.0%): In some cases the trivial packages may have security flaws that make the application more vulnerable. This is an issue pointed out by a few developers, for example, as P15 mentioned earlier, it is difficult to find packages that are trustworthy. P57 also mentions: "If you depend on public trivial packages then you should be very careful when selecting packages for security reasons". As in the case of any dependency one takes on, there is always a chance that a security vulnerability could be exposed in one of these packages. + + + + + + +Licensing issues (3.4%): In some cases, developers are concerned about potential licensing conflicts that trivial packages may cause. For example, P73 states: "[...], possibly license-issues", P62: "[...], there is a risk that the 'trivial' package might be licensed under the GPL must be replaced anyway prior to shipping." + + + + + + +There were also 8% of the responses that stated they do not see any drawbacks with using trivial packages. + + +The two most cited drawbacks of using trivial packages are 1) they increase dependency overhead and 2) they may break their applications due to a package or a specific version becoming unavailable or incompatible. +---------------------------------------- +------------------------------- +Section 188: +6 PUTTING DEVELOPER PERCEPTION UNDER THE MICROSCOPE + + +The developer survey provided us with great insights on why developers use trivial packages and what they perceive to be their drawbacks. However, whether there is empirical evidence to support their perceptions remains unexplored. Thus, we examine the most commonly cited reason for using trivial packages, i.e., the fact... +that trivial packages are well tested, and drawback, i.e., the impact of additional dependencies, based on our findings in Section 5. + + +6.1 Examining the ‘Well Tested’ Perception + + +As shown in Table 1, 54.6% of the responses indicate that they use trivial packages since they are well implemented and tested. And, the developers have good reasons to believe so. npm requires that developers provide a test script name with the submission of their packages (listed in the package.json file). In fact, 81.2% (31,521 out of 38,845) of the trivial packages in our dataset have some test script name listed. However, since developers can provide any script name under this field, it is difficult to know if a package is actually tested. + + +We examine whether a package is really well tested and implemented from two aspects; first, we check if a package has tests written for it. Second, since in many cases, developers consider packages to be ‘deployment tested’, we also consider the usage of a package as an indicator of it being well tested and implemented [47]. To carefully examine whether a package is really well tested and implemented, we use the npm online search tool (known as npms [11]) to measure various metrics related to how well the packages are tested, used and valued. To provide its ranking of the packages, npms mines and calculates a number of metrics based on development (e.g., tests) and usage (e.g., no. of downloads) data. We use three metrics measured by npms to validate the ‘well tested and implemented’ perception of developers, which are: + + +1) Tests: considers the tests’ size, coverage percentage and build status for a project. We looked into the npms source code and find that the Tests metric is calculated as: ( \text{testsSize} \times 0.6 + \text{buildStatus} \times 0.25 + \text{coveragePercentage} \times 0.15 ). We use the Tests metric to determine if a package is tested and how trivial packages compare to non-trivial packages in terms of how well tested they are. One example that motivates us to investigate how well tested a trivial package is the response by P68, who says: “Tests already written, a lot edge cases captured [...].” + + +2) Community interest: evaluates the community interest in the packages, using the number of stars on GitHub & npm, forks, subscribers and contributors. Once again, we find through the source code of npm that Community interest is simply the sum of the aforementioned metrics, measured as: ( \text{starsCount} + \text{forksCount} + \text{subscribersCount} + \text{contributorsCount} ). We use this metric to compare how interested the community is in trivial and non-trivial packages. We measure the community interest since developers view the importance of the trivial packages as evidence of its quality as stated by P56, who says: “[...] Using an isolated module that is well-tested and vetted by a large community helps to mitigate the chance of small bugs creeping in.” + + +3) Download count: measures the mean downloads for the last three months. Again, the number of downloads of a package is often viewed as an indicator of the package’s quality; as P61 mentions: “this code is tested and used by many, which makes it more trustful and reliable.” + + +As an initial step, we calculate the number of trivial packages that have a Tests value greater than zero, which means trivial packages that have some of tests. We find that only 45.2% of the trivial packages have tests, i.e., a Tests value > 0. In addition, we compare the values of the Tests, Community interest and Download count for Trivial and non-Trivial packages. Our focus is on the values of the aforementioned metric values for trivial packages, however, we also present the results for non-trivial packages to put our results in context. + + +Figure 3 shows the bean-plots for the Tests, Community interest and Download count. The figures show that in all cases trivial packages have, on median, a smaller Tests value, Community interest value and Download count compared to non-trivial packages. That said, we observe from Figure 3 a) that the distribution of the Tests metric is similar for both, trivial and non-trivial packages. Most packages have a Tests value of zero, then there are small pockets of packages that have values of aprox. 0.25, 0.6, 0.8 and 1.0. In the case of the Community interest and Download count metrics, once again, we see similar distributions, although clearly the median values are lower for trivial packages. + + +To examine whether the difference in metric values between trivial and non-trivial packages is statistically significant, we performed a Mann-Whitney test to compare the two distributions and determine if the difference is statistically significant, with a ( p )-value < 0.05. We also use Cliff’s Delta (( d )), which is a non-parametric effect size measure to interpret the effect size between trivial and non-trivial packages. As suggested in [23], we interpret the effect size value to be small for ( d < 0.33 ) (positive as well as negative values), medium for ( 0.33 \leq d < 0.474 ) and large for ( d \geq 0.474 ). + + +Table 3 shows the ( p )-values and effect size values. We observe that in all cases the differences are statistically significant, however, the effect size is small. The results show that although the majority of trivial packages do not have tests written for them, and have + + +| Metrics | ( p )-value | ( d ) | +|--------------------------|--------------|--------------| +| Tests | 2.2e-16 | -0.119 (small) | +| Community interest | 2.2e-16 | -0.269 (small) | +| Downloads count | 2.2e-16 | -0.245 (small) | +Contrary to developers’ perception, only 45.2% of trivial packages actually have tests. Albeit, trivial packages have lower Tests, Community interest and Download count values, the values of the metrics do not seem to have a large difference compared to non-trivial packages, i.e., trivial packages are similar to non-trivial packages in terms of how well they are tested. + + +6.2 Examining the ‘Dependency Overhead’ Perception + + +As discussed in Section 5, the top cited drawback of using trivial packages is the fact that developers need to take on and maintain extra dependencies, i.e., dependency overhead. Examining the impact of dependencies is a complex and well-studied issue (e.g., [1, 12, 15]) that can be examined in a multitude of ways. We choose to examine the issue from both, the application and the package perspectives. + + +Applications: When compared to coding trivial tasks themselves, using a trivial package imposes extra dependencies. One of the most problematic aspects of managing dependencies for applications is when these dependencies update, causing a potential to break their application. Therefore, as a first step, we examined the number of releases for trivial and non-trivial packages. The intuition here is that developers need to put in extra effort to assure the proper integration of new releases. Figure 4 shows that trivial packages have less releases than non-trivial packages (median is 2 for trivial and 3 for non-trivial packages), hence trivial packages do not require more effort than non-trivial packages. The fact that the trivial packages are updated less frequently may be attributed to the fact that trivial packages ‘perform less functionality’, hence they need to be updated less frequently. + + +Next, we examined how developers choose to deal with the updates of trivial packages. One way that application developers reduce the risk of a package impacting their application is to ‘version lock’ the package. Version locking a dependency/package means that it is not updated automatically, and that only the specific version mentioned in the packages.json file is used. As stated in a few responses from our survey, e.g., P8: “...Also, people who don’t lock down their versions are in for some pain.” There are different types of version locks, i.e., only updating major releases, updating patches only, updating minor releases or no lock at all, which means the package automatically updates. The version locks are specified in the packages.json file next to every package name. We examined the frequency at which trivial and non-trivial packages are locked. We find that on average, trivial packages are locked 14.9% of the time, whereas non-trivial packages are locked 11.7% of the time. However, the Wilcox test shows that the difference is not statistically significant, p-value > 0.05. Hence, we cannot say that developers version lock trivial packages more. + + +Packages: At the package level, we investigate the direct and indirect dependencies of trivial packages. In particular, we would like to determine if the trivial packages have their own dependencies, which makes the dependency chain even more complex. For each trivial and non-trivial package, we install it and then count the actual number of (direct and indirect) dependencies that the package requires. Doing so, allows us to know the true (direct and indirect) dependencies that each package requires. Note that simply looking into the .json file and the require statements will provide the direct dependencies, but not the indirect dependencies. + + +Figure 5 shows the distribution of dependencies for trivial and non-trivial packages. Since most trivial packages have no dependencies, the median is 0. Therefore, we bin the trivial packages based on the number of their dependencies and calculate the percentage of packages in each bin. Table 4 shows the percentage of packages and their respective number of dependencies. We observe that the majority of trivial packages (56.3%) have zero dependencies, 27.9% have between 1-10 dependencies, 4.3% have between 11-20 dependencies and 11.5% have more than 20 dependencies. The table shows that some of the trivial packages have many dependencies, which indicates that indeed, trivial packages can introduce significant dependency overhead. + + +| Packages | # Dependencies (Direct & Indirect) | +|----------------|-----------------------------------| +| Trivial | zero 56.3% 1-10 27.9% 11-20 4.3% 11.5% | +| Non Trivial | zero 34.8% 1-10 30.6% 11-20 7.3% 27.3% | + + +Trivial packages have fewer releases and developers are less likely to be version locked than non-trivial packages. That said, developers should be careful when using trivial packages, since in some cases, trivial packages can have numerous dependencies. In fact, we find that 43.7% of trivial packages have at least one dependency and 11.5% of trivial packages have more than 20 dependencies. +7 RELEVANCE AND IMPLICATIONS + + +A common question that is asked in empirical studies is - so what? what are the implications of your findings? why would practitioners care about your findings? We discuss the issue of relevance of our study to the developer community, based on the responses of our survey and highlight some of the implications of our study. + + +7.1 Relevance: Do Practitioners care? + + +At the start of the study, we were not sure how practically relevant our study of trivial packages is. However, we were surprised by the interest of developers in our study. In fact, one of the developers (P39) explicitly mentioned the lack of research on this topic, stating “There has not been enough research on this, but I’ve been taking note of people’s proposed “quick and simple” code to handle the functionality of trivial packages, and it’s surprised me to see the high percentage of times the proposed code is buggy or incomplete.” + + +Moreover, when we conducted our study, we asked respondents if they would like to know the outcome of our study and if so, they provide us with an email address. Of the 88 respondents, 66 (approx. 74%) of them provided their email for us to provide them with the outcomes of our study. Some of these respondents hold very high level leadership roles in npm. To us this is an indicator that our study and its outcomes are of high relevance to the npm and Node.js development community. + + +7.2 Implications of Our Study + + +Our study has a number of implications on both, software engineering research and practice. + + +Implications for Future Research: Our study mostly focused on determining the prevalence, reasons for and drawbacks of using trivial packages. Based on our findings, we find a number of implications/motivations for future work. First, our survey respondents indicated that the choice to use trivial packages is not black or white. In many cases, it depends on the team and the project. For example, one survey respondent stated that on his team, less experienced developers are more likely to use trivial packages, whereas the more experienced developers would rather write their own code for trivial tasks. The issue here is that the experienced developers are more likely to trust their own code, while the less experienced are more likely to trust an external package. Another aspect is the maturity of the project. As some of the survey respondents pointed out, they are much more likely use trivial packages early on in the project, so they do not waste time on trivial tasks and focus on the more fundamental tasks of their project. However, once their project matures, they start to look for ways to reduce dependencies since they pose potential points of failure for their project. Hence, our study motivates future work to examine the relationship between team experience and project maturity and the use of trivial packages. + + +Second, survey respondents also pointed out that using trivial packages is seen favourably compared to using code from Q&A sites such as StackOverflow or Reddit. When compared to using code on StackOverflow, where the developer does not know who posted the code, who else uses it or whether the code may have tests or not, using a trivial package that is on npm is a much better option. In this case, using trivial packages is not seen as the best choice, but it is certainly a better choice. Although there have been many studies that examined how developers use Q&A sites such as StackOverflow, we are not aware of any studies that compare code reuse from Q&A sites and trivial packages. Our findings motivate the need for such a study. + + +Practical Implications: A direct implication of our findings is that trivial packages are commonly used by others, perhaps indicating that developers do not view their use as bad practice. Moreover, developers should not assume that all trivial packages are well implemented and tested, since our findings show otherwise. npm developers need to expect more trivial packages to be submitted, making the task of finding the most relevant package even harder. Hence, the issue of how to manage and help developers find the best packages needs to be addressed. To some extent, npms has been recently adopted by npm to specifically address the aforementioned issue. Developers highlighted that the lack of a decent core or standard JavaScript library causes them to resort to trivial packages. Often, they do not want to install large frameworks just to leverage small parts of the framework, hence they resort to using trivial packages. Therefore, there is a need by the Node.js community to create a standard JavaScript API or library in order to reduce the dependence on trivial packages. However, the issue of creating such a standard JavaScript library is under much debate. +---------------------------------------- +------------------------------- +Section 189: +8 RELATED WORK + + +Studies of Code Reuse. Prior research on code reuse has been shown its many benefits, which include improving quality, development speed, and reducing development and maintenance costs [3, 32, 36, 37]. For example, Sojer and Henkel [43] surveyed 686 open source developers to investigate how they reuse code. Their findings show that more experienced developers reuse source code and 30% of the functionality of open source software (OSS) projects reuse existing components. Developers also reveal that they see code reuse as a quick way to start new projects. Similarly, Haefliger et al. [24] conducted a study to empirically investigate the reuse in open source software, and the development practices of developers in OSS. They triangulated three sources of data (developer interviews, code inspections and mailing list data) of six OSS projects. Their results showed that developers used tools and relied on standards when reusing components. Mockus [36] conducted an empirical study to identify large-scale reuse of open source libraries. Their study shows that more than 50% of source files include code from other OSS libraries. On the other hand, the practice of reusing source code has some challenging drawbacks including the effort and resource required to integrate reused code [16]. Furthermore, a bug in the reused component could propagate to the target system [17]. While our study corroborates some of these findings, the main goal is to define and empirically investigate the phenomenon of reusing trivial packages, in particular in Node.js applications. + + +Studies of Other Ecosystems. In recent years, analyzing the characteristics of ecosystems in software engineering has gained momentum [4, 5, 15, 34]. For example, in a recent study, Bogart et al. [6, 7] empirically studied three ecosystems, including npm, and found that developers struggle with changing versions as they might break dependent code. Witter et al. [46] investigated the evolution of the npm ecosystem in an extensive study that covers the dependence between npm packages, download metrics and the +usage of npm packages in real applications. One of their main findings is that npm packages and updates of these packages is steadily growing. Also, more than 80% of packages have at least one direct dependency package. + + +Other studies examined the size characteristics of packages in an ecosystem. German et al. [21] studied the evolution of the statistical computing project GNU R, with the aim of analyzing the differences between code characteristics of core and user-contributed packages. They found that user-contributed packages are growing faster than core packages. Additionally, they reported that user-contributed packages are typically smaller than core packages in the R ecosystem. Kabbedijk and Jansen [30] analyzed the Ruby ecosystem and found that many small and large projects are interconnected. + + +In many ways, our study complements the previous work since, instead of focusing on all packages in an ecosystem, we specifically focus on trivial packages. Moreover, we examine the reasons developers use trivial package and what they view as their drawbacks. + + +We study the reuse of trivial packages, which is a subset of general code reuse. Hence, we do expect there to be some overlap with prior work. Like many empirical studies, we confirm some of the prior findings, which is a contribution on its own. Moreover, our paper adds to the prior findings through, for example, our validation of the developers’ assumptions. Lastly, we do believe our study fills a real gap since 74% of the participants said they wanted to know our study outcomes. +---------------------------------------- +------------------------------- +Section 190: +9 THREATS TO VALIDITY + + +Construct validity considers the relationship between theory and observation, in case the measured variables do not measure the actual factors. To define trivial packages, we surveyed 12 JavaScript developers who are mostly graduate student with some professional experience. However, we find that there was a clear vote for what is considered a trivial package. Also, although our data suggested that packages with $\leq 35$ LOC and a complexity $\leq 10$ are trivial packages, we believe that other definitions are possible for trivial packages. That said, of the 88 survey participants that we emailed about using trivial packages, only 1 mentioned that the flagged package is not a trivial package (even though it fit our criteria). To us, this is a confirmation that our definition applies in the vast majority of the cases, although clearly it is not perfect. + + +We use the LOC and complexity of the code to determine trivial packages. In some cases, these may not be the only measures that need to be considered to determine a trivial packages. For example, some of the trivial packages have their own dependencies, which may need to be taken into consideration. However, our experience tells us that most developers only look at the package itself and not its dependencies when determining if it is trivial or not. That said, it would be interesting to replicate this questionnaire with another set of participants to confirm or enhance our definition of a trivial Node.js package. + + +Our list of reasons for and drawbacks of using trivial packages are based on a survey of 88 Node.js developers. Although this is a large number of developers, our results may not hold for all Node.js developers. A different sample of developers may result in a different list or ranking of advantages and disadvantages. To mitigate the risk due to this sampling, we contacted developers from different applications and as our responses show, most are experienced developers. Also, there is potential that our survey questions may have influenced the replies from the respondents. However, to minimize such influence, we made sure to ask for free-form responses (to minimize any bias) and we publicly share our survey and all of our anonymized survey responses. + + +We used npms to measure various quantitative metrics related to testing, community interest and download counts. Our measurements are only as accurate as npms, however, given that it is the main search tool for npm, we are confident in the the npms metrics. + + +We do not distinguish between the domain of the npm packages, which may impact the findings. However, to help mitigate any bias we analyzed more than 230,000 npm packages that cover a wide range of domains. + + +We removed test code from our dataset to ensure that our analysis only considers JavaScript source code. We identified test code by searching for the term ‘test’ (and its variants) in the file names and file paths. Even though this technique is widely accepted in the literature [22, 44, 48], to confirm whether our technique is correct, i.e., files that have the term ‘test’ in their names and paths actually contain test code, we took a statistically significant sample of the packages to achieve a 95% confidence level and a 5% confidence interval and examined them manually. + + +External validity considers the generalization of our findings. All of our findings were derived from open source Node.js applications and npm packages, hence, our findings may not generalize to other platforms or ecosystems. That said, historical evidence shows that examples of individual cases contributed significantly in areas such as physics, economics, social sciences and even software engineering [19]. We believe that strong empirical evidence is built from both, studies on individual cases and studies on large samples. +---------------------------------------- +------------------------------- +Section 191: +10 CONCLUSION + + +The use of trivial packages is an increasingly popular trend in software development. Like any development practice, it has its proponents and opponents. The goal of our study is to examine the prevalence, reasons and drawbacks of using trivial packages. Our findings indicate that trivial packages are commonly and widely used in Node.js applications. We also find that the majority of developers do not oppose the use of trivial packages and the main reasons developers use trivial packages is due to the fact that they are considered to be well implemented and tested. However, they do cite the fact that the additional dependencies’ overhead as a drawback of using these trivial packages. That said, our empirical study showed considering trivial packages to be well tested is a misconception since more than half of the trivial package we studied do not even have tests written, however, these trivial packages seem to be ‘deployment tested’ and have similar Tests, Community interest and Download count values as non-trivial packages. In addition, we find that some of the trivial packages have their own dependencies and, in our studied dataset, 11.5% of the trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they use. + + +ACKNOWLEDGMENTS + + +The authors are grateful to the many survey respondents who dedicated their valuable time to respond to our surveys. +REFERENCES + + +[1] Pietro Abate, Roberto Di Cosmo, Jaap Boender, and Stefano Zacchiroli. 2009. Structural Dependence: Between Software Components. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement (ESEM ’09). IEEE Computer Society, 89–99. + + +[2] Rabe Abdalkareem, Emad Shihab, and Juergen Rilling. 2017. On Code Reuse from StackOverflow: An exploratory study on Android apps. Information and Software Technology 88, C (2017), 148–158. + + +[3] Victor R. Basili, Lionel C. Briand, and Walcélio L. Melo. 1996. How Reuse Influences Productivity in Object-oriented Systems. Commun. ACM 39, 10 (October 1996), 104–116. + + +[4] Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2013. The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache. In Proceedings of the 2013 IEEE International Conference on Software Maintenance (ICSM ’13). IEEE Computer Society, 280–289. + + +[5] Remco Bloemen, Chintan Amrit, Stefan Kuhlmann, and Gonzalo Ordóñez Mata-moros. 2014. Gentoo Package Dependencies over Time. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR ’14). ACM, 404–407. + + +[6] Christopher Bogart, Christian Kästner, and James Herbsleb. 2015. When It Breaks, It Breaks: How Ecosystem Developers Reason About the Stability of Dependencies. In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW ’15). IEEE Computer Society, 86–89. + + +[7] Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to Break an API: Cost Negotiation and Community Values in Three Software Ecosystems. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’16). ACM, 109–120. + + +[8] Stephan Bonnemann. 2015. Dependency Hell Just Froze Over. https://speakerdeck.com/bonnemann/dependency-hell-just-froze-over. (September 2015). (accessed on 08/10/2016). + + +[9] Bower. 2012. Bower a package manager for the web. https://bower.io/. (2012). (accessed on 08/23/2016). + + +[10] J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological measurement 20 (1960), 37–46. + + +[11] Andre Cruz and Andre Duarte. 2017. npmjs. https://npmjs.org/ (01/2017). (accessed on 02/20/2017). + + +[12] Cleidson R. B. de Souza and David F. Redmiles. 2008. An Empirical Study of Software Developers’ Management of Dependencies and Changes. In Proceedings of the 30th International Conference on Software Engineering (ICSE ’08). ACM, 241–250. + + +[13] Alexandre Decan, Tom Mens, and Maëlick Claes. 2016. On the Topology of Package Dependency Networks: A Comparison of Three Programming Language Ecosystems. In Proceedings of the 10th European Conference on Software Architecture Workshops (ECSAW ’16). ACM, Article 21, 4 pages. + + +[14] Alexandre Decan, Tom Mens, and Maëlick Claes. 2017. An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems. In Proceedings of the 24th International Conference on Software Analysis, Evolution, and Reengineering (SANER ’17). IEEE. + + +[15] Alexandre Decan, Tom Mens, Philippe Grosjean, and others. 2016. When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER ’16). Vol. 1. IEEE, 493–504. + + +[16] Roberto Di Cosmo, Davide Di Ruscio, Patrizio Pelliccione, Alfonso Pierantonio, and Stefano Zacchiroli. 2011. Supporting software evolution in component-based FOSS systems. Science of Computer Programming 76, 12 (2011), 1144–1160. + + +[17] Mehdi Dogguy, Stephane Glondu, Sylvain Le Gall, and Stefano Zacchiroli. 2011. Enforcing Type-Safe Linking using Inter-Package Relationships. Studia Informatica Universalis. 9, 11 (2012), 129–157. + + +[18] J. L. Fleiss and J. Cohen. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 33 (1973), 613–617. + + +[19] Bent Flyvbjerg. 2006. Five misunderstandings about case-study research. Qualitative Inquiry 12, 2 (2006), 219–245. + + +[20] Thomas Fuchs. 2016. What if we had a great standard library in JavaScript? https://medium.com/@thomafuchs/what-if-we-had-a-great-standard-library-in-javascript-52692342ee3f. (Mar 2016). (accessed on 02/24/2017). + + +[21] D German, B Adams, and AE Hassan. 2013. Programming language ecosystems: the evolution of r. In Proceedings of the 37th European Conference on Software Maintenance and Reengineering (CSMR ’13). IEEE, 243–252. + + +[22] Georgios Gousios and Andy Zaidman. 2014. A Dataset for Pull-based Development Research. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR ’14). ACM, 368–371. + + +[23] Robert J Grissom and John J Kim. 2005. Effect sizes for research: A broad practical approach. Lawrence Erlbaum Associates Publishers. + + +[24] Stefan Haefliger, Georg Von Krogh, and Sebastien Spahett. 2008. Code reuse in open source software. Management Science 54, 1 (2008), 180–193. + + +[25] Quin Hanam, Fernando N. N. M. S. M. Brito, and Ali Mesbah. 2016. Discovering Bug Patterns in JavaScript. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’16). ACM, 144–156. + + +[26] Dan Haney. 2016. NPM & left-pad: Have We Forgotten How To Program? http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/. (March 2016). (accessed on 08/10/2016). + + +[27] Rich Harris. 2015. Small modules: it’s not quite that simple. https://medium.com/@Rich_Harris/small-modules-it-s-not-quite-that-simple-3ca5352d5d4e. (Jul 2015). (accessed on 08/24/2016). + + +[28] Hemanth HM. 2015. One-line node modules -issue#10-sindresorhus/ama. https://github.com/sindresorhus/ama/issues/10. (2015). (accessed on 08/10/2016). + + +[29] Katsuro Inoue, Yusuke Sakai, Pei Xia, and Yuki Manabe. 2012. Where Does This Code Come from and Where Does It Go? - Integrated Code History Tracker for Open Source Systems -. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, 331–341. + + +[30] Jaap Kabbedijk and Slinger Jansen. 2011. Steering insight: An exploration of the ruby software ecosystem. In Proceedings of the Second International Conference on Software Business (ICSOB ’11). Springer, 44–55. + + +[31] Ernui Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The Promises and Perils of Mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR ’14). ACM, 92–101. + + +[32] Wayne C. Lim. 1994. Effects of Reuse on Quality, Productivity, and Economics. IEEE Software 11, 5 (1994), 23–30. + + +[33] Fiona Macdonald. 2016. A programmer almost broke the Internet last week by deleting 11 lines of code. &+#http://www.sciencealert.com/how-a-programmer-almost-broke-the-internet-by-deleting-11-lines-of-code. (March 2016). (accessed on 08/24/2016). + + +[34] Konstantinos Manikas. 2016. Revisiting software ecosystems research: a longitudinal literature study. Journal of Systems and Software 117 (2016), 84–103. + + +[35] Stephen McCamant and Michael D. Ernst. 2003. Predicting Problems Caused by Component Upgrades. In Proceedings of the 9th European Software Engineering Conference Held Jointly with 11th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE ’03). ACM, 287–296. + + +[36] Audris Mockus. 2007. Large-Scale Code Reuse in Open Source Software. In Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS ’07). IEEE Computer Society, 7–. + + +[37] Parastoo Mohagheghi, Reidar Conradi, Ole M. Killi, and Henrik Schwarz. 2004. An Empirical Study of Software Reuse vs. Defect-Density and Stability. In Proceedings of the 26th International Conference on Software Engineering (ICSE ’04). IEEE Computer Society, 282–292. + + +[38] npm. 2016. Most depended-upon packages. http://www.npmjs.com/browse/depended. (August 2016). (accessed on 08/10/2016). + + +[39] npm. 2016. What is npm? Node Package Management Documentation. https://docs.npmjs.com/getting-started/what-is-npm. (July 2016). (accessed on 08/14/2016). + + +[40] The npm Blog. 2016. The npm Blog changes to npm’s unpublish policy. http://blog.npmjs.org/post/141953680000/changes-to-unpublish-policy. (March 2016). (accessed on 08/11/2016). + + +[41] Heikki Orsila, Jaco Geldenhuys, Anna Ruokonen, and Imed Hammouda. 2008. Update propagation practices in highly reusable open source components. In Proceedings of the 4th IFIP WG 2.13 International Conference on Open Source Systems (OSS ’08). 159–170. + + +[42] Janice Singer, Susan E Sim, and Timothy C Lethbridge. 2008. Software engineering: data collection for field studies. In Guide to Advanced Empirical Software Engineering. Springer London, 9–34. + + +[43] Manuel Sojer and Joachim Henkel. 2010. Code Reuse in Open Source Software Development: Quantitative Evidence, Drivers, and Impediments. Journal of the Association for Information Systems 11, 12 (2010), 868–901. + + +[44] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). ACM, 356–366. + + +[45] Chris Williams. 2016. How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript. http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos. (March 2016). (accessed on 08/24/2016). + + +[46] Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A Look at the Dynamics of the JavaScript Package Ecosystem. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR ’16). ACM, 351–361. + + +[47] Dan Zambonini. 2011. Testing and deployment. In A Practical Guide to Web App Success, Owen Gregory (Ed.). Five Simple Steps, Chapter 20. (accessed on 02/02/2017). + + +[48] Jiaxin Zhu, Minghui Zhou, and Audris Mockus. 2014. Patterns of Folder Use and Project Popularity: A Case Study of GitHub Repositories. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’14). ACM, Article 30, 4 pages. +---------------------------------------- +------------------------------- +Section 192: +Deliberate change without hierarchical influence? + + +The case of collaborative OSS communities + + +Abstract + + +Purpose – Deliberate change is strongly associated with formal structures and top-down influence. Hierarchical configurations have been used to structure processes, overcome resistance and get things done. But is deliberate change also possible without formal structures and hierarchical influence? + + +Design/Methodology/Approach – This longitudinal, qualitative study investigates an open-source software (OSS) community named TYPO3. This case exhibits no formal hierarchical attributes. The study is based on mailing lists, interviews, and observations. + + +Findings – The study reveals that deliberate change is indeed achievable in a non-hierarchical collaborative OSS community context. However, it presupposes the presence and active involvement of informal change agents. The paper identifies and specifies four key drivers for change agents’ influence. + + +Originality/value – The findings contribute to organizational analysis by providing a deeper understanding of the importance of leadership in making deliberate change possible in non-hierarchical settings. It points to the importance of ‘change-by-conviction’, essentially based on voluntary behaviour. This can open the door to reducing the negative side effects of deliberate change also for hierarchical organizations. + + +Keywords + + +Open-source communities, deliberate change, change agents, change by conviction, hierarchical influence +Introduction + + +There is widespread agreement in research as well as in management practice that deliberate change is key for an organisation’s success, if not for its long-term survival (By, 2005; Teece, Pisano, & Shuen, 1997). On the other hand, it is also generally acknowledged that deliberate change challenges organisations and potentially stresses their members. It disturbs existing structures and causes disorder (Schumpeter, 1934), violates the truce of existing routines (Nelson & Winter, 1982), drives people out of their comfort zones, and evokes resistance (Hon, Bloom, & Crant, 2011; Waddell & Sohal, 1998). Therefore, deliberate change is also typically associated with strong leaders and execution power (Kotter, 2007). Thus, there is general agreement that hierarchical influence is particularly needed during the implementation stage in order to get things done and overcome resistance (Somech, 2006). Strong leaders are also needed to promote change in organisations and create a sense of urgency (Higgs & Rowland, 2011; Yates, 2000). + + +But what happens if there are only informal leaders with no formal and positional power and organisational members are basically left doing whatever they want? This is exactly the situation for many collaborative communities such as open-source software (OSS) communities. In many of these communities, participation is voluntary, so leaders have only very limited formal power known from hierarchical organizations. How do these communities handle the challenges of deliberate change without formal power successfully? How do they secure efficient and consistent planning procedures? How do they overcome resistance and get things done? Are collaborative communities able to change at all or are they doomed to fail in the long term? Differently put, what does it mean for OSS communities to change deliberately? +Organisational scholars have already shown extensive interest in OSS communities and collaborative communities in general (Martinez-Torres & Diaz-Fernandez, 2014). Key topics of interest include the motivation to participate in and contribute to collaborative communities (Cromie & Ewing, 2009; Hars & Ou, 2002; Lerner & Tirole, 2002), structures and the division of labour (Mockus, Fielding, & Herbsleb, 2002), governance structures and processes in communities (Demil & Lecocq, 2006; Markus, 2007), and coordination and communication mechanisms (Lee & Cole, 2003). While extant research thus provides a detailed picture of how OSS communities work, no studies have yet examined deliberate change in OSS communities. The few studies that address change have found that most change in OSS communities is fluid, tacit, and emergent because task execution is typically dependent on the informal structures and the voluntary contributions of members (Sharma, Sugumaran, & Rajagopalan, 2002). + + +The aim of this study is to investigate how deliberate change is accomplished in OSS communities. More specifically, the empirical foundation for this research has been based on a longitudinal single-case study. Data have been collected about one OSS community, called TYPO3, during 2006–2010. We refer to deliberate change as change that is intended and planned. Change is therefore not the residual outcome of a multitude of processes, even though there might be disparities between plans and outcomes (Burnes, 1996, 2009; Kanter, Stein, & Jick, 1992). In our data collection we observed various deliberate change initiatives in TYPO3 at the strategic as well as at the organisational level. The focus of this paper is on one strategic change initiative carried out in order to redirect the project’s focus towards more product usability. Our results show deliberate change is possible in OSS communities and that change agents play an essential role in change processes. We summarise our findings in a model, structuring the success factors of change agents. +Two main contributions are offered. First, our paper advances knowledge about change processes in non-hierarchical structures, such as OSS communities. Because of their increasing relevance for economic activity, it is relevant to know if informal and non-hierarchical organisations allow for executing deliberate change. If this is not possible, such organizations are not likely to become old. Second, and much more important, our investigation of changes in OSS communities gives new insights into how deliberate change in non-hierarchical organisational settings is possible. It shows how organisations can master ‘change by conviction’, i.e., when organisational members are not being forced to change but accept and adapt to change voluntarily. We will discuss how the insights of this study may be used to reduce tensions and frictions of change in traditional business organisations as well. + + +Structure and governance of OSS communities + + +An OSS community consists of individuals who voluntarily contribute to the development of open-source software (Martinez-Torres & Diaz-Fernandez, 2014). Open-source software is freely available to the public under an open license and is based on unrestricted access to source code (Bonaccorsi & Rossi, 2003). Well-known examples of OSS are Linux, Firefox, and Apache (Lakhani & von Hippel, 2003). OSS communities typically demonstrate classic textbook principles of organisations in that they (i) form an entity distinguishable from its environment (Lawrence & Lorsch, 1967), (ii) have specific goals (Etzioni, 1964), (iii) have purposive actions to realise these goals (Mooney & Reiley, 1939), and (iv) are dependent on and affected by the external environment (Scott, 1981). However, at the same time, OSS communities distinguish themselves from traditional business organisations in that they are basically open to anyone to +participate, participation is voluntary, there is a high degree of self-assignment, and they don’t have a physical location like a headquarters. This is enabled by modularization of the software and by distributed activities allowing for rather loosely managed and structured development processes that leave the developers free to choose which tasks to execute (Vujovic & Ulhøi, 2008). Demil and Lecocq (2006) argue open license is indeed a unique contractual framework that has generated a new type of governance structure distinct from the familiar governance modes of hierarchy, network, and market. Although OSS communities differ in terms of structure, size, and formalisation, there appears to be an ‘ideal type ground architecture’ that has been identified for many of these communities. The main characteristics of this architecture also apply to TYPO3. + + +OSS communities are often managed through a two-layer task structure, containing a core and a peripheral layer (Lee & Cole, 2003). The core consists of project leaders and maintainers. While leadership in some projects (e.g., Linux) is more centralised and there is one undisputed project leader, in other projects (e.g., Apache) a committee solves particular leadership tasks, such as disagreements and conflicts, through voting or consensus (Lerner & Tirole, 2002). On the one hand, these communities align with the definition of shared leadership—“distributed phenomenon in which there can be several (formally appointed and/or emergent) leaders within a group”—and which generally focuses on the emergence of such leaders (Mehra, Smith, Dixon, & Robertson, 2006, p. 233). On the other hand, investigations of shared leadership stem mainly from the context of organizational teams and emphasize the importance of formal leaders to set the stage for informal leadership roles to arise and create the conditions which will maximize the successful outcome of shared leadership in teams (Denis, Langley & Sergi, 2012). This stands in contrast to OSS communities, which are not based on formal leadership in the traditional sense. Such leadership is in fact not required for informal leaders to emerge in OSS communities. +In OSS, informal leadership positions emerge through reputational gains based on “technical acumen and managerial skill” (Fleming & Waguespack, 2007, p. 165). In addition, trust is a requirement for leaders to be selected by the community (O’Mahony & Ferraro, 2007). Usually, the founders count on the project leaders having earned credibility to act as leaders by contributing the initial source code and demonstrating their expertise. Project leaders typically act as visionaries, providing recommendations, work tasks, milestones, etc., to the community. Another important leadership task is to attract new members by posing challenging programming problems for potential contributors (Lerner & Tirole, 2002, p. 220). The nature of leadership in OSS communities changes as communities grow and mature (O’Mahony & Ferraro, 2007). Over time, project leaders will perform less technical tasks, such as programming, and more organisational building tasks (ibid.). The periphery of an OSS community is often structured by the development and bug-fixing team (Lee & Cole, 2003). Members of the periphery are more loosely connected with the community. Task assignment here is mostly completely voluntary (ibid.). + + +Participation in OSS communities is driven by intrinsic (e.g., fun and enjoyment) and extrinsic (e.g., peer recognition, signalling of skills for career benefits) rewards (Lerner & Tirole, 2002). Lakhani and von Hippel (2003, p. 923) emphasize three motivations for participation in OSS communities: need-driven participation (e.g., the need for software), enjoyment-driven participation, and reputation enhancement. Reputation is a low-ranking incentive to join and contribute to an OSS community (ibid.). However, once a reputation is achieved, the member’s desire to maintain his or her reputation encourages the member to continue to provide quality contributions (Sharma et al., 2002). +This structure is supported by a number of governance mechanisms that help direct, control, and coordinate individual efforts in OSS communities (Markus, 2007). These mechanisms include the self-assignment of tasks (Crowston, Li, Wei, Eseryel, & Howison, 2007), peer review (Lee & Cole, 2003), bug reporting, voting procedures, and the process of determining software requirements (Scacchi, 2002). Collaboration is enabled through software platforms, which provide infrastructure for sharing solutions, asking for help, etc. Services and tools, such as mailing lists, discussion forums, archives, and blogs, are the key infrastructures that enable communication and collaboration in OSS communities (Fjeldstad, Snow, Miles, & Lettl, 2012; O'Mahony & Ferraro, 2007). + + +To sum up, OSS communities have well-developed structures, resembling project structures in traditional business organisations. They also have leaders involved in organising and structuring processes. The major difference is that such leaders have no formal authority and thus no execution power. Participation in OSS communities is voluntary, and tasks are self-assigned. Leaders cannot therefore exert hierarchical influence but can only lead based on expertise, persuasion power, and reputation among peers. The literature has called this type of influence informal leadership (De Souza & Klein, 1995; Hongseok, Labianca, & Myung-Ho, 2006). Lakhani and von Hippel (2003, p. 923) found the informal leaders of OSS communities are capable of organising the “mundane but necessary” tasks in the day-to-day business. But are they also capable of mastering the challenges of change that are already difficult to master in formal companies and for which leadership and power are needed? + + +Deliberate change in organisations +Like in other organisations, in OSS communities change concerns the “organisation’s direction, structure, and capabilities” (Moran & Brightman, 2001). In this sense, there is nothing unusual about the basic nature and substance of change in OSS communities. It resembles the basic structure and demands of other organisational change processes. + + +Many researchers have emphasised the process character of organisational change (Bullock & Batten, 1985; Hayes, 2010; Lewin, 1951). Van de Ven and Poole (1995) identified 20 models that structure change processes in different ways. However, the vast majority of these models identify three key tasks with which deliberate change processes have to deal. First, the need for change has to be recognised and the change process initiated (Kirzner, 1997). This need typically results from opportunities or threats that can be addressed by change. Further, the change initiative has to be put on the organisation’s agenda in order to secure action is taken (Kotter, 2012). Organisational change on the strategic level is a genuine management task. The recognition of change needs might come from ‘ordinary’ employees, but it is the exclusive right of the management to acknowledge these initiatives and put them on the agenda (Kesting & Ulhøi, 2010), at least in traditional business organisations. The main rationale behind such a governance structure is to secure consistency—between the different initiatives and organisational activities but also with shareholder and stakeholder interests. + + +Second, deliberate change tends to be based on some planning and decision-making activities (By, 2005). Goals have to be defined and information has to be acquired and analysed. The results of this process are management decisions and documents like road maps or business plans. In traditional business organisations, leaders have to drive and structure this process by creating a sense of urgency, involving organisational members and keeping track of the process (Kotter, 2012). +A distinction between deliberate and emergent change is acknowledged both in the strategy literature (Mintzberg & Waters, 1985) and in the change management literature (Liebhart & Garcia-Lorenzo, 2010). Other aspects like contingency and choice have also been included in this discussion. The review of By (2005) shows how complex, heterogeneous, and inconsistent this distinction is. In this paper we do not intend to contribute to this discussion. For the argumentation of this paper, it is sufficient to specify the substance of deliberate change by the two above attributes: purpose and reason. In our understanding, deliberate change neither implies that everything goes according to plan nor that goals are realised exactly in the planned way. As Dunphy and Stace (1993) argue, organizational change takes place in a dynamic environment and organizations have to adapt their plans accordingly. Against this background, we posit that deliberate change does not rule out the emergent element. Rather, it implies change is grounded in the intention to change. This view corresponds to Mintzberg’s (1994) view of change as an element of the strategy process. In contrast, change is (completely) emergent if it is simply the accumulated result of a series of unrelated decisions and events that have no change or strategic perspective. + + +Third, change has to be executed and decisions implemented. This means organisation members have to make an effort to bring about the change. Also, routines have to be altered in order to adapt to change. The literature on conflict and resistance caused by change (del Val, 2003; Huy, Corley, & Kraatz, 2014) emphasises leadership and execution power as particularly necessary to get things done and overcome resistance and resolve conflicts. + + +Leadership power is thus required for all three tasks, most of all, however, for the implementation. Change often burdens organisations and stresses people. Leadership power is needed to change behaviour and overcome resistance. Traditional business organisations +therefore often rely on a top-down implementation of planned change (Howell & Avolio, 1993). Leadership vision is needed to motivate organisational members. + + +But how can these challenges be handled by informal leaders? How can resistance be overcome without the use of any formal power? How does the governance structure of OSS communities handle deliberate organisational change? Currently, there is no research addressing these questions systematically. However, there is one concept of change leadership that offers some theoretical grounding for an answer that will also be important for the analysis of this article: the concept of the change agent. + + +Based on Caldwell’s findings (2003), we define change agents as individuals who initiate, direct, manage, and/or implement specific change initiatives. Like many other concepts, the concept of change agents is also used heterogeneously (Wylie, Sturdy, & Wright, 2014) and there are closely related concepts like (product) champions in the literature (Ginsberg & Abrahamson, 1991). The key point for our study is that change agents are individuals that drive change initiatives, i.e., create momentum and ensure decisions are made and actions are taken. In doing so, change agents can assume complex sensemaking (Brown, Colville, & Pye, 2015) and sensegiving (Petkova, Rindova, & Gupta, 2013) roles that can be essential to attract collective attention and gain legitimacy for their change initiatives. Change agents do not have to be assigned leaders with formal given responsibilities. They can even be outsiders like consultants (Volberda, Van Den Bosch, & Mihalache, 2014). However, in traditional business organisations they have to be authorised and supported by formal leaders. Therefore, the activity of change agents is also based on hierarchical influence, even though mostly indirectly. While change agents thus might not have the power to order change, the supporting formal leaders do possess such power. In this case, sensegiving, i.e. “the processes by which strategic change is framed and +disseminated to an organization’s constituents” (Fiss & Zajac, 2006, p. 1173) can be particularly relevant for change agents to attract management attention and promote initiatives. + + +As outlined above, deliberate change cannot be decided and enforced by management in OSS communities like in traditional business organisations. Even when initiatives come from the core, they have to be based on initiative and promoted in the community. Here, sensegiving may be particularly relevant for change agents as a way to attract the attention of the community and/or even attract media attention in order to promote change initiatives. Sensegiving can support positions in the “symbolic struggles over the purpose and direction of an organization” (Fiss & Zajac, 2006, p. 1173). When coming from the periphery, it requires even more initiative to change an OSS community deliberately. Therefore, it can be expected that change agents play an important role here. However, conditions are fundamentally different because in OSS communities there is no management support or hierarchical influence upon which to draw. So, how can change agents realise change initiatives here? + + +Methods + + +Two main criteria guided the selection of our focal case. First, our case had to be a representative example of an OSS community. Second, the community had to be a mature case that had already established and formalised work procedures, guidelines, and rules. Studying change in a developed, growing community would hold promises for providing an intensive and rich case that would “manifest the phenomenon of interest intensely (but not extremely)” because extreme cases may distort the manifestation of the phenomenon (Patton, 2002, p. 234). Accordingly, we selected an OSS community named TYPO3 for this study. +In line with the research objective, we first identified deliberate changes at their various stages. Then, we followed the process underlying those changes before tracing the mechanisms used to address the changes. The unit of analysis is the community, i.e., the focus is on the intraorganisational level. + + +Study setting + + +TYPO3 has been public since 2000. At the time of the study, this community was experiencing continuous growth (see Figure 1). The TYPO3 system is an enterprise-class content management system (CMS) offering out-of-the-box operation with standard modules (http://typo3.org/). The system is aimed at two different groups: (i) authors and (ii) administrators and content managers. TYPO3’s core team members play a central role in the community because they contribute most of the source code and manage the design and development of the project on a voluntary basis. When the study started, approximately half of the core team members (i.e., nine individuals) comprised the project’s R&D committee, the members of which also belonged to the project’s other teams and working groups. Moreover, the members of this committee could be described as the project’s central coordination body, as their responsibilities included (i) supervising and coordinating the development of the software; (ii) providing knowledge, contacts, and financial support; and (iii) supervising and supporting the community-driven teams. We chose the committee as a point of departure for the study because of these responsibilities. With 85.5% of their discussions focussing on governance issues (Table 1), the relevance of the R&D committee members as informants was undeniable. In addition to interviewing seven R&D committee members, two core team members were interviewed because +they were directly involved with specific organisational changes before joining the core team (i.e., when they still belonged only to the community’s periphery). As the study unfolded, hundreds of other informants pertaining to the community’s periphery became involved through observations of relevant mailing lists on the TYPO3 website (Table 2). + + +--- Table 1 --- + + +--- Figure 1 --- + + +Starting in the year 2003, TYPO3 began to grow fast, and the number of registered developers doubled each year from 2003 to 2005. This continuous growth trend set the stage for the community changes that are the focus of this study. The time lag between the growth registered from 2003 to 2005 (Figure 1) and the start of the data collection process in 2006 was necessary to see how the community would respond to this growth. + + +Data sources + + +Multiple sources of data (Table 2) were employed to strengthen the design of the study and to capture the complexities of the case in question. These data sources allowed us to triangulate the data and validate the theoretical constructs. The data were collected on several occasions between 2006 and 2010. When the study began, TYPO3 was addressing organisational issues that had surfaced because of the growing size of the community. However, we soon discovered TYPO3 had experienced other organisational challenges in the past. Therefore, learning about the project’s history and its prior development was just as important as illuminating its current development. +We collected our data through interviews, observations of face-to-face R&D committee meetings, three relevant community mailing lists, and archival data. An introductory interview with the project founder, who also acted as the project leader from 2000 to 2007, provided a deeper understanding of the community, its history, its development up to that point, its structure, its internal work processes, its products, and its current and future strategies. The rest of the interviews with the community manager of the TYPO3 Association, the R&D committee, and core team members—some of whom had only recently made a move from the periphery to the core of the community—were focussed on managing deliberate changes in TYPO3. The interviews addressed the following main themes: (i) change initiatives; (ii) activities, roles, and practices related to the identified change initiatives; (iii) motivation; and (iv) background. The same interview guide was used throughout the process, but as new relevant information emerged about specific community changes, additional questions were incorporated into the following interviews. The interviews, which lasted about 60 minutes on average, were recorded and transcribed. + + +Furthermore, over a two-day period in 2006, more than 18 hours were spent observing face-to-face meetings among R&D committee members. This method yielded insights into a range of organisational issues related to the community’s development and the background for the deliberate change initiatives. + + +A review of 235 posts from the R&D committee mailing list gave access to the content and type of discussions, the contributions and roles of various individuals, and work coordination and delegation. In particular, this source of information allowed us to obtain a deeper +understanding of the organisational challenges facing the community during that time period and how those challenges were resolved. + + +The interviews, the observations of the R&D committee’s meetings, and the R&D committee mailing list together led to the uncovering of a number of change processes in the TYPO3 community. Additional relevant mailing list data (namely, the human-computer interaction (HCI) team’s mailing list and the core team’s mailing list) were included in the data collection. Using archival data allowed us to cross-check some of the facts uncovered during the observation activities and interviews. + + +Data analysis + + +Since we were interested in both +if +, +how + and +why + deliberate changes are possible in a specific context, a case study design was deemed appropriate. More specifically, when studying contemporary activities and/or events over which the researcher has no (or very limited) control, a case study research is the obvious choice (Yin, 1994). Qualitative techniques were used to analyse the data (Eisenhardt, 1989; Miles & Huberman, 1984; Strauss & Corbin, 1998). Overall, the analysis focussed on organisational practices, change, and structuring while paying specific attention to grounded concepts and proceeded in three steps. First, we constructed case studies (Eisenhardt 1989) for each identified organisational change initiative. We focussed on major change initiatives that affected the entire community. At the time of the study, four change initiatives were ongoing: (i) reorganisation of product development, (ii) establishment of a non-profit organisation called the TYPO3 Association (a central hub from which to support active developers), (iii) installation of usability as a mindset (thus replacing the strong technical mindset... +in the community), and (iv) restructuring of the entire community to create more efficiency through a more transparent structure with clear responsibilities and increased team autonomy. Although the general character of three of the initiatives was structural and one of them was cultural (the usability initiative), all of the changes involved changes in both structures and practices. + + +Second, we divided the coding process into open, axial, and selective coding and employed a constant comparative method within each coding phase to identify the concepts and relationships relevant to each type of change (Locke, 2001; Strauss & Corbin, 1998). Third, a cross-case analysis (Eisenhardt, 1989; Miles & Huberman, 1984) was used to identify any similarities and differences across the three change types. This process was repeated several times. Each time, the resulting conceptual insights were refined and further developed. The analysis generated four core categories that represent the mechanisms employed by TYPO3 to address deliberate changes (Table 4). + + +The interviews, the observations from the R&D committee meetings, and the data from the three mailing lists enabled us to determine precisely the timing and order of deliberate changes and their intended effects. The same data sources were used to trace the unintended, emergent effects of the identified deliberate changes. However, the three mailing lists, which documented the reactions (or lack of reactions) of the entire community, played a central role. The interviews played a central role in establishing the timeline for the parts of the change processes (e.g., decision making) that took place offline. The preliminary findings were presented and discussed with the project leader and two core team members, who provided valuable comments that confirmed and elaborated upon the uncovered theoretical constructs. +Findings + + +We observed multiple change initiatives in the community, some of them successful, some less successful. The most significant of these are summarised in Table 3. Change agents played a decisive role in all key tasks of the observed change management processes: recognition, decision making, and implementation. In the observed initiatives, all but one change agent originated from the community’s core. One reason for the prevalence of the core member change agents might be the fact that the identified initiatives were major and, as such, expected to have a wide-scale effect on the community. + + +--- Table 3 --- + + +Below, we sketch the four change initiatives (Table 3) by elaborating (i) the aims of each initiative, (ii) what made them deliberate, (iii) specifying the change agents, and (iv) whether the implementation was successful. + + +The first change initiative “Reorganization of product development” was launched because the product development process was inefficient. It was characterized by a lack of release discussions between the core and the community, the community’s failure to test enough different software versions, failure to read existing instructions about different project contributions (i.e. release management procedures, testing instructions), and poor planning of subprojects (e.g. too many postponements, unrealistic deadlines). For its part, the Core Team did not have the capacity to respond to all of the inquiries, project proposals and general input. A meeting was arranged where potential solutions were discussed, demonstrating explicit intent to plan and execute the needed change. A Core Team and R&D Committee member, who was in +charge of the software release process at that time proposed a solution, which was subsequently adopted. Release management was consequently improved by introducing a rotating release manager function in July 2007. During this change process the R&D Committee’s tasks were taken over by the Core Team and one hierarchical layer got removed. This created more flexibility and readiness for the Core Team, and easier access for new contributions. Additionally, the core development mailing list was opened and created a direct communication channel between the core and periphery. The activity level increased drastically on the mailing list and this initiative more than doubled the amount of incoming patches to the core list and thus freed the Core Team members to also be able to pursue larger projects to a much higher extent than before. The initiative was thus successfully implemented. + + +The second change initiative “Founding of a non-profit organization called the TYPO3 Association” intended to create a committee structure, which resembled a functional organizational structure. It consisted in establishing a non-profit organization called the TYPO3 Association and was initiated by the project founder. This complex task demanded deliberate action and took many discussions, especially during the Core Team meetings and TYPO3 conferences. The main goals of the Association were to support core development on a steadier basis and improve the efficiency of the project by “providing a central hub from which to support active developers as well as to concentrate its members into a pool of regular contributors” (mailing list). The TYPO3 Association was meant to support core development by providing funds to take care of the development that was not taken care of by the commercial interests. One way was through donations, i.e. individuals who earn their income (or part of it) by using this open source software choose to give some of this income back to the community in form of donations. Another way was membership, i.e. firms and individuals could become members of +the Association by paying an annual fee, which was used to sponsor software development in TYPO3. Furthermore, the Association was able to create transparency regarding decision-making, roles and activities. The change initiative was thus successfully implemented and the Association created a period of growth under goal-oriented and integrative leadership of the board whose chairman was the project leader. + + +The third change initiative “New team structure” was a deliberate and direct response to the rapid community growth. The project founder was the change agent behind this initiative that sought to make particular responsibilities and tasks explicit in order to create more transparency in project activities (and not only at the upper echelons of the Association). At the team level, therefore, it was determined that the following should apply to team leaders’ tasks: (i) leaders are solely responsible for the team; (ii) members are appointed/accepted by the leader; (iii) decisions are made by the leader (however, agreement is sought with the team members as far as possible); (iv) delegation of tasks is encouraged; and (v) a minimum timeframe is set for the leader’s response to team members’ requests. By defining responsibilities, the community attempted to introduce a measure of accountability in team performance, which was considered vital in this virtual context due to the voluntary nature of participation. To formalize responsibilities and tasks, the project founder thus introduced “team contracts”. These contracts served the purpose of creating synergy between the already existing teams through elaboration of a written mission statement, which, as a minimum, contained the following team information: the team’s position in the organizational structure (i.e. to which committee or project does the team belong?), a description of the team’s mission, a specification of the team’s responsibilities, the name of the team leader, and the rules for becoming a team member. Although these contracts were introduced, tasks were still taken on by self-assignment. The motive underlying the team +contracts was to define two aspects: responsibility and authority. However, team contracts never really gained momentum and attempts at introducing formal authority at the team level did not succeed either. The initiative failed because the attempted structure left too few degrees of freedom to the project contributors. The type of executed authority resembled that of hierarchy (Demil & Lecocq, 2006; Powell, 1990) and unintentionally led to authority erosion. This accentuated the need for more autonomy with regard to following one’s own “personal itch”. + + +Finally, the aim of fourth initiative “Installing usability as a mindset” was to redirect the project’s focus towards product usability. At the time, the project’s focus was almost entirely technical in nature, which limited the product’s appeal to those customer segments with low technical skills, e.g., a secretary who edits the content on a company website: “A lot of OSS is created by technicians for technicians. […] And then there are those [users] who use [the software] every third week. They don’t demand that many functions; they demand that they don’t need to remember how [the software] works because they are only using it every third week” (interview, project founder). + + +The wish to introduce a greater degree of product usability was put forward by a newcomer to the TYPO3 community in 2001. This newcomer, i.e. a periphery member of the community, became the change agent, who made an explicit decision to launch a process of change, making this initiative a case of deliberate change. He was a software designer by profession and realized the need for TYPO3 to improve its design. The idea remained in the background until 2006, when the project leader established the human-computer interaction (HCI) team and an appertaining mailing list, which was intended to act as “the melting pot for ideas about usability improvements” (the HCI team mailing list). However, the progress was slow. A breakthrough first came about when the change agent started making a more focussed +effort to implement the usability idea. In the end, the change initiative was successfully implemented. + + +While our findings are based on the analysis of all the observed initiatives in the community, we selected the fourth initiative “Installing usability as a mindset” as a representative initiative to illustrate the general traits of the organisational change mechanisms that drove the success of the change initiatives. By focusing the presentation of the study’s results on one particular change initiative our intention was to promote clarity and comprehensibility of the findings. + + +In the following, we present our findings, which consist of the four mechanisms that our analysis revealed as central drivers of successful, deliberate change management in the community (Table 4). + + +--- Table 4 --- + + +Individual initiative + + +Our data first of all reveal the community cannot be expected to embrace a change initiative—regardless of its inherent value to the community—unless there is a persistent change agent who will bring the initiative from the point of inception to successful implementation. This is a direct consequence of the absence of formal power and hierarchical influence in OSS communities. Since community members cannot be ordered to do something, they have to be persuaded to become active. The change agent of the HCI project expressed the difficulties in doing so by saying, “You can find developers that are interested in [design] topics, but you don’t +really get very far. And that’s what we experienced with the HCI team…a lot’ (interview, change agent). + + +Even if a change agent has the right idea and engages with the right community members, this is not enough to set the change in motion. As a consequence, the change agent persevered for four years before the concept of usability penetrated the prevailing mindset and culture of the community. Persistence involves a high dose of patience, primarily because the community also needs time to adapt to organisational changes. This need was pointed out by one core member of TYPO3: “There is a gap between the design of the organisation and letting the organisation accumulate around the design…giving time to people to flock to the teams” (R&D committee meeting, core member). + + +We found clear indications that it is less about organisational planning and decision making and more about individual effort and achievement that motivate community members to contribute to a change initiative. + + +Do decisions matter in OS [communities]? No. The one thing that matters is what is actually done. Post factum situation. By doing things, people make decisions. If we make a decision, it doesn’t mean that people will be motivated to implement it, work by it. The only thing that matters is action. Consult people, hook them up with knowledge and resources, and hope that they do what you would like, what you expect… we should think of ourselves as service providers. (R&D committee meeting, core member) + + +This was one of the key statements of our investigation, outlining the structure of an individual initiative as clearly as possible. This view was also supported by a project founder of TYPO3 with the short statement, “First you have to do things yourself, and then others will follow” (interview, project founder). +Before taking action, the change agent of the HCI project reflected upon what motivated him and other developers to do work for the project leader. He found a key driver was the project leader’s “front guy and guru status” and the fact that “he usually keeps his promises and is able to do huge workloads” (interview, change agent). Based on this insight the change agent tried to motivate others to participate in the HCI team: “I tried to find guys who were motivated by my work and then do work for me” (interview, change agent). The success of this approach was evident already in 2007 when the change agent became the HCI team leader. This success was also recognised by other community members: + + +Someone from the usability mailing list comes up with a nifty and good-looking screenshot and proposes his usability changes to the core developers. They are fascinated and go implement it because it seems like a really great idea to them. Especially [the change agent] has been very successful with this way of getting his suggestions implemented, and now he’s the HCI team leader. (interview, core member) + + +And even: + + +I don’t know how many have seen the PDF [the change agent] produced, but I saw it and also met him in Frankfurt before the PHP conference ([core team member name] and I joined a meeting of him and [project leader])—and there is hard and impressive work being done. (core team mailing list, core member) + + +In the end we found the role of change agents in communities is similar to that of product champions who experience progress over time only through persistent and enthusiastic effort (Tushman & Anderson, 1986). Persistence and leading by example are traits that define a change agent’s degree of individual initiative. Persistent change agents who are able to self-motivate and self-direct their performance, i.e., to exercise self-leadership (Manz, 1986), are an essential part of any organisational change initiative in OSS communities because it takes a great deal of time and persuasion to garner acceptance and support for any organisational change. +A change agent demonstrating high levels of commitment (personal motivation and skills) may develop mutual, cognitive-based trust, which, in turn, may strengthen the community members’ readiness to engage and collaborate (Chowdhury, 2005; McAllister, 1995). Thus, we put forward the following proposition, which is grounded in the above and similar behaviours observed in the other three change initiatives (Table 3): + + +Proposition 1: + The individual initiative of change agents is positively related to a successful implementation of deliberate organisational change initiatives in communities. + + +Reputation and reputation lending + + +Power struggles were visible during the change process for each initiative. For instance, during the observed R&D committee meeting, one member left the room because he was frustrated the rest of the group did not support his views. He was arguing against an excessively predetermined team structure, which was about to be implemented. However, he lost the debate because he was arguing against the stance of the change agent responsible for the particular change initiative, who had a higher status within the community. It was later revealed the opposing member was actually right and the team structure was, in fact, too prescriptive. This example shows how difficult it is to accomplish anything without the support of community members with higher social statuses. This difficulty exists even when the difference in social status between the change agent and the supporting high-status member is rather low (e.g., when both were members of the core team). + + +We find that, by lending their reputations to lower-status members, high-status members can share their influence. This was clearly recognised by the project founder: “And then, it is +clear that for those individuals who have that kind of naturally given power, as I for example have, it is natural that other individuals whom we appoint and those close to us easily gain influence” (interview, project founder). + + +In situations where a change agent has a rather lower status in the community, as was the case in the early days of the HCI team, the change agent can gain influence by teaming up with one or more community members who enjoy a high-status reputation. + + +In the case of the HCI team, the change agent “did a lot of work for [the project founder]” to establish himself as a worthy community member. Eventually, he was invited to a TYPO3 Board meeting to discuss usability issues: “With [the project founder] at [the ] T3 Board we talked about why Drupal is easier than TYPO3 or why WordPress is easier than TYPO3”. By linking to high-status members in this way, the change agent gained respect and support from the high-status core members. They addressed the change agent in complimentary terms and praised his work: “As the usability guru, please give me your feedback on the description of the two mentioned features in the page tree below…” (core team mailing list, core member). + + +But after he was appointed HCI team leader, it was evident the he had not yet gained the same respect from other members, as they were systematically circumventing the HCI team and instead discussed the usability issues on the core team’s mailing list. An effort was made to redirect the attention towards the HCI team, in particular towards the role of the change agent, endorsing him and building his authority. Some examples of that include: + + +[By the way], this is [user interface] change, so it can be committed only if you get approval from [the change agent]. (core team mailing list, core member) +I agree with all this but we do not have anyone else properly educated in these questions. I do not trust anyone else in [the] HCI field for TYPO3 because no one showed good HCI skills so far. [The change agent] is the only one who did. (core team mailing list, core member) + + +You might also have watched the podcast issue [2] where [the change agent] demonstrates some great ideas about usability improvements in TYPO3 or have seen the PDF [3]. (core team mailing list, core member) + + +In the subsequent period the activity levels in the HCI team increased significantly. However, there seemed to be no obvious relationship between the content of the change initiatives and the skills of the high-status members supporting the initiatives. This finding implies a potential spillover effect between reputations rooted in technical contributions and reputations rooted in organisational contributions. + + +There were also instances when high-status members (e.g., project and team leaders, core team members, and other respected members) met the change agents halfway. Our data show the leaders in TYPO3 work with the community’s initiatives through a process of mutual adjustments. The leaders notice promising initiatives, assess them, and try to provide them with the necessary resources: + + +I tried to motivate him to build a team around that. I just noticed him. In this way, I try to enable people to work. It’s a bit intuitive also. I [have been] working already for ten years on this system, so the foundation for something like this was probably already laid a couple of years back. (interview, community manager) + + +This type of leadership emphasises intuition and alertness. The main task consists of providing support for change initiatives in the form of knowledge and resources without making decisions on behalf of the community members. Rather, the leaders establish the infrastructure and framework that will hopefully assist the community change agents in paving the way for the intended improvements and changes. +High-status members lend their lateral authority and reputation to a change agent by providing any type of visible support, even if it is only verbal in nature. One reason this method works is that high-status members’ support provides the change agent with credibility, which is crucial if the initiative is to stand a chance of being implemented (Markus & Benjamin, 1996). This finding further suggests community leadership is shared via reputation lending, which also facilitates organisational changes in communities. Therefore, based on the above and similar behaviours observed in the other three initiatives (Table 3) we make the following prediction: + + +Proposition 2: Reputation lending (from high status to lower status members) is positively related to a successful implementation of deliberate organisational change initiatives in communities. + + +Change-oriented communication + + +We found communication about change initiatives was essential to their successful implementation. Through meetings and presentations to small and large target audiences at various community events, change agents in TYPO3 communicated the rationales and arguments behind the initiatives. Still, it took the change agent behind the HCI initiative a long time to realise communicating the idea about usability was vital to its success. The change agent attracted support for the usability initiative by communicating (in a change-oriented fashion) the basic ideas behind the concept in several rounds of presentations to the developer community: “This is why [the project founder] and I decided that maybe we just need to find out how we can change that point of view to guide developers in a different direction—so a typical marketing and communication thing” (interview, change agent). +From 2007 to 2008, the change agent tried to motivate the community by communicating the relevance of usability to TYPO3 through presentations at the community’s main yearly events. + + +The first presentation was just about usability flaws, ten major usability flaws […] at the Developer Days in 2007. Then, in 2008, at T3Con, I held a presentation about what can be done in a positive way with usability, solutions and future interfaces like, for example, the interfaces in “Minority Report” […]. If I look back, that was the second phase to motivate [people], saying, “Look, that’s possible if we work together”, and “Wouldn’t it be fun to have some amazing interfaces in there?” (interview, change agent) + + +In all observed projects, the presentations helped change agents to gain the community’s trust in them and their capabilities. + + +After I showed them [through presentations] that it could really get done, they kind of trusted in the words I said. Because usually it’s a very inner circle, only developers with developers, so they could trust each other. They have the same language. But now, there comes this strange design guy and he says, “You are doing everything wrong; you have to change everything, and you don’t even have the knowledge to understand what you are doing wrong.” That doesn’t really end in trust. (interview, change agent) + + +In addition to establishing the trustworthiness of the change agent (Gurtman, 1992), the change-oriented communication process in TYPO3 also helped stimulate the community members to participate because the process also aimed to educate the target audience about the attempted changes. The community developers were the target: “Then through the Usability Week, we started, in some way, to educate [people]” (interview, change agent). + + +This facilitation of community participation resembles a particular dimension of shared leadership, called voice, which is known to increase a person’s social influence among the members of a community (Carson, Tesluk, & Marrone, 2007). During the change initiatives, which had a successful outcome, the change agents excelled at initiating and facilitating +constructive, change-oriented dialogue and debates around how the community should achieve the needed changes. Thus, voice boosted the change agents’ level of social influence by increasing immersion and participation through various means, such as opening the core team’s mailing list (under a set of rules) to the rest of the community, implementing rotating release managers, presenting ideas at community events, and establishing Usability Week. Voice in the form of change-oriented communication may be associated with successful change implementations because voice is based on interpersonal events that promote communication and feedback, which, according to Ryan and Deci (1985), catalyse feelings of competence and thereby stimulate intrinsic motivation. Based on the above and on similar behaviours exhibited in the other three initiatives (Table 3), we make the following prediction: + + +Proposition 3: + Change-oriented communication is positively related to a successful implementation of deliberate organisational change initiatives in communities. + + +Motivation through challenging tasks + + +Because of the self-assignment principle (Crowston et al., 2007), one of the major challenges in open-source communities is motivating developers to work on tasks that are uninteresting but necessary to complete (Lakhani & von Hippel, 2003). We can see this problem extends to organisational change initiatives. This was also recognised by the change agent of the HCI project, “[…] usability topics are not really challenging for developers usually. It’s about removing staff, making staff simple, and that’s usually not the challenge for developers. It’s a challenge for me as designer” (interview, change agent). The resulting challenge was put more +generally by one member of the core team: “We were uncertain how to get people to do some of the more boring and time-consuming, but essential, tasks” (interview, core team member). + + +Working with usability demanded the developers overcome three fundamental tasks. First, the developers needed to become motivated to work on usability issues. Second, the TYPO3 community had to attract skilled software designers who possessed the necessary knowledge regarding usability. Third, the change agent had to find a way to stimulate the developers to follow the designers’ recommendations. + + +To motivate developers to work on usability issues, the change agent came up with the idea to create “fake challenges […] to motivate them to finish the goals” (interview, change agent). His approach was based on the idea that developers would be more willing to work on their tasks if they perceived them to be challenging. + + +After a while I came up with the idea to have a ‘Usability Week’. The concept was pretty simple. I rented a castle for one week, and I locked 30 developers in that castle, and they had a certain task they needed to solve within that one week. So, the challenge was there in some way because they needed to solve the problem in one week, which is kind of tough because the problems I took [on] were too huge to solve in one week. So, there was a challenge even if the task was simple because they had time pressure. (interview, change agent) + + +During Usability Week, five mixed teams were created. Each team consisted of three developers, one core developer, one manager, and one designer. Each day of the event three meetings took place. The meetings were designed to streamline the tasks and motivate the teams. + + +To attract designers to the TYPO3 community and the usability project, the change agent used a different set of tools. He created an entrance barrier that the designers needed to overcome before they could join the community. +My major wish through that Usability Week wasn’t to solve those tasks but to find more designers who [were] able and motivated to join the TYPO3 community. My idea to make it more interesting to them was, again, to make it a little bit more complicated because they had to apply to the Usability Week. So, we had about 60 or 70 applications and only 30 places. In the end, only five designers out of 50 could join, and they were somehow charmed because they could attend and others couldn’t. It really worked out and they really stuck to the project and until today [are] doing some design work. (interview, change agent) + + +Finally, to motivate the developers, the change agent needed to make the tasks related to usability issues more challenging. He achieved this by incorporating (i) novel task structure and content and (ii) freedom to execute the tasks in a different way than usual into simple problems. By doing so, the change agent successfully motivated the developers to solve those problems. + + +For example, to structure a website we have something called a ‘page tree’, which looks like the tree in Explorer on your Windows machine, and that’s kind of very old style, how it is done […]. However, there is a framework called XJS, written in Java Script, and that is interesting for developers because it’s a new technology in some way and a new framework, and it’s hard to implement, and they need to change a lot. So, I decided that they should use XJS for that page tree, even if we don’t need it, but then I would be sure that in the end I would have the page tree I wished to have and they would have a challenging task to actually do it instead of writing some lines by themselves to change [the page tree]. (interview, change agent) + + +We really had the freedom to totally change the core… Actually, the way […] we worked… we [were] taking the beta version of 3.9 back in time, and we just coded anything we liked inside the core. Usually, someone who creates an extension is [told] “never touch any core file”, [but here] we could really go deeply inside and delete files, replace files totally, and we did not have to focus on keeping [it] compatible with the old code and being compatible with the old […] extensions. (interview, developer) + + +In the case of the HCI project, Usability Week turned out to be quite successful: + + +They were challenged by whether they could reach the goals. This really moved the project hugely forward in one week […] In the end, I have to say, we didn’t reach any of our goals […] But they +got pretty far, and it really gave the whole [usability] project a new motivation. (interview, change agent) + + +The self-assignment of tasks, which is the prime mechanism for work division and task allocation in OSS communities, is obviously an issue if the tasks do not attract enough interest and, consequently, remain undone. Task challenge here refers to a continuum ranging from low-to high-stimulation tasks (e.g., highly routinized tasks versus non-standardized, original tasks). The case of TYPO3 shows that increases in task challenge due to, for example, entrance barriers, competition, level of within-task stimulation, task novelty, or freedom to execute a task in a new way, can compensate for an initial lack of personal desire, which would normally drive the self-assignment of tasks. Our analysis shows that in the case of tasks related to the implementation of organisational change initiatives, the change agent needs to increase the perceived task challenge in accordance with the skills and interests of the targeted members. Thus, task challenge should be seen as a dynamic factor dependent on the person-task interaction (Campbell, 1988). Task challenge is associated with increased participation because it appeals to intrinsic motivation, the primary motivational factor in open-source communities (Lakhani & Wolf, 2005). In turn, increased participation improves performance (Hackman & Oldham, 1976; Herzberg, 1959). Furthermore, creating entrance barriers to team membership proved effective at activating a sense of achievement and recognition as stimuli (Herzberg, 1959). Hence, based on the above and the other three observed change initiatives (Table 3) we make the following prediction: + + +Proposition 4: Increased task challenge is positively related to a successful implementation of deliberate organisational change initiatives in communities. + + +Discussion +This study offers the first comprehensive investigation of deliberate change in OSS communities. It presents clear indications that OSS communities are indeed capable of changing deliberately and, therefore, not doomed to fail in the long run. A change is deliberate because it is desired by a community member—the change agent—and then supported by a sufficient coalition within the community; in the observed HCI project, the change initiative was carried out with the clear goal of improving the usability of TYPO3. + + +Our study also shows that in OSS communities deliberate change is highly dependent on change agents who play an essential role in managing the key tasks of change processes: (i) change agents recognise the need for change and translate that into organisational goals; (ii) they create a sense of urgency and convince community members to make decisions in this matter; and (iii) they push the change process and ensure things are getting—often by doing things on their own. This is a clear contrast to hierarchical business organisations, where change is mostly driven by leaders with positional power and/or special functions and change agents only play a secondary role. Against this background, this study of deliberate change in OSS communities focuses on the investigation of change agents and the success drivers of their initiatives. The insights of this study can be summarised in a simple model: + + +--- Figure 2 --- + + +These findings are first of all relevant for the research on non-hierarchical organizational settings such as OSS communities. They provide insights into an area that was vastly under-researched so far. In addition, knowledge of change is as important for collaborative communities as it is for traditional business organisations because (i) it allows designing change processes more purposefully and (ii) it provides insights into the long-term behaviour of collaborative +communities in relation to their (competitive) environment. As long as they are based on a similar governance structure, there is good reason to assume these findings also apply to other types of communities of practice not related to software development (Bridwell-Mitchell, 2015). This gives a broader relevance to our findings since the importance of communities is increasing in an information- and knowledge-based economy (O’Mahony & Ferraro, 2007). + + +However, the findings of this study also include some quite interesting and relevant findings that go beyond communities and also concern change processes in traditional business organisations. In this way, our paper can also contribute to the broader change literature. The elements of the above change model are not all completely new. We already know about change agents, informal power and leadership from investigations of other contexts. What is new and important, however, is that the complete absence of formal power does not prevent the execution of deliberate change and the critical role of change agents to drive the process. OSS project leaders and core team members do not have formal command authority to enforce decisions (von Hippel & von Krogh, 2003). This is also clearly illustrated by especially the third change initiative “New team structure” (Table 3), in which the project leader and founder was the change agent. Although he kept the team contracts on the agenda for two years, he was unable to implement this initiative. Had he had any kind of formal fiat in the community, this initiative would probably have lead to a different outcome. But OSS communities “do not rely on employment contracts and so are unable to be governed by formal authority, as is the case in a hierarchy” (Demil & Lecocq, 2006, p. 1454). This allows for some quite interesting perspectives and insights. + + +The first important finding is the apparent irrelevance of decision making in a hierarchical sense, as expressed by community members. This point needs some clarification. It does not +mean there is no deliberate planning or decision making taking place in OSS communities. Instead, these statements relate to their power structure. In his article, Finkelstein (1992) distinguished various forms of management power. As outlined above, OSS communities are characterised by the inherent absence of formal power (‘structural power’ in the terminology of Finkelstein, 1992, p. 509, i.e., the “legislative right to exert influence” over others). Other forms of informal power, like ‘expert power’ and ‘prestige power’ not only exist in OSS communities, but they play an important role in the informal leadership that provides the foundation for the significance of the community’s core team (Fleming & Waguespack, 2007; O’Mahony & Ferraro, 2007). Individual initiative (proposition 1) as a mechanism of change resembles some change factors observed in ‘traditional’ organizations with formal leadership (i.e. hierarchies, Demil & Lecocq, 2006). Similarly to community change agents, agents in hierarchies make use of exemplary change or leading by example (Kotter, 2012). Also, individual initiative bears resemblance to the tasks performed by change champions (Ulrich, 1997) and product champions (Day, 1994), such as providing impetus for and strongly promoting the change initiative. However, the apparent irrelevance of decision making in community change points to a structural power deficit of change agents with regard to change initiatives. Change agents are able to convince relevant community members, decisions are made, and tasks are distributed, but this does not often result in action. In these situations, decisions are only relevant to legitimise the activities of change agents, not to trigger action. Often, change agents have to keep pushing to get things done; in other cases, they have to complete the tasks themselves. Against this background, individual initiative is a strategy to exert influence without formal power. Yet, it has to be noted this strategy only works locally, and informal power is still needed by change agents at other points. Individual initiative might even result in the acquisition of expert and prestige power because it makes change agents and their abilities visible. To date, the meaning of individual +initiative and the structure of low-power contexts are not very well understood. It might be expected that individual initiative also plays a role in high-power contexts as a strategy to exert influence without power. However, more research is needed in this regard. + + +Another interesting point is the observations of what we have named ‘reputation lending’ (proposition 2). There is already some research on reputation and advancement in communities and other organisations without vertical lines of authority (Fleming & Waguespack, 2007). Research knows a lot about (i) what authority means for flat hierarchies and (ii) how authority is acquired there (Dahlander & O’Mahony, 2011). In the context of hierarchies, reputation lending parallels coalition formation, support building and gaining sponsorship from individuals with organizational clout, formal authority, and access to resources (Connor, 1998; Day, 1994; Kanter, 1994; Kotter, 2012). Such actions help legitimize the change initiative and the change agent as well as create acceptance of change by those affected (Buchanan & Boddy, 1992). Conceptually, reputation lending is also somewhat close to leader support in hierarchies (Amabile, Schatzel, Moneta, & Kramer, 2004). Leader support means using the formal power of managers to support activities by less-powerful organisational members, often in relation to innovation and change activities. This support can include resources and time, autonomy, and support in organisational decision making (Mumford, Scott, Gaddis, & Strange, 2002). In contrast, reputation lending implies using the informal power of community leaders to support change agents in their activities, mostly by giving them recognition, letting them participate in board meetings and decision-making procedures, and making them and their initiatives more visible in the community. This informal form of support has not been described so far in the literature. Still, this is interesting because the elements of visibility and acceptance play only a minor role in +leader support. This finding indirectly confirms the research showing the importance of informal networks and policy systems for change agent success (Battilana & Casciaro, 2012). + + +We also discovered interesting findings with regards to the motivation of community members to carry out change-related tasks. As discussed in the conceptual section above, motivation has already been the focus of previous research. Lakhani and von Hippel (2003) found that participation in OSS communities is quite rewarding since “98% of the effort expended by information providers in fact returns direct learning benefits to those providers” (p. 923). However, we observed there are change-related tasks that are not rewarding and that it is rather challenging to motivate community members to work on them. In this regard, we observed the strategy of so-called ‘fake challenges’ (proposition 4). The underlying approach is to combine unattractive tasks with motivating elements like competitions or social gatherings. There is an interesting early description of the principle: the fence episode in the novel +The Adventures of Tom Sawyer + by Mark Twain (1876). Most readers perhaps remember: Tom had to paint Aunt Polly’s fence as a punishment after he dirtied his clothes in a fight. He hated this work; however, when one of his friends came to the spot, Tom was able to create the impression that it was a privilege and a pleasure to paint the fence. After a while, he was even able to sell painting permissions to his fellows. In this sense, the change agent was successful in creating a sense of exclusivity by restricting spaces at the challenge and transformed boring work into a socially attractive event. To our knowledge, this strategy has not been described by research on OSS communities so far. Ultimately, the strategy of creating challenging tasks is expected to improve the community members understanding and sense of ownership of the change initiative, and eventually enhance their motivation to participate in executing change. In that sense, this approach has the same objective as, for instance, empowerment of organizational members, +which is an important element in the change leadership literature within the context of hierarchies (Caldwell, 2003; Gill, 2003; Goffee & Scase, 1992). While both strategies thus seek to remove obstacles to change, they are in fact each other’s opposites. One strategy uses task design to deal with the downsides of an innate characteristic of OSS communities, i.e. member autonomy. The other, however, seeks to increase member autonomy in a hierarchical setting, where strong administrative controls provide formal powers to supervise and regulate the behaviour of organizational members (Demil & Lecocq, 2006). + + +Although change processes have been theorized about and practiced in a variety of ways, the one finding that deliberate change in OSS communities has mostly in common with change in hierarchies is related to change-oriented communication (proposition 3). Through frequent communication change agents create opportunities for organizational members to understand and give input to the change process (Kotter, 2012). Practicing openness and widespread communication (Buchanan & Boddy, 1992) during a change process increases the chance of successful implementation because organizational communication plays a central role in eroding existing path dependencies (Cohen & Levinthal, 1990), thus paving the way for organizational change. + + +Yet, the most important finding of this study is perhaps the very observation that OSS communities succeed in handling deliberate change processes without any formal or pre-assigned power. Certainly, informal power, persuasion, and group pressure are relevant to manage deliberate change in OSS communities to a certain extent. Situations can arise in which organisational members are faced with the decision to accept change or leave the community. Still, no community member can be ordered to accept change like in traditional business organisations. Nobody can be laid off, and sanctioning possibilities are generally very limited. If +community members comply with change, they do so because they believe in it or at least accept the majority decision. If a change project is not supported by a critical mass of the community, it will not be successful. We call this type of deliberate change ‘change by conviction’. Why is that relevant? If people comply with change voluntarily, there is a good chance negative side effects, resulting from enforcement, will be reduced (even though not completely eliminated because group members might submit to change unwillingly or leave the community). Indeed, we found some indications for that in our data, even though we were not directly looking for it. We are convinced these findings may also be applicable to hierarchical business organisations and that the latter can learn a lot from OSS communities to reduce the level of enforcement in change processes, thereby decreasing the levels of demotivation, insecurity, and resistance. Consequently, the relevance of our findings is much broader and does not only concern non-hierarchical settings such as OSS communities but helps shed additional light on deliberate organisational change in general. More research is, however, needed to substantiate these findings, clarify the impact of different elements of change on negative side effects, and explore possibilities for traditional business organisations. + + +Managerial implications + + +The most obvious managerial implication is that communities need to be aware of the central role of change agents in deliberate change to organise change processes accordingly. This study emphasizes the role and importance of individuals taking initiatives and responsibilities by outlining some critical success factors for realizing deliberate change in non-hierarchical settings such as OSS communities. +Another implication is that hierarchical organizations need also reconsider their use and appreciation of change agents, including self-appointed ones. Change agents are already being used in hierarchical business organisations but often in an unsystematic way. However, the results of this study suggest it would be useful to base all major change projects on change agents here as well. After decisions have been made, change agents can simply be assigned and endowed with the necessary power or supported by top managers. Contrary to the non-hierarchical case analysed in this study, there is no specific individual initiative needed at this point in hierarchical organisations. Still, it might be important for change agents to care more than usual about the second driver in our model and build a reputation for being the right person to organise the change process among all organisational members involved in it. The two last drivers point to communication and education, as well as to motivation. We are convinced a lot can be done to smooth change projects in hierarchical business organisations, and it might be even possible to establish a regime of change by conviction there. + + +Limitations and future research + + +The first limitation of this study is of theoretical nature. When investigating deliberate change in OSS communities, we are touching on a variety of different themes, including leadership, reputation building, informal power, motivation, innovation, and others. Each of these themes can be further developed, and many of them might potentially offer new insights. For the sake of rigour, we decided to focus on change, the meaning of change agents, and the drivers of change agent success. We have targeted this study primarily toward the research conversations on +communities and on change. This is a decision that was made to keep the study focused and detailed. + + +Second, in this study we were not looking at organisational context factors that mediate the effect of the success drivers of change agent activities like the cultural context, size and age of the community, degree of formalisation, or others. We also did not look at the antecedents of change agent activities. This means our study is far from offering a complete model of change agent activity in communities. Still, we think our propositions can be useful stepping stones towards a more holistic model. + + +Analysing classic concepts and/or phenomena such as deliberate change under entirely different and new(er) organizational regimes is important as it not only helps to clarify how such organizational settings work, it also sheds new light on the phenomenon under investigation. In our study, the realization of the phenomenon manifested itself in the form of self-appointment of change agents. While this was necessary for the phenomenon to exist in a completely different and non-hierarchical organizational setting, it also holds potential for being applied in hierarchical settings. + + +Conclusion + + +This study provides evidence that it is indeed possible to change complex organisations deliberately without formal power and hierarchical influence. All change initiatives we observed were grounded in the individual commitment of change agents. However, we also found the success of change agents’ initiatives depended on their ability to get sufficient support within the organisation. Key drivers of this are individual initiative, reputation and reputation lending, +change-oriented communication and education, and motivation through challenging tasks. There is reason to assume these insights also hold for a broader range of organisations, including hierarchical business organisations. This is relevant because there are indications that change by conviction reduces the negative side effects of deliberate change. + + +References + + +Amabile, T. M., Schatzel, E. A., Moneta, G. B., & Kramer, S. J. (2004). Leader behaviors and the work environment for creativity: Perceived leader support. +Leadership Quarterly, 15 +(1), 5-32. + + +Battilana, J., & Casciaro, T. (2012). Change Agents, Networks, and Institutions: A Contingency Theory of Organizational Change. +Academy of Management Journal, 55 +(2), 381-398. + + +Bonaccorsi, A., & Rossi, C. (2003). Why Open Source Software Can Succeed. +Research Policy, 32 +, 1243-1258. + + +Bridwell-Mitchell, E. N. (2015). Collaborative Institutional Agency: How Peer Learning in Communities of Practice Enables and Inhibits Micro-Institutional Change. +Organization Studies +. + + +Brown, A. D., Colville, I., & Pye, A. (2015). Making sense of sensemaking in organization studies. +Organization Studies, 36 +(2), 265-277. + + +Buchanan, D., & Boddy, D. (1992). +The expertise of the change agent +. London: Prentice Hall. + + +Bullock, R. J., & Batten, D. (1985). It's Just a Phase We're Going Through: A Review and Synthesis of OD Phase Analysis. +Group & Organization Studies, 10 +(4), 383-412. + + +Burnes, B. (1996). No such thing as ... a "one best way" to manage organizational change. +Management Decision, 34 +(10), 11. +Burnes, B. (2009). +Managing change: a strategic approach to organisational dynamics + (5th ed.). Harlow, England; New York: Prentice Hall/Financial Times. + + +By, R. T. (2005). Organisational change management: A critical review. +Journal of Change Management, 5 +(4), 369-380. + + +Caldwell, R. (2003). Models of change agency: A fourfold classification. +British Journal of Management, 14 +, 131-142. + + +Campbell, D. J. (1988). Task complexity: A review and analysis. +The Academy of Management Review, 13 +(1), 40-52. + + +Carson, J. B., Tesluk, P. E., & Marrone, J. A. (2007). Shared leadership in teams: An investigation of antecedent conditions and performance. +Academy of Management Journal, 50 +(5), 1217-1234. + + +Chowdhury, S. (2005). The role of affect- and cognition-based trust in complex knowledge sharing. +Journal of Managerial Issues, 17 +(3), 310-327. + + +Cohen, W. M., & Levinthal, D. A. (1990). Absorptive capacity: a new perspective on learning and innovation. +Administrative Science Quarterly, 35 +(1), 128-152. + + +Connor, D. R. (1998). +Managing at the speed of change +. Chichester, UK: John Wiley & Sons. + + +Cromie, J. G., & Ewing, M. T. (2009). The rejection of brand hegemony. +Journal of Business Research, 62 +, 218-230. + + +Crowston, K., Li, Q., Wei, K., Eseryel, U. Y., & Howison, J. (2007). Self-organization of teams for free/libre open source software development. +Information and Software Technology, 49 +, 564–575. + + +Dahlander, L., & O'Mahony, S. (2011). Progressing to the Center: Coordinating Project Work. +Organization Science, 22 +(4), 961-979. doi: 10.1287/orsc.1100.0571 +Day, D. (1994). Raising radicals: Different processes for championing innovative corporate ventures. +Organization Science, 5 +, 148-173. + + +De Souza, G., & Klein, H. J. (1995). Emergent leadership in the group goal-setting process (English). +Small group research, 26 +(4), 475-496. + + +del Val, M. P. (2003). Resistance to change: a literature review and empirical study. +Management Decision, 41 +(2), 148. + + +Demil, B., & Lecocq, X. (2006). Neither market nor hierarchy nor network: The emergence of bazaar governance. +Organization Studies, 27 +(10), 1447-1466. + + +Dunphy, D., & Stace, D. (1993). The strategic management of corporate change. +Human Relations, 46 +(8), 905-920. + + +Eisenhardt, K. M. (1989). Building theories from case study research. +Academy of Management Review, 14 +(4), 532. + + +Etzioni, A. (1964). +Modern organization +. Englewood Cliffs, N.J.: Prentice-Hall, Inc. + + +Finkelstein, S. (1992). Power in top management teams: dimensions, measurement, and validation. +Academy of Management Journal, 35 +(3), 505-538. + + +Fiss, P. C., & Zajac, E. J. (2006). The symbolic management of strategic change: sensegiving via framing and decoupling. +Academy of Management Journal, 49 +(6), 1173-1193. + + +Fjeldstad, Ø. D., Snow, C. C., Miles, R. E., & Lettl, C. (2012). The architecture of collaboration. +Strategic Management Journal, 33 +, 734-750. + + +Fleming, L., & Waguespack, D. M. (2007). Brokerage, Boundary Spanning, and Leadership in Open Innovation Communities. +Organization Science, 18 +(2), 165-180. + + +Gill, R. (2003). Change management — or change leadership? +Journal of Change Management, 3 +(4), 307-318. +Ginsberg, A., & Abrahamson, E. (1991). Champions of change and strategic shifts: The role of internal and external change advocates. +Journal of Management Studies, 28 +(2), 173-190. + + +Goffee, R., & Scase, R. (1992). Organizational change and the corporate career: the restructuring of managers’ job aspirations. +Human Relations, 45 +(4), 363-384. + + +Gurtman, M. B. (1992). Trust, distrust, and interpersonal problems: a circumplex analysis. +Journal of Personality & Social Psychology, 62 +, 989-1002. + + +Hackman, J. R., & Oldham, G. R. (1976). Motivation through the design of work - test of a theory. +Organizational Behavior and Human Performance, 16 +(2), 250-250. + + +Hars, A., & Ou, S. (2002). Working for free? Motivations for participating in open-source projects. +International Journal of Electronic Commerce, 6 +(3), 25–39. + + +Hayes, J. (2010). +The theory and practice of change management + (3rd ed.). New York: Palgrave Macmillan. + + +Herzberg, F. (1959). +The motivation to work +. New York: John Wiley and Sons. + + +Higgs, M., & Rowland, D. (2011). What Does It Take to Implement Change Successfully? A Study of the Behaviors of Successful Change Leaders. +Journal of Applied Behavioral Science, 47 +(3), 309-335. + + +Hon, A. H. Y., Bloom, M., & Crant, J. M. (2011). Overcoming Resistance to Change and Enhancing Creative Performance. +Journal of Management, 40 +(3), 919-941. + + +Hongseok, O., Labianca, G., & Myung-Ho, C. (2006). A mulitlevel model of group social capital. +Academy of Management Review, 31 +(3), 569-582. + + +Howell, J. M., & Avolio, B. J. (1993). Transformational leadership, transactional leadership, locus of control, and support for innovation: key predictors of consolidated-business-unit performance (English). +Journal of applied psychology, 78 +(6), 891-902. +Huy, Q. N., Corley, K. G., & Kraatz, M. S. (2014). From Support to Mutiny: Shifting Legitimacy Judgments and Emotional Reactions Impacting the Implementation of Radical Change. +Academy of Management Journal, 57 +(6), 1650-1680. + + +Kanter, R. M. (1994). +The change masters +. London: Allen & Unwin. + + +Kanter, R. M., Stein, B., & Jick, T. (1992). +The challenge of organizational change: How companies experience it and leaders guide it +. New York: Free Press. + + +Kesting, P., & Ulhøi, J. P. (2010). Employee-driven innovation: extending the license to foster innovation. +Management Decision, 48 +(1), 65-84. + + +Kirzner, I. M. (1997). Entrepreneurial Discovery and the Competitive Market Process: An Austrian Approach. +Journal of Economic Literature, 35 +(1), 60-85. + + +Kotter, J. P. (2007). Leading Change: Why Transformation Efforts Fail. +Harvard Business Review, 85 +(1), 96-103. + + +Kotter, J. P. (2012). +Leading change +. Boston, Mass.: Harvard Business Review Press. + + +Lakhani, K. R., & von Hippel, E. (2003). How open source software works: "free" user-to-user assistance. +Research Policy, 32 +(2003), 923-943. + + +Lakhani, K. R., & Wolf, R. G. (2005). Why hackers do what they do: Understanding motivation efforts in free/open source projects. In S. A. Hissam, B. Fitzgerald, J. Feller & K. R. Lakhani (Eds.), +Perspectives in free and open source software + (pp. 3-21). Cambridge, MA: MIT Press. + + +Lawrence, P. R., & Lorsch, J. W. (1967). +Organization and environment: Managing differentiation and integration +. Cambridge, MA: Harvard University Press. + + +Lee, G. K., & Cole, R. E. (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux Kernel Development. +Organization Science, 14 +(6), 633-649. +Lerner, J., & Tirole, J. (2002). Some Simple Economics of Open Source. +The Journal of Industrial Economics, 50 +(2), 197-234. + + +Lewin, K. (1951). +Field theory in social science; selected theoretical papers + ([1st ed.). New York,: Harper. + + +Liebhart, M., & Garcia-Lorenzo, L. (2010). Between planned and emergent change: decision maker’s perceptions of managing change in organisations. +International Journal of Knowledge, Culture and Change Management, 10 +(5), 214-225. + + +Locke, K. (2001). +Grounded theory in management research +. London: Sage Publications. + + +Manz, C. C. (1986). Self-leadership: toward an expanded theory of self-influence processes in organizations. +Academy of Management Review, 11 +, 585-600. + + +Markus, M. L. (2007). The governance of free/open source software projects: Monolithic, multidimensional, or configurational? +Journal of Management and Governance, 11 +(2), 151-163. + + +Markus, M. L., & Benjamin, R. I. (1996). Change agentry - The next IS frontier. +MIS Quarterly, 20 +(4), 385-407. + + +Martinez-Torres, M. R., & Diaz-Fernandez, M. C. (2014). Current issues and research trends on open-source software communities. +Technology Analysis & Strategic Management, 26 +(1), 55-68. + + +McAllister, D. J. (1995). Affect and cognition based trust as foundations for interpersonal cooperation in organizations. +Academy of Management Journal, 38 +(1), 24-59. + + +Mehra, A., Smith, B., Dixon, A., & Robertson, B. (2006). Distributed leadership in teams: The network of leadership perceptions and team performance. +Leadership Quarterly, 17 +, 232–245. +Miles, M. B., & Huberman, A. M. (1984). +Qualitative data analysis: A sourcebook of new methods +. Beverly Hills, CA: Sage Publications. + + +Mintzberg, H. (1994). +The rise and fall of strategic planning +. New York (NY): The Free Press. + + +Mintzberg, H., & Waters, J. A. (1985). Of strategies, deliberate and emergent. +Strategic Management Journal, 6 +(3), 257-273. + + +Mockus, A., Fielding, R. T., & Herbsleb, J. (2002). Two case studies of open source software development: Apache and Mozilla. +ACM Transactions on Software Engineering and Methodology, 11 +(3), 309–346. + + +Mooney, J. D., & Reiley, A. C. (1939). +The principles of organization +. New York: Harper and Brothers. + + +Moran, J. W., & Brightman, B. K. (2001). Leading organizational change. +Career Development International, 6 +(2), 111-118. + + +Mumford, M. D., Scott, G. M., Gaddis, B., & Strange, J. M. (2002). Leading creative people: Orchestrating expertise and relationships. +Leadership Quarterly, 13 +(6), 705. + + +Nelson, R. R., & Winter, S. G. (1982). +An evolutionary theory of economic change +. Cambridge, Mass.: Belknap Press of Harvard University Press. + + +O’Mahony, S., & Ferraro, F. (2007). The emergence of governance in an open source community. +Academy of Management Journal, 50 +(5), 1079-1106. + + +Patton, M. Q. (2002). +Qualitative research and evaluation methods + (3rd ed.). Thousand Oaks, CA: Sage Publications. + + +Petkova, A. P., Rindova, V. P., & Gupta, A. K. (2013). No news is bad news: sensegiving activities, media attention, and venture capital funding of new technology organizations. +Organization Science, 24 +(3), 865-888. +Powell, W. W. (1990). Neither market nor hierarchy: network forms of organization. +Research in Organizational Behavior, 12 +, 295-336. + + +Ryan, R. M., & Deci, E. L. (1985). Intrinsic and extrinsic motivations: Classic definitions and new directions. +Contemporary Educational Psychology, 25 +, 54-67. + + +Scacchi, W. (2002). Understanding the requirements for developing open source software systems. +IEE Proceedings--Software, 149 +(1), 24-39. + + +Schumpeter, J. A. (1934). +The theory of economic development; an inquiry into profits, capital, credit, interest, and the business cycle +. Cambridge, Mass.,: Harvard University Press. + + +Scott, W. R. (1981). +Organizations: rational, natural and open systems +. Englewood Cliffs, NJ: Prentice Hall. + + +Sharma, S., Sugumaran, V., & Rajagopalan, B. (2002). A framework for creating hybrid-open source software communities. +Information Systems Journal, 12 +, 7-25. + + +Somech, A. (2006). The Effects of Leadership Style and Team Process on Performance and Innovation in Functionally Heterogeneous Teams. +Journal of Management, 32 +(1), 132-157. + + +Strauss, A., & Corbin, J. (1998). +Basics of qualitative research - techniques and procedures for developing grounded theory + (2nd edition ed.). London: SAGE Publications. + + +Teece, D. J., Pisano, G., & Shuen, A. (1997). Dynamic capabilities and strategic management. +Strategic Management Journal, 18 +(7), 509-533. + + +Tushman, M. L., & Anderson, P. (1986). Technological Discontinuities and Organizational Environments. +Administrative Science Quarterly, 31 +(3), 439-466. + + +Twain, M. (1876). +The adventures of Tom Sawyer +. Toronto: Belford Bros. + + +Ulrich, D. (1997). +Human resource champions +. Cambridge, MA: Harvard University Press. +Van De Ven, A. H., & Poole, M. S. (1995). Explaining development and change in organizations. +Academy of Management Review, 20 +(3), 510-540. + + +Volberda, H. W., Van Den Bosch, F. A. J., & Mihalache, O. R. (2014). Advancing Management Innovation: Synthesizing Processes, Levels of Analysis, and Change Agents. +Organization Studies, 35 +(9), 1245-1264. + + +von Hippel, E., & von Krogh, G. (2003). Open Source Software and the 'Private-Collective' Innovation Model: Issues for Organization Science. +Organization Science, 14 +(2), 209-223. + + +Vujovic, S., & Ulhøi, J. P. (2008). Online innovation: the case of open source software development. +European Journal of Innovation Management, 11 +(1), 142-156. + + +Waddell, D., & Sohal, A. S. (1998). Resistance: a constructive tool for change management. +Management Decision, 36 +(7/8), 543. + + +Wylie, N., Sturdy, A., & Wright, C. (2014). Change agency in occupational context: lessons for HRM. +Human Resource Management Journal, 24 +(1), 95-110. + + +Yates, M. (2000). Developing leaders in a global landscape. In D. J. Giber, L. Carter & M. Goldsmith (Eds.), +Linkage Inc.'s best practices in leadership development handbook: Case studies, instruments, training + (1st ed.). San Francisco, CA: Jossey-Bass/Pfeiffer. + + +Yin, R. K. (1994). +Case study research: design and methods + (2nd ed.). Thousand Oaks, CA: Sage Publications. +Biographies: + + +Sladjana Nørskov is an External Lecturer at the Department of Management, Aarhus University. She received her Ph.D. from Aarhus School of Business. Her research interests include organizational development, user-centered innovation processes, community governance, and new organizational forms. + + +Peter Kesting is an Associate Professor of Management at Aarhus University, Denmark. His research interests primarily concern innovation management, the cognitive and conceptual foundations of routine and decision-making, negotiations, and the life and work of Joseph A. Schumpeter. + + +John Parm Ulhøi is a Professor of Organization and Management Theory at Aarhus University. His research interests include organisational development, new forms of organising, human and social capital, and innovation and entrepreneurship. Over the years, he has served as TIM-Division Board Member of the Academy of Management and as Editorial Board member of various journals. He has served as member of various International Expert Boards such as, for example, Directorate-General Research, The European Commission; Israel Science Foundation; European Science Foundation; The Belgian Office for Scientific, Technical and Cultural Affairs; The Research Council of Norway. +Figure 1. The growth of TYPO3 depicted as the number of registered developers, references, and extensions (2003-2005).\textsuperscript{1} Source: http://typo3.com/ + + +\begin{figure} +\centering +\includegraphics[width=\textwidth]{typo3_growth.png} +\caption{The growth of TYPO3 depicted as the number of registered developers, references, and extensions (2003-2005). Source: http://typo3.com/} +\end{figure} + + +\textsuperscript{1} The graph shows the number of registered developers from 2003 to 2005. Unfortunately, reliable statistics for the ensuing years could not be obtained. + + +Figure 2. Model of the moderators of change initiatives in OSS communities + + +\begin{figure} +\centering +\includegraphics[width=\textwidth]{change_initiatives.png} +\caption{Model of the moderators of change initiatives in OSS communities} +\end{figure} +Table 1. Topics discussed in the R&D Committee’s mailing list + + +| Number, # | Governance-related postings | Technical postings | Other | Sum | +|-----------|-----------------------------|--------------------|-------|-----| +| 201 | 21 | 13 | 235 | +| 85.5 | 9.0 | 5.5 | 100 | + + +Table 2. Data sources + + +| Data source | Description | Purpose | Time | +|-------------|-------------|---------|------| +| Mailing list I | 235 postings from the R&D Committee mailing list | Insight into the contributions and role of each Committee member; an in-depth understanding of the organizational tasks and issues and how they were addressed | 2006 | +| Mailing list II | 1,088 postings from the HCI Team mailing list | Understanding organizational developments within the HCI (Usability) Team. Related to a particular change initiative. | 2006-2009 | +| Mailing list III | 1,191 postings (selected for their relevance from a total of 13,587 postings) from the Core Team mailing list | Understanding the interactions between the core and the periphery and how the interactions developed over time. Actions and reactions related to the identified change processes. | 2006-2008 | +| Interviews | 11 interviews: - 1 interview with the project founder - 1 interview with the community manager - 9 interviews with 9 Core Team members, out of whom 7 were also members of the R&D Committee | Understanding of the community, its history and development, and change in TYPO3. Managing change in TYPO3; follow-up on specific developments and change initiatives. | 2006-2010 | +| Observation | 18 hours (a two-day R&D Committee face-to-face meeting) | Insight into issues regularly addressed by the R&D Committee. The observations revealed a range of organizational issues and | 2006 | +Archival documentation + + +Project description, bylaws, videos of conferences and meetings, summaries of meetings, and news + + +Learning about the formal regulations and structures of the community. +Crosschecking some of the facts uncovered during the observation activities and interviews. + + +2006-2010 + + +Table 3. The four change initiatives + + +| Change initiative | Components of the change initiative | Rationale behind changes | Change agent | Outcome | +|-----------------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|------------------| +| Reorganization of product development | New work processes + +Feedback + +Gate keeping + +Closer interactions + +Release management | Motivate contributors via feedback, gate keeping, and closer interactions, which were expected to act as rewards and retention mechanisms. Release management improved after setting up strict development phases. | Core member | Successfully implemented | +| Founding of a non-profit organization called the TYPO3 Association | Create a committee structure (similar to a functional structure) | Support core development on a steadier basis. Improve the efficiency of the project by "providing a central hub from which to support active developers as well as to concentrate its members into a pool of regular contributors." | Project founder | Successfully implemented | +| New team structure | Establishing 'Team Contracts' for each team. Implement a more transparent structure with clear responsibilities, increased team autonomy, elaborate structure. | Ensure responsibility and accountability for each task and role. | Project founder | Unsuccessful | +| Installing usability as a mindset | Usability as a mindset + +Changing the mindset of developers. + +Bringing software developers and designers together. | Create a team that would work to increase the usability of the TYPO3 system. Developers usually lack the user perspective. Designers are needed to create more user-friendly software. | Periphery member | Successfully implemented | +| Individual initiative | | +|--------------------------------------------------------------------------------------|------------------------------------------------------------------| +| Persistence | You need to be extremely enthusiastic and not afraid of setbacks because you will experience many, and it will take a long time to make changes happen. (Interview, core member) | +| Leading by example (creating credibility and merit in the community to gain followers for the change initiative) | But what didn’t work out is that I couldn’t motivate persons just to follow the guidance of my changes. So I created about, I would say, 200 mock-ups. And about 10 percent have been realized in TYPO3 until today. (Interview, change agent) So you need to prove to them that you have the skills and that you are able to assess their solutions. (Interview, change agent) | + + +| Reputation and reputation lending | | +|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------| +| Endorsement by high-status members to the change agents | I also realized that [change agent’s name]– who is one of the most active participants in here – has been continuously working on a lot of TYPO3 HCI Topics: + +- […] New Installer 2.0 + +- Backend interface improvements for TYPO3 4.2 + +- TemplaVoilá 2 (together with [name]) + +- Starting to work on Extension Manager 2 (with [name]) + +- And finally, [change agent’s name] is also an active member of the TYPO3.org redesign group (Core Team mailing list, core member) | +| Redirecting attention and work efforts towards the initiative | > Could you tell us a bit more about this? Maybe in the [developer list]? + +Answer: Or HCI, that is. Please continue the discussion there. (…) can you re-send your mail in the HCI list please, once you feel like you want to continue the discussion. (Core Team mailing list) | +| Proactive recognition and support of initiatives by high-status members | It’s more of keeping this big overview and picking the cherries. It is a dynamic system. I never have an idea all of a sudden. (…) It’s mostly about things that are already under way. (Interview, core member) | +| | You work mostly with the things that are going on and try to find little suggestions or ask someone else: “What do you think about this idea, about this project? Do you have anything to add to that?” (…) It’s mostly that there are already ongoing projects. As a community manager I see, okay, this guy is working on it and this guy is working on it, and I try to connect them. (Interview, community manager) | + + +| Change-oriented communication | | +|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------| +| Inform and educate the community about the rationale and arguments behind the initiatives | +|---| +| The breakthrough was the presentation for 5.0 with a guy called [name]. After that presentation, the spirit in the community changed because they saw that it is really possible to do this. […] (Interview, change agent) | +| I just watched the HCI podcast and was really impressed. Once we get there, we can all be very proud of not only a flexible product but a user-friendly product as well! As an ‘outsider’ to the HCI team, it produced two random thoughts I would like to share with you. […] After viewing the presentation I was overwhelmed when thinking about what it would mean to achieve all this. To really get a consistent look and field, it would require rewriting a lot of code and adapting tons of extensions. Some things like the installer might be easier since it is better modularized. But to achieve major changes, I strongly feel that it would be best to focus on the 5.0 development. (HCI mailing list, developer) | + + +| Motivation through challenging tasks | +|---| +| Novel task structure and content | +| It was exciting for the developers to use a framework that is so powerful, so new, that has so many functions already inside. By just using the framework, we could use a lot of things out of the box that we could never just pluck into the old system. (Interview, core member) | +| Freedom to work in new ways | +| So removing everything and replacing them with totally new components for the whole frame and for the page tree, this was really [going] to bring something totally new in there. Our coding was driven by the huge set of features that were there. Every one of us was coding in the past and was in a position of coding extensions for a customer […] and to create new menu items was never possible in the past […] So we really at some point had the freedom to drop compatibility and this was quite helpful to go fast forward to say, ok, let’s delete everything and create new. (Interview, core developer) | +---------------------------------------- +------------------------------- +Section 193: +When and How to Make Breaking Changes: Policies and Practices in 18 Open Source Software Ecosystems + + +CHRIS BOGART, CHRISTIAN KÄSTNER, and JAMES HERBSLEB, +Carnegie Mellon University, USA +FERDIAN THUNG, Singapore Management University, Singapore + + +Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do. + + +CCS Concepts: • Software and its engineering → Collaboration in software development; Software development process management; Software libraries and repositories; • Human-centered computing → Empirical studies in collaborative and social computing; + + +Additional Key Words and Phrases: Software ecosystems, dependency management, semantic versioning, collaboration, qualitative research + + +ACM Reference format: +Chris Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2021. When and How to Make Breaking Changes: Policies and Practices in 18 Open Source Software Ecosystems. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 42 (July 2021), 56 pages. +https://doi.org/10.1145/3447245 + + +This work has been supported by by NSF awards 1901311, 1546393, 1302522, 1322278, 0943168, 1318808, 1633083, and 1552944, the Science of Security Lablet (H9823014C0140), the U.S. Department of Defense through the Systems Engineering Research Center, and a grant from the Alfred P. Sloan Foundation. + + +Authors’ addresses: C. Bogart, C. Kästner, and J. Herbsleb, Carnegie Mellon University, Institute for Software Research TCS Hall 430, 4665 Forbes Avenue, Pittsburgh, PA 15213; emails: {cbogart, ckaestner, jherbsleb}@cs.cmu.edu; F. Thung, Singapore Management University, School of Computing and Information Systems, 80 Stamford Road, Singapore 178902; email: ferdiant.2013@smu.edu.sg. + + +Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. + + +© 2021 Association for Computing Machinery. +1049-331X/2021/07-ART42 $15.00 +https://doi.org/10.1145/3447245 +1 INTRODUCTION + + +Software ecosystems are communities built around shared programming languages, shared platforms, or shared dependency management tools, which allow developers to create packages that import and build on each others’ functionality. Software ecosystems have become an important paradigm for organizing open source software development and maintaining and reusing code packages. Development within ecosystems is efficient in the sense that common functionalities need only be developed, maintained, and tested by a single team, instead of many authors reimplementing the same functionality. + + +Coordination is a major challenge in software ecosystems, since packages tend to be highly interdependent yet independently maintained [2, 3, 6, 21, 55, 68]. In at least some ecosystems, such as JavaScript, transitive dependency networks are growing rapidly [46]. Improvements that a maintainer makes to a shared package may affect many users of that package, for example, by incorporating new features, making APIs simpler, and improving maintainability [10]. Any of these actions may require rework from developers whose software depends on that package. Package users may invest in regular rework to keep up with changes, collaborate with upstream projects to minimize the impact of those changes, decline to update to the latest versions (at the risk of missing bug fixes or security updates), or replicate functionality to avoid dependencies in the first place [6, 17, 19, 72]. Package maintainers, in turn, have many ways to reduce the burden on their users. For example, they can refrain from performing changes, announce and clearly label breaking changes, or help their users to migrate from old to new versions [6, 36, 65, 67]. Many different practices can contribute to managing change, and adopting various practices can shift some of the cost (in the form of effort) among different classes of ecosystem participants such as maintainers, package users, and end-users (e.g., Reference [28]). + + +While much is known about some individual practices for managing change, we do not yet understand how these practices occur in the wild, nor how they can combine to establish the full design space of practices. Managing change takes time and effort from both upstream and downstream developers, and depending on their community’s practices, this cost may be distributed differently. However, we do not fully understand the distributions of costs that result from various practices, nor how practices are related to ecosystem culture and technologies. This is important not only from a research perspective, to acquire an understanding of ecosystem coordination mechanisms, but also for practitioners and sponsors who may need to tune the distribution of costs to accommodate changing conditions. For example, as an ecosystem accumulates a large and rapidly growing base of applications that use particular packages, its community may wish to adopt practices to increase the stability of those packages to avoid imposing the costs of change on a large and growing base of users. What practices could accomplish this? Of this set of practices, which are likely to be compatible with the adopting ecosystem’s culture and values? + + +We perform two studies to address questions like this. First, we conducted a multiple case study (Study 1) of three open source software ecosystems with different philosophies toward change: Eclipse, R/CRAN, and Node.js/npm. We studied how developers plan, manage, and coordinate change within each ecosystem, how change-related costs are allocated, and how developers are influenced by and influence change-related expectations, policies, and tools in the ecosystem. In each ecosystem, we studied public policies and policy discussions and interviewed developers about their expectations, communication, and decision-making regarding changes. We found that developers employ a wide variety of practices that shift or delay the costs of change within an ecosystem. Expectations about how to handle change differ substantially among the three ecosystems and influence cost-benefit tradeoffs among those who develop packages used by others (who we will call upstream developers), the developer-users of such packages (who we will call... +downstream developers), and end-users. We argue that these differences arise from different values in each community and are reinforced through peer pressure, policies, and tooling. For example, long-term stability is a central value of the Eclipse community, achieved by their “prime directive” practice of never permitting breaking changes. This practice imposes costs on upstream developers, who may accept substantial opportunity costs and technical debt to avoid breaking client code. In contrast, the Node.js/npm community values ease and simplicity for upstream developers and has a technical infrastructure in which breaking changes are accepted, but signaled clearly through version numbering. + + +Our second study builds on and expands the scope of the first, investigating the prevalence of practices, and attitudes toward the ecosystems values from Study 1, in a larger set of 18 ecosystems. We combine several methods to accomplish this, including data mining of software repositories to identify those practices that leave visible traces, document analysis to identify policy-level practices that are stated explicitly, and a large-scale survey to ask developers about many other practices as well as the importance of various values within the ecosystem. In Study 2, we find that practices and values are indeed often cohesive within an ecosystem, but diverse across different ecosystems. We also find that even when ecosystems share similar values, they often achieve it in different ways, or sometimes fail to achieve it at all, promoting practices that are never widely adopted or do not work well. Together, our results provide a map of the distribution of values and practices across these ecosystems and allow us to examine the relationships between values and practices. Beyond these findings, we make our full anonymized results available to the research community, in hopes they will be useful in future studies, for example, by providing a basis for selecting cases with particular combinations of practices and values. + + +This work builds on and extends our previously published conference paper [6], including much of the material in Section 4. The data is available as an archived dataset [7] as well as an interactive web page.1 + + +Our contributions include a description of breaking change-related values and practices in three ecosystems, a taxonomy of values and of practices, and a mapping of those values and practices across 18 ecosystems derived from a survey, data mining, and policy analysis. +---------------------------------------- +------------------------------- +Section 194: +2 CONCEPTS AND DEFINITIONS + + +Software ecosystems. For this study, we define software ecosystems as communities built around shared programming languages, shared platforms, or shared dependency management tools, allowing developers to create packages that import and build on each others’ functionality. In line with definitions of Lungu [50] and Jansen and Cusumano [43], we focus on “collection[s] of software projects which are developed and which co-evolve together in the same environment” [50, p. 27], which have interdependent but independently developed packages, which generally share a technology platform or a set of standards [43]. Such ecosystems typically center on some means to package, version, and often host software artifacts, and to manage dependencies among them [1, 47, 51, 61, 74]. + + +Note that the term “software ecosystem” is overloaded and used with different definitions in different lines of research [52], including ones that focus on commercial platforms that can be enhanced with third-party contributions [40, 56, 81, 83]. We focus especially on open-source communities developing interdependent libraries (e.g., Maven, npm, CPAN), rather than more centralized platforms where usually independent extensions provide a single application but do not build on each other (e.g., Photoshop plugins, Android apps); we also exclude ecosystems that repackage + + +1http://breakingapis.org. +software projects and their dependencies for deployment (e.g., Debian packages, homebrew), as they are often managed by independent volunteers rather than the original software developers. + + +Breaking changes. + There are many relevant software development concerns when maintaining interdependent artifacts as a community. We focus on the coordination issue of deciding whether and how to perform breaking changes and how downstream developers respond. + + +In this article, we define a breaking change as any change in a package that would cause a fault in a dependent package if it were to blindly adopt that change. We thus include not only cases where a change in API would cause a downstream package to fail to compile, but also cases where program behavior would change, leading to incorrect results or unacceptable performance. We examine breaking-change related practices quite broadly, including not only reactions to actual breaking changes, but practices meant to signal, mitigate, or prevent breaking changes. + + +Maintaining dependencies and updating one’s own code to react to breaking changes is a significant cost driver when using otherwise free open-source dependencies. Breaking changes are common in practice [3, 5, 6, 14, 22, 29, 39, 44, 48, 53, 54, 66–68, 89, 90]. For example, Decan et al. [22] found that 5% of package updates in CRAN were backward incompatible, causing 41% of the errors in released dependent packages. Xavier et al. [90] report that 28% of releases of frequently used Java libraries break backward compatibility, with the rate of breaking changes increasing over time. Information hiding [63], centralized change control [29, 73], and change impact analysis [8, 84] can all guide decision making, but cannot entirely prevent the need for breaking changes in practice, given the large-scale, open, and distributed nature of software ecosystems [6, 59, 62, 76, 90]. + + +Package managers + structure the problem and make dependencies and versions explicit [3, 47, 51], and practices like semantic versioning assign semantics to version numbers (e.g., breaking vs. nonbreaking changes) [65, 67], but these only help to manage change, not prevent the problem or support decision making about when to perform breaking changes. + + +Values and practices. + The “why” and “how” of managing breaking changes in software ecosystems are values and practices. + + +Shared values—judgments of what is important or preferred—can explain how developers make similar decisions. Values have been studied at societal scale in psychology [4], ethics [16], and related fields [12, 37] (e.g., how education influences personal value systems); however, values and their influence on practices have been studied mostly in narrow contexts in software engineering: Pham et al. studied testing culture [64] and Murphy-Hill et al. found that creativity and communication with non-engineers is valued more by game developers than by application developers, resulting in less testing and architecture practices in game development [58]. We use the concept of values to analyze common shared beliefs about what is important for an ecosystem, with a focus on change-related issues. + + +With practices, we refer broadly to activities that developers engage in, again primarily with a focus on managing change. Practices may include specific release strategies, deciding not to perform changes, mitigating the impact of changes through documenting migration paths or reaching out to developers, monitoring changes in dependencies, deciding whether and when to update dependencies, and many more [6]. + + +In ecosystems, practices may be encouraged or mandated by policies (for example, npm and Eclipse mandate the use of semantic versioning in their documentation) and may be supported or even enforced by tools (for example, the Eclipse community’s API Tools detect even subtle breaking changes and CRAN runs automated checks to enforce coding standards and resolve incompatibility issues) [6]. For simplicity, we use the term practice broadly, including policies and tools. +Governance in open source and software ecosystems covers community-wide decisions, e.g., how to integrate third-party contributions [11], which model for decision making is generally appropriate [45, 60], how open an ecosystem should be [85], and how people in different roles should be allowed to participate [86]. While some governance research discusses the need for both evolvability and stability of an organization [83], research focuses on general market mechanisms or process documentation and conformance [41, 45] not on technical steps a software engineer might take. +---------------------------------------- +------------------------------- +Section 195: +3 METHODS + + +3.1 Research Design + + +As stated in the introduction, our goal in this research is to create a high-level map of values and practices relating to breaking change across many software ecosystems. + + +We approached this question with an exploratory sequential mixed-methods design [15], beginning a qualitative preliminary case study to first understand how the community deals with or prevents breaking changes, and why they deal with them in this way. This first study takes a constructivist view, focusing on how the problem of breaking changes look from the perspective of participants, and asking why they approach this collaboration problem the way they do. We use this to inform a second, primarily quantitative study. The second study is not intended specifically to confirm that the findings generalize (although we do a confirmatory check in Section 5.1), but rather a broad look to see where it generalizes, and if there is any pattern to the combinations of values and practices we see in the larger landscape outside the three case study ecosystems. Study 2 casts a broad net at the cost of depth when asking high-level questions about many communities; however, we recognize and call for research about particular practices, values, or ecosystems that should be followed up in more depth, bringing more resources to bear for more focused questions. Study 2 shows that there is not a simple relationship between practices and values—we found that communities often act on the same value in different ways. + + +3.2 Study 1: Interview Case Study + + +For our first look at ecosystem practices, we performed a multiple case study, interviewing 28 developers in the three ecosystems. Case studies are appropriate for investigating “how” and “why” questions about current phenomena [92]. We selected three contrasting cases to aim for theoretical replication [92], a means to investigate the proposition that phenomena will differ across contrasting cases for predictable reasons. + + +Eclipse and Node.js/npm served as cases that contrast sharply in their approach to change: Eclipse has interfaces that have not changed for over a decade, while Node.js/npm is a relatively new and fast-moving platform. We expected that Eclipse’s policies and tools might impose costs on developers in a way that encouraged them to act consistently with the ecosystem’s values of stability. The R/CRAN ecosystem serves as a useful third theoretical replication, since its policy favors compatibility among the latest versions of packages over Eclipse’s long-term compatibility with past versions. In addition, CRAN acts as a gatekeeper for a centralized repository in contrast to npm’s intentionally low hurdles for contributions. + + +We began by mining lists of packages and their dependency relationships from these three ecosystems. We assembled a database of packages, their dependency relationships, and version change histories from the npm repository (metadata from which was retrieved from https://registry.npmjs.org/ in json format), CRAN repositories (scraping metadata from web pages starting from http://cran.r-project.org/web/packages/available_packages_by_name.html), and git repositories of Eclipse (https://git.eclipse.org/c/). +Table 1. Interviewees. R2 and N4 Were Pairs of Close Collaborators, Identified as R2a, R2b, N4a, and N4b + + +| Code | Case | Field | Occupation | +|------|------|------------------------|----------------| +| E1 | Eclipse | Programming tools/HCI | University | +| E2 | Eclipse | Soft. Eng./CS Education | University | +| E3 | Eclipse | Soft. Eng./Research | University | +| E4 | Eclipse | CS Education | University | +| E5 | Eclipse | Software engineering | Retired | +| E6 | Eclipse | Software engineering | Industry | +| E7 | Eclipse | Eclipse infrastructure | Industry | +| E8 | Eclipse | Software engineering | Industry | +| E9 | Eclipse | Software engineering | Industry | +| R1 | CRAN | Soil science | Government | +| R2a,b| CRAN | Statistics | University | +| R3 | CRAN | Medical imaging | University | +| R4 | CRAN | Genetics | University | +| R5 | CRAN | Soil science | University | +| R6 | CRAN | Web apps | Industry | +| R7 | CRAN | Data analysis | Industry | +| R8 | CRAN | R infrastructure | Industry | +| R9 | CRAN | R infrastructure | Industry | +| R10 | CRAN | R infrastructure | University | +| N1 | NPM | Telephony | Industry | +| N2 | NPM | Tools for API dev. | Industry | +| N3 | NPM | Web framework | Startup | +| N4a,b| NPM | Web framework | Startup | +| N5 | NPM | Cognitive Science | University | +| N6 | NPM | Database, Node infrastr.| Startup | +| N7 | NPM | Database, Node infrastr.| Industry | + + +All owned packages with both upstream and downstream dependencies. + + +We pursued two complementary recruitment strategies for our interviews. To find package maintainers who would have recent, relevant insight about managing dependencies from both sides of the dependency relationship, we used our mined repository datasets to identify packages that had at least two downstream dependencies and two upstream dependencies, and that both the focal package and at least one of the upstream dependencies had had a version update in the year before the interview (2015).² + + +We emailed a random sample of these packages’ owners choosing at random from the package list mentioned above in small batches, handwriting emails to the authors using emails and details supplied in the npm and CRAN repositories, or the Eclipse commit logs, and set up interviews with people who responded. We also interviewed three developers that we or our colleagues knew personally. In all, we contacted 92 people and conducted 26 interviews. Our interviews focused on their personal practices and experiences managing upstream and downstream dependencies. + + +After 20 interviews, we were hearing similar ideas from each new interviewee but we recognized the need for deeper experience with the ecosystem-wide origins and impacts of the ecosystem’s + + +²The code implementing this filtering is available at https://github.com/cbogart/depalyze/blob/1d867cc92d7a5f18274358ae02574915026a30d5/depalyze/versionhistory.py#L354. +policies, so we decided to additionally interview individuals with some role (current or historical) in the development of the ecosystem’s tools or policies. As these individuals are fewer and there are more demands on their time, we only attempted to find a few key people in each ecosystem; thus, we recruited 8 additional developers; asking a few of the same questions but also adding questions about the ecosystem’s history, policy, and values. All 28 interviewees were active software developers with multiple years of experience, but their background ranged from university research to startup companies; Table 1 gives an overview. + + +We conducted semistructured phone interviews that lasted 30–60 minutes. We generally followed an interview script shown in Appendix A, but tailored our questions toward the interviewees’ personal experiences. With the interviewees’ consent, we recorded all interviews. + + +In keeping with our constructivist approach to the first study, we analyzed the interviews using Thematic Analysis [9]. We transcribed the recordings, then tentatively coded the transcripts looking for interesting themes, using Dedoose [23], then iteratively discussed, redefined, and recoded. Codes that emerged in the first round included labels such as “expectations towards change,” “communication channels,” “opportunity costs of backward compatibility,” and “monitoring.” We combined redundant codes, eliminated ones that did not recur or address our research questions, then grouped the remainder into seven high-level themes: “Change planning: reasons for changes,” “change planning: costs to the changer,” “Change planning: Technical means, practices,” “Change planning: reasoning about cost tradeoffs,” “Coping with change,” “Communication,” and “Ecosystem-wide policy and technology.” Next, we gathered tagged quotes from each high-level category, and two researchers checked that they agreed with the low-level tags for each quote in the category, revising any disagreements through discussion. + + +Thematic analysis does not claim to find reproducible phenomena within the interviews; for example, we did not attempt to compute interrater reliability, since we make no claim that two researchers trained themselves to reliably identify exactly the same utterances from interviewees as examples of “expectations towards change,” nor that we have exhaustively identified all instances of such an expectation among our interviewees. As such, we do not apply statistics to our qualitative results or attach much importance to counts; the purpose of the interviews and our thematic analysis is to discover the broad categories of attitudes and strategies towards change that interviewees experienced, with illustrative examples of typical practices and motivations that constitute those strategies. + + +To complement our interviews, we explored policies, public discussions, meeting minutes, tools in each ecosystem. + + +In our analysis, we distinguish between decisions made in the roles upstream and downstream developer, as depicted in Figure 1. + + +Validity check. To validate our findings from the case study, we adapted Dagenais and Robillard’s methodology [18] to check fit and applicability as defined by Corbin and Strauss [13, p. 305]. We presented interviewees with both a summary and a full draft of Sections 4.2–4.3, along with +questions prompting them to look for correctness and areas of agreement or disagreement (i.e., fit), and any insights gained from reading about experiences of other developers and platforms (i.e., applicability). + + +Six of our interviewees responded with comments on the results; all six indicated general agreement (e.g., R5: “It brings a structure and coherence to issues that I was loosely aware of, but that are too rarely the centre of focus in my everyday work.”); corrections included small factual errors, (e.g., the number of CRAN packages had increased since the initial writeup, and is now over 14,000); and suggestions of ways to sharpen our analysis (e.g., R7 noted that CRAN’s policy to contact downstream developers does not apply to the majority of users outside CRAN). We incorporated their feedback when it was consistent with a recheck of our data and added clarifications otherwise. + + +3.3 Study2 + + +We then conducted a systematic mapping of values and practices in a broad sample of ecosystems, primarily making use of a survey. Because of the large number and diversity of practices (Tables 4, 5, and 6), we could not measure them all with one methodology. We asked about a large subset of them in the survey (e.g., doing research about dependencies before using them; bottom section of Table 6). We also analyzed documentation and policies to identify practices that are enacted ecosystem-wide by organizations or tools (e.g., Ecosystem-wide synchronized release; Table 4); finally, we mined Github repositories and the libraries.io package metadata dataset for practices that leave visible traces (e.g., “Continue critical updates to older versions”; Table 5). Out of the 55 practices we identify, there are 19 that we do not attempt to measure in Study 2 (e.g., socially connected developers following each other on Twitter, going to conferences; top section of Table 6). + + +First, we describe the survey methods, then in subsequent subsections describe the policy analysis (Section 3.3.5) and data mining (Section 3.3.6) methods. + + +3.3.1 Ecosystems. We solicited survey participants from ecosystems with a dependency network structure, in which packages can depend on other packages and a standardized infrastructure helps with sharing and compatibility. We started with a list of software repositories from Wikipedia’s “Software Repository” page and added additional ecosystems with an active community that we could find. + + +We excluded ecosystems with a flat structure where packages depend only on a single shared platform (e.g., Android) and ecosystems obviously too small to hope to get at least a few dozen responses. We also excluded ecosystems if they were different enough that it was not possible to write clear questions that would apply across ecosystems. This excluded, for example, operating-system-level package managers such as apt, rpm, and brew, and scientific workflow engines. + + +We conducted the survey with 31 ecosystems. For our analysis, we somewhat arbitrarily set the minimum number of participants for each ecosystem at 15, feeling this would give us a reasonable claim to some breadth in the responses. This led us to exclude 13 ecosystems: C++/Boost, Bower, Perl 6, Smalltalk, Tex/CTAN, Julia, Clojure/clojars, Meteor, Wordpress, SwiftPM, PHP’s PEAR, Racket, and Dart/pub, leaving us with 18 ecosystems for our analysis, shown in Table 2. All but 2 had more than 40 complete responses. + + +3.3.2 Survey Goals and Recruitment. The survey consisted of 108 questions: seven long free text questions (marked as optional opportunities for clarification), three short text questions (ecosystem, package name, and gender), and the rest multiple-choice scales. After an informed consent screen, participants first were asked to choose an ecosystem in which they had published or used a package (they could choose from a list, or type in another; we grouped rare answers as “other” for analysis). +3.3.3 Recruitment. We invested in significant outreach activities to recruit participants for the survey. First, we created a web page and Twitter account to describe the state of current research in this area, in a form easily accessible to practitioners. We encouraged readers of the web page to take the survey to contribute additional knowledge about values in ecosystems. Second, we attended community events, including npm.camp 2016, to talk to developers and community leaders from multiple ecosystems about our research; as a result, several prominent community members tweeted about our web page and survey, resulting in surges of responses (CRAN and npm particularly). Third, we promoted our web page and the survey in ecosystem-specific forums and mailing lists to “developers who write + packages,” hoping that our web page would spark interest in the topic. We also posted on Twitter with hashtags appropriate for different ecosystems. Finally, for 21 ecosystems in which our outreach activity did not yield sufficient answers, we solicited individuals directly by email. We sent 8,137 emails to package authors. We sampled these from authors of packages culled from libraries.io for targeted ecosystems. + + +Participants and their demographics. We succeeded in recruiting 2,321 participants to partially or fully complete the survey between August and November of 2016. Of this number, 932 completed the survey; however, we put value questions near the beginning, so there are 1,466 answers to those questions. Statistical analysis of answers to early questions did not reveal any systematic differences between people who completed the survey and those who did not (mean difference between answers to 65 Likert-scale questions between respondents who completed the survey and + + + + +3https://breakingapis.org. +those who did not, was 0.13 scale points (out of 4 or 5, depending on the question). The maximum difference was .83 scale points; but the maximum difference among questions where more than one “incomplete” respondent answered was .54 Likert-scale points. Since the partial responses were similar to full responses, we include data for the incomplete responses. + + +To correct for careless responses in which people appeared to be answering many questions without careful consideration, we excluded as “careless” those sections of a person’s response in which they rated all items exactly the same. We performed this test on eight sections of the survey, and the number of excluded blocks ranged from 11 (for a set of upstream practices) to 76 (for a set of downstream practices). When people were excluded from one block, their responses to other questions did not appear to be outliers (mean difference between answers to 65 Likert-scale questions between respondents excluded from some other block, and respondents who were not, was 0.15 scale points (out of 4 or 5, depending on the question). The maximum difference was .50, for the question “How important do you think the following values are to the + community: stability”). Because the answers were similar for all questions, we did not exclude entire people if they were apparently careless in any of the eight blocks. + + +Table 2 shows participation by ecosystem. Participants averaged 8.8 years of development experience, 7.2 years in open source, and 4.6 in the ecosystem they answered about. Slightly more than half (59%) had college degrees in CS. The most frequently claimed role in the ecosystem was package lead developer (59%); Others ranged from the 8.5% who claimed a role in the founding or core team of the ecosystem, to 11% who only drew on ecosystem packages for their own projects. The average age was 33, with 152 18–24-year-olds, and 6 over 65. Of those who gave their gender, 95.9% identified themselves as male, 3.2% as female, and 0.8% gave another gender. These demographic proportions are quite similar to a contemporaneous Github community survey [31]. + + +3.3.4 Survey Design. Our goal in the survey was to investigate the prevalence of values and practices across as many ecosystems as was feasible. We asked a larger number of questions than is typical for a survey of this sort. Long surveys often have reduced completion rates, however, we mitigated this by keeping the questions diverse and hopefully interesting to the participants, and by putting the questions we were most interested in up front. As a result, we got a reasonably high completion rate (40%) and partial completion rate (62% for value questions at the beginning) considering the length of the survey, resulting in an encouragingly rich and deep dataset. In this article, we focus on describing the values and practices responses, but additional data is available in the accompanying data release [7]. + + +Values. To explore as complete a list as possible of values relevant to managing change, we began with values derived from our interviews in Study 1. We then searched each of the web pages of all our candidate ecosystems for clues of other potential values. For example, “fun” is mentioned as an explicit value in the Ruby community; in an interview Ruby founder Matsumoto said, “That was my primary goal in designing Ruby. I want to have fun in programming myself” [82]. Note that some values initially seem not directly related to breaking change, but we included them if we thought they could indirectly influence breaking change practices. For example, we expected that perhaps if some practices are more efficient, but less rewarding to carry out, then a “fun”-valuing ecosystem might avoid them. + + +We assembled a list of 11 values with the following descriptions: + + + + +Stability +: Backward compatibility, allowing seamless updates (“do not break existing clients”). + + +Innovation +: Innovation through fast and potentially disruptive changes. +• +Replicability: + Long-term archival of current and historic versions with guaranteed integrity, such that exact behavior of code can be replicated. + + + + +• +Compatibility: + Protecting downstream developers and end-users from struggling to find a compatible set of versions of different packages. + + +• +Rapid Access: + Getting package changes through to end-users quickly after their release (“no delays”). + + +• +Quality: + Providing packages of very high quality (e.g., secure and correct). + + +• +Commerce: + Helping professionals build commercial software. + + +• +Community: + Collaboration and communication among developers. + + +• +Openness + and Fairness: Ensuring that everyone in the community has a say in decision-making and the community’s direction. + + +• +Curation: + Selecting a set of consistent, compatible packages that cover users’ needs. + + +• +Fun + and personal growth: Providing a good experience for package developers and users. + + +In the survey, we asked participants about the +perceived values + of the community—“How important do you think the following values are to the + community?” We used a seven-point rating scale, adapted from Schwartz’s value study [71]: “extremely important,” “very important,” “important,” “somewhat important,” “not important,” “community opposes this value,” and “I don’t know.” The first five options were separated visually from the last two to make clear that only the former were designed to approximate regular intervals (as recommended by Dillman et al. [27]). + + +In addition, we asked participants a similar value question on the same scale about their +own values + with respect to a single package they worked on in the ecosystem. To encourage participants to think about concrete work that they are doing, we asked for the name of a specific package that they worked on and used that package in the question: “How important are each of these values in development of + to you personally?” + + +Recognizing that despite taking values from multiple sources, we may not have captured all values relevant to managing change, we asked survey participants in an open-ended question about other values important to their ecosystem. Their answers are summarized in Section 5.2. + + +Practices. + The practices part of the survey asked about many software-engineering practices, many of which we mention throughout our analysis (Tables 4, 5, and 6); the full list and exact phrasing of our questions can be found in Appendix B. Surveyed practices encompassed the participant’s personal practices and experiences with respect to documentation, support, timing, and version numbering for releases, selecting packages on which to depend, and monitoring dependencies for changes. These were asked, as appropriate, either on an agreement Likert scale as above or on a frequency scale from “never” to “several times a day.” A subset of 15 questions relating to communication with developers of downstream packages were skipped for participants who indicated that they did not maintain a package used by others. To limit the length of the survey, we focused primarily on questions that cannot be answered or are difficult to answer by mining software repositories or reading explicit policy documents (see “M” and “P” labels in Tables 4, 5, and 6) in the Study 2 Methods column. + + +Survey analysis. + 483 participants (21%) gave an answer to at least one of the seven optional free-response questions; 11 people gave answers to all seven. We used a grounded approach to analyze answers to the question about other values: one researcher performed open coding to identify a set of candidate codes, then two researchers iteratively combined and revised these to achieve a consensus set of codes and to apply them to the responses. + + +Layout of Figures. + Figures 2, 3, and 4 were drawn by eliminating skipped or “don’t know” values, merging “Not important” with “opposed to this value” answers, and drawing a violin plot, with a +diamond symbol at the mean position. The violin bodies are smoothed, so the image portrays the mean and a rough distribution. + + +For Table 10, we wanted to derive a ranking of the importance of the values in each ecosystem and provide an indication of the consensus around the ranking. The method we adopted calculates highest ranked values for each ecosystem by identifying, for each person in the ecosystem, their highest rating of any of the 11 values, then incrementing a count for all values that person assigned that highest rating to. This has the effect of counting the number of people who ranked each value as the highest while accounting for ties. The table lists the values with the three highest counts, and the consensus numbers are as described in the caption. + + +3.3.5 Policy Analysis Method. We examined each ecosystem’s online presence and summarized their sanctioned practices. Practices of the ecosystems were derived from documentation pages within each language’s and repository’s websites, specifically seeking out documentation about how to define a package and submit it to the repository, as these documents typically communicate policies to authors in a clear, actionable way. The columns of the table were defined as follows: + + + + + + +Dependencies outside repository. + Standard tools in all but two ecosystems (Stackage and LuaRocks) allow developers to additionally specify packages that are not part of the standard repository, for example by a reference to a GitHub repository or an alternate specialized site. We checked the documentation for each package manager’s syntax about how +to declare dependencies, to see if there was a way to specify a URL for a package not formally in the repository. We marked these as having the feature if it could be specified directly as a URL; as “alternate repo” if this could be accomplished only through an alternate repository, or a custom server that mimics the repository’s API. + + + + + + +Central Repository. + This captures whether the ecosystem supplies packages in a central repository or simply provides an index to author-hosted download sites. + + + + + + +Access to dependency versions. + This denotes whether ecosystem documentation recommends (through examples in the documentation page) for packages to refer to dependencies by version number, or to simply assume the latest version of a dependency is desired (R/CRAN and Go). In two cases (Stackage and Bioconductor), a set of mutually compatible versions is provided to be used together as a set. + + + + + + +Gatekeeping Standards. + Ecosystem repositories vary in the amount of vetting of the packages they include. We determined this by looking at the submission requirements for packages. An open circle in the table means that no more than cursory metadata such as name of the package and list of dependencies are required; a closed circle means that platform tools or volunteers perform some deeper investigation of the package: vetting of the submitter, automated or manual tests (of the package or of other packages that depend on it), or virus checks. Two were marked as “staged releases,” because submissions are tested collectively along with a cohort of packages being released simultaneously. + + + + + + +Synced Ecosystem. + This simply denotes whether ecosystem packages (or some important subset) are released all at once on a regular, synchronized schedule. +---------------------------------------- +------------------------------- +Section 196: +3.3.6 Data Mining. + + +We mined data from two sources to capture data about the prevalence of seven additional practices. + + +First, the list of packages to query was derived from the libraries.io ( +libraries.io/data +) cross-ecosystem package index. Libraries.io lists versions, their release dates, dependencies with their version constraints, and their source repositories. It was only available for a subset of our 18 ecosystems (Atom, R/CRAN, Perl/CPAN, Ruby/Rubygems, Rust/Cargo, Python/Pypi, NuGet, Maven, PHP/Packagist, Node.js/NPM, Erlang, Elixir/Hex). Partial information was available for CocoaPods. + + + + +4Recommendations have evolved since 2016 for Go: see +https://blog.gopheracademy.com/advent-2016/saga-go-dependency-management/ +. +Table 3. Ecosystem Statistics + + +| Ecosystem | Founded | Num. Pkgs | Avg. deps | >3 deps | >0 deps | +|----------------------------|---------|-----------|-----------|---------|---------| +| Atom (plugins) | 2014 | 4,424 | 1.2 | 10.0% | 38.2% | +| CocoaPods | 2001 | 14,493 | 0.4 | 1.7% | 21.1% | +| Eclipse (plugins) | 2001 | 14,954 | 6.4 | 55.7% | 100% | +| Erlang,Elixir/Hex | 2013 | 1,304 | 1.0 | 5.3% | 50.5% | +| Go | 2013 | 76,632 | 10.6 | 57.1% | 88.3% | +| Haskell (Cabal/Hackage) | 2003 | 8,593 | 6.4 | 57.9% | 91.6% | +| Haskell (Stack/Stackage) | 2012 | 1,337 | 8.3 | 65.0% | 93.9% | +| Lua/Luarocks | 2007 | 966 | 0.8 | 5.7% | 34.7% | +| Maven | 2002 | 114,404 | 2.1 | 20.6% | 41.8% | +| Node.js/NPM | 2010 | 229,202 | 5.6 | 49.8% | 81.2% | +| NuGet | 2010 | 66,486 | 1.6 | 11.4% | 58.3% | +| Perl/CPAN | 1995 | 31,641 | 7.6 | 56.5% | 79.6% | +| Python/PyPi | 2002 | 65,622 | 0.2 | 2.0% | 8.1% | +| PHP/Packagist | 2012 | 63,860 | 3.1 | 28.1% | 82.7% | +| R/Bioconductor | 2001 | 1,104 | 4.9 | 48.9% | 74.2% | +| R/CRAN | 1997 | 7,922 | 2.9 | 27.9% | 86.7% | +| Rust/Cargo | 2014 | 3,727 | 2.1 | 20.1% | 71.5% | + + +Packagedependency and founding year data for ecosystems. #Pkgs = number of packages in the repository we checked as of January 2016; Avg. deps = average number of dependencies sampled packages had; >3 deps = percentage of packages having more than three dependencies. >0 deps = percentage having any dependencies. + + +and Hackage, but not dependencies. Dependency counts for Bioconductor, Hackage, Stackage, Lua, Eclipse, and CocoaPods were scraped from their respective repository websites. We did not find Go dependencies listed centrally in any repository, so we extracted this information from World of Code [57], a massive mirror of GitHub, GitLab, Bitbucket, and other open source software repositories, indexed and searchable in ways that make it more convenient for data mining than GitHub’s APIs allow. One data product World of Code provides is dependencies of packages, parsed from source code files; we used this to count Go dependencies. Table 3 shows that packages in the ecosystems are interdependent, but in widely differing degrees. + + +Beyond package counts and dependencies, further information about these packages was queried about packages in all ecosystems from World of Code [57]. + + + + +Dependency Version Constraints. + We ran pattern-matching on the dependency constraints of all packages in libraries.io, for packages released during 2016 and flagged for each package whether it used a particular type of constraint on any one or more of its dependencies at any time during the year. Note that percentages add up to over 100%, since a package may use more than one kind of dependency constraint. + + +Exact: + Dependency version is constrained by a fully specified version number, such as 1.3.2. + + +Min only: + Version constraints such as >1.3.2, or use of conventions like caret (^) in npm that has the same effect (e.g., ^1.3 is the same as >= 1.3.0). + + +Range: + Constraints with a minimum and maximum version, like >1.3.2,<2.0; or use of conventions like tilde (~) in npm that has the same effect (e.g., ~1.3.2 means >=1.3.2,<2.0). +— +Unconstrained +: The dependency name is specified with no version constraints; either the constraint is blank or some symbol like “*” is used.(^5) + + + + +For a more fine-grained analysis of version constraints across many ecosystems, see Dietrich et al. [26]. + + + + + + +Lock files. + Using World of Code [57], we examined files committed during 2016 in each of the ecosystem’s packages, looking for references to a lock file, which specifies exact versions of all dependencies, direct and transitive (i.e., dependencies of dependencies). These differ by ecosystem and vary in how canonical their use is. The filenames we used in this search are shown in Table 11 in Appendix D. Including a lock file in an end-user distribution of a program makes it more likely the program will run correctly, since it preserves the exact versions of dependencies that the program was tested on. However, developers including many dependencies in their own projects may prefer not to specify the exact versions of all their transitive dependencies, since they may be in conflict with each other, and they have the means and opportunity to resolve any conflicts themselves (then perhaps locking in a consistent set of dependencies when producing a release for their own users) [78]. + + + + + + +Maintaining old versions. + Making bug fixes to outdated versions of code, or even backporting new features, can be helpful for users who cannot update to the cutting-edge versions for some reason. We define prior-version maintenance operationally as simply any release whose version number is smaller than expected and hence out of sequence: For example, if a sequence of releases was “2.0.1,” “2.0.2,” “1.5.3,” “2.0.3,” then we identify “1.5.3” as a likely bugfix or backported feature introduced in 2.0.1 or 2.0.2, introduced as a courtesy to those users currently using 1.5.2 who choose not to upgrade to the 2.0 series. Specifically, this measure captures the percentage of packages in each ecosystem whose version number ever decreased in 2016, per data from Libraries.io. + + + + + + +Cloning. + We measured the percentage of packages in each repository whose projects borrowed a file in 2016 from another package. We did this by building a list of SHA hashes of files (blobs) associated with each commit in each project in the ecosystem through World of Code [57], and looking for overlaps. We count a project as having cloned a file, if a commit incorporates a blob over 1 kb in 2016 that was previously seen in some other package in the ecosystem. We only considered blobs derived from other packages in the ecosystem’s repository, not ones derived from projects in the broader realm of open source. We chose to count these within-repository clones specifically, since the developer could have tried to use the ecosystem’s dependency management system to incorporate the desired code by reference, but chose not to. Previous research has also mapped cloning behaviors [33, 49]. +---------------------------------------- +------------------------------- +Section 197: +3.4 Threats to Validity + + +We chose our methods carefully to answer our research questions, and the survey in particular differs from a more typical statistically focused survey technique. We therefore describe the threats to the validity of the study before presenting the results, so readers can have these in mind as they read our findings. + + +As described, Study 1 used case selection criteria [92] appropriate for contrasting cases, but they may not be typical of all ecosystems, and so one needs to be careful when generalizing beyond the three cases. Our results may be affected by a selection bias, in that developers who did not want to be interviewed may have had different experiences. Finally, the differences we found among cases + + + + +(^5)Note that this weighs most heavily the state of packages for which more versions were released or that had more dependencies. +may be confounded with the reasons we selected them, such as their popularity or the availability of data about them. + + +As for Study 2, as is typical of surveys in our field, our survey sample is not truly random; there may be selection bias relating to who we were able to reach via the venues we chose. We tried to mitigate this by recruiting from forums, Twitter, and direct e-mail. The survey was also quite long (and was advertised as such up front). People with less patience for long surveys, or less interest in questions of breaking changes, values, and practices, may have self-selected out. This could be significant if people with impatience for long surveys also have different software-engineering practices and beliefs. + + +Another possible concern is that respondents may apply different standards in their ratings. For example, if the expectation of stability is extremely high in a particular ecosystem, then participants may rate the perceived importance of stability lower, because they are applying a very stringent standard for how focused everyone should be on stability. A similar focus on stability in a different ecosystem might lead to participants in that ecosystem to rate the importance of stability higher. We tried to mitigate this by requiring at least 15 participants for each ecosystem, which should give some breadth of experience behind the responses. + + +While we tried to avoid using terminology that differed among ecosystems, we were not always successful. For example, the word “snapshot” means different things in different ecosystems’ practices, which caused some confusion. Even the term “breaking change” may be interpreted differently; for example, they might define it more narrowly as a change that simply would cause downstream compilation to fail, while we intended it to also include changes that would cause wrong behavior in downstream software. + + +Respondents may also have given answers to a few questions influenced by social desirability. For example, they may have felt obliged to say that “quality” is extremely important because that is the “right” answer, or that people follow certain practices because they are what they know to be expected. Our mitigation approach was ensuring confidentiality of responses and avoiding, to the extent possible, questions with clear desirable and undesirable responses. + + +We had difficulty recruiting sufficient participants from smaller ecosystems, such as Perl 6 or Clojure; small ecosystems may have different characteristics than large ones. We do have two small ecosystems, Stackage and Lua, and they are outliers in some ways. So, further exploration of small ecosystems, for example with interviews or analysis of artifacts, should be a priority for future work. +---------------------------------------- +------------------------------- +Section 198: +4 STUDY 1: QUALITATIVE MULTIPLE-CASE STUDY + + +In Study 1, we investigated the decision-making involved in making breaking changes, and practices they adopt to ease the burden: + + +RQ1.1: How do developers make decisions about whether and when to perform breaking changes and how do they mitigate or delay costs for other developers? + + +We also wanted to see how developers responded to breaking changes that affected them: + + +RQ1.2: How do developers react to and manage change in their dependencies? + + +Finally, we wanted to know whether developers perceived tensions between platform policies and their intended effects: + + +RQ1.3: Did platform policies or tools ever have unintended consequences? + + +4.1 Case Overview + + +To understand the identified different practices and policies, it is important to understand the purpose and history of each ecosystem. In the following, we provide a brief description of all +three ecosystems and their values, informed by both public documentation and our interviews. Platform-level features or practices relevant to breaking change are identified in Table 4. + + +4.1.1 Eclipse. The Eclipse foundation publishes more than 250 open source projects. Its flagship project is the Eclipse IDE, created in 2001. The IDE is built from the ground up around a plugin architecture, which can be used as a general purpose GUI platform and in which plugins can depend on and extend other plugins. Projects can apply to join the Eclipse foundation through an incubation process in which their project and practices come under the Eclipse management umbrella. It is also common practice to develop both commercial and open-source packages separately from the foundation, and publish them in a common format on a third-party server. In addition, the “Eclipse marketplace” is a popular registry, listing over 1,600 external Eclipse packages that can be installed from third-party servers through a GUI dialog. + + +The Eclipse foundation coordinates a “simultaneous release” of the Eclipse IDE once a year and (as of 2016) three “update releases” for new features in between. Many external developers align with those dates as well. + + +The Eclipse foundation is backed by corporate members, such as IBM, SAP, and Oracle. Its policies are biased toward backward compatibility; packages (e.g., commercial business solutions) developed 10 years ago will often still work in a current Eclipse revision without modification. + + +A core value of the Eclipse community is backward compatibility. This value is evident in many policies, such as “API Prime Directive: When evolving the Component API from release to release, do not break existing Clients” [25]. Although not entirely uncontroversial (as we will explain), this value was confirmed by many interviewees. + + +4.1.2 R/CRAN. The Comprehensive R Archive Network (CRAN) has managed and distributed packages written in the R language since 1997. R is an interpreted language designed for statistics. The R language itself is updated approximately every six months, but new development snapshots are available daily. R has multiple repositories with different policies and expectations, including Bioconductor and R-Forge; we focus on CRAN, the largest one. CRAN formally exists under the umbrella of the R Foundation, but sets its own policies. + + +CRAN contains over 8,000 packages. Of these, 29 are either required or “recommended,” and are bundled in binary installs. About 2,200 more are cataloged as useful for 33 different specializations such as finance and medical imaging. Distributing R software as a CRAN package gives it high visibility, since installation from CRAN is automated in the command-line version of R and the popular IDE RStudio [69]. + + +R and CRAN are used by many developers without a formal computer-science or programming background. CRAN pursues snapshot consistency in which the newest version of every package should be compatible with the newest version of every other package in the repository. Older versions are “archived”: available in the repository, but harder to install. When a new package version is submitted to CRAN, it is evaluated by the CRAN team’s partly automated process. The package must pass its own tests and must not break the tests of any downstream packages in CRAN that depend on it without first alerting those package’s authors so they can make corresponding fixes. Package owners need to react to changes in the platform or in upstream packages within a few weeks, otherwise their package may be archived. + + +A core value of the R/CRAN community is to make it easy for end-users to install the most up-to-date packages. Although not explicitly represented in policy documents, this value was apparent from many interviews; for example, R10 said, “CRAN primarily has the academic users in mind, who want timely access to current research.” +Table 4. Platform and Community-level Practice Choices: Who: (P)latform, (U)pstream, (D)ownstream, (3) Third party; Study 2 Method: (P)olicy Analysis, (S)urvey, (M)ining + + +| Who | Study 2 Method | Practice | +|-----|----------------|----------| +| P | P | Existence of centralized repository or directory of packages | +| P | P | Mechanism for referring to dependencies distributed outside official repositories (e.g., via github directly) | +| P | P | Make historical versions of package easy or difficult to rely on | +| P | P | Mechanism to remove or reassign unmaintained packages (e.g., maintainers do not respond to emails) | +| P | S | Releasing changes on a fixed, advertised schedule per package | +| P | S,P | Ecosystem-wide synchronized release | +| P | P | Repository personnel check standards of submitted code before making available on the repository | +| P | | Allow multiple versions/only one version of a package to be loaded at the same time | +| P/U | | “Stability attributes” (in Rust) saying which API points will not change | +| P | | Use nightly unstable builds to get exciting new features (at cost to compatibility for downstream users) | +| P | | Disallow wildcard dependencies | +| P | | Test compiler changes against all published software using it to prevent breaking things | +| P | | Constrained rules about version numbering (e.g., cargo disallowing wildcards) | +| 3 | P | Third-party curation of sets of useful packages or compatible versions | +| P | | Dynamic language feature to help backward compatibility (optional parameters in R) | +| P | | Centralized testing infrastructure for all packages | +| P | | Vulnerability tracking (e.g., Node security platform) | +| U | S | Private arrangement among package authors to release at the same time | + + +For ecosystem-by-ecosystem breakdown of policies, see Section 5. + + +4.1.3 Node.js/npm. Node.js is a runtime environment for server-side JavaScript applications released initially in 2009, and npm is its default package manager. npm provides tools for managing packages of JavaScript code and an online registry for those packages and their revisions. The npm repository contains over 250,000 packages with rapid growth rates. + + +The Node.js/npm platform has the somewhat unusual characteristic that multiple revisions of a package can coexist within the same project. That is, a user can use two packages that each require a different revision of a third package. In that case, npm will install both revisions in distinct places and each package will use a different implementation. + + +A core value of the Node.js/npm community is to make it easy and fast for developers to publish and use packages. In addition, the community is open to rapid change. Ease for developers was one of the principles motivating the designer of npm [75]. Therefore, npm explicitly does not act as a gatekeeper; it does not have review or testing requirements; in fact, the npm repository contains a large number of test or stub packages. The focus on convenience for developers (instead of end-users) was apparent in our interviews. + + +4.2 Study 1 Results: Planning Changes (RQ1.1) + + +We first discuss managing change from the perspective of a developer planning to perform changes that may affect downstream users. While we observed similar forces and concerns regarding +change across all three ecosystems, we observed differences in how the community values affect the ways package maintainers mitigate or delay costs for downstream users. + + +4.2.1 Breaking Changes: Reasons and Opportunity Costs. Although breaking changes to APIs are costly to downstream users in terms of interruptions and rework, our interviewees gave many reasons why they had to perform such changes; there are corresponding opportunity costs that arise when deciding not to perform the change, such as the cost of maintaining obsolete code, working around known bugs, or postponing desirable new features. + + +Obvious and expected reasons for breaking changes included requirements and context changes and rippling effects from upstream changes. Beyond that, we found surprisingly frequent mentions of stylistic and performance reasons, as well as difficult bug fixes. + + +Technical debt. Surprisingly, 12 interviewees (E3, E9, R1, R3, R4, R5, R6, R7, R8, N1, N7) mentioned concerns about technical debt, rather than bugs, new features, or rippling upstream changes, as the trigger for breaking changes. By technical debt, we refer to code that is functionally sufficient but has outstanding stylistic issues developers want to fix, such as poorly chosen object models or method names, lack of extensibility or maintainability, or little-used or long-deprecated methods. + + +We conjecture that the reason interviewees brought up these kinds of changes so often in discussion was because they had thought about them in depth. Technical debt often arises from the tension between tools and practices that encourage developers to preserve backward compatibility (e.g., Eclipse’s “prime directive”), versus general pressure for evolution and improvement. Developers often postpone breaking changes until the technical debt becomes intolerable; for example, E3 mentioned as the reason for planning to finally remove some deprecated code: “What we did there was to provide old methods as deprecated. But that gets quite messy. At one point almost half of the methods were deprecated.” E9 similarly told us about an upcoming long-postponed major version change: “since we don’t do it often, probably once every five years, [...] let’s take advantage of that opportunity to do some of the things that would be good that we couldn’t do before.” + + +Old interfaces can come to seem old fashioned and unattractive in a swiftly changing community. Three interviewees said they made breaking changes for syntactic reasons: to harmonize syntax (R1) or improve “weird” or “bad” names (R3, R4) in their interfaces. N7 talked about adopting a new JavaScript programming paradigm that was far more attractive: N7: “You can’t just stay on that old stuff for forever, it’s just not going to work. And so we drastically rewrote the internals at the transport to be a stream, because that’s sort of, essentially what it is, right? Like, it’s a little stream that takes logs and sends them places.” However, four interviewees (E1, E5, E6, R6) talked about the consequences when not being able to make such changes, i.e., having to preserve old interfaces over long periods, caused opportunity costs, since it hindered attracting new developers, lured by cutting-edge things. E6, for example, told us that: “If you have hip things, then you get people who create new APIs on top of that in order to [for example] create the next graphical editing framework or to build more efficient text editors. These things don’t happen on the Eclipse platform anymore.” + + +Efficiency. Four interviewees (E6, R1, R4, N1) reported cases in which efficiency improvements required breaking changes. For example, N1’s package offered an API for requesting paged data that the server could not provide efficiently; they deprecated and eventually removed that function rather than spending money on hardware. + + +Bugs. Bug fixes were another reason for breaking changes (E4, E7, R7, R9). Bug fixes can break downstream packages if those packages depend on the actual (broken) behavior instead of the intended behavior. A lack of well-defined contracts in most implementations makes assigning blame and responsibilities difficult in practice. As E5 told us, “If someone likes the broken +semantics, then they’re not going to like the fixed semantics.” Thus, even fixing an obvious mistake in code under the control of a single person can require significant coordination among many people. + + +Throughout our interviews, we heard many examples of how bug fixes effectively broke downstream packages, and the difficulty of knowing in advance which fixes would cause such problems. For example, R7 told us about reimplementing a standard string processing function and finding that it broke the code of some downstream users that depended on bugs that his tests had not caught. R9 commented on the opportunity cost of not fixing a bug in deference to downstream users’ workarounds for it: “If the [downstream package] is implemented on the workaround for your bug, and then your fix actually breaks the workaround, then you sort of have to have a fallback … [pause] It gets nasty.” + + +4.2.2 Dividing and Delaying Change Costs. Our previous discussion already hinted that there is flexibility regarding who bears the costs of a breaking change. For instance, a package’s developer can decide between making a breaking change, pushing costs for rework to maintainers of downstream packages; or not making the change, accepting opportunity costs such as technical debt. Even when deciding to make the change, the developer faces strategic choices about whether to invest more effort when making the change to reduce the interruption and rework costs for downstream users as well as to affect timing of when those costs are paid (Table 5). For example, by documenting how to upgrade, the developer invests more effort to reduce effort for downstream maintainers. Different developers and different communities have different attitudes toward who should pay the costs of a change and when, as we will show. + + +Awareness of Costs to Downstream Users. Almost all (24 out of 28) of our interviewees stated that, when possible, they avoid breaking changes that would affect downstream users. Reasons included looking out for their users’ best interests and knowing that costs to affected users would come back to them, as users ask for help adapting to the change, ask for the change to be reverted, or seek alternative packages. Two interviewees (E1 and R4) specifically mentioned concern for downstream users’ scientific research (R4: “We’re improving the method, but results might change, so that’s also worrying—it makes it hard to do reproducible research”). + + +Interviewees’ concern for impacts on users was tied to the size and visibility of the user base and the perceived importance and appropriateness of their usage. Nine interviewees across all ecosystems (E4, E5, E6, R1, R4, R6, R7, R9, N7) were aware of their users and were concerned specifically about the number of users affected and the quantity of complaints that a change would imply, e.g., R9: “I wanted to rename it to something that more specifically describes that this is actually a new V8 context, but, you know, I can’t because so many packages are already importing the new context function.” N1: “we happen to know that paging is not the feature that was […] often used from Node module customers” Another npm developer said, N7: “…that was strictly a breaking change for [feature], and so we really didn’t want to break all the community [feature]. Like, we didn’t want all 700 of these to give out ‘the code you’re using, you have to upgrade…Good luck, bro.” An R/CRAN developer said, R7: “I’m very cautious about making changes to it, and then when I make changes I often regret it. Even for a small change on a package used by a lot of people, it improves 90% of people’s lives, but makes 10% of people’s lives worse, and 1% complain, which, with [package] can be a lot of people.” Three interviewees (E1, R4, R8) noted that their sensitivity toward avoiding breaking changes grew with experience and with a growing user base, as they learned from feedback received about earlier breaking changes. + + +Of course some developers also themselves work on such downstream packages. Four of our interviewees mentioned doing so (E5, N4, N7, R6) (see discussion in Section 4.3.1); these are presumably aware of the impact of the changes they make to their own other packages. +Only four developers were not particularly worried about breaking changes. Three (E6, N1, N5) had strong ties to their users and felt they could help them individually (N5: “We try to avoid breaking their code—but it’s easy to update their code”). Interviewee N6 expressed an “out of sight, out of mind” attitude: “Unfortunately, if someone suffers and then silently does not know how to reach me or contact me or something, yeah that’s bad but that suffering person is sort of [the tree] in the woods that falls and doesn’t make a sound.” + + +Finally, developers described tradeoffs in fixing mistakes that downstream users had come to depend on. E8 talked about being stuck with a poor design “If you make a mistake in your API […] sorry, you’re stuck with it, so you have to kind of work around it.” R9 mentioned circumstances where users depended on buggy behavior, but the upstream code had to be fixed anyway: “After upgrading the parser some people complained that their script was no longer working. But the problem was that their syntax was invalid to begin with. It’s obviously their fault.” + + +Techniques to Mitigate or Delay Costs. Despite a strong general preference for avoiding breaking changes, there are many cases where the opportunity costs of not making a change are too high. Our interviewees identified several different strategies for how they, as package maintainers, routinely invest effort to reduce or delay the impact from their changes for downstream users. + + +Maintaining old interfaces. Across all ecosystems, preserving the old interface alongside a new one is a very common approach to mitigate an immediate impact of a change on downstream users. While specifics depend on the language and tools, common strategies to avoid breaking downstream implementations include documenting methods as deprecated and providing default implementations for new extension points or parameters. In these strategies, the package developer invests additional effort now to preserve backward compatibility, accepting technical debt in the form of extra code to maintain for some time, in exchange for preventing an immediate downstream impact of the change. The developer may at some later time clean up the code, affecting downstream users that have not updated in the meantime [68]. + + +Similarly, many interviewees (E2, E3, E5–E8, R1, R6–R9, N1, N7) told us about various techniques to perform changes without breaking binary compatibility. They prevent rework costs for existing users by accepting more complicated implementations and harder maintenance in the changed package, while possibly also creating costs for new downstream users who have to deal with more complicated mechanisms. + + +Parallel Releases Seven developers (E5, E6, R1, R2, R4, R7, R8) reported strategies to maintain multiple parallel releases, such that downstream developers can incorporate minor nonbreaking changes (e.g., bug fixes) without having to adopt major revisions. Node.js/npm’s caret operator allows package authors to support parallel releases with different version numbers: An author can publish an update 1.0.1 to their version 1.0.0, even after 2.0.0 has been released; users who wish to stay with the 1.* series but still receive updates may refer to version ^1 or ^1.x to receive anything less than 2.0.0. It is a common practice to provide security patches including for older releases. In contrast, CRAN only supports sequential version numbering, causing some developers to fork their own packages (e.g., reshape2 was introduced as backward incompatible revision to reshape). However, R8 told us this is discouraged by CRAN: R8: “Because +2, it’s the second version of +, at what point can you just freeze an API and leave it there, and + + + + +6https://docs.npmjs.com/misc/semver. +7Current npm security alerts are listed at https://www.npmjs.com/advisories. +8e.g., https://www.npmjs.com/advisories/1482. +9According to https://cran.r-project.org/web/packages/policies.html, “Updates to previously-published packages must have an increased version.” +jump n+1 version and just continue with that? I think there’s some lingo in [CRAN’s instructions for package authors] that they’d rather not have that.” In each case, the fact that they are adding code to multiple versions suggests that developers are investing significant additional effort to reduce the (immediate) impact on downstream users. For example, N1 told us that they were conservative about making major new versions, since their package “has changed major version numbers a lot over last few years, many things backported to earlier versions; irritating to do major revisions every couple of months.” + + +A variant of this strategy is to maintain separate interfaces for different user groups with different stability commitments within the same package (see the façade pattern in Reference [30]). For example, interviewee E5 provided in parallel both a detailed and frequently changing API for expert users and a simpler and stable API that insulated less sophisticated users from most changes. Similarly, interviewee R1 has split packages into smaller packages, with the intention that each user could depend only on parts relevant to them and would be exposed to less change. In both cases, the developer accepts the higher design and maintenance costs of multiple APIs for reduced impact on specific groups of users with distinct needs. + + +Release Planning. + Individual developers and communities may take consideration of downstream users by planning when to release changes. R1 keeps versions of his package with a quickly changing API in a separate repository and batches multiple updates together in CRAN less frequently when he wants to release a version to a broader audience. While in R/CRAN and Node.js/npm packages are released by individuals whenever they want, the core packages of the Eclipse community coordinate around synchronized yearly releases(^\text{10}) (a strategy also common in other package systems such as Debian(^\text{11}) and Bioconductor(^\text{12})). Delaying releases may incur coordination overhead and opportunity costs in slowing down development for the changer, but reduces the frequency (though not necessarily the severity) with which downstream users are exposed to changes and gives downstream users a planning horizon. + + +Communication with users. + Finally, developers communicate in various ways with users to reduce the impact of a breaking change. Seven interviewees (E6, R4, R7, R8, R9, N6, N7) made early announcements to create awareness and receive feedback. R7 explained that “two weeks or a month before the actual release, I do sort of a pre-release announcement on Twitter [and] tell people to use the README.” He told us during the validation phase that he has since written a script to email all downstream maintainers before a release. + + +Another reason for communicating with downstream users was to help them deal with the aftermath of change. In the simplest case, a developer could invest effort in documenting how to upgrade. Nine interviewees (E7, R2, R3, R7–R9, N1, N4, N5) mentioned being aware of their users personally, and could reach out to them individually; for example, N1 contacted users who were still using an old API, to help them migrate, and N5 had most users present on-site and could therefore help them migrate their code. E7 went so far as to create individual patches for all downstream packages within the Eclipse core to get them to adopt a new interface and move away from an old deprecated one. In all these cases, package maintainers invest effort to reduce costs for downstream users. + + +4.2.3 The Influence of Community Values. + The previously discussed techniques are mechanisms that developers can use for tweaking who pays for the costs of a change and when. Individual developers often adopt patterns and, in fact, six interviewees (E1, R3, R4, R5, R8, N6) described gradual + + + + +(^{10})https://wiki.eclipse.org/Simultaneous_Release. + + +(^{11})https://www.debian.org/doc/manuals/debian-handbook/sect.release-lifecycle.ro.html. + + +(^{12})According to https://www.bioconductor.org/developers/package-submission/, “There are two releases each year, around April and October.” +Table 5. Practices (Mostly Upstream) to Communicate and Mitigate Effects of Change + + +| Who | Study 2 Method | Practice | +|-----|----------------|----------| +| U | S | Freeze APIs to protect downstream users from change | +| U | | Release a major change as a new package name, rather than a new version | +| U | | Mark API points as deprecated to warn of future removal | +| U | | Remove deprecated API points eventually | +| U | | Parallel releases to protect users who do not want to upgrade | +| U | S | Release changes in a batch rather than as they are made, to make less churn for users | +| U | S | Write new code as backward compatible, possibly at the cost of incurring technical debt | +| U | S | Proactively notify users about upcoming changes | +| U | S | Assist users who are having trouble upgrading to a new version with a breaking change | +| U | S | Write a migration guide to help users upgrade | +| U | S | Write a change log to document compatibility problems with prior releases | +| U | S | Use semantic versioning to signal the kinds of changes being made | +| U/P | S | Platform rules requiring package authors to negotiate compatibility before releasing (snapshot consistency) | +| U | M | Continue critical updates to older versions, to give users a way to avoid an expensive major upgrade | +| U/P | | Ways to check that APIs have not changed, e.g., API tools, @since tags, documentation | + + +adoption of more formal processes over time, as they learned their value through experience. At the same time, we could clearly observe that attitudes and practices differ significantly among the three ecosystems and are heavily influenced by ecosystem values, tools, and policies. + + +Eclipse. Developers are willing to accept high costs and opportunity costs to further Eclipse’s value of backward compatibility, especially for core packages. The community has developed educational material explaining Java’s binary compatibility and giving recommendations for backward compatible API design [24, 25]. With API Tools, the community has developed sophisticated tool support to detect even subtle breaking changes and enforce change-related policies, such as adding @since tags to API documentation. Breaking changes in core packages are in fact very rare [38]. + + +Even though they arguably make the platform harder to learn and maintain, Eclipse developers have identified and documented [25, part 3] workarounds for extending an interface while maintaining old interfaces, such as creating additional interfaces to avoid modifying existing ones (e.g., IDetailPane2, IDetailPane3, IHandler2) and runtime weaving. Deprecating interfaces and methods is common, but actually removing them is not; for example, like many other methods, org.eclipse.core.runtime.Plugin.startup() as of this publication was still included despite being deprecated for over 15 years. E6 noted that this backward compatibility prevents modernizing APIs, such as replacing arrays with collections. + + +13https://www.eclipse.org/pde/pde-api-tools/. +14e.g., a guide published by the Eclipse foundation about evolving APIs says that, “Obsolete API elements should be marked as deprecated and point new customers at the new API that replaces it, but need to continue working as advertised for a couple more releases until the expense of breakage is low enough that it can be deleted.” [25]. +15This method was deprecated in 2004: https://github.com/eclipse/eclipse.platform.runtime/commit/a46e757a1938edb0a7109dafef349c3a3ffc58ea and was still present in 2020: https://github.com/eclipse/eclipse.platform.runtime/blob/9aedff3f2141631a8bc5fa6d1abe005ea633f107/bundles/org.eclipse.core.runtime/src/org/eclipse/core/runtime/Plugin.java. +The Eclipse community invests significant effort into release planning, at the cost of some resulting friction, as reported by multiple interviewees. E9: “Eclipse has a release process, and some projects have to release at the same time as the platform, some projects the day after, some projects the day after, [so] you’re expected to be available a little bit before, so you can make sure that yours bills properly right? [...] So, that’s kinda a complexity.” The required coordination is invested toward ensuring stability and smooth transitions at few plannable times for downstream users. An Eclipse release is a complex process with steps aimed at maintaining not only technical interoperability with prior versions, but also maintaining a consistent level of legal compatibility, usability standards, security, and so on.\footnote{https://wiki.eclipse.org/Development_Resources/HOWTO/Release_Reviews.} This culture of conservative change contrasts with what, for example, an R developer told us: R7: “On one hand I try to be careful, but on the other hand I don’t want to inflict harm and be like paralyzed by the fact that anything I do might make someone’s life worse. Sometimes you have to be like go ahead and accept that things are going to break and it’s not the end of the world.” + + +In Eclipse, maintenance releases for old major revisions are not common (Table 7); presumably because with backward compatibility users can simply be told to update to the latest release. + + +R/CRAN. As the R/CRAN community values making it easy for users to get a consistent and up-to-date installation, developers invest significant effort to achieve consistency. + + +There is no policy against CRAN packages making changes that affect the larger body of code outside of CRAN. However, when changes affect other CRAN packages, upstream developers are asked to bear the significant extra cost of reaching out to and coordinating with maintainers of affected packages\footnote{https://cran.r-project.org/web/packages/policies.html#Submission} (termed “forward impact management” by De Souza and Redmiles\cite{DeSouza2019}). Downstream maintainers then may also bear the cost of pressure to update their packages first before the upstream developer can make a breaking change, to ensure that all CRAN packages are consistent. CRAN’s policy requires (and verifies) that developers maintain constant synchronization with each other, and 5 of our 10 interviewees (R2, R3, R7, R8, R9) specifically mentioned reaching out individually to known, downstream developers (in contrast to three Node.js interviewees (N1, N4, and N5) and one Eclipse interviewee (E7)). Synchronization is thus continuous, but more decentralized and localized than with Eclipse’s simultaneous releases. + + +Among our interviewees, five developers of specialized R packages targeted small and close communities and knew their users personally. For example, R3 mentioned that “no one used” a feature, and when asked how they knew that, they replied that “statisticians working on a lot of medical imaging [...] type of applications in R is a very small community. There’s only so many people to know.” R3 said he got to know those users because of interactions about the dependency. Only a one of our Node and Eclipse interviewees (E6) mentioned personal connections with downstream users, but our sample is too small to be sure this is not just sampling bias. + + +Consistency is enforced by manual and automated checks on each package update.\footnote{https://cran.r-project.org/web/packages/policies.html#Submission.} The change management process is collaborative but also demanding of a maintainers time; R7 said the timeline to adapt to an upstream change “might be a relatively short timeline of two weeks or a month. And that’s difficult for me to deal with because I try to sort of focus one project for a couple weeks at a time, just so I can remain productive.” Node developers, in contrast, can ignore changes until they feel like updating (N5: “Why don’t we upgrade more often? It’s more work than you’d hope.”), while Eclipse developers rarely need to worry about change (e.g., E1: “When a new version comes +out every year in July or whenever, I’d go ahead and test if my plugin works correctly in that new version; if it does, I don’t care much about that. [...] New features were mostly irrelevant. I didn’t care that much about that.” + + +The platform is not conducive to multiple parallel releases—on CRAN a package revision must have a higher version number than the one it supersedes, so an old major version cannot be updated; policies also discourage forking a project and submitting it with a separate name. There is no central release planning, perhaps because it is perceived to slow down access to cutting-edge research. + + +Overall, we observed much more communication and coordination with downstream users about individual changes than in Eclipse, but also more flexibility with regard to performing breaking changes. + + +Node.js/npm. The Node.js/npm community values ease for upstream developers and the possibility to move fast [75]. It is much less demanding for a developer to make a breaking change. Six of the Node.js interviewees talked about the importance of signaling change through semantic versioning. + + +This sharply contrasts with the R developers we asked about this: two R interviewees spoke out against semantic versioning; for example, R7: “I’m familiar with the semantic versioning stuff. It’s just I don’t find that useful personally, because most R users aren’t familiar with that and I think [convention] is a little bit on the ridiculous side. [...] For most R users I don’t think version numbers send a terribly strong signal, and they are likely to not know what version they are using currently anyway.” + + +Semantic versioning in Node allows developers to make breaking changes as long as they clearly indicate their intentions. Because the technical platform allows downstream developers to still easily use the old version without fearing version inconsistencies, breaking changes do not as easily cause rippling effects or immediate costs for downstream users. While they still avoid breaking changes and employ various strategies to maintain old interfaces, in our interviews, Node.js/npm developers were generally willing to perform breaking changes in the name of progress and in fighting technical debt, including experimenting with APIs until they are right. For example, N6 told us that if a downstream user was concerned about a breaking change: “I could tell this person, well look if you have this problem at least for now your workaround is very simple. Change your dependency to be this exact dependency so instead of saying we depend on package foo version *. Change it to just exactly that version [...], and you will still be using the old one that you know and love. And that will postpone your problem until the day that you need some new thing that’s come out which is no longer backported into the old version. [...] So knowing that, I do kind of feel kind of confident enough to just say yeah we’re gonna bump the major version, we’re gonna announce or whatever that takes, but I don’t really myself feel too much desire to kind of read for the backward compatible people.” + + +As mitigation strategy, maintenance releases for old versions are common, made easy by the platform and associated tools. Analyzing the npm repository, we found that 24 of the 100 most “starred” packages did this at least once; this was more common than in Eclipse or R/CRAN (Table 7). + + +Summary of RQ1.1 results: Developers are motivated to change code for many reasons, such as requirements and context changes, bugs and new features, rippling effects from upstream changes, and technical debt from postponed changes. There are also opportunity costs from + + +19https://cran.r-project.org/web/packages/policies.html#Submission. +forgoing or postponing changes. Opposing this motivation is their awareness of costs to down- +stream users of such changes, especially when their userbase is large and visible to them; in +most cases developers want to avoid imposing those costs on users. Their choice is not binary, +however; there are ways of softening the impacts of change, such as maintaining old interfaces, +making parallel releases, and making and communicating plans about upcoming changes. De- +velopers weigh these choices differently depending on the ecosystem’s values: Eclipse core +package developers are discouraged heavily against change, and thus opt for techniques to al- +low strictly backward-compatible additions. R/CRAN developers are not officially discouraged +from making changes, but they are aware that the ecosystems rules (no parallel releases, onus +on downstream users to update) are burdensome for downstream users, so they emphasize com- +munication and collaboration in their updates. Node.js/npm developers are encouraged to make +changes, by mechanisms that signal downstream users about changes, yet insulating them from +the requirement to adopt the changes; as a result upstream developers are quite likely to opt +for change, and to police each others’ rigorous use of the signaling mechanisms for change +(semantic versioning). + + +4.3 Study 1 Results: Coping with Upstream Change (RQ1.2) + + +Just as upstream developers have some flexibility in planning changes that may affect downstream +developers, downstream developers have flexibilities regarding whether, when, and how to react +to upstream change, again influenced by values, policies, and technologies (Table 6). Having to +monitor and react to upstream change can be a significant burden on developers (e.g., mismatch +between schedules has been shown to be a barrier to collaboration [42]). The urgency of reacting +to change can depend significantly on the development context and platform mechanisms. + + +When discussing how frequently they react to upstream change, our interviewees described a +spectrum ranging from never updating (E3) to closely monitoring all changes in upstream packages +(N1, N2, R9). Two interviewees mentioned explicitly ignoring certain upstream changes (N3, N7); +others upgraded dependencies only at the time of their own releases (N3, N5) or during deliberate +house-cleaning sweeps (N7, E2). Even when the platform does not require updates, developers +often prefer to update their dependencies to incorporate new fixes and features (E3, N2) or to avoid +accumulating technical debt (R6, N5). But they avoid updating when updates require too much +effort (e.g., by causing complicated conflicts; N5, E3) or cause too much disruption downstream +(N7). + + +4.3.1 Monitoring Change. When developers have to or want to react in a timely fashion to up- +stream changes, they need to monitor the upstream projects in some way. The platform itself, e.g., +Node.js, R core, and the CRAN infrastructure, is often an additional source of changes that devel- +opers need to keep up with. In our interviews, we discovered many different strategies for moni- +toring, including technical and social strategies. Their strategies varied along with the urgency of +their needs, from active monitoring of upstream activity, to general social awareness of upstream +activities, to a purely reactive stance where developers wait for some kind of notifications. + + +Active monitoring. Only four interviewees (E5, R9, N1, N4) reported actively monitoring up- +stream changes, in the sense of maintaining personal awareness of upstream changes, by regu- +larly looking at activity going on in their upstream dependencies. R9, N1, and N2 said they used +GitHub’s notification feed with some regularity (N2 only for changes to the Node.js platform, not +to upstream packages). N4 kept up by following Twitter feeds, blogs, and attending conferences. +R7 indicated that raw notification feeds, in their current form, are a significant burden with a low +signal to noise ratio, saying that “The quantity of notifications I get on GitHub [on my own project] already is to the point of overwhelming. So I don’t even mostly read them unless I’m actually working on the project at that moment.” He later told us that after our interview he tried scaling back to watching just the three to five projects he is actively working on. Only one interviewee (R9) did not feel overwhelmed, saying that occasional skimming of GitHub feeds was useful way to get an overview of activity. + + +Upstream participation. + In seven cases, developers mentioned monitoring upstream changes not as outsiders following a stream of data, but as active participants in those projects, collaborating to influence them toward their own needs (E5, N4, N7, R6) or providing direct contributions to those packages (E7, E9, R7). For example, in describing the challenge of getting upstream projects to prioritize changes that he needed, an Eclipse developer said, “I touch everything that I care about, because it’s really hard to convince other people to do things that I need to do. I find it much easier to just learn all the projects and when I need something, to do it myself.” This aligns with de Souza and Redmiles’ observation of exchange of personnel as a common strategy for cooperation among dependent projects[19]. Such developers wear hats in both projects: They maintain active awareness of the upstream project, as downstream developers, and as upstream developers, their downstream work informs their understanding of the upstream project’s requirements. + + +Others like E5 actively compiled and tested their project with development versions of upstream dependencies, emphasizing the importance of giving timely reactions: “if you report it within a week there’s a better chance the developer might remember what they did […] which provides a good chance that they can revert their change before they hit their milestone.” + + +Social awareness. + Many interviewees tried to maintain a broad awareness of change through various social means. The most frequently mentioned mechanism, especially in the Node.js community, was Twitter (E9, R7–R9, N2, N3, N4a, N4b, N6, N7). For example, N4a commented, “the people who write the actual software are fairly well connected on Twitter, […] like water cooler type of thing. So we tend to know what’s going on elsewhere.” In each ecosystem, interviewees (E5, R9, N4, N6) mentioned the importance of face-to-face interactions at conferences for awareness about important changes in the ecosystem. Other mentioned social mechanisms to learn about change were personal networks (R6, R8), blogs (E1, R4, R7, R8, N4, N7), and curated mailing lists (N1). + + +Reactive monitoring. + Although our research questions led us to probe interviewees about the aforementioned active and social monitoring practices, a reactive strategy is also possible for dependencies. That is, rather than maintain some awareness and understanding of plans and activity in an upstream project, for example, by watching a Github feed and keeping track of why they follow each project and which changes might be relevant to them, a developer may instead ignore upstream projects’ activity until they are given actionable evidence that their own project needs to adapt in some way. The developer waits to hear about problems from others (in advance, or after things had broken): Upstream developers contacting them about breaking changes, failing tests after dependency updates, or platform maintainers warning of changes that would affect them. There are tools that enable this reactive stance, that generate targeted notifications on certain kinds of changes. The specific tools differ among the platforms and support different practices or policies. Policies and common practices (e.g., testing practices) in the platform strongly in turn affect the reliability of a reactive strategy and corresponding tools. + + +Four developers (R3, E5, N2, and N7) mentioned the use of continuous integration to detect compile-time issues caused by breaking changes in upstream packages early. The tools gemnasium [32] and greenkeeper [35] allowed Node.js/npm developers to get notifications about new +Table 6. Practices (Mostly Downstream) to Monitor Change and Manage or Avoid Its Effects + + +| Who | Study 2 Method | Practice | +|-----|----------------|----------| +| | | Awareness and coordination | +| D | S | Reactively track what upstream packages are doing (when it breaks; when you’re notified somehow) | +| D | S | Proactively track (maintain awareness via github notifications, mailing lists, etc.) | +| D | S | Submit feature requests and bug reports to upstream package authors | +| D | S | Participate in decision-making about upstream package’s future | +| D | S | Tool-based notifications about upstream changes (e.g., Greenkeeper) | +| D | | Regularly test against unreleased development versions of dependency to give timely feedback | +| P | | Socially connected group of developers following each other on Twitter, going to conferences, etc. | +| P | | Political work among core people to get buy in on making a breaking change | +| | | Protection against each potential change | +| D | S | Do not update dependencies; just leave them at old versions known to work | +| D | | Upgrade dependencies all at once only when making a new release | +| D | S | Dependency hell: manual manipulation of dependency version constraints to get a set of dependencies to be mutually compatible | +| U | S | Violate semantic versioning for trivial changes to prevent rippling updates that version change would require | +| D | M | Lock file: fix versions of all upstream packages (incl. transitive dependencies) with release | +| D | | Report wrong semantic versioning as a bug | +| D | M,S | Specify an exact version number of a specific dependency | +| D | M,S | Specify a range of legal version numbers of dependencies (e.g., allow minor but not major upgrades) | +| D | M,S | Specify only a dependency’s name and do not constrain what version of it is to be used | +| | | Protection against dependencies themselves | +| D | S | Do significant research about each dependency weighing whether to adopt it | +| D | S | Wrap the dependency in an abstraction layer to decrease risk of change | +| D | S | Avoid use of dependencies, roll your own | +| D | S,M | Clone the dependency’s code and maintain the new code yourself | +| D | M,S | Copy dependency code into your own repository (“vendoring”) to get exact version needed | + + +releases of upstream packages. Gemnasium alerted developers of package releases that fix known vulnerabilities, whereas greenkeeper submitted pull requests to automate a continuous integration run against the new release. In either case, developers could react to notifications by email or pull requests. + + +CRAN’s requirement that upstream developers notify their downstream dependents when a change is coming appears to encourage downstream developers across the ecosystem to take a reactive stance (in contrast to Eclipse and Node.js/npm, where individual downstream developers need to employ optional monitoring tools). R7 defended the practice of waiting to be told about breaking changes as a principled attention-preserving choice, consistent with ecosystem norms; while R2 was apologetic about being reactive: “I guess I’ll sound crass about this and say it. For things +like that I would wait to hear from CRAN when something broke. Because I don’t think I can keep up with all of it.” CRAN enforces this policy with manual and automated checking on each package update, running the package’s tests and the test of all downstream packages in the repository, as well as some static checks. The CRAN team may then warn an affected downstream developer of an upcoming change by email. + + +4.3.2 Reducing the Exposure to Change. Many developers have developed strategies to reduce their exposure to change from upstream modules and, thus, reduce their monitoring and rework efforts. The degree to which developers adopt such mitigation strategies again depends on the technology, policies, and values, as we will discuss. + + +Limiting dependencies. Most of the CRAN and Eclipse interviewees that we asked (11 interviewees: R1, R2, R3, R4, R6, R7, E1, E2, E4, E5, E9) felt that it was better to have fewer dependencies. Reasons for limiting dependencies included limiting one’s exposure to upstream changes and not burdening one’s users with a lot of modules to install and potential version conflicts (“dependency hell”). Interviewee E5 represents a common view: “I only depend on things that are really worthwhile. Because basically everything that you depend on is going to give you pain every so often. And that’s inevitable.” Apart from removing no longer needed dependencies (tooling provided in Eclipse), six developers described more aggressive actions to avoid dependencies, including copying (R4) or recreating (R1, R6, R7, N6) the functionality of another package. N6 had to fork and recreate an upstream dependency as a temporary measure because of a licensing issue, but he did not feel dependencies were a burden generally. + + +In contrast, due to Node.js/npm’s ability to use old versions and Eclipse’s stability, three developers (E3, N1, N5) specifically said that they did not see dependencies as a burden. + + +Selecting appropriate dependencies. When limiting themselves to appropriate dependencies, interviewees mentioned a variety of different signals they looked for; these fell into five categories: + + + + + + +Trust of developers: + Seven interviewees (E4, R1, R5, R6, R7, N4, N6) mentioned basing decisions on personal trust of package maintainers. Criteria included being a large organization (E4), having a reputation for high quality code (R6, N6), and being consistent with maintenance (R6). One interviewee (R7) deliberately sent bug reports to a package to test whether the developer would be responsive before depending on it. + + + + + + +Activity level: + Five interviewees (E4, N6, N2, R1, R6) considered the activity level of the community of developers; for example, distinguishing a “real” ongoing project from an abandoned research prototype. Both high and low activity levels can be a positive indicator depending on the state of the project, as stated by N2: “Ones with activity are mostly better maintained; they have lots of people contributing, like express. It’s likely the community will have eyes on the ball, consider backward compatibility, ramifications […] Ones with little activity are small projects that don’t change often, so change isn’t an issue either.” + + + + + + +Size and identity of user base: + Four developers mentioned the size of the user base was using signals such as daily download counts (E2, N3, N5), whether projects of trusted developers use it (N6), or, as E2 said, “Whether I’ll actually jump on it or not is about how I perceive other software projects are using it.” N5 told us, “We look to see how many people are using it: number of downloads per day. If it’s low, that’s a clue that it’s sketchy, but not a perfect heuristic.” + + + + + + +Project history: + Four interviewees said they assumed that past stable behavior of a package would predict future stability (R1, R4, R6, E2). Signals included their own experience with the package (N4, E5), its status as part of the platform’s core set of packages (E4), or its +visible version history, such as lack of recent updates and a version number above 1.0 (E3, N1, N4). + + + + + + +Project artifacts: + Finally, developers mentioned signals from project artifacts, including coding style (R1, R6), documentation (R1), good maintenance (N6), perceived ease of adoption (R1), code size (E2, N4, N7), and conflicts with other dependencies (N5). + + + + + + +Encapsulating change. + Interestingly, there was almost no mention of traditional encapsulation strategies to isolate the impact of changes to upstream modules, contrary to our expectations and typical software-engineering teaching [63, 73, 88]. Only N6 mentioned developing an abstraction layer between his package and an upstream dependency, implemented because of an anticipated change. Questions about encapsulation were not in our interview protocol, so we did not ask about it specifically, but one possible explanation is that since upstream package already generally try to avoid gratuitous API changes, the ones that are necessary would require changes to an encapsulating class’s API, obviating the point of the encapsulation. +---------------------------------------- +------------------------------- +Section 199: +4.3.3 Platform Values and Developer Values. + + +Because policies, tools, and practices support different values in each ecosystem, they impose different costs on developers depending on whether their attitude towards some particular dependency aligned or conflicted with the community’s broader values. In some situations developers will treat a dependency as a fixed resource to draw functionality from (also termed API as contract [20]), but in other situations, they treat the interface as open to negotiation and change (also API as communication mechanism [20]). + + +Eclipse’s value on backward compatibility and predictable release planning is convenient for developers and corporate stakeholders who wish to rely on the released core platform code as a fixed resource. Stability ensures that most developers relying on the platform packages do not need to monitor upstream changes, reacting at most to the yearly releases. Signals about whether to trust an upstream package are primarily social in the sense they can trust the packages that are part of the core, supported by corporations known to be invested in the stability of the platform. + + +According to E6, developers working within more volatile parts of the Eclipse ecosystem, such as using code outside the stable core, or in-development features of the core, have a greater need for monitoring and may be exposed to more change, sometimes encountering friction associated with that. E6 told us that “there is a very different understanding of how important compatibility is and what it means, if you start from the platform, and then to the outer circles of Eclipse.” E5 talked about recompiling upstream code often to report bugs to them within a week. Thus, although Eclipse deeply values stability, there is necessarily a sphere of activity with active collaboration and change where that value is appropriately set aside. + + +CRAN’s emphasis on consistency and timely access to research seems to encourage the API as communication rather than the API as contract [20] view of dependencies, in that its snapshot consistency approach forces maintainers to react to breaking upstream changes quickly (typically a few weeks [87]). This causes some apparent friction with researchers who might otherwise wish to publish their software and move on to other things. Many of the interviewees limited their dependencies, sometimes quite aggressively, by replicating code and reacting to notifications about change rather than actively following a community of upstream developers. However, an active and socially connected subset of developers (R7–R9) seemed to welcome collaboration. Although R7 advocated reacting to upstream changes rather than trying to anticipate them, R7, R8, and R9 emphasized Twitter and conferences to maintain an upstream awareness. + + +Node.js/npm’s emphasis on convenience for developers has led to infrastructure that seems to decouple upstream and downstream developers from having to collaborate, since the downstream +can depend on old versions of the upstream for as long as they like. This should logically lead to less urgency to monitor upstream changes, except for patching security vulnerabilities. Developers do nonetheless often choose to take a collaborative approach to development, using tools such as continuous integration and greenkeeper [32] to force themselves to stay up to date despite the platform’s permissiveness. + + +Summary of RQ1.2 results: + Downstream developers are motivated to update their dependencies to take advantage of bug fixes and new features and avoid technical debt. However, such updates can be complex or risky, can disrupt downstream users, and may require some awareness of ongoing activity in an upstream project. Strategies to balance the costs and risks include different levels of awareness of upstream projects (from social or technical participation, to active or merely reactive monitoring), chunking the work by making all updating decisions at once periodically, or limiting the problem by carefully vetting dependencies to begin with. As with upstream change decisions, the ecosystem’s context affects participants’ choices. Eclipse’s extreme interface stability allows downstream developers, at least outside the core, to trust it and ignore the possibility of change. CRAN’s policy of global consistency among packages creates pressure for package maintainers to actively collaborate with their upstream counterparts; a core community seems to be spurred to active collaboration on Twitter and at conferences, while a peripheral community limits dependencies to avoid this necessity. Finally, NPM’s tooling decouples downstream developers from immediate impact by upstream changes; developers who nonetheless wish to stay up to date adopt tools like greenkeeper to remind and encourage them to update. +---------------------------------------- +------------------------------- +Section 200: +4.4 RQ1.3 Unintended Consequences + + +Interviewees told us about instances where policies or their combinations led to unintended consequences. + + +Eclipse. + One Eclipse developer said that the “political” nature of making changes can drive away developers and users. “You have to be very patient and know who to talk with and whatnot; you really have to know how to play that game to get your patches accepted, and I think it’s very intimidating for some new people to come on.” He explained that with many interdependent packages managed by different people each with a mandate not to change their interfaces, implementing a rippling change can require negotiations among people with conflicting interests. + + +Another consequence of Eclipse’s stability, along with its use of semantic versioning, is that many packages have not changed their major version number in over 10 years. However, as E8 told us, strict semantic versioning is impractical to follow, so even for the few cases of breaking changes that are clearly documented in the release notes, such as removing deprecated functions, major versions are often not increased. Updating a major version number can ripple version updates to downstream packages, which can entail significant work for the many downstream projects that have hard-coded major version numbers for their dependencies. + + +Node.js/NPM. + For Node.js/npm, in contrast, the rapid rate of changes and automatic integration of patches can raise concerns about reproducibility in commercial deployments. In many cases, the community then builds tools to work around some of the issues, such as providing tools that take a specific snapshot of an installation including all transitive package dependencies (e.g., “npm shrinkwrap” or R/CRAN’s packrat). “In npm, if you install today and tomorrow, you’ll +get 100s of dependencies, and something may have changed. So even if my version is the same, the servers could be running slightly different code, so customer facing code will differ and be hard to reproduce.” + + +R/CRAN. CRAN has a similar issue regarding scientific, rather than deployment reproducibility: The community’s goal of timely access to current research conflicts with many researchers’ goal to ensure reproducibility of their studies [61]. + + +In R/CRAN, the opposite dynamic from Node is evident in its versioning policy: The official policy on version numbers only requires that version numbers increase with each submission(^{20}); but a permissive form of semantic versioning is used and recommended by many developers [87, 91]. + + +These conflicts and unintended consequences suggest that the design of ecosystem practices is not a solved problem. + + +Summary of RQ1.3 results: + Unexpected community responses to policies included creative use of semantic versioning, innovative ways of promoting replicability, and stagnation. +---------------------------------------- +------------------------------- +Section 201: +5 STUDY 2: A SURVEY ON VALUES AND PRACTICES: PREVALENCE, CONSENSUS, AND RELATIONSHIPS + + +The research questions for Study 2 emerged in large part from the results of our first study. Study 2 endeavored to expand the scope beyond these three cases and to ask further questions raised by our results. + + +Study 1 revealed substantial differences in our three cases in the practices used to manage breaking changes and in the values these practices appeared to serve. This raises the question of how prevalent such differences are. Some values may be nearly universal, and some practices may be so fundamental, well-known, and effective that they are employed by nearly all ecosystems. However, different ecosystems make use of different technologies, have evolved different cultures, and serve different constituencies, suggesting that at least some values and practices may vary, perhaps dramatically, among ecosystems. Our questions for Study 2 were therefore: + + +RQ2.1: To what extent are values and practices for managing breaking changes shared among a diverse set of ecosystems? + + +Moreover, we have been making the assumption that ecosystems tend to have a shared view of values and practices across the ecosystem, i.e., that they are characteristics of ecosystems rather than individual projects or sub-ecosystem clusters of projects. It seems important to test this assumption, hence: + + +RQ2.2: To what extent do individual ecosystems exhibit consensus within the community about values and practices? + + +Finally, as we observed in Study 1, it seems that some practices are designed to serve the ecosystem’s values, e.g., to insulate an installed base of applications from changes (Eclipse), to make it easy for end-users to install and use the latest software (R/CRAN) or to allow developers to contribute code as simply as possible (Node.js). Are particular values always associated with specific practices that further that value? We ask more generally: + + +RQ2.3: What is the relationship between ecosystem values and practices? + + +Anonymized survey data is available [7]. + + +(^{20})https://cran.r-project.org/web/packages/policies.html +5.1 Study 2 Results: Validation of Study 1 + + +Before presenting new results from the survey, we take the opportunity to validate some of the results of Study 1, since we have available hundreds of survey responses covering similar questions from the three ecosystems in that study. + + +Study 1 characterized practices and values of three ecosystems based on interviews with developers in each ecosystem. The values they inferred for Eclipse and Node.js/NPM align with our data: Eclipse participants did seem to value backward compatibility as postulated: Stability and compatibility were their two highest ranked values (Table 10). Aligning with findings from the interviews, Eclipse developers were top-ranked in claiming to make design compromises in the name of backward compatibility (Figure 3(c)). Aligning with the interview result that showed Node.js developers to value ease of contributions for developers, Node.js participants in our survey were top ranked in valuing innovation and ranked highly in both making frequent changes to their own package (Figure 3(a)) and in facing breaking changes from dependencies (Figure 4(a)), although they were mid-rank in feeling any less constrained from making changes than other ecosystems (Figure 3(b)). + + +CRAN survey participants did not highly rank rapid access as expected from the interviews; and they were not more averse to adopting dependencies as predicted (not shown), although, as predicted, they did claim to clone code more (not shown). Aligning with interview results discussing personal contacts among upstream and downstream developers, they were top ranked in reporting being personally warned about changes in their dependencies (Figure 4(e)), but, contrary to expectations, were low ranked in warning their own downstream users (Figure 3(h)). This contrast, in particular, i.e., frequently being warned but rarely issuing warnings, suggests that our R/CRAN interviews may be overweighted toward downstream developers. + + +Although the survey largely validates the interview results, the differences highlight the fact that different methods with different sampling strategies can produce somewhat different results, and that even the design intentions of core members responsible for promulgating practices are not necessarily propagated to the whole community. + + +5.2 Study 2 Results: To What Extent Are Values and Practices Shared across Ecosystems? (RQ2.1) + + +The survey, policy analysis, and data mining revealed an interesting pattern of similarity and differences in values and practices across ecosystems. For those that vary across ecosystems, it is rare that we see a clear division of ecosystems in two distinct groups. Rather, sorting tends to generate a smooth curve between the extremes. Visible differences between ecosystems at either end of the spectrum are generally statistically significant, and often a few ecosystems stand out, as we will discuss. We plot answers to many of our survey questions in Figures 2, 3, and 4 and Table 7. + + +All values, except for commerce (Figure 2), were considered at least “somewhat important” in all ecosystems. Stability, quality, and community are nearly universal values and compatibility, rapid access, and replicability are also rated highly across most ecosystems (see the bottom rows of Figure 2 for the few exceptions). For quality in particular, participants felt even more strongly, and more consistently, that it was of high importance to them personally and to the ecosystem as a whole (the mean personal value of quality was about 0.8 scale points higher than the mean ecosystem value). Still, we see strong differences between ecosystems at each end of the spectrum. Personal values correlate strongly with perceived community values (Spearman $\rho = 0.416, p < .00001, n = 10878$, comparing the two answers for each of the eleven values, for each person, as a separate observation), but participants, on average, rated quality as a much higher personally, +Table 7. Comparison of Data-mined Practices (Data from libraries.io and World of Code [57]; see Section 3.3.6 for Details + + +| Ecosystem | (a) Exact | (b) min only | (c) range | (d) unconstrained | (e) Cloning | (f) Lock Files | (g) Maint. old vers. | +|--------------------|-----------|--------------|-----------|-------------------|-------------|----------------|---------------------| +| Atom (plugins) | 22.5% | 1.55% | 73.7% | 1.29% | 2.62% | 0.1% | 1.8% | +| CocoaPods | – | – | – | – | – | 8.37% | 3.85% | +| Eclipse (plugins) | – | – | – | – | – | n/a | – | +| Erlang,Elixir/Hex | 9.09% | 9.25% | 81.6% | 0.0% | – | 65.7% | 3.95% | +| Go | – | – | – | – | 3.24% | 14.4% v | – | +| Haskell (Cabal/Hackage) | – | – | – | – | – | 0.5% | 1.04% | +| Haskell (Stack/Stackage) | – | – | – | – | – | 0% | n/a | +| Lua/Luarocks | – | – | – | – | 3.21% | 0% | – | +| Maven | 100.0% | 0% | 0% | 0% | 0.72% (Java)| n/a | 25.4% | +| Node.js/NPM | 16.3% | 0.44% | 78.6% | 3.67% | 7.03% | 0.8% | 3.96% | +| NuGet | 5.27% | 88.7% | 6.01% | 0% | – | 7.2% | 17.6% | +| Perl/CPAN | 100.0% | 0.0% | 0.0% | 0.0% | 2.30% | 1.0% | 2.72% | +| PHP/Packagist | 21.3% | 3.72% | 66.7% | 7.99% | 1.16% | 16.9% | 10.6% | +| Python/PyPi | 14.6% | 34.5% | 5.86% | 44.1% | 8.17% | n/a | 6.07% | +| R/Bioconductor | – | – | – | – | 3.59% | 0.2% | n/a | +| R/CRAN | 0.0% | 24.4% | 0.0% | 75.6% | 2.69% | 0.8% | 0.10% | +| Ruby/Rubygems | 3.78% | 49.6% | 46.3% | 0.94% | 1.76% | 17.4% | 4.54% | +| Rust/Cargo | 3.86% | 2.14% | 93.6% | .40% | 6.90% | 14.6% | 1.4% | + + +Dependency Version Constraints: Over all versions of packages in our data, over each of the packages’ dependencies, what proportion of dependencies were constrained with Exact version number, specified the minimum version only, a range of versions, or left the version unconstrained. Dash(–) means no data (dependencies not tracked in libraries.io, or language files not indexed in WoC). Most common type of constraint for each ecosystem is bolded. + + +Cloning is percent of packages in repository whose projects borrowed a file from another package. Maint. old vers. is percent of packages whose version number does not increase monotonically. Lock files is percentage of packages that use a lock file to set an exact version of transitive dependencies. n/a = no equivalent of a lock file. v = Go includes projects with a “vendor” directory, which has a similar effect as a lock file. + + +compared to how they rated it as an ecosystem value (.9 Likert scale points, paired t-test: p<.0001); they also tended to rate fun slightly higher personally (.6 Likert scale, paired t-test: p<.0001); all other differences were within half a Likert scale point. + + +Additional values from open-ended questions. We also asked an open-ended question about other values important to their ecosystem. Common themes are counted in Table 8. Answers included usability (15 responses) and social benevolence (good conduct, altruism, empowerment, making resources available to all; 17 responses). An interesting pair of contrasting values we had not considered was standardization (12 responses) and technical diversity (17 responses). Technical diversity advocates valued freedom to implement things and interact with other developers in a diversity of ways: “the package creator should be in charge of deciding how best to manage his/her package and organize with other contributors […]” (Node.js/NPM respondent), while standardization advocates said their ecosystem limited choice to save developers time and effort by promoting wide adherence to standards: e.g., a Python respondent said the platform’s “open ecosystem proposes commonly used, sensible ways to solve popular problems, enforces de facto standards” and decried the chaos of “NIH [Not Invented Here] syndrome.” +Table 8. Number of Respondents Suggesting Other Ecosystem Values: Usability, Social Benevolence, Standardization, Technical Diversity, Documentation, Modularity, Testability + + +| Ecosystem | Usability | Social Benevolence | Standardization | Technical Diversity | Documentation | Modularity | Testability | +|----------------------------|-----------|--------------------|-----------------|---------------------|---------------|------------|-------------| +| Atom (plugins) | | | 1 | | | | | +| CocoaPods | 2 | 2 | | | | | | +| Eclipse (plugins) | | | | | | | | +| Erlang,Elixir/Hex | 1 | 1 | 1 | | | | | +| Go | 1 | 4 | 4 | 2 | 1 | 1 | | +| Haskell (Cabal/Hackage) | | | | | | | | +| Haskell (Stack/Stackage) | | | | | | | | +| Lua/Luarocks | | | 1 | | | | | +| Maven | | | | | | | | +| Node.js/NPM | 1 | 1 | | | 3 | 7 | | +| NuGet | | | | | | | | +| PHP/Packagist | | | | | | | | +| Perl/CPAN | 2 | 2 | 3 | 5 | 2 | 1 | 5 | +| Python/PyPi | 1 | 2 | 1 | 2 | 2 | | | +| R/Bioconductor | | | | | 4 | | | +| R/CRAN | | | | | | | | +| Ruby/Rubygems | 3 | 3 | 2 | 2 | 4 | | | +| Rust/Cargo | 1 | 1 | | | 1 | 1 | | +| other | 1 | 1 | 1 | 1 | 1 | | | + + +Other responses to this question we deemed to be not really ecosystem values, but rather favored technical qualities of code at the package level (64 responses), which might be promoted by ecosystem culture, such as good documentation (11 responses; 4 of which were Bioconductor participants); high modularity (16 responses; 7 of them in Node.js/NPM); and testability (11 responses; 4 each in Ruby and Perl). Finally, 13 (8%) responses objected to the framing of the question, claiming either that no community existed that could be said to share values (5 respondents, 3 of them in Maven) or saying that multiple subcommunities existed with differing values (8 respondents, including 2 in Erlang/Hex and 2 in Haskell/Cabal). + + +Other recent surveys [34, 77] have used similar sets of values. In light of responses to our survey, we propose the revised list of values in Appendix C. This new list adds the new values of Standardization, Technical Diversity, Usability, and Social Benevolence, removes Quality (since it did not distinguish among ecosystems). + + +Change planning practices. Participants across all ecosystems indicated in the survey (Figure 3) that they perform breaking changes only rarely: a median of less than once a year both for the changes that our participants perform (Figure 3(a)) and breaking changes that their package faces from dependencies (Figure 4(a)). Although prior research suggests that breaking changes are “frequent” (Section 2), this is relative to the overall frequency of change. Applying a back-of-envelope estimate to Decan et al. [21]’s findings, for example: They report about 5% of updates actually caused breakages, against a background rate of about 1.2 updates per year per package (1,029 updates to 1,710 packages in a six-month window), or one breakage every 17 years. Given that breakages may not be evenly distributed, packages have multiple, recursive dependencies, and developers work on multiple packages, experiencing a breakage once a year is in the range of +Table 9. Comparison of Sanctioned Practices and Features + + +| Ecosystem | (a) Dependencies outside repository | (b) Central Repository | (c) Access to old dependency versions | (e) Gatekeeping standards | (f) Synced ecosystem | +|----------------------------|-------------------------------------|------------------------|--------------------------------------|--------------------------|----------------------| +| Atom (plugins) | ● | ● | ● | ● | ● | +| CocoaPods | ● | ● | ● | ● | ● | +| Eclipse (plugins) | ● | ● | ● | ● | ● | +| Erlang,Elixir/Hex | ● | ● | ● | ● | ● | +| Go | ● | ● | ● | ● | ● | +| Haskell (Cabal/Hackage) | ● alt repo | ● | ● | ● | ● | +| Haskell (Stack/Stackage) | ● | ● | ● | ● | ● | +| Lua/Luarocks | ● | ● | ● | ● | ● | +| Maven | ● | ● | ● | ● | ● | +| Node.js/NPM | ● | ● | ● | ● | ● | +| NuGet | ● alt repo | ● | ● | ● | ● | +| Perl/CPAN | ● alt repo | ● | ● | ● | ● | +| PHP/Packagist | ● | ● | ● | ● | ● | +| Python/PyPi | ● | ● | ● | ● | ● | +| R/Bioconductor | ● alt repo | ● | ● | ● | ● | +| R/CRAN | ● alt repo | ● | ● | ● | ● | +| Ruby/Rubygems | ● | ● | ● | ● | ● | +| Rust/Cargo | ● | ● | ● | ● | ● | + + +● = ecosystem has feature, ○ = does not have feature, □ = has feature, but for a group of packages, not for individual packages. alt repo = through reference to an alternative repository; staged releases = groups of packages are debugged together and released as a group. submitter = the author, not the package, is vetted. core = core packages only. See Section 3.3.5 for details. + + +plausability. So, this is perhaps why their actual experience of dealing with a breaking change may be infrequent even if breaking changes are frequent overall in the ecosystem. + + +Respondents in every ecosystem agreed, on average, that they used semantic versioning or comparable versioning strategies (Figure 3(f)), batch multiple changes into a single release (Figure 3(d)), document their changes (Figure 3(e)), and are conservative about adding dependencies to their projects (Figure 4(c)). These seem to generally be considered as good software-engineering practices independent of programming language or ecosystem. + + +Answers that varied more dramatically among ecosystems included reluctance to make breaking changes (Figure 3(b)), willingness to compromise design for backward compatibility (Figure 3(c)), and synchronizing with users before releasing changes (Figure 3(h)). Data mining reveals that ecosystems also vary considerably in how often they make updates to previous versions, ranging from as high as 25% of Maven projects doing this at least once, to 0.1% of R/CRAN projects doing so. + + +Turning to shared community resources, all but two of the ecosystems we studied supply a central repository server from which packages could be downloaded automatically as needed (Table 9(b)). Two (Go and Eclipse) only maintain indexes to maintainers’ own servers that must supply the package and metadata in some standard way. Advertised submission requirements for packages show that ecosystems differed in the level of vetting (Table 9(e)) of the packages. +these repositories apply. Haskell’s Cabal/Hackage system is unusual in that it vets maintainers, who apply for accounts that are hand-checked by human reviewers, but does not apply more than minimal automated standards to submitted packages. CRAN has very strict standards for package submissions and updates, which are vetted by hand as well as automated tests. + + +Three ecosystems are released all at once on a regular, synchronized schedule (Table 9(f)): the core set of packages in Eclipse, as well as the whole of Bioconductor (synchronized with releases of the R runtime), and CPAN. These work by having a staged sequence where a development build is worked on until it is consistent, then parts or all of it are released as a group into the official supported release. Other ecosystems allow developers to release packages whenever their authors wish. This is similar to practices of operating-system-level software ecosystems such as Debian’s APT that repackage software from a variety of languages and ecosystems into compatible releases for an operating system. + + +Note that Stackage’s sets of compatible packages are curated together post hoc; their development is not synchronized unless developers collaborate on their own to do so. + + +Practices for coping with dependency changes. Sixteen of the 18 ecosystems offer an optional (Table 9(b)) but widely used central repository (Table 9(a)) for packages, usually encouraging packages to refer to dependencies by name and version number. + + +When asked specifically about their package’s exposure to breaking changes from upstream packages, participants across all ecosystems again reported low frequencies (Figure 4(a)); only a quarter of our participants indicated that they saw a breaking change per year. Participants in ecosystems with more conservative change practices (e.g., Eclipse, Erlang, Perl) are exposed to slightly fewer breaking changes. Participants across all ecosystems indicated that they are conservative in adding dependencies (Figure 4(c)) and perform significant research first (Figure 4(d)). In contrast, how they learn about updates (Figures 4(e)–(g); e.g., through personal contacts or tools), the rate to which they may skip them (Figure 4(h)), and how they declare version constraints on dependencies (Figure 4(i)) depends significantly on the ecosystem. + + +Data mining (Table 7) reveals that file cloning is rare (less than 10% of projects) in every ecosystem in which we measured it; developers instead rely on the package dependency infrastructure (Table 7(e)). Mining also confirmed survey answers about how users of packages chose to constrain the versions of packages they depended on: While Maven almost universally relies on a fixed version number (e.g., package A might depend on precisely version 3.2.1 of package B), other ecosystems typically constrain dependencies to version number ranges (Node.js/NPM, Atom, PHP, and Rust/Cargo), specifying only a minimum version (NuGet, Ruby/RubyGems) or leaving versions unconstrained (Python/PyPi, R/CRAN). Survey and mining results differed for one ecosystem, however: Perl/CPAN users claimed the ecosystem’s typical practice was to specify just the name (43% of respondents) or version range (36%) of dependencies, yet mining of libraries.io revealed nearly 100% use of exact version numbers. This may be a matter of developer perception: libraries.io apparently measures precise dependencies captured in the published repository, but tools such as Dist::Zilla::Plugin::DistINI generate these from less-constrained numbers specified by developers. + + +Universal or distinctive. While there is considerable nuance in the differences among ecosystems, overall our results suggest that there are several values that seem to be universal, at least + + + + +21 https://cran.r-project.org/web/packages/policies.html. +22 https://wiki.debian.org/Apt. +23 https://github.com/commercialhaskell/stackage#frequently asked-questions. +in the 18 ecosystems we surveyed. Chief among these are stability, quality, and community, while compatibility, rapid access, and replicability have achieved a near-universal status. The unique personality of each ecosystem, however, seems to derive from either a few key distinctions (in values or in practices) that set them apart. There are many examples of this, including: + + + + +Bioconductor + and +Eclipse + stand out as coordinating releases on a synchronized and fixed schedule and the survey (Figures 3(i) and (j), Table 9(f)) and valuing +curation + (Figure 2, Table 9(e)). + + +Go + has a distinctive version numbering practice that does not require version updates on all changes (Figure 3(g), Table 9(c)). + + +CRAN + and +Bioconductor + have strict requirements for submission and update of packages (Figure 3(k), Table 9(e)). + + +Lua + developers value +fun +, feel least constrained from making changes in their code, and generally do not coordinate much with others (Figures 3(b),(h), and (i)). + + +Rust + has a strong stance on +openness + and is the least prone to make design compromises for backward compatibility (Figure 3(b) and (c)). Data mining of Cargo projects show they rarely port fixes to earlier code releases (Table 7(g)). + + +CPAN + developers universally claim to write change logs (Figure 3(e)). + + + + +Value differences by ecosystem are statistically significant for each of the values (Kruskal-Wallis, run separately on each value to check if it differs by ecosystem: ( p < 0.00001 ), ( \chi^2 ) ranging from 53.704 for +quality + to 178.69 for +commerce +). + + +Summary of RQ2.1 results: + Stability, quality, community, compatibility, rapid access, and replicability are important across all ecosystems, while openness, curation, standardization, technical diversity are values that are not universal, but differ by ecosystem. Breaking changes are experienced only rarely by any one developer (on the order of yearly), even though they are common within an ecosystem as a whole. Differing ecosystem circumstances lead to great variety in developers’ willingness to make breaking changes, or conversely to compromise their designs to ensure backward compatibility; and in turn consumers’ eagerness to incorporate upstream changes. +---------------------------------------- +------------------------------- +Section 202: +5.3 Study 2 Results: To What Extent Is There a Consensus within Ecosystems about Values and Practices? (RQ2.2) + + +The distribution of value ratings +within + each ecosystem was particularly wide for the values +repli-cability +, +openness +, and +curation +, indicating generally less consensus on these values. There is evidence of broad consensus about the highest ranked value(s) for some ecosystems (Table 10), most conspicuously in cases in which a value clearly aligns with the core purpose of an ecosystem. An illustrative example is Stackage and Cabal/Hackage, two Haskell-based ecosystems, contrasted strongly with each other in +compatibility + and +curation +; participants rated these values as much more important in Stackage than in Hackage/Cabal. Stackage was also rated markedly lower in +rapid access + than all other ecosystems. These values are consistent with the stated goals of Stackage (“to create stable builds of complete package sets”). Stackage is built on top of Cabal for the express purpose of curating compatible sets of versions, while Hackage submissions only require that they be submitted by a developer whose identity has been manually vetted (Table 9(e)). Volunteer curators wait until a set of consistent package versions can be assembled and release them as +Table 10. Values Most Commonly Rated Highest, by Ecosystem + + +| Ecosystem | Top 3 values | Consensus in % | +|-----------------|-------------------------------|----------------| +| Haskell/Stack | compatibility > replicability > curation | 75 55 45 | +| Perl/CPAN | stability > replicability > quality | 64 40 31 | +| Maven | replicability > stability > quality | 64 38 32 | +| Lua/Luarocks | fun > replicability > quality | 64 35 17 | +| Eclipse | stability > compatibility > quality | 62 48 37 | +| NuGet | replicability > compatibility > stability | 59 37 20 | +| Go | quality > stability > fun | 56 37 19 | +| R/Bioconductor | replicability > quality > compatibility | 52 32 26 | +| CocoaPods | quality > stability > compatibility | 52 30 17 | +| Rust/Cargo | replicability > stability > community | 51 31 23 | +| PHP/Packagist | quality > stability > compatibility | 50 32 23 | +| Node/NPM | rapid.access > community > innovation | 50 24 15 | +| Atom | rapid.access > fun > openness | 50 26 17 | +| Erlang | quality > fun > stability | 46 24 18 | +| Haskell/Cabal | quality > innovation > replicability | 43 17 8 | +| Python | replicability > quality > stability | 42 20 14 | +| Ruby | fun = community = rapid.access | 41 18 12 | +| R/CRAN | replicability > compatibility > innovation | 36 20 8 | + + +Consensus Cn is the percent of respondents in each ecosystem who did not rate any value higher than any of the ecosystem’s highest n values. Top three values are listed for each ecosystem; > indicates relative popularity of the values; = indicates ties. + + +A unit, trading rapid release for tested compatibility. The Stackage/Hackage choice is controversial in the Haskell community, which may make their perceived differences in values and practices more visible. + + +A few more examples include: + + + + + + +Maven + is primarily a build tool that comes with a centralized hosting platform for Java packages and was not designed as a collaborative platform. This purpose is reflected in strongly valuing +replicability + but least valuing +community +, +openness +, or +fun +. + + + + + + +Bioconductor + is a platform for scientific computation (specifically, analysis of genomic data in molecular biology) where +replicability + of research results is a key asset, but +commerce + is clearly not a focus. + + + + + + +Lua + is widely used as an embedded scripting language for games; prior work has shown that the culture of game developers is significantly different from that of application developers [58]; for example, game development communities value creativity and communication with designers over rigid specifications, which makes extensive automated testing impractical. + + + + + + +Others, like +R/CRAN +, have markedly less consensus, at least regarding the set of values that we surveyed. + + +Some, but not all, practice differences can be explained by enforced policies or design choices in platform tools. For example, +Node.js/npm + sets a version range for dependencies by default when a dependency is added (Figure 4(i)), +Bioconductor + and the core packages of +Eclipse + have a +synchronized, central release (Figure 3(i) and (j), Table 9(f)), and Bioconductor and CRAN require reviews before packages are included in the repository (Figure 3(k), Table 9(e)). Some practices are supported by optional tooling in the ecosystem, such as tools to create notifications on dependency updates in the Node.js and Ruby community (Figure 4(i); e.g., gemnasium and greenkeeper.io). Other practices seem to be mere community conventions—for example, providing change logs is encouraged in the documentation of CPAN but not enforced, yet the practice is apparently universal (Figure 3(e)). + + +Interestingly, there are some cases of practices with surprisingly little consensus in some ecosystems given what we know about tools and policies in that ecosystem. For example, 26.6% of Node.js respondents indicated that a “package has to meet strict standards to be accepted into the repository” (Figure 3(k)), even though that community’s npm repository does not have any such checks (Table 9(e)) and in fact contains many junk packages. It may be that ecosystem members are not aware of the design space and what practices other ecosystems employ, so they have a biased interpretation of what a “strict standard” is. Alternatively, participants may be members in subcommunities with contrasting values and practices. For example, there may be vetting of revisions among the developers within a specific project or subcommunity that is also hosted on npm. + + +The role of roles. We wanted to explore the possibility that survey respondents’ differences in perceived values and practices may be explained by the role of a respondent in their ecosystem. The ecosystem may appear different depending on one’s responsibilities and perspective. The survey asked people what their role was in the ecosystem: choices were user, committer, submitter, package lead, central package lead (a.k.a. lead+), and founder. We analyzed how core (lead+ and founder) roles differed from the rest within each ecosystem. We suspected that core and peripheral ecosystem participants may have different values, but we found little evidence that that was the case. We tested their ratings on the perceptions of all 11 values and found that only for one value, replicability, was there a statistically significant difference (t-test, p = 0.044, n = 1,504); however, this difference was small (an average rating 3.5 out of 5 for core, 3.68 for non-core, thus a difference of 0.18 scale points), and there was no evidence that value perceptions differed for other values (t-test, p between .13 and .73, n ranging from 1,492 to 1,504). + + +Core people seemed to be more enmeshed in the community than the other roles, in the sense that they were more likely to collaborate with upstream packages ($\chi^2(1, N=932) = 16.571, p < .0001$; 21% more likely to answer yes to the question, “In the last 6 months I have participated in discussions, or made bug/feature requests, or worked on development of another package in + that one of my packages depends on.”) have downstream dependencies ($\chi^2(1, N=925) = 24.132 p < .0001$, 18% more likely to answer yes to the question, “Have you contributed code to an upstream dependency of one of your packages in the last 6 months (one where you’re not the primary developer)?”), and claim to know their users’ needs ($\chi^2(1, N=932) = 62.947 p < .0001$, 29% more likely to answer “Strongly” or “Somewhat agree” to the question, “I know what changes users of + want”). People in core roles felt very slightly more confident in their answers to the community values questions, ($\chi^2(1,N=932) = 6.2247 p < .05$, 8% more likely to answer “Confident” or “Very confident” to the question “How confident are you in your ratings of the values of + above?”); this difference was statistically significant, but not very large. + + +In short, there are a few features that distinguish core community members from the rest, but they seem to be culturally a part of their communities in that they perceive its values to be the same. +Summary of RQ2.2 results: Ecosystems tend to have many of the same values but distinguish themselves by virtue of a few distinctive values strongly related to their purpose and audience. Consensus in practices is largely, but not entirely, driven by the affordances of shared tooling and the policies that they enforce or encourage. Core and peripheral members of the ecosystem community share their ecosystem’s values, but core members are more collaborative in their practices. + + +5.4 Study 2 Results: What Is the Relationship between Values and Practices: The Case of Stability (RQ2.3) + + +One might expect that ecosystems that share similar values would adopt similar practices that support those values, but for most practices that is not the case. We averaged each value and practice answer within each ecosystem to get a summary for each ecosystem of mean answers and looked for correlations between any value and any practice among columns within these 18 rows. There were few strong correlations between values and practices. Out of 418 such value-practice comparisons, only 29 were significantly correlated (Spearman test, ( p < 0.05 )); however, even these may be due to chance: Because of the small sample size (( n = 18 )) and the large number of comparisons, applying a Holm-Bonferroni correction rules out taking any of these correlations as conclusive. + + +The fact that practices are not universally associated with particular values implies that the same value can be associated with the adoption of different practices. For example, of the practices shown in the violin plots above, only one, the perception of the ecosystem’s use of exact version numbers to refer to dependencies (Figure 4(i), choice E), significantly correlated with the perceived value of stability to the ecosystem (Spearman correlation of mean answers within each ecosystem: ( \rho = 0.506, p < .05, n = 18 ) ecosystems). We investigate further this relationship with a comparison of the practices associated with stability in three ecosystems that had high ratings and high consensus for stability: Eclipse, Perl, and Rust (Figure 2 and Table 10). Our survey results indicate that these ecosystems achieved stability with different, sometimes nearly opposite, practices. + + + + +Eclipse: stability through strict standards and gatekeeping. + Eclipse’s leadership very strongly promotes stable plugin APIs. As we mentioned earlier, official developer documentation includes this “prime directive”: “When evolving the Component API from release to release, do not break existing Clients” [25]. Eclipse developers rated stability higher than any other ecosystem, and with the smallest variance in their mean ratings of stability (Figure 2), and strong consensus that stability was the highest value (cf. Table 10). + + + + +Survey answers about practices show that Eclipse relies on gatekeeping (Figure 3(k)) and its developers claim to make design compromises to achieve backward compatibility (Figure 3(c)); they police each others’ backward compatibility and release together when they can be sure they will not break legacy code (Figure 3(i)); developers feel constrained in making changes (Figure 3(b)). + + + + +Rust: stability through dependency versioning and stability attributes. + Rust, in contrast, ranked lowest in design compromises for backward compatibility (Figure 3(c)) and rarely maintains outdated versions, (Table 7(g)), but is high in semantic versioning (Figure 3(f)). Rust’s Cargo infrastructure prevents the use of wildcards for dependency versions, although it allows ranges (Figure 4(i)), which are almost universally used (93.6% of Cargo packages, Table 7(c)). + + + + + + +24Figures 3(b)–(k) and Figures 4(c)–(h) and (j), and the four answers of Figure 4(i) taken separately. +Users were thus prodded to use older versions of dependencies, rather than letting their tools upgrade them automatically and burdening upstream packages with bug reports when things change. Other stability features include a “lock” file that records exact versions of dependencies used by a version (Table 7(f)), and a feature called “stability attributes,” which tag API elements that are guaranteed to be stable, in contrast to new features that might change [80]. + + +Survey results show that Rust developers acknowledged the community’s stated value of stability (Figure 2), despite the fact that participants also perceived the ecosystem’s packages to be in fact relatively unstable (Figure 4(b)). The Rust language developers had been consistent in promising stability for the “stable” branch of the language, to the extent that they test any compiler changes against the entire corpus of Rust programs they can find on GitHub. But their analysis of their community’s 2016 user survey [79] summarized why many users complained about instability: too many packages (“crates”) relied on unstable “nightly” development versions of the compiler to take advantage of interesting new features. They concluded that “consensus formed around the need to move the ecosystem onto the stable language and away from requiring the nightly builds of the compiler.” + + + + +CPAN: stability through centralized testing. Finally, Perl, unlike Rust, is low in semantic versioning (Figure 3(f)), and in fact was the most likely ecosystem to claim they refer to dependencies by name only, not version number (Figure 4(i)). They indicate some gatekeeping and design compromises but not to the extent of Eclipse (Figure 3(c) and (k)). However, in response to the open-ended question about what other values were not covered by the survey, 12 (40%) of 30 Perl/CPAN participants who gave comments mentioned testability, many referring to Perl’s extensive battery of tests run on CPAN packages by volunteers; one explicitly claimed this test facility helped with the stability of Perl packages. CPAN stages changes and releases packages together (Table 9(f)), almost entirely specifying fixed version numbers of their dependencies (Table 7(a)). A Haskell/Hackage participant mentioned CPAN’s kwalitee metric, an operationalization of quality employed by these testing facilities, and attributed it to the ecosystem’s “focus on stability and compatibility.” + + + + +The three ecosystems work towards stability in very different ways. Eclipse, with its long-standing corporate support, is able to dictate that upstream developers pay the cost of maintaining backward compatibility; Rust/Cargo, although users clamor for stability, is eager to attract developers, and cannot impose the cost of stability by fiat as in Eclipse; instead, they apply gentle pressure to upstream developers in various ways, while easing the pressure from downstream developers by discouraging automatic major updates. CPAN, finally, has a large cadre of volunteers (CPAN Testers) and built infrastructure taking on the task of thorough testing. + + +This comparison of stability practices demonstrates that the relationships between practices and values are context-dependent and thus hard to generalize. A comprehensive theory incorporating such insights is a task for future work. We hope our dataset and the questions it suggests provide a useful launching point. Contrasts revealed by the survey are ripe for further investigation: researchers can find appropriate subjects for case studies of values being pursued in contrasting ways, or, conversely, practices associated with contrasting values. In this case, analyzing the differences between these three ecosystems suggests that the theory of how practices can further values should take into account other factors, including the presence, availability, and motivations of different kinds of developers. This should be confirmed, however, with more exhaustive study of + + + + +25 Testability was not a value we surveyed, but we recommend it as a new value in an expanded list, since many survey takers suggested it. +these and other ecosystems and with other practice contrasts. Ecosystem communities dissatisfied with their practices can themselves use it as a starting place to find alternative combinations of practices that others are using. + + +Summary of RQ2.3 results: + Many ecosystems have clear distinctions in a few key values and practices. Often the consensus on important values is high; some practices are actually enforced by policies and platform tools. However, some values, particularly +quality +, are nearly universal value for software engineers with little variance among ecosystems. Breaking changes are also generally avoided, though the strategies of how this is achieved and as how difficult it is perceived to be depends on the specifics of the ecosystem. +---------------------------------------- +------------------------------- +Section 203: +6 DISCUSSION AND FUTURE WORK + + +Our article makes several contributions toward understanding how ecosystems go about the critical task of managing breaking changes and how those practices reflect the culture and values of the ecosystem participants. Study 1 contributes a qualitative accounting of the very different ways that three contrasting ecosystems manage change and how these differences relate to different values and different ideas about which classes of participants should bear the costs. Prior work [19, 36, 67, 72] has examined particular practices for change management and noted the prevalence of breaking changes [22, 48, 54, 90]. Our contribution is to characterize the types of change negotiation practices found in three different ecosystems, show how these different sets of practices require varying amounts of effort from different classes of ecosystem participants. We also show how these different sets of practices reflect ecosystem values about the software, the community, and which community needs take precedence. Study 2 builds on this, examining practices and values in a larger set of 18 ecosystems. We find that some values appear to be universal or nearly so, within this set of ecosystems, perhaps reflecting a broader open source culture. Other values show considerable divergence, which appears to be a substantial component of ecosystems’ distinctive “personalities.” Within ecosystems, some values appear to reflect a consensus among participants, while views of others are highly variable, perhaps reflecting diverse views of subsets of projects or individuals, rather than ecosystem-wide values. We also show that the relationship between practices and values is not simple, and we illustrate the apparent nature of such relationships by contrasting the very different practices that several ecosystems employ in pursuit of stability, which all of them value highly. + + +In the following subsections, we outline new and interesting research questions brought to light by this work. +---------------------------------------- +------------------------------- +Section 204: +6.1 When Are Practices in Conflict or Complementary? + + +It seems highly unlikely that practices can be treated as independent of one another. If an ecosystem is considering adopting a new practice, e.g., to enhance stability, the outcome of trying to implement various stability-enhancing practices is likely to be contingent on the set of other practices already in place. For example, introducing semantic versioning to signal breaking changes would not make sense where snapshot consistency (current versions of everything must be compatible) is already enforced. Complementarity is the other side of the coin: Certain practices may be more effective if certain other practices are adopted as well. For example, centralized testing is likely to be more effective where an ecosystem has a repository with strong gatekeeping mechanism, and a norm that dissuades developers from using alternative repositories. + + +We suspect that many conflicts and complementarities among practices are much more subtle, and greater insight into these relations among practices would be very helpful to clarifying feasible +paths for achieving ecosystem goals. Our survey data contains many starting points for investigations; for example, by allowing researchers to identify ecosystems with various combinations of values and practices as targets for further exploration. + + +6.2 Assimilation or Ecosystem Selection? + + +Our survey indicates that developers’ personal values usually align well with the values of ecosystems (Figure 2) in which they operate. Understanding how this alignment comes about would help to predict the outcome of attempted interventions and design interventions more likely to be effective. There are at least two major possibilities. Developers may join ecosystems for reasons unrelated to values, e.g., the application domain or technical characteristics of the software. Being exposed to the ecosystem values, they may then assimilate over time, adapting their behavior and personal values to what they experience around them. However, the alignment may come about primarily through value-based selection, where developers join ecosystems because they resonate with the system’s values. + + +These two possibilities will often carry different implications for interventions. If developers tend to assimilate the ecosystem’s values, an existing community might be steered toward different practices and expect that developers will adapt over time. In contrast, if developers pick ecosystems based on compatible values, then this would likely mean that substantial changes would attract new value-aligned developers but risk significant disruption if long-term contributors rebel or leave. While one might expect some degree of both selection and assimilation, understanding which values and practices are more easily adapted, and which tend to be resistant to change, could be a big help in designing effective interventions. + + +Our survey data does not provide insights into causation, but it can provide starting points for further investigations and can be combined with external data to approach the questions. We took a small step in this direction to illustrate some of the possibilities. If developers tend to assimilate practices and values from those around them, we would expect values and practices to be shared more among ecosystems with relatively large overlap of participating developers than in those with a relatively small overlap. As a preliminary study, we investigated whether ecosystems that share many developers(^{26}) have similar practices or values. Over all pairs of ecosystems, we found a sizable correlation between similarity of average responses on ecosystem practice questions (those depicted in Figures 3 and 4), and overlap in committers to those ecosystems (Spearman (\rho = 0.341, p < .00001, n = 289) pairs of ecosystems, correlating average perceived ecosystem value for each pair of ecosystems with developer overlap between them). Interestingly, perceived values of the ecosystem do not seem to align with developer overlap ((\rho = -0.05, p = 0.44, n = 289), correlating average personal value for each pair of ecosystems with developer overlap between them). + + +While a number of interpretations of these relationships are possible, the data are consistent with the idea that practices diffuse among ecosystems that have large developer overlap, but values do not. Future work using time series data about developer overlap and historic participation in ecosystems would allow researchers to identify specific developers that moved to ecosystems with different or similar practices and values (according to our survey data) and use interviews, surveys, or data mining to see if and how their behavior changed. + + +(^{26})To measure developer overlap, we assembled a list of all packages in each ecosystem from libraries.io, Cargo.io, and LuaRocks.com, and we identified Eclipse plugins as non-fork packages in GitHub containing a “plugin.xml” file. Using the authors of commits to those packages’ github projects as archived by Mockus [57], we counted what percent of each ecosystem’s contributors also contributed to each other ecosystem. We excluded Bioconductor, because we had no clear mapping to GitHub repositories. +6.3 When Are Attempted Changes Broadly Adopted? + + +Collecting cases of effective and ineffective past changes in ecosystems can help to understand the conditions that favor broadly adopted changes. Examples of attempted policy or practices changes can often be found through surveys. In our survey, text answers about contrasting ecosystems often explained how practices were deliberately designed. Five Perl developers, for example, described how an extensive centralized testing infrastructure (CPAN Testers) was added to improve the quality and compatibility of CPAN modules. Perhaps beginning with our results and then conducting new interviews or surveys, it should be possible to unearth many examples of attempted change and to determine the outcome. A second approach could identify conflicts between values and practices to suggest ineffective changes. In the case of Rust, for example, the high value of stability (Figure 3(a)), but also high perception of instability (Figure 4(b)) led us to investigate Rust’s struggle, as mentioned above, to promote practices leading to stable versions of libraries despite the community’s eagerness to innovate with new features. + + +In Edgar Schein’s work on organizational culture, his recommendations [70, p. 323ff] for changing an organization include strong role models for new behaviors, lowering learning anxiety, and raising survival anxiety (i.e., making people confident that they can learn new practices and aware that the community will fail if they do not). Elements of this advice are visible in the practices of ecosystems that have tried to change their values. In Rust, for example, the compiler team models stability practices that packages might follow [80]. Rust’s stability attributes for packages may reduce learning anxiety by making it easier for downstream users to create stable interfaces, and Rust’s annual survey helps developers see each others’ agreement that there are problems with stability. +---------------------------------------- +------------------------------- +Section 205: +7 CONCLUSION + + +While managing change has long been an important topic in software engineering, it is particularly interesting in the context of open source ecosystems, since projects tend to be highly interdependent yet independently maintained. The variety of practices used to manage change is considerable, but perhaps most interestingly is what we might think of as the political dimension in the selection of practices. Whose interests are served by the adoption of one set of practices rather than others? How are the costs (primarily effort) distributed over types of ecosystem participants? What values to these practices actually serve? + + +We have attempted to provide a somewhat detailed description of practices used in three ecosystems, as well as a broader characterization of 18 ecosystems. We believe these studies just scratch the surface, however, and much work remains to be done in understanding how practices fit with values, and with each other, and how effective changes can be made to address ecosystem weaknesses. We hope through this work, and through the data we are making publicly available, to have contributed to a better understanding of these issues. + + +APPENDICES + + +A STUDY 1 INTERVIEW PROTOCOL + + +The following lists the questions from our interview script. We did not ask each question to each interviewee, but instead we directed them towards areas where they had personal experience. Given our iterative approach, some questions in this script were added or modified after earlier interviews. + + +For maintainers of upstream packages: + + + + +Why do you work on +? + + +Do you have any plan or strategy for how the interface of + will evolve as people come to depend on it? +Think about a recent larger change in your project. Was it backward-compatible? What impact did you expect it would have on packages that depend on + +? + + + + +Follow up: Did you consider alternative ways of making + + that would have more or less impact on users of + +? + + +Follow up: If you had not made + +, what would have happened differently for + +’s future? + + +Follow up: What is your position on backward compatibility? + + +Does the platform help/hinder you in evolution decisions as in + +? What if the platform had mechanism + +? + + +For developers with upstream dependencies: + + +Why do you work on + +? + + +If there’s a useful looking package that claims to provide some functionality you need, how do you decide whether to adopt it? + + +What’s your general strategy for choosing which version of a package to depend on? + + +When do you think it’s reasonable and expected for a package to change its interface? + + +Do you prefer a stable but stale or a rapidly evolving but unstable dependency? What rate of interface change is too often? + + +Is it a burden to have too many dependencies for a project? + + +Can you give an example of a package you’ve considered, and felt like its stability was a consideration (positively or negatively)? + + +How do you keep up with changes to packages you depend on? + + +When + + happened in + +, how did you first find out about it? + + +Are you ever watching for development activity between releases? + + +Are you using the Github notification mechanism and why/why not? + + +If you could have an ideal notification system to get important changes: What would such system look like, what changes would it notify you about? + + +Did you think + + was an appropriate change, or should they have left it alone? + + +For developers having experience working on the platform, we asked questions about specific policies, their intentions, and their consequences. Here are some example questions about CRAN: + + +CRAN differs from some other repositories in that it asks package authors to notify reverse dependency packages before submitting an update that breaks its API. + + +—Was there anything specific that precipitated that policy? +—Did you consider other options for solving the problem? What were the tradeoffs you thought about? +—How successful has that policy been so far? + + +More generally, CRAN has stricter requirements for authors than some other package repositories do. What factors does the CRAN team take into consideration when deciding if a quality standard is worth the effort of instituting and enforcing? + + +Bioconductor does coordinated releases of all the packages at once, while CRAN lets packages update on their own schedule. + + +—How and why did the two repositories end up having different policies? +—What have been the consequences for the two repositories? +—Will they likely stay that way? + + +CRAN makes it easy to install only the latest version of a package; some repositories let users install old versions. Why is it done that way? +• CRAN has more permissive expectations about version number changes than some platforms. Has the current system been sufficient, or have you considered altering the policies about numbering? + + +• Can you tell me something about how potential breaking changes are handled among the developers of the base and recommended packages? + — How do developers communicate to coordinate and synchronize changes? + — Does it work differently for base and recommended than among ordinary packages in the CRAN repository? + + +B STUDY 2 SURVEY QUESTIONS + + +For transparency and replicability, we list all evaluated questions of the survey including their exact phrasing. We exclude a small number of questions about power structures, community health, and motivation that we have not used in this article. + + +Part I: Ecosystem. + + +• Please choose ONE software ecosystem + in which you publish a package +*. If you don’t publish any packages, then pick an ecosystem whose packages you use. + “Software ecosystem” = a community of people using and developing packages that can depend on each other, using some shared language or platform + * “Package”: A distributable, separately maintained unit of software. Some ecosystems have other names for them, such as “libraries,” “modules,” “crates,” “cocoapods,” “rocks,” or “goodies,” but we’ll use “package” for consistency. + [selection or textfield, substituted for + in remainder of survey] + + +Ecosystem Role. + + +• Check the statement that best describes your role in this ecosystem. + — I’m a founder or core contributor to + (i.e., its language, platform, or repository). + — I’m a lead maintainer of a commonly-used package in +. + — I’m a lead maintainer of at least one package in +. + — I have commit access to at least one package in +. + — I have submitted a patch or pull request to a package in +. + — I have used packages from + for code or scripts I’ve written. + + +• About how many years have you been using + in any way? + — < 1 year + — 1–2 years + — 2–5 years + — 5–10 years + — 10–20 years + — > 20 years + + +Ecosystem values. + + +• How important do you think the following values are to the + community? (Not to you personally; we’ll ask that separately.) [See Section 3.3.2 for the 11 value questions; results shown in Figure 2.] + + +• How confident are you in your ratings of the values of + above? + — Not confident + — Slightly confident +— Confident +— Very confident + + +• Is there some other value the + community emphasizes that was not asked above? If so, describe it here: + + +Part II: Package. + + +• In the following, we are going to ask about your experience working on one particular package. Please think of one package in + you have contributed to recently and are most familiar with. If you haven’t contributed to a package in +, then name some software you’ve written that relies on packages in + packages. You may use a pseudonym for it if you are concerned about keeping your responses anonymous. — [text fields, substituted for + in remainder of survey] + + +• Do you submit the package you chose to a/the repository associated with +? (Choose “no” if the ecosystem does not have its own central repository.) — [yes/no] + + +• Is there any software maintained by other people that depends on the package you chose? — [yes/no] + + +• Is the package you chose installed by default as part of a standard basic set of packages or platform tools? — [yes/no] + + +• How important are each of these values in development of + to you personally? [See Section 3.3.2 for the 11 value questions.] + + +• (OPTIONAL) Is there some other value important to you personally for + which was not mentioned? — [text fields] + + +• How often do you face breaking changes from any upstream dependencies (that require rework in +)? [Results shown in Figure 4(a).] + — Never + — Less than once a year + — Several times a year + — Several times a month + — Several times a week + — Several times a day + + +• How often do you make breaking changes to +? (i.e., changes that might require end-users or downstream packages to change their code) — [frequency scale as above][Results shown in Figure 3(a).] + + +Making changes to +. + + +• I feel constrained not to make too many changes to + because of + • potential impact on users. [Results shown in Figure 3(b).] + — Strongly agree + — Somewhat agree + — Neither agree nor disagree + — Somewhat disagree + — Strongly disagree + — I don’t know + + +• I know what changes users of + want. — [agreement+don’t know scale as above] + + +• If I have multiple breaking changes to make to +, I try to batch them up into a single release. — [agreement+don’t know scale as above][Results shown in Figure 3(d).] + + +• I release + on a fixed schedule, which + users are aware of. — [agreement+don’t know scale as above][Results shown in Figure 3(j).] +• Releases of + are coordinated or synchronized with releases of packages by other authors. — [agreement+don’t know scale as above][Results shown in Figure 3(i).] + + +• When working on +, I make technical compromises to maintain backward compatibility for users. — [agreement+don’t know scale as above][Results shown in Figure 3(c).] + + +• When working on +, I often spend extra time working on extra code aimed at backward compatibility. (e.g., maintaining deprecated or outdated methods) — [agreement+don’t know scale as above] + + +• When working on +, I spend extra time backporting changes, i.e., making similar fixes to prior releases of the code, for backward compatibility. — [agreement+don’t know scale as above] + + +Releasing Packages. + + +• A large part of the community releases updates/revisions to packages together at the same time. — [agreement+don’t know scale as above] + + +• A package has to meet strict standards to be accepted into the repository. — [agreement+don’t know scale as above][Results shown in Figure 3(k).] + + +• Most packages in + will sometimes have small updates without changing the version number at all. — [agreement+don’t know scale as above] + + +• Most packages in + with version greater than 1.0.0 increment the leftmost digit of the version number if the change might break downstream code. — [agreement+don’t know scale as above] + + +• I sometimes release small updates of + to users without changing the version number at all. — [agreement scale, without “don’t know”][Results shown in Figure 3(g).] + + +• For my packages whose version is greater than 1.0.0, I always increment the leftmost digit if a change might break downstream code (semantic versioning). — [agreement as above][Results shown in Figure 3(f).] + + +• When making a change to +, I usually write up an explanation of what changed and why (a change log). — [agreement as above][Results shown in Figure 3(e).] + + +• When working on +, I usually communicate with users before performing a change, to get feedback or alert them to the upcoming change. — [agreement as above][Results shown in Figure 3(h).] + + +• When making a breaking change on +, I usually create a migration guide to explain how to upgrade. — [agreement as above] + + +• After making a breaking change to +, I usually assist one or more users individually to upgrade. (e.g., reaching out to affected users, submitting patches/pull requests, offering help) — [agreement as above] + + +Part IV: Dependencies. + + +• In the last 6 months I have participated in discussions, or made bug/feature requests, or worked on development of another package in + that one of my packages depends on. — [yes/no] + + +• Have you contributed code to an upstream dependency of one of your packages in the last 6 months (one where you’re not the primary developer)? — [yes/no] + + +• About how often do you communicate with developers of packages you depend on (e.g., participating in mailing lists, conferences, Twitter conversations, filing bug reports or feature requests, etc.)? — [frequency scale, as above][Results shown in Figure 4(f).] +For most dependencies that my packages rely on, the way I typically become aware of a change to the dependency that might break my package is: + + + + +I read about it in the dependency project’s internal media (e.g., dev mailing lists, not general public announcements) — [agreement scale, as above] + + +I read about it in the dependency project’s external media (e.g., a general announcement list, blog, Twitter, etc) — [agreement scale, as above] + + +A developer typically contacts me personally to bring the change to my attention — [agreement scale, as above][Results shown in Figure 4(e).] + + +Typically I get a notification from a tool when a new version of the dependency is likely to break my package — [agreement scale, as above][Results shown in Figure 4(f).] + + +Typically, I find out that a dependency changed because something breaks when I try to build my package. — [agreement scale, as above][Results shown in Figure 4(g).] + + +How do you typically declare the version numbers of packages that + depends — [Results shown in Figure 4(i).] + + +I specify an exact version number + + +I specify a range of version numbers, e.g., 3.x.x, or [2.1 through 2.4] + + +I specify just a package name and always get the newest version + + +I specify a range or just the name, but I take a snapshot of dependencies (e.g., shrinkwrap, packrat) + + +What is the common practice in + for declaring version numbers of dependencies? — [same scale as previous + “don’t know”] + + + + +Using or avoiding dependencies. + + + + +When adding a dependency to +, I usually do significant research to assess the quality of the package or its maintainers, before relying on a package that seems to provide the functionality I need. — [agreement scale, as above][Results shown in Figure 4(d).] + + +It’s only worth adding a dependency if it adds a substantial amount of value. — [agreement scale, as above][Results shown in Figure 4(c).] + + +I often choose NOT to update + to use the latest version of its dependencies. — [agreement scale, as above][Results shown in Figure 4(h).] + + +When adding a dependency, I usually create an abstraction layer (i.e., facade, wrapper, shim) to protect internals of my code from changes. — [agreement scale, as above] + + +When working on +, I often copy or rewrite segments of code from other packages into my package, to avoid creating a new dependency. — [agreement scale, as above] + + +When working on +, I must expend substantial effort to find versions of all my dependencies that will work together. — [agreement scale, as above] + + +(OPTIONAL) Compare + with other ecosystems you’ve used or heard about – does one have some features that the other should adopt? If so, name the other ecosystem(s) and describe the feature(s). — [text field] + + +(OPTIONAL) Why do you think people chose to design these other ecosystem(s) differently from +? — [text field] + + + + +Part V: Demographics and motivations. + + + + +Age + + +18–24 + + +25–34 +— 35–44 +— 45–54 +— 55–64 +— 65+ +• Gender — [male/female/other] +• Formal computer science education/training + — None + — Coursework + — Degree +• How many years have you been contributing to open source? (in any way, including writing code, documentation, engaging in discussions, etc) — [same time scale as “years used ecosystem” above] +• How many years have you been developing or maintaining software? — [same as previous] +• (OPTIONAL) Is there anything else we should have asked, that would help us better understand your experience with community values and breaking changes in + If so, tell us about it: — [text field] + + + + +C SUGGESTED SET OF VALUES FOR FUTURE STUDIES + + +We propose the following list of values that appear to distinguish software ecosystems. They are derived from Study 1 results plus examination of ecosystem webpages, then modified based on survey results, adding values that were suggested by survey respondents (Standardization, Technical Diversity, Usability, and Social Benevolence), and removing one that does not distinguish meaningfully among developers or ecosystems (Quality). + + +• +Stability: + Backward compatibility, allowing seamless updates (“do not break existing clients”). +• +Innovation: + Innovation through fast and potentially disruptive changes. +• +Replicability: + Long-term archival of current and historic versions with guaranteed integrity, such that exact behavior of code can be replicated. +• +Compatibility: + Protecting downstream developers and end-users from struggling to find a compatible set of versions of different packages. +• +Rapid Access: + Getting package changes through to end-users quickly after their release (“no delays”). +• +Commerce: + Helping professionals build commercial software. +• +Community: + Collaboration and communication among developers. +• +Openness + and Fairness: ensuring that everyone in the community has a say in decision-making and the community’s direction. +• +Curation: + Selecting a set of consistent, compatible packages that cover users’ needs. +• +Fun and personal growth: + Providing a good experience for package developers and users. +• +Standardization: + Promote standard tools and practices, limiting developers choice to save them time and effort. +• +Technical Diversity: + Allowing developers freedom to develop and interact in a diversity of ways. +• +Usability: + Ensuring that tools and libraries are easy for developers to use; ensuring resulting software is easy for end-users to use. +• +Social Benevolence: + An ethical community empowering others by making software and other resources available. +---------------------------------------- +------------------------------- +Section 206: +D LOCK FILE NAMES IN EACH ECOSYSTEM + + +| Ecosystem | Lock file | Notes | +|----------------------------|----------------------------|----------------------------------------------------------------------| +| Atom (plugins) | package-lock.json, npm-shinkwrap.json | (see Node.js/NPM below) | +| CocoaPods | podfile.lock | | +| Eclipse (plugins) | N/A | This function would be done within the project’s regular metadata files (plugin.xml and pom.xml) and so could not be measured readily with this technique | +| Erlang,Elixir/Hex | mix.lock | | +| Go | GoPkg.lock, vendor/ | Preceding the GoPkg.lock file, a canonical method of locking down dependency versions was to simply include a snapshot of their source code; so we looked for a "vendor/" directory in the project. | +| Haskell (Cabal/Hackage) | cabal.config | | +| Haskell (Stack/Stackage) | cabal.config | Although possible, this was never used since Stackage’s main distinguishing feature is to constrain the versions of a set of packages | +| Lua/Luarocks | N/A | We could not find evidence of a canonical or even common practice way of locking down Lua versions | +| Maven | N/A | This function would be done within the project’s regular metadata file (pom.xml) and so could not be measured readily with this technique | +| Node.js/NPM | package-lock.json, npm-shinkwrap.json | These are both npm lockfiles with some semantic differences; npm-shrinkwrap is intended to be published; package-lock is not; however, both can be found in GitHub projects. | +| NuGet | project.lock.json | The NuGet blog suggests saving this file to a repository to lock in dependency versions. | +| Perl/CPAN | cpanfile.snapshot | We could not find evidence of a canonical way to do this in CPAN, but one recommendation was a third-party package called Carton that creates this snapshot file. | +| PHP/Packagist | composer.lock | | +| Python/PyPi | N/A | We could not find evidence of a canonical way to do this in Pypi; a StackOverflow post suggested that there are several nonstandard alternatives. | +| R/Bioconductor | packrat.lock | Not canonically standard, but common and well-known. However, it is mostly irrelevant for Bioconductor, since a set of mutually compatible packages are released as a unit. | +| R/CRAN | packrat.lock | Not canonically standard, but common and well-known. | +| Ruby/Rubygems | Gemfile.lock | | +| Rust/Cargo | Cargo.lock | | + + +27 +https://docs.npmjs.com/files/package-lock.json +. +28 +https://blog.nuget.org/20181217/Enable-repeatable-package-restores-using-a-lock-file.html +. +29 +https://metacpan.org/pod/Carton +. +30 +https://stackoverflow.com/questions/8726207/what-are-the-python-equivalents-to-rubys-bundler-perls-carton +. +ACKNOWLEDGMENTS + + +We want to thank Audris Mockus and the WoC project at University of Tennessee, Knoxville, for access to the WoC archive [57] for data mining, and the many people interviewed and surveyed, and those who helped with the design and promotion of the survey. + + +REFERENCES + + +[1] Pietro Abate, Roberto DiCosmo, Ralf Treinen, and Stefano Zacchiroli. 2011. MPM: A modular package manager. In Proceedings of the International Symposium on Component Based Software Engineering (CBSE’11). ACM Press, New York, 179–188. DOI: https://doi.org/10.1145/2000229.2000255 + + +[2] Rabe Abdalkareem. 2017. Reasons and drawbacks of using trivial npm packages: The developers’ perspective. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). ACM, New York, NY, 1062–1064. + + +[3] Cyrille Artho, Kuniyasu Suzaki, Roberto Di Cosmo, Ralf Treinen, and Stefano Zacchiroli. 2012. Why do software packages conflict? IEEE International Working Conference on Mining Software Repositories, 141–150. + + +[4] Anat Bardi and Shalom H. Schwartz. 2003. Values and behavior: Strength and structure of relations. Personal. Soc. Psychol. Bull. 29, 10 (2003), 1207–1220. + + +[5] Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2015. How the Apache community upgrades dependencies: An evolutionary study. Empir. Softw. Eng. 20, 5 (2015), 1275–1317. + + +[6] Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to break an API: Cost negotiation and community values in three software ecosystems. In Proceedings of the International Symposium Foundations of Software Engineering (FSE’16). ACM Press, New York. + + +[7] Christopher Bogart, Anna Filippova, James Herbsleb, and Christian Kastner. 2017. Culture and Breaking Change: A Survey of Values and Practices in 18 Open Source Software Ecosystems. DOI: https://doi.org/10.1184/R1/5108716.v1 + + +[8] Shawn A. Bohner and Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA. + + +[9] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualit. Res. Psychol. 3, 2 (2006), 77–101. DOI: https://doi.org/10.1191/1478088706qp063oa + + +[10] A. Brito, L. Xavier, A. Hora, and M. T. Valente. 2018. Why and how Java developers break APIs. In Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). 255–265. + + +[11] Javier Luis Cánovas Izquierdo and Jordi Cabot. 2015. Enabling the definition and enforcement of governance rules in open source systems. Proceedings of the International Conference on Software Engineering (ICSE’15). 505–514. DOI: https://doi.org/10.1109/ICSE.2015.184 + + +[12] Jaepil Choi and Heli Wang. 2007. The promise of a managerial values approach to corporate philanthropy. J. Bus. Ethics 75, 4 (2007), 345–359. + + +[13] Juliet Corbin and Anselm Strauss. 2014. Criteria for evaluation. In Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory (3rd ed.). Sage Publications, Inc. + + +[14] Bradley E. Cossette and Robert J. Walker. 2012. Seeking the ground truth: A retroactive study on the evolution and migration of software libraries. In Proceedings of the International Symposium Foundations of Software Engineering (FSE’12). ACM Press, New York, 55. + + +[15] John W. Creswell and J. David Creswell. 2014. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (4th ed.). Sage Publications. + + +[16] Mary Crossan, Daina Mazutis, and Gerard Seijts. 2013. In search of virtue: The role of virtues, values and character strengths in ethical decision making. J. Bus. Ethics 113, 4 (2013), 567–581. + + +[17] Laura Dabish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: Transparency and collaboration in an open software repository. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW’12). 1277–1286. + + +[18] Barthélémy Dagenais and Martin P. Robillard. 2010. Creating and evolving developer documentation: Understanding the decisions of open source contributors. In Proceedings of the ACM International Symposium on Foundations of Software Engineering. 127–136. DOI: https://doi.org/10.1145/1882291.1882312 + + +[19] Cleidson R. B. de Souza and David F. Redmiles. 2008. An empirical study of software developers’ management of dependencies and changes. In Proceedings of the International Conference on Software Engineering (ICSE’08). + + +[20] Cleidson R. B. De Souza and David F. Redmiles. 2009. On the roles of APIs in the coordination of collaborative software development. Comput. Supp. Coop. Work 18, 5-6 (2009), 445–475. DOI: https://doi.org/10.1007/s10606-009-9101-3 + + +[21] Alexandre Decan, Tom Mens, Maëlick Claes, and Philippe Grosjean. 2016. When GitHub meets CRAN: An analysis of inter-repository package dependency problems. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering. 493–504. DOI: https://doi.org/10.1109/SANER.2016.12 +[22] Alexandre Decan, Tom Mens, and Maëlick Claes. 2017. An empirical comparison of dependency issues in OSS packaging ecosystems. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’17). + + +[23] Dedoose. 2016. Version 7.0.23. Web Application for Managing, Analyzing, and Presenting Qualitative and Mixed Method Research Data. SocioCultural Research Consultants, LLC, Los Angeles, CA. Retrieved from www.dedoose.com + + +[24] Jim des Rivières. 2005. API First. Retrieved from http://www.eclipsecon.org/2005/presentations/EclipseCon2005_12.2APIFirst.pdf + + +[25] Jim des Rivières. 2007. Evolving Java-based APIs. Retrieved from https://wiki.eclipse.org/Evolving_Java-based_APIs + + +[26] Jens Dietrich, David J. Pearce, Jacob Stringer, and Kelly Blincoe. 2019. Dependency versioning in the wild. In Proceedings of the Conference on Mining Software Repositories (MSR’19). 349–359. DOI: https://doi.org/10.1109/MSR.2019.00061 + + +[27] Don A. Dillman, Jolene D. Smyth, and Leah Melani Christian. 2014. Internet, Phone, Mail, and Mixed-mode Surveys: The Tailored Design Method. John Wiley & Sons. + + +[28] Alexander Eck. 2018. Coordination across open source software communities: Findings from the rails ecosystem. In Tagungsband Multikonferenz Wirtschaftsinformatik (MKWI’18). 109–120. + + +[29] Stephen G. Eick, Todd L. Graves, Alan F. Karr, J. S. Marron, and Audris Mockus. 2001. Does code decay? Assessing the evidence from change management data. IEEE Trans. Softw. Eng. 27, 1 (Jan. 2001), 1–12. DOI: https://doi.org/10.1109/32.895984 + + +[30] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston, MA. + + +[31] R. Stuart Geiger. 2017. Summary analysis of the 2017 GitHub open source survey. CoRR abs/1706.02777 (2017). + + +[32] Gemnasium. 2017. Gemnasium. Retrieved on 28 April, 2021 from https://web.archive.org/web/20180324121439/https://gemnasium.com/ + + +[33] Mohammad Gharehyazie, Baishakhi Ray, and Vladimir Filkov. 2017. Some from here, some from there: Cross-project code reuse in GitHub. In Proceedings of the IEEE International Working Conference on Mining Software Repositories. 291–301. DOI: https://doi.org/10.1109/MSR.2017.15 + + +[34] GitHub, Inc. 2017. Open Source Survey 2017. Retrieved from http://opensourcesurvey.org/2017/ on 4/28/2021. + + +[35] The Neighbourhoodie Software GmbH. 2017. Greenkeeper.io. Retrieved on 28 April, 2021 from https://web.archive.org/web/20180224075015/https://greenkeeper.io/ + + +[36] Johannes Henkel and Amer Diwan. 2005. CatchUp!: Capturing and replaying refactorings to support API evolution. In Proceedings of the International Conference on Software Engineering (ICSE’05). ACM Press, New York, 274–283. + + +[37] Steven Hitlin and Jane Allyn Piliavin. 2004. Values: Reviving a dormant concept. Ann. Rev. Sociol. 30, 1 (2004), 359–393. + + +[38] Reid Holmes and Robert J. Walker. 2010. Customized awareness: Recommending relevant external change events. In Proceedings of the International Conference on Software Engineering (ICSE’10). ACM Press, New York, 465–474. DOI: https://doi.org/10.1145/1806799.1806867 + + +[39] Daqing Hou and Xiaojia Yao. 2011. Exploring the intent behind API evolution: A case study. In Proceedings of the Working Conference on Reverse Engineering (WCRE’11). IEEE Computer Society, Los Alamitos, CA, 131–140. + + +[40] Marco Iansiti and Roy Levien. 2004. The Keystone Advantage: What the New Dynamics of Business Ecosystems Mean for Strategy, Innovation, and Sustainability. Harvard Business Press, Boston, MA. + + +[41] Javier Luis Cánovas Izquierdo and Jordi Cabot. 2015. Enabling the definition and enforcement of governance rules in open source systems. In Proceedings of the International Conference on Software Engineering (ICSE’15). IEEE, 505–514. + + +[42] Steven J. Jackson, David Ribes, Ayse G. Buyuktur, and Geoffrey C. Bowker. 2011. Collaborative rhythm: Temporal dissonance and alignment in collaborative scientific work. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW’11). 245–254. + + +[43] Slinger Jansen and Michael A. Cusumano. 2013. Defining software ecosystems: A survey of software platforms and business network governance. In Software Ecosystems: Analyzing and Managing Business Networks in the Software Industry. Edward Elgar Publishing. + + +[44] Puneet Kapur, Brad Cossette, and Robert J. Walker. 2010. Refactoring references for library migration. In Proceedings of the International Conference on Object-oriented Programming, Systems, Languages and Applications (OOPSLA’10). ACM Press, New York, 726–738. DOI: https://doi.org/10.1145/1869459.1869518 + + +[45] Smitha Keertipati, Sherlock A. Licorish, and Bastin Tony Roy Savarimuthu. 2016. Exploring decision-making processes in Python. In Proceedings of the International Conference on Evaluation and Assessment in Software Engineering. ACM, 43. + + +[46] Riivo Kikas, Georgios Gousios, Marlon Dumas, and Dietmar Pfahl. 2017. Structure and evolution of package dependency networks. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR’17). IEEE Press, Piscataway, NJ, 102–112. +[47] Daniel Le Berre and Pascal Rapicault. 2009. Dependency management for the eclipse ecosystem: Eclipse P2, metadata and resolution. In Proceedings of the International Workshop on Open Component Ecosystems (IWOCE’09). 21–30. DOI: https://doi.org/10.1145/1595800.1595805 + + +[48] Mario Linares-Vásquez, Gabriele Bavota, Carlos Bernal-Cárdenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2013. API change and fault proneness: A threat to the success of Android apps. In Proceedings of the European Software Engineering Conference/Foundation of Software Engineering (ESEC/FSE’13). ACM Press, New York, 477–487. + + +[49] Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: A map of code duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA (2017), 1–28. DOI: https://doi.org/10.1145/3133908 + + +[50] Mircea F. Lungu. 2009. Reverse Engineering Software Ecosystems. Ph.D. Dissertation. University of Lugano. + + +[51] Fabio Mancinelli, Jaap Boender, Roberto Di Cosmo, Jerome Vouillon, Berke Durak, Xavier Leroy, and Ralf Treinen. 2006. Managing the complexity of large free and open source package-based software distributions. 199–208. DOI: https://doi.org/10.1109/ASE.2006.49 + + +[52] Konstantinos Manikas. 2016. Revisiting software ecosystems research: A longitudinal literature study. J. Syst. Softw. 117 (2016), 84–103. + + +[53] Michael Mattsson and Jan Bosch. 2000. Stability assessment of evolving industrial object-oriented frameworks. J. Softw. Maint.: Res. Pract. 12, 2 (2000), 79–102. + + +[54] Tyler McDonnell, Baishakhi Ray, and Miryung Kim. 2013. An empirical study of API stability and adoption in the Android ecosystem. In Proceedings of the International Conference on Software Maintenance (ICSM’13). IEEE Computer Society, Los Alamitos, CA. + + +[55] T. Mens. 2016. An ecosystemic and socio-technical view on software maintenance and evolution. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’16). 1–8. + + +[56] David G. Messerschmitt, Clemens Szyperski et al. 2005. Software Ecosystem: Understanding an Indispensable Technology and Industry. MIT Press Books. + + +[57] Audris Mockus. 2009. Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In Proceedings of the IEEE Conference on Mining Software Repositories (MSR’09). + + +[58] Emerson Murphy-Hill, Thomas Zimmerman, and Nachiappan Nagappan. 2014. Cowboys, ankle sprains, and keepers of quality: How is video game development different from software development? In Proceedings of the International Conference on Software Engineering (ICSE’14). DOI: https://doi.org/10.1145/2568225.2568226 + + +[59] Linda Northrop, Peter Feiler, Richard P. Gabriel, John Goodenough, Rick Linger, Tom Longstaff, Rick Kazman, Mark Klein, Douglas Schmidt, Kevin Sullivan, and Kurt Wallnau. 2006. Ultra-large-scale Systems: The Software Challenge of the Future. Software Engineering Institute. + + +[60] Siobhán O’Mahony and Fabrizio Ferraro. 2007. The emergence of governance in an open source community. Acad. Manag. J. 50, 5 (2007), 1079–1106. + + +[61] Jeroen Ooms. 2013. Possible directions for improving dependency versioning in R. R Journal 5, 1 (2013), 1–9. + + +[62] Klaus Ostermann, Paolo G. Giarrusso, Christian Kästner, and Tillmann Rendel. 2011. Revisiting information hiding: Reflections on classical and nonclassical modularity. In Proceedings of the European Conference on Object-oriented Programming (ECOOP’11) (Lecture Notes in Computer Science), Vol. 6813. Springer-Verlag, Berlin, 155–178. + + +[63] David L. Parnas. 1972. On the criteria to be used in decomposing systems into modules. Commun. ACM 15, 12 (1972), 1053–1058. DOI: https://doi.org/10.1145/361598.361623 + + +[64] Raphael Pham, Leif Singer, Olga Liskin, Fernando Figueira Filho, and Kurt Schneider. 2013. Creating a shared understanding of testing culture on a social coding site. In Proceedings of the International Conference on Software Engineering (ICSE’13). IEEE Computer Society, Los Alamitos, CA, 112–121. + + +[65] Tom Preston-Werner. 2013. Semantic Versioning 2.0.0. Retrieved from http://semver.org. + + +[66] Steven Raemaekers, Arie van Deursen, and Joost Visser. 2012. Measuring software library stability through historical version analysis. In Proceedings of the International Conference on Software Maintenance (ICSM’12). IEEE Computer Society, Los Alamitos, CA, 378–387. + + +[67] Steven Raemaekers, Arie Van Deursen, and Joost Visser. 2014. Semantic versioning versus breaking changes: A study of the Maven repository. In Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM’14). IEEE Computer Society, Los Alamitos, CA, 215–224. DOI: https://doi.org/10.1109/SCAM.2014.30 + + +[68] Romain Robbes, Mircea Lungu, and David Röthlisberger. 2012. How do developers react to API deprecation? The case of a smalltalk ecosystem. In Proceedings of the International Symposium Foundations of Software Engineering (FSE). ACM Press, New York. DOI: https://doi.org/10.1145/2393596.2393662 + + +[69] RStudio Team. 2015. RStudio: Integrated Development for R. Technical Report. RStudio, Inc., Boston MA. Retrieved from www.rstudio.com +[70] Edgar H. Schein and Peter Schein. 2017. +Organizational Culture and Leadership + (5th ed.). Wiley. + + +[71] Shalom H. Schwartz. 1992. Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. +Adv. Exper. Soc. Psychol. + 25 (1992), 1–65. + + +[72] Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey. 2014. Software engineering at the speed of light: How developers stay current using Twitter. In +Proceedings of the International Conference on Software Engineering (ICSE’14) +. 211–221. DOI: https://doi.org/10.1145/2568225.2568305 + + +[73] Ian Sommerville. 2010. +Software Engineering + (9th ed.). Pearson Addison Wesley. + + +[74] Diomidis Spinellis. 2012. Package management systems. +IEEE Softw. + 29, 2 (2012), 84–86. + + +[75] Adam Stakoviak, Andrew Thorp, and Isaac Schleuter. 2013. The Changelog. Retrieved from https://changelog.com/101/. + + +[76] Peri Tarr, Harold Ossher, William Harrison, and Stanley M. Sutton, Jr. 1999. N degrees of separation: Multi-dimensional separation of concerns. In +Proceedings of the International Conference on Software Engineering (ICSE’99) +. IEEE Computer Society, Los Alamitos, CA, 107–119. + + +[77] The LibreOffice Design Team. 2017. What Open Source Means To LibreOffice Users. Retrieved from https://design.blog.documentfoundation.org/2017/09/13/open-source-means-libreoffice-users/. + + +[78] The Rust Team. 2021. The Cargo Book. Retrieved on 28 April, 2021 from https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries. + + +[79] Jonathan Tuner. 2016. State of Rust Survey 2016. Retrieved from https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html. + + +[80] A. Turon and N. Matsakis. 2014. Stability as a Deliverable (The Rust Programming Language Blog). Retrieved from https://blog.rust-lang.org/2014/10/30/Stability.html. + + +[81] Ivo van den Berk, Slinger Jansen, and Lútzen Luinenburg. 2010. Software ecosystems. In +Proceedings of the European Conference on Software Architecture (ECSA’10) +. 127–134. DOI: https://doi.org/10.1145/1842752.1842781 + + +[82] Bill Venners. 2003. The Philosophy of Ruby: A Conversation with Yukihiro Matsumoto, Part I. Retrieved from http://www.artima.com/intv/rubyP.html. + + +[83] Jonathan Wareham, Paul B. Fox, and Josep Lluís Cano Giner. 2014. Technology ecosystem governance. +Organiz. Sci. + 25, 4 (2014), 1195–1215. + + +[84] Mark Weiser. 1984. Program slicing. +IEEE Trans. Softw. Eng. + 10, 4 (1984), 352–357. + + +[85] Joel West. 2003. How open is open enough?: Melding proprietary and open source platform strategies. +Res. Polic. + 32, 7 (2003), 1259–1285. + + +[86] Joel West and Siobhán O’Mahony. 2008. The role of participation architecture in growing sponsored open source communities. +Industr. Innov. + 15, 2 (2008), 145–168. + + +[87] Hadley Wickham. 2015. +Releasing a Package +. O’Reilly Media, Sebastopol, CA. Retrieved from http://r-pkgs.had.co.nz/release.html. + + +[88] Wei Wu, Foutse Khomh, Bram Adams, Yann Gaël Guéhéneuc, and Giuliano Antoniol. 2015. An exploratory study of API changes and usages based on Apache and Eclipse ecosystems. +Empir. Softw. Eng. + (2015), 1–47. DOI: https://doi.org/10.1007/s10664-015-9411-7 + + +[89] Wei Wu, Foutse Khomh, Bram Adams, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2016. An exploratory study of API changes and usages based on Apache and Eclipse ecosystems. +Empir. Softw. Eng. + 21, 6 (2016), 2366–2412. + + +[90] Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente. 2017. Historical and impact analysis of API breaking changes: A large-scale study. In +Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’17) +. IEEE, 138–147. + + +[91] Yihui Xie. 2013. R Package Versioning. Retrieved from http://yihui.name/en/2013/06/r-package-versioning/. + + +[92] Robert A. Yin. 2013. +Case Study Research: Design and Methods + (5th ed.). Sage Publications. + + +Received August 2019; revised December 2020; accepted January 2021 +---------------------------------------- +------------------------------- +Section 207: +Code Reuse in Open Source Software Development: +Quantitative Evidence, Drivers, and Impediments + + +March 2010 +Manuel Sojer¹ and Joachim Henkel¹, ² + + +¹Technische Universität München, Schöller Chair in Technology and Innovation Management, Arcisstr. 21, D-80333 Munich, Germany. sojer|henkel@wi.tum.de +²Center for Economic Policy Research (CEPR), London + + +Abstract + + +The focus of existing open source software (OSS) research has been on how and why individuals and firms add to the commons of public OSS code—that is, on the “giving” side of this open innovation process. In contrast, research on the corresponding “receiving” side of the innovation process is scarce. We address this gap, studying how existing OSS code is reused and serves as an input to further OSS development. Our findings are based on a survey with 686 responses from OSS developers. As the most interesting results, our multivariate analyses of developers’ code reuse behavior point out that developers with larger personal networks within the OSS community and those who have experience in a greater number of OSS projects reuse more, presumably because both network size and a broad project experience facilitate local search for reusable artifacts. Moreover, we find that a development paradigm that calls for releasing an initial functioning version of the software early—as the “credible promise” in OSS—leads to increased reuse. Finally, we identify developers’ interest to tackle difficult technical challenges as detrimental to efficient reuse-based innovation. Beyond OSS, we discuss the relevance of our findings for companies developing software and for the receiving side of open innovation processes in general. + + +Keywords: Innovation, software development, open source software, code reuse, software reuse + + +We are grateful to Oliver Alexy, Timo Fischer, Stefan Haefliger, Francesco Rullani, and seminar participants at the Pre-ECIS 2009 Open Source and Innovation Workshop, the TUM/Imperial Paper Development Workshop 2009, and the Open Source, Innovation, and Entrepreneurship Workshop 2010 for helpful comments. +1. Introduction + + +The public development of open source software (OSS)(^1) is a specific instance of open innovation, a term coined by Chesbrough (2003). A large body of empirical work has addressed the “giving” side of this open innovation process, that is, exploring the question of why and how individuals (e.g. Ghosh et al., 2002; Hars and Ou, 2002; Hertel et al., 2003; Lakhani and Wolf, 2005; Henkel, 2009) and firms (e.g. West, 2003; Dahlander, 2005; Gruber and Henkel, 2005; Bonaccorsi et al., 2006; Henkel, 2006; Rossi Lamastra, 2009) make their developments freely available for others to use and build upon. + + +In contrast, research on the “receiving” side of the innovation process,(^2) that is, on the extent, drivers, and impediments of reuse of existing OSS code in subsequent OSS development, is scarce and either based on high-level code or dependency analyses (German, 2007; Mockus, 2007; Spaeth et al., 2007; Chang and Mockus, 2008), or on case studies (von Krogh et al., 2005; Haefliger et al., 2008). While this research suggests that code reuse is of major importance for OSS development, a large-scale quantitative study of the phenomenon on the level of individual developers is lacking. + + +A better understanding of code reuse in OSS is desirable, not only in itself, but also because it will yield insights on reuse beyond OSS. Reuse has long been recognized as crucial to overcome the “software crisis” (Naur and Randell, 1968), as it allows for more efficient and more effective development of software of higher quality (Krueger, 1992; Kim and Stohr, 1998). More generally, the literature on innovation management points to knowledge reuse as an important factor mitigating the cost of innovation (e.g. Langlois, 1999; Majchrak et al., 2004). Despite significant advances in reuse research, especially software reuse in commercial firms is still not without issues and its antecedents are not fully understood yet (e.g. Desouza et al., 2006; Sherif et al., 2006). Some scholars suspect that reuse failure is often related to individual developer issues (e.g. Isoda, 1995; Morisio et al., 2002). However, + + + + +(^1) For better readability, we will use the term Open Source software in this article, but our work also refers to Libre and Free software, which differs from open source in ideological considerations but not in technical ones. See http://www.gnu.org/philosophy/free-sw.html for further information. + + +(^2) Also the users of OSS obviously receive code, however, since they do not base own innovations on it we do not consider them to be on the “receiving” side of the OSS innovation process. +there is a paucity of, especially quantitative, research addressing the view of individual developers on reuse (e.g. Sen, 1997; Ye and Fischer, 2005). + + +Our aim is to fill the above gap regarding the “receiving” side of OSS innovation and to leverage our findings to augment general software reuse literature by adding insights regarding the perspectives of individual developers on reuse with a survey-based empirical study of code reuse in public OSS development. We quantitatively assess the importance of code reuse as one form of reuse in software development in OSS, and explore its drivers and impediments at the level of individual developers. Our empirical approach relies on a web-based survey to which we had, via email, invited 7,500 developers from SourceForge.net, the largest OSS development platform. + + +Our results point out that code reuse does play a major role in OSS development; developers reported, on average, that 30 percent of the functionality they have implemented in their current main projects has been based on reused code. Investigating the drivers of reuse in multivariate analyses, we find that developers who believe in the effectiveness, efficiency, and quality benefits of reuse and developers who see reuse as a means to work on their preferred development tasks rely more on existing code. Further, presumably because both a larger network and experience in a greater number of projects provide them with access to local search for reusable artifacts, developers with larger personal networks within the OSS community and experience in a greater number of OSS projects reuse more. Moreover, we find that a development paradigm that calls for releasing an initial functioning version of the product early, and so delivering a “credible promise”, leads to increased reuse. Finally, developers’ interest to tackle difficult technical challenges is identified as detrimental to efficient reuse-based innovation, while developers’ commitment to the OSS community leads to increased reuse behavior. + + +The remainder of the paper is organized as follows. The next section reviews relevant literature on software reuse and OSS, followed by a section that presents our research model and hypotheses. After that, we elaborate on our data and measures before we present our analyses and results. The last section concludes with a summary and a discussion. A supplemental appendix contains further tables referred to in the paper but not included in its main body for space considerations. +2. Literature Review + + +The theoretical foundation of this paper draws on two streams of the literature. First, we review relevant software engineering literature on reuse and its implementation in firms. Second, scholarly work on OSS development provides the context of our work, establishing basic concepts of why developers contribute to OSS projects and how they do so. A summary of the small base of scholarly work on code reuse in OSS development concludes the literature review. + + +2.1. Reuse in Software Development + + +Software reuse (as the software-specific form of knowledge reuse (e.g. Langlois, 1999; Majchrak et al., 2004)) is “[…] the process of creating software systems from existing software rather than building software systems from scratch” (Krueger, 1992, p. 131). The artifacts most commonly reused in software development are components (pieces of software that encapsulate functionality and have been developed specifically for the purpose of being reused) and snippets (multiple lines of code from existing systems) (Krueger, 1992; Kim and Stohr, 1998). Our study focuses on these two artifacts, and we refer to their reuse as “code reuse.” Software reuse promises not only increased development efficiency and reduced development times, but also improved software quality and better maintainability because developers do not have to develop everything from scratch, but rather can rely on existing, proven, and thoroughly tested artifacts (Frakes and Kang, 2005). + + +Despite these compelling benefits, software reuse still fails frequently in commercial firms, sometimes for technical, but most often for human and organizational reasons (e.g. Morisio et al., 2002). The importance of the individual developer in successful reuse is undisputed. Isoda (1995, p. 183) for instance concedes, “unless they [software engineers] find their own benefits from applying software reuse […] they will not […] perform reuse.” Still, there is a paucity in reuse research that focuses on the individual developer (Sen, 1997; Ye and Fischer, 2005). + + +OSS seems to be a unique opportunity to enhance our knowledge about the role of individuals in successful reuse-based innovation and software reuse in particular for two reasons. First, contrary to commercial software developers who +are often restricted to the limited amount of code available in their firms’ reuse repositories, the abundance of OSS code available under licenses which generally permit reuse in other OSS projects provides OSS developers with broad options to reuse existing code if they wish to do so. Second, the broad scholarly knowledge about the motivations and beliefs of OSS developers should be helpful in analyzing the perspectives of individual developers on software reuse. The next section establishes community-based, public OSS development as the empirical setting of our analysis. + + +2.2. Open Source Software Development + + +Strictly speaking, software is OSS if it comes under an open source license. Such a license grants users of the software the right to access, inspect, and modify the source code of the software and distribute modified or unmodified versions of it.(^3) Since much OSS is developed by informal collaboration in public OSS projects (Crowston and Scozzi, 2008), the term “OSS” is often also understood to imply that the software has been developed in the “OSS fashion” (von Krogh et al., 2008). Typically, the development of software in OSS projects differs strongly from the development of traditional software in most commercial setups (Crowston et al., 2009). In this context the motivation of developers to spend considerable time on their OSS projects and the process of OSS development are of particular relevance to our study. + + +A large body of literature has emerged that addresses the first topic. Common to most of this work is the finding that OSS developers work on their projects for both intrinsic and extrinsic reasons. As intrinsic motivations, scholars have identified identification with the OSS community and the resulting wish to support it (Hertel et al., 2003), ideological support of the OSS movement (Stewart and Gosain, 2006), the desire to help others (Hars and Ou, 2002), and, most importantly, the fun and enjoyment that developers experience when working on their projects (Lakhani and Wolf, 2005). Based on psychology research (Amabile et al., 1994), Sen et al. (2008) further differentiate fun into the enjoyment and “flow” feelings (Csíkszentmihályi, 1990) that developers perceive when writing code and the satisfaction of solving challenging technical problems. Extrinsic motivations of + + +(^3) Whether a software license is an open source license is determined by the Open Source Initiative (http://www.opensource.org). +OSS developers may derive from the wish to enhance their reputation in the OSS community (Lakhani and Wolf, 2005), to hone their software development skills (Hars and Ou, 2002), to develop or adapt software functionality to their own needs (Hertel et al., 2003), and to signal their skills to potential employers and business partners (Lerner and Tirole, 2002). Also, they may be paid directly for their OSS work, for example, if it is part of their job (Ghosh et al., 2002). + + +Regarding the process of OSS development, OSS projects are often started by an individual developer who has a need for certain software functionality that does not yet exist (Raymond, 2001). After initialization, the developer typically wants to attract other developers to participate in the project. An incentive for others to join the project is that it offers interesting tasks and also seems feasible (von Krogh et al., 2003). The founder can enhance this recruitment process by delivering a “credible promise”, which Lerner and Tirole (2002, p. 220) describe as “a critical mass of code to which the programming community can react. Enough work must be done to show that the project is doable and has merit.” However, not only does the founder have to prove that the project is worthy of support by others, but also developers interested in joining a project often have to show that they possess the skills required by solving some of the technical issues the project is currently facing (von Krogh et al., 2003). + + +2.3. Code Reuse in Open Source Software Development + + +There is scant research on code reuse in OSS and so far no large-scale quantitative data on the developer level exist. Initial academic work, however, suggests that code reuse is practiced in OSS projects even at a high level. Analyzing the code of a large number of OSS projects, Mockus (2007) and Chang and Mockus (2008) measure the overlap of filenames among OSS projects in their database of 38.7 thousand OSS projects and conclude that about 50 percent of the components exist in more than one project. Mockus’s (2007) data even suggests that code reuse is more popular in OSS development than in the traditional commercial closed source software arena. Following a different approach, both German (2007) and Spaeth et al. (2007) rely on dependency information available in Linux distributions to show that most packages in these distributions require other packages as they reuse their functionality. +Using case studies on the project and individual developer level rather than large-scale code analyses, von Krogh et al. (2005) and Haefliger et al. (2008) confirm that OSS developers reuse existing code—in the form of components and snippets—as well as abstract knowledge—such as algorithms and methods. Diving into the mechanics of code reuse in OSS, Haefliger et al. (2008) find that OSS developers reuse code because they want to make their development work more efficient, they lack the skills to implement certain functionality by themselves, they prefer some specific development work over other tasks, or they want to deliver a “credible promise” with their project. The authors further point out that there exist equivalents to some of the components of corporate reuse programs, such as the OSS repositories like SourceForge.net which can substitute internal reuse repositories within firms, or the reuse frequency of a component which can serve as a proxy for the component’s quality and thus substitutes certification. + + + + +Research Questions and Hypotheses + + + + +Building on the existing research on code reuse in OSS presented above, this paper seeks to use large-scale quantitative data obtained through a survey among OSS developers to answer the question: under what conditions do developers prefer reusing existing code over developing their own code from scratch? In this context, the following specific research questions will be addressed: + + + + +How important is code reuse in OSS development projects? + + +What do OSS developers perceive as the benefits of code reuse and what do they see as the issues and impediments? + + +How is the degree of code reuse in open source developers’ work determined by their characteristics and those of their project? + + + + +The first question establishes if and to what extent OSS developers reuse existing code, while the subsequent questions explore how this behavior can be understood and explained. Question three will be addressed using regression analyses. To guide the choice of explanatory variables and formulate hypotheses, a research model is developed in the following section. To provide a solid theoretical base, our research model builds on the well-established Theory of Planned Behavior (TPB) (Ajzen, 1991) and is refined and extended with both interviews and literature on code reuse and OSS. +3.1. Theory of Planned Behavior + + +Initially developed in the context of social psychology, TPB as a behavioral model has found wide adoption in various fields of information systems (IS) research. TPB is a parsimonious and rather generic model explaining human behavior and thus provides an excellent starting point to investigate code reuse as one particular form of behavior. Research related to the topic of our study has relied on TPB or its sister model TAM (Technology Acceptance Model) (Davis et al., 1989) to explain for example software developers’ application of various development methodologies such as CASE tools (Riemenschneider and Hardgrave, 2001), object-oriented software development (Hardgrave and Johnson, 2003) or generally formalized software development processes (Riemenschneider et al., 2002; Hardgrave et al., 2003). Following the encouraging results of this stream of research we base our research model on TPB. + + +TPB posits that behavior is determined by intention, which itself is predicted by three factors: (1) attitude toward the behavior, (2) subjective norms, and (3) perceived behavioral control. Attitude is formed by the individual’s beliefs about the consequences and outcomes (both positive and negative) of the behavior. Subjective norms refer to pressure from the social environment as perceived by the individual to perform or not perform the behavior. Lastly, perceived behavioral control is the perception of individuals of their ability to perform the behavior. It can be further broken down in individuals’ “capability” of performing the behavior and the “controllability” (Ajzen, 2002) the individuals have over the behavior, that is, whether the decision to perform the behavior is theirs or not. + + +3.2. Research Model and Hypotheses + + +Using TPB as a starting point for our research model (see Figure 1), we argue that developers’ reuse behavior is influenced by their attitude toward code reuse, their subjective norms on code reuse, and the behavioral control they perceive regarding code reuse. Contrary to typical work relying on TPB, we do not employ generic scales to measure these constructs in most cases, but rather operationalize them with unique scales and single items explicitly framed in the OSS and code reuse context. As a second deviation from typical TPB research we will test the research model with different regressions which either use intention to reuse as the dependent variable or employ actual reuse behavior as the dependent +variable. Since we do not combine intention and behavior into one construct, but rather employ only one of them in each of our regression models we stay true to the TPB assumption that the two concepts are related but not the same. Comparing the results of the regressions with different dependent variables adds robustness to our findings. + + +Note that our research model aims at explaining developers’ reuse behavior without explicitly differentiating between component and snippet reuse. In conventional software development component reuse is typically considered as black-box reuse, implying that developers can neither access nor modify the source code of the components the reuse. Thus, component reuse is assumed to follow different drivers than white-box reuse (e.g. snippet reuse) where access to source code is given (Ravichandran and Rothenberger, 2003). In the context of OSS however, also the source code of components is available to reusing developers and our survey data indicate that about 50% of the developers exercise the option to modify it. Because of this we expect no fundamental differences in the drivers of +component and snippet reuse and treat both forms of code reuse jointly in our research model. + + +Based on our interviews(^4) and existing research, we have identified five main drivers that influence developers’ attitude toward code reuse since they determine whether developers expect positive or negative outcomes from reuse. These drivers are developers’ perceptions of (1) the effectiveness of reuse, (2) the efficiency of reuse, (3) the software quality attained by reuse, (4) the task selection benefits resulting from reuse, and (5) the potential loss of control over their project that might come with reuse. The link between reuse and effectiveness, efficiency, and software quality is straightforward. In addition, code reuse might result in task selection benefits if developers can avoid certain tasks by reusing existing code (Haefliger et al., 2008). As the fifth driver, reuse can lead to control loss as a developer reusing code from another project might become dependent on this project to develop the code further, fix bugs, and so on. Since developers with a more positive perception of the above drivers should hold a more positive attitude toward reuse, TPB suggests that they rely more on reusing existing code in their work. Based on this logic, the following hypotheses can be derived for the five drivers: + + +Developers reuse more existing code… + + +(H1a:) …the more strongly they perceive the effectiveness benefits of reuse. + + +(H1b:) …the more strongly they perceive the efficiency benefits of reuse. + + +(H1c:) …the more strongly they perceive the quality benefits of reuse. + + +(H1d:) …the more strongly they perceive the task selection benefits of reuse. + + +(H1e:) …the less strongly they perceive the loss of control risks of code reuse. + + +Since the primary interest of our research is to understand how individual developer characteristics influence reuse, both subjective norms and perceived behavioral control as the two other parts of TPB besides attitude are treated as control variables in our model. The controllability portion of perceived behavioral + + +(^4) See the next section for an overview of our interviews. +control is operationalized by six variables relating to project attributes. Two dummy variables indicate whether there exist policies in the project supporting or discouraging code reuse. Four Likert-scale variables capture the intensity of general impediments to code reuse: a lack of reusable code for the specific requirements of a developer’s project; conflicts between the license of the developer’s project and the license of the code to be reused; incompatibilities between programming languages, when the code to be reused is written in a different language than the developer’s project (Haefliger et al., 2008), or when the programming language of the focal project makes it difficult to include code in foreign languages; and an architecture of the developer’s project that is not modular enough to allow for easy reuse of existing code (Baldwin and Clark, 2006). The capability portion of perceived behavioral control is operationalized through each developer’s self-reported skill level in software development, arguing that without some proficiency, developers will not be able to understand and integrate foreign code. + + +TPB research posits that attitude toward a behavior, subjective norms, and perceived behavioral control explain behavior comprehensively (Ajzen, 1991). We stay true to this assumption when we add further groups of hypotheses and control variables hereinafter because all of these additional groups could be incorporated into the three original TPB groups of attitude, subjective norms, and perceived behavioral control. We did, however, choose to display some hypotheses as independent groups to better illustrate the ideas behind them. Moreover, some further control variables are shown as a group of their own because their influence on attitude, subjective norms, and perceived behavioral control is rather indirect. + + +In the first additional hypotheses group, we argue that developers’ access to local search leads to increased code reuse. Banker et al. (1993) show that developers will reuse if their costs for searching and integrating existing code are lower than for developing it from scratch. These costs for searching and integrating are lower if OSS developers can turn to their own experience or that of fellow OSS developers who can point them to the code they need, assure them of its quality, and explain to them how it works and how to best integrate it (Haefliger et al., 2008). Consequently, we posit that developers with a larger personal network of other OSS developers will reuse more code because they can reap the benefits of local search (H2a). Similarly, developers who have been active in more OSS projects in the past +will also show increased code reuse behavior ((H2b)). Summarizing, the following two hypotheses can be derived regarding developers’ access to local search. + + +Developers reuse more existing code… + + +(H2a: ) …the larger their personal OSS network. + + +(H2b: ) …the greater the number of OSS projects they have been involved in. + + +Further, we also conjecture a relationship between the maturity of an OSS project and the code reuse behavior of its developers. As pointed out in the literature review section, OSS developers launching a project strive to deliver a “credible promise” as quickly as possible in order to attract other developers’ support. Code reuse is an excellent tool to accomplish that because it allows the addition of large blocks of functionality to a new project with limited effort (Haefliger et al., 2008). Further, code reuse can help a new project to overcome its “liabilities of smallness” (Aldrich and Auster, 1986) and quickly close the gap to established competing projects in its domain. Lastly, while code reuse is very helpful in the early phases of the life of an OSS project, we expect its importance to decline once the project has reached a certain level of maturity. At that point, the project has implemented all required basic functionality and turns toward fine-tuning the aspects that make it unique, which by definition is difficult with reused code. Thus, we posit that the less mature an OSS project is, the more code its developers will reuse ((H3)). + + +(H3: ) Developers reuse more existing code the less mature their project. + + +In the final group of hypotheses, we argue that the compatibility of code reuse with developers’ own goals in their project will influence the extent of their code reuse behavior. This is important because the “attitudes”-group of our model presented above captures developers’ general attitude toward code reuse, while the “compatibility”-group presented in the following will help to link these general attitudes to the developers’ work in one specific project. We follow Moore and Benbasat (1991, p. 195) and define compatibility as the degree to which code reuse “is perceived as being consistent with the existing values, needs, and past experiences” of an OSS developer and focus primarily on “values” and “needs” +(“experiences” being addressed by $H2b$). Our argumentation regarding compatibility between developers’ project goals and their reuse behavior is based on the motivations of developers to participate in OSS projects described earlier. + + +Sen et al. (2008) show empirically that developers for whom tackling difficult technical problems is a main motivation to work on their project try to limit the number of team members involved in their project besides themselves because they want to solve the problems themselves and without the help of others. In similar fashion, developers who work on their project to tackle difficult technical challenges should reuse less existing code because reuse would solve some of the challenges for them ($H4a$). In order to be able to focus on solving these difficult technical challenges by themselves, developers might very well show increased reuse behavior for other parts of their project, but we control for this effect by including developers’ perception of task selection benefits through reuse (see $H1d$ above). + + +Also supportive of our argumentation is DiBona et al.’s (1999, p. 13) description of the “satisfaction of the ultimate intellectual exercise” which developers feel “after completing or debugging a hideously tricky piece of recursive code that has been a source of trouble for days.” It seems likely that reuse would reduce the joy described and thus developers for whom challenge seeking is a major motivation should reuse less existing code. + + +Related to the above effect of challenge seeking, reuse should also be of lower importance to developers who work on their project for the pleasure they experience when writing code ($H4b$). Code reuse would reduce their need to write own code and thus reduce the pleasure derived from doing so. Hars and Ou (2002, p. 28) provide a nice illustration for this argumentation when they quote an OSS developer explaining his motivation to work on his project with his “innate desire to code, and code, and code until the day I die.” It seems more than plausible that a developer feeling this way about coding would, ceteris paribus, reuse less. As for challenge seeking, one might argue that developers who code for fun might reuse more in order to focus on the most enjoyable tasks. However, again, this is statistically controlled for by including developers’ perception of task selection benefits through reuse (see $H1d$ above). + + +The goal to improve one’s software development skills could affect reuse intensity in two directions. One could conjecture that developers who want to hone +their skills purposefully reinvent the wheel in order to learn how it is done. Yet, we argue that countervailing effects dominate, such that developers for whom skill improvement is more important also reuse more existing code ((H4c)). Our rationale is based on DiBona’s (2005) finding that OSS developers leverage existing code as a starting point for their learning and study and modify it to improve their own skills. We also found confirmation for this stance in our interviews(^5) in which developers for example told us that they have “used code reuse as a way of learning” or pointed out that “reusing code snippets can really help to learn a new programming language.” Also supportive to our argumentation is the finding from our survey(^6) that about 50% of the developers modify the components they reuse and thus do not practice black-box reuse in which they do not get in touch with the source code of the components. + + +Regarding community commitment as motivation we argue that developers who feel strongly committed to the OSS community and want it to be successful will reuse more code ((H4d)). Code reuse helps these developers to write better software faster, and allows them to make the community stronger by contributing this software. + + +As the last two motivations conjectured to influence developers’ reuse behavior we turn to reputation building, first within the OSS community and second for the purpose of signaling skills to potential commercial partners such as employers. Regarding developers’ reputation within the OSS community we argue that developers seeking to improve their reputation will reuse more code ((H4e)). Code reuse should make a project better and thus create more attention for the project within the OSS community and also for the developers associated with the project. This argumentation receives support from Sen et al. (2008) who find that developers for whom OSS reputation building is important prefer to be part of a successful project with many other developers over being one of only a few developers of a less successful project. One could object that an OSS developer’s reputation is grounded in her technical skills which she best proves with her unique—that that is, not reuse-based—contributions to the OSS community. Yet, this argumentation is refuted by von Krogh et al.’s (2003) finding that developers + + +(^5) See the next section for an overview of our interviews. +(^6) The survey is introduced in detail in the next section. +who need to prove their worthiness to join a project by making their initial contributions often include reused code in these. Furthermore, Raymond’s (2001, p. 24) famous saying that “good programmers know what to write. Great ones know what to rewrite (and reuse)” also leans toward our hypothesis that developers for whom reputation building in the OSS community is important will reuse more existing code. Finally, and basically following the same argumentation as above, we posit that developers who want to signal their software development skills to potential employers or business partners will reuse more code because parties outside of the OSS community are more likely to become aware of successful OSS projects and their developers (H4f). Summarizing, we posit the following hypotheses addressing the compatibility between developers’ motivations to work on their project and code reuse: + + +Developers reuse more existing code… + + +H4a: …the less important challenge seeking… + + +H4b: …the less important coding fun and enjoyment… + + +H4c: …the more important skill improvement… + + +H4d: …the more important community commitment… + + +H4e: …the more important OSS reputation building… + + +H4f: …the more important commercial signaling… + + +…is for them as a motivation to work on their project. + + +Finally, multiple additional control variables are included in our model to account for further contextual differences in code reuse behavior. These control variables encompass four groups. First, we account for the project characteristics project size (number of project team members), technical complexity of the project, the project’s position in software stack, and whether the project aims at creating a standalone executable application or a reusable component. In addition to that we further control for the level of professionalism and seriousness with which developers work on their current main project by including the number of years they +have already been involved in OSS, the average weekly hours they invest into their current main project, the share of functionality that was developed by them in their current main project as compared to their project team members, and whether they have worked or work as professional software developers. Moreover, we account for developers’ education and training on reuse, which has been shown to be a determinant of reuse behavior in software development firms in previous research (e.g. Frakes and Fox, 1995). Finally, we accommodate developers’ geographic residence on a continent level. Subramanyam and Xia (2008) have shown that developers from different geographies prefer, for example, different levels of modularity in their OSS projects. Following this line of thought, geographic origin might also be an antecedent for reuse behavior. + + + + +Research Design, Data and Measures + + + + +We collected data for our study using a web-based survey that was developed based on 12 interviews with OSS developers and on the existing literature. Moreover, all questionnaire items and questions were assessed for clarity by fellow researchers and OSS developers in a qualitative pretest. In the survey, we asked developers about their experiences with code reuse in the context of their current main OSS project. In order to capture the high heterogeneity of OSS projects and their developers, we chose the largest OSS project repository, SourceForge.net, as a platform to selected survey participants. In April 2009, two rounds of quantitative pretests, to which in total 2,000 developers had been invited, were conducted to assess the quality of our questionnaire in terms of content, scope, and language. Following minor refinements based on an analysis of the pretest and feedback from the respondents, the main survey took place in July 2009. An email was sent to 7,500 developers from SourceForge.net inviting them to participate in our survey. The developers were selected at random from all SourceForge.net developers who had been active on the platform in the first half of +---------------------------------------- +------------------------------- +Section 208: +7 The number of years a developer has been active in OSS is treated as a control variable and not included in the local search hypotheses because it is not the intensity of experience (as e.g. measured by the number of years), but rather the breadth of experience (as e.g. measured by the number of projects involved) which is conjectured to facilitate better access to local search and consequently more code reuse. + + +8 Ten of these interviews had been conducted via phone or Internet-based voice communication, the two others were conducted via email exchange. Nine of the voice-based interviews were taped and transcribed and had an average length of 49 minutes. +We received a total of 686 responses, equaling a response rate of 9.6 percent (338 invitations could not be delivered). This rate is similar to those obtained by other recent surveys among SourceForge.net developers (e.g. Wu et al., 2007; Sen et al., 2008). Eleven responses had to be eliminated due to inconsistent or corrupt entries, leaving us with 675 completed surveys. + + +The demographic profile of the developers participating in our study (see Table 1) is largely consistent with that reported by other studies among OSS developers (e.g. Lakhani and Wolf, 2005; Sen et al., 2008). In particular, we find no indication that nonresponse has biased our sample to overrepresent less serious OSS developers. Of special relevance to our endeavor is the fact that only 92 percent (or 624) of the developers we surveyed actually write code for their OSS projects. As only developers writing code can practice code reuse, our further analyses will focus on these 624 developers. + + +Before starting the analysis of our data, we briefly assess the multi-item constructs we have employed to measure developers' motivation to work on their main project. The items for these constructs were adopted from prior research both in the OSS domain (Hars and Ou, 2002; Lakhani and von Hippel, 2003; Roberts et al., 2006) and in psychological motivation research (Amabile et al., 1994; Clary et al., 1998), and were measured on seven-point Likert scales (“strongly disagree” to “strongly agree”). We took several steps to ensure validity and reliability of these measures. Content validity was qualitatively assessed through building on existing OSS literature whenever possible, discussions with fellow OSS researchers, and two rounds of pretests. Reliability was assessed via Cronbach’s $\alpha$ for each multi-item variable. Not all Cronbach’s $\alpha$ values exceed Straub’s (1989) rule of thumb of 0.8, but they all exceed Nunnally’s (1978) threshold of 0.6 (see Table A1 in the Appendix). Convergent validity was assessed through factor analysis, which confirms that all items have their highest loading with their respective intended + + + + +9 Given the large number of surveys among SourceForge.net developers, one might suspect that especially the more active developers on this platform would show signs of “survey fatigue.” However, comparing the self-reported weekly hours developers spend working on their main project between our survey (mean: 8.8) and the first SourceForge.net survey ever taken by Lakhani and Wolf (2005) (mean: 7.5), mitigates these concerns. The additional finding, that 69 percent of the developers in our survey have worked as professional software developers or are still working as professional software developers with an average tenure of 7.9 years rules out the further concern that only less skilled programmers took part in our survey. +construct and all loadings are higher than 0.5 (Hair et al., 2006) (see Table A1 in the Appendix). Discriminant validity is demonstrated by showing that the square root of the average variance extracted of each construct is greater than its correlations with other constructs (see Table A2 in the Appendix), thus satisfying the Fornell-Larcker criterion (Fornell and Larcker, 1981). + + +| Table 1: Demographics of Survey Participants | Percentage | +|---------------------------------------------|------------| +| +Age (mean: 31.8, median: 30) + | | +| 1-19 | 5% | +| 20-29 | 42% | +| 30-39 | 35% | +| 40-49 | 13% | +| 50+ | 5% | +| +Residence + | | +| North America | 26% | +| South America | 5% | +| Europe | 54% | +| Asia and rest of world (RoW) | 15% | +| +Highest education level + | | +| Non-university education | 15% | +| Undergraduate or equivalent | 35% | +| Graduate or equivalent | 30% | +| Ph.D. and higher | 20% | +| +Task profile in open source projects + | | +| Includes writing code | 93% | +| Does not include writing code | 7% | +| +Hours spent working on main OSS project per week (mean: 8.8, median: 5) + | | +| 1-4 | 48% | +| 5-9 | 19% | +| 10-19 | 21% | +| 20+ | 12% | +| +Size of personal OSS network (mean: 29.9, median: 8) + | | +| 1-9 | 70% | +| 10-19 | 18% | +| 20+ | 12% | +| +Number of OSS projects ever involved in (mean: 3.7, median: 2) + | | +| 1-4 | 65% | +| 5-9 | 26% | +| 10-14 | 6% | +| 15+ | 3% | + + +In order to reduce common method bias, we employed several measures during data collection as suggested by Podsakoff et al. (2003). We have taken care to formulate simple and unambiguous questions for our survey by discussing our questionnaire items with our interview partners and conducting multiple rounds of pretests. Further, survey respondents were assured when the survey was +introduced to them that their responses would be treated strictly confidentially. Moreover, much of the survey items address motivations, attitudes, and beliefs for which by nature there are no right or wrong answers. + + +To estimate the presence of common method bias in our data after survey completion, we employed Harman’s test in which all variables of a model are loaded onto a single factor in a principal component factor analysis. A significant amount of common method bias is assumed to exist if this one factor explains a large portion of all the variance in the data (Podsakoff et al., 2003). In our data we find the maximum variance explained by one factor being 9.3 percent, which does not hint toward strong common method bias. + + + + +Results and Discussion + + + + +Following the research questions presented above, this section consists of four parts. In the first, we establish the importance of code reuse in OSS development. Next, we present perceived benefits and issues of reuse as well as impediments to it, and address the question of why OSS developers do or do not reuse code. The third part presents the core of this study in the form of a multivariate analysis of code reuse behavior used to test our research model. In the final, fourth part we discuss potential threats to validity and limitations of our study. + + +5.1. Importance of Code Reuse + + +When measuring code reuse we focused on component and snippet reuse. In our survey, component reuse was defined as “reusing of functionality from external components in the form of libraries or included files. E.g., implementing cryptographic functionality from OpenSSL or functionality to parse INI files from an external class you have included. Please do not count functionalities from libraries that are part of your development language, such as the C libraries.” In a similar fashion, snippet reuse was defined as “reusing of snippets (several existing lines of code) copied and pasted from external sources. If you have modified the code after copying and pasting it by, e.g., renaming variables or adjusting it to a specific library you use, this would still be considered as […] reuse […]”. + + +Three different measures (depicted in Table 2) were employed to investigate the importance of code reuse. First, related to, for example, Cusumano +and Kemerer (1990) or Frakes and Fox (1995), we asked developers to indicate the share of functionality based on reused code that was added by them to their current main project. We found that, on average, nearly one third (mean=30%, median=20%) of the functionality OSS developers have added to their project was based on reused code, which points out that code reuse is indeed an important element of OSS development. This interpretation is further supported by the fact that only six percent of the developers surveyed report that they have not reused any code at all. Furthermore, the maximum share of reused functionality of 99 percent shows that some developers rely very heavily on code reuse and see their role mainly in writing “glue-code” to integrate the various pieces of reused code. As a second measure, we employed a self-developed four-item scale to directly measure the perceived importance of reuse for the individual developers’ work on their main project.(^\text{10}) On seven-point Likert scales, developers indicated their agreement to four statements that described, in various ways, reuse as “very important.” With a mean of 4.74 (median=5.25) and 58 percent of all developers at least “somewhat agreeing” to the statements, the important role of code reuse in OSS development is again confirmed. + + +Finally, as the third approach, using a further self-developed four-item scale,(^\text{11}) we asked developers to indicate their intent to reuse existing code in the future development of their current main project. The results are largely similar to those obtained by using the second measure (perceived importance of reuse in past work), once more indicating that code reuse is very important. However, both mean and median are significantly lower (mean=4.57, median=4.75) than in the previous + + +(^{10}) The scale was developed based on our interviews with developers and on research on general knowledge reuse (Watson and Hewett, 2006). It also draws on the intention and behavior scales commonly employed in TAM or TPB research in the IS domain, for example, by Riemenschneider et al. (2002) or by Mellarkod et al. (2007). The statements of the scale are: “Reusing has been extremely important for my past work on my current main project,” “Without reusing, my current main project would not be what it is today,” “I did reuse very much during my past work on my current main project,” and “My past work on my current main project would not have been possible without reusing.” The scale explains 83.4 percent of the total variance and Cronbach’s $\alpha$ is 0.93. + + +(^{11}) The statements of the scale are: “Reusing will be extremely important in my future work on my current main project,” “Realizing my future tasks and goals for my current main project will not be possible without reusing,” “I will reuse very much when developing my current main project in the future,” and “Realizing my future tasks and goals for my current main project will be very difficult without reusing.” The scale explains 83.8 percent of the total variance and Cronbach’s $\alpha$ is 0.94. +measure. This finding might be a first indication supporting hypothesis $H3$, which states that code reuse is more important in earlier phases of an OSS project. + + +| Measure | Mean | Median | S.D. | Min. | Max. | +|------------------------------------------------------------------------|-------|--------|-------|-------|-------| +| Share of implemented functionality based on reused code (in %) | 30.0% | 20.0% | 26.5% | 0.0% | 99.0% | +| Importance of reuse for past work on project (seven-point Likert scale) + | 4.74 | 5.25 | 1.86 | 1.00 | 7.00 | +| Importance of reuse for future work on project (seven-point Likert scale) + | 4.57 | 4.75 | 1.69 | 1.00 | 7.00 | + + +*Measure is based on four single items. N=624. + + +Despite the prominent role of code reuse as consistently indicated by all three measures, the high standard deviations also reveal large heterogeneity in developers’ code reuse behavior. Developers’ individual reasons for and against code reuse in their development are suspected to largely drive this heterogeneity and will be explored in the following section. + + +5.2. Developers’ Reasons For and Against Code Reuse + + +In our analysis of developer’s reasons for and against code reuse, we differentiate between three sets of factors. First, we analyze the benefits of code reuse as perceived by OSS developers. Second, we investigate the drawbacks and issues that developers see in code reuse, and, finally, we address the importance of general impediments(^\text{12}) to code reuse. + + +Based on our interviews, as well as the existing literature, we have identified eight distinct benefits of code reuse. Survey participants were asked to indicate their agreement on a seven-point Likert scale to statements regarding these benefits. Results are displayed in Figure 2 and show that all of the statements received rather high shares of agreement. The two statements with the highest level of agreement both point to efficiency effects of reuse, followed by a statement pertaining to its effectiveness effects. For the benefits on ranks four and higher, agreement drops significantly compared to rank three, yet is still quite high. Ranked fourth and fifth are statements addressing effects of reuse on the quality of the + + +(^{12}) While these “general impediments” are rather objective compared to developers’ beliefs about benefits and issues, they may still reflect individual developer’s opinions, having been measured by asking the developers. +software being developed by making it more stable and more compatible with standards. The statement ranked eighth, about the effects of code reuse on software security also pertains to this group, however, it receives considerably less agreement. This could be explained by the fact that many OSS projects develop types of software for which security is not a major concern, for example, games. Ranked sixth and seventh are statements that position reuse as a means for developers to select their project tasks by preference and avoid mundane jobs. An example for this is “outsourcing” maintenance work to the original developers of the reused code who fix bugs or implement new functionality in the code of which the reusing developer benefits without having to do this work by herself. + + +| Reuse benefits as perceived by developers (in % of developers) | Share agreement | Share disagreement | +|---------------------------------------------------------------|-----------------|--------------------| +| 1. Reusing helps developers realize their project goals/ tasks faster | 92% | 3% | +| 2. Reusing allows developers to spend their time on the most important tasks of the project | 91% | 9% | +| 3. Reusing allows developers to solve difficult problems for which they lack the expertise | 85% | 14% | +| 4. Reusing helps developers create more reliable/ stable software, e.g. less bugs | 74% | 12% | +| 5. Reusing ensures compatibility with standards, e.g. the look and feel of GUIs | 72% | 14% | +| 6. Reusing allows developers to spend their time on the development activities they have most fun doing | 67% | 24% | +| 7. Reusing allows developers to “outsource” maintenance tasks for certain parts of their code to developers outside of their project | 60% | 19% | +| 8. Reusing helps developers create more secure software, e.g. less vulnerabilities | 57% | 19% | + + +Note: The share of developers who are “indifferent” about the statements is not shown. N=624. + + +Figure 2: Share of Developers that Disagree/Agree to Reuse Benefits + + +In order to check consistency of responses and to construct factor scores to be used in the multivariate analyses later, an exploratory factor analysis is carried out. With four components, it explains 77.2 percent of total variance and yields good quality measures (KMO: 0.76, p<0.0001). The resulting components can be interpreted as development efficiency (ranks 1, 2), software quality (ranks 4, 5, 8), task selection (ranks 6, 7), and development effectiveness (rank 3). +---------------------------------------- +------------------------------- +Section 209: +13 For better interpretability of the resulting components, components with an Eigenvalue of less than 1 were also extracted. The fourth component had an Eigenvalue of 0.79. + + +14 The factor analysis uses principal component analysis and Varimax rotation. Cronbach’s $\alpha$ for the components software quality, development efficiency, and task selection is 0.80, 0.72 and 0.47, respectively. See Table A3 in the Appendix for detailed factor loadings. +Following the benefits of code reuse, nine issues and drawbacks identified in our interviews and existing literature (shown in Figure 3) were presented to participants who were again asked to indicate their agreement to the respective statements. The highest share of agreement was received by a statement pointing to the loss of control that a developer may have to accept when reusing code. Statements ranked second and third also relate to losing control, however, with significantly lower levels of agreement. The statement ranked second points to software being more difficult to install (build) and use by end-users due to technical dependencies, while the statement ranked third reflects the developer’s obligation to check and integrate updates of reused code.\textsuperscript{15} Ranked fourth, fifth, and eighth—and again with significantly lower levels of agreement than the previous statements—are two potential issues of code reuse that point to quality and security risks. The statements ranked sixth, seventh, and ninth all describe situations where development from scratch is more efficient than code reuse. They do, however, receive at least 50 percent disagreement, which emphasizes that most developers do not deem searching, understanding, and adapting reusable code as inefficient. + + +\begin{figure}[h] +\centering +\includegraphics[width=\textwidth]{figure3.png} +\caption{Share of Developers that Disagree/Agree to Reuse Issues and Drawbacks} +\end{figure} + + +\textsuperscript{15} Both statements mainly refer to component reuse and are only partially applicable to snippet reuse. +An exploratory factor analysis of these issues and drawbacks explains 69.0 percent of total variance with three components, and yields good quality measures KMO: 0.72, p<0.0001). The resulting components can be interpreted as control loss (ranks 1, 2, 3), quality risks (ranks 4, 5, 8), and inefficiency of reuse (ranks 6, 7, 9).\textsuperscript{16} + + +To consolidate the number of variables in the multivariate model employed later, a further factor analysis merged the software quality benefits and the quality risks into one component. Further, the development efficiency benefits were merged with the inefficiency of reuse. The five final components used in the multivariate model are: effectiveness benefits, efficiency benefits, quality benefits, task selection benefits, and loss of control risks. + + +While the benefits and issues/drawbacks of code reuse were subjective and perceived by the individual developer, there also exist general impediments to reuse. These general impediments which resulted from our interviews and existing literature make code reuse difficult or impossible even if the individual developer wanted to rely on existing code (see Figure 4). Interestingly, however, all four statements offered to the surveyed developers received more disagreement than agreement. The statement “there exist only very few reusable resources for my current main project” ranked first, with 39 percent of the developers agreeing. Oneway ANOVA analysis used to identify for which projects there exist least reusable resources found only the target operating system of a project having a significant influence on the availability of reusable code (p=0.0497). Projects that are not developed for POSIX operating system systems (e.g., Linux) or Windows have less reusable code at their disposal. Neither the type of the project (e.g., “Software Development,” “Scientific and Engineering,” or “Games and Entertainment”) had any significant influence (p=0.2440), nor did the graphical user interface employed by the project (0.1171). + + +Ranked as the second general impediment to code reuse with 24 percent agreement are license incompatibilities. Such a situation would occur, for example, if a programmer wanted to reuse code snippets licensed under the GPL in a project licensed under the BSD license. As expected, the license of the developer’s main project significantly influences this general impediment (Oneway ANOVA, \textsuperscript{16} The factor analysis uses principal component analysis and Varimax rotation. Cronbach’s $\alpha$ for the components control loss, quality risks, and inefficiency of reuse is 0.66, 0.76 and 0.85, respectively. See Table A4 in the Appendix for detailed factor loadings.) +p<0.0001), with developers working on GPL licensed projects least likely to perceive this as an issue. However, the low share of agreement is surprising. Three possible explanations for this finding seem plausible: First, there might exist enough reusable code in each license category. Second, developers might able to mitigate the license incompatibilities through modular project architectures that clearly separate modules under different licenses and thus avoid contamination issues (Henkel and Baldwin, 2009). Third, developers are not knowledgeable about license incompatibilities and ignore the potential issues. Ranked third and fourth with 17 percent and nine percent agreement, respectively, are the architecture of the developer’s current main project being not modular enough to allow for easy integration of reusable code (rank 3) and incompatibilities between the project’s main programming language and the programming language of the code the developer wants to reuse (rank 4). Both are significantly dependent on the programming language of the developer’s project (Oneway ANOVA, p=0.0036 and p<0.0001 for rank 3 and rank 4, respectively), with C++ and Java as object-oriented languages posing the least issues. + + +| General impediments to reuse as perceived by developers (in % of developers) | +|-----------------------------------------------------------------------------| +| 1. There exist only very few reusable resources for my current main project | +| 2. License issues make reusing in my current main project very difficult, e.g. reusing a GPL component would require the license of my current main project to be changed to GPL as well | +| 3. The software architecture of my current main project makes reusing very difficult, e.g. the architecture of my current main projects is not very modular | +| 4. The programming language of my current main projects makes reusing very difficult, e.g. the programming language of my current main projects makes including popular libraries difficult | + + +Note: The share of developers who are “indifferent” about the statements is not shown. N=624. + + +Figure 4: Share of Developers that Disagree/Agree to General Reuse Impediments + + +5.3. Multivariate Analysis of Reuse Behavior + + +Following the descriptive analysis, the objective of our research model is to explain the observed heterogeneity in developers’ reuse behavior found earlier with both developer and project characteristics. We test the research model with our three different measures of reuse behavior as dependent variables in three different +regression models in order to ensure robustness of results.\textsuperscript{17} All three models will be tested using Tobit regressions as their dependent variables are restricted to either [0-100\%] or [1-7].\textsuperscript{18} A summary of the research model hypotheses and the support they received in the multivariate analyses is presented in Table 3 while the detailed regression tables containing the Tobit models are depicted in Table 4. As a further robustness check, we ran specifications of the three models with successive elimination of insignificant variables. The results of this robustness check which are largely consistent with the results of the main models are shown in Table A7 in the Appendix. The results of the multivariate analyses are presented and discussed in the following. + + +| Hypotheses | Confirmed? | +|------------|------------| +| +Attitude toward reuse: + Developers reuse more on existing code… | | +| H1a: …the more strongly they perceive the effectiveness benefits of reuse. | ✓ | +| H1b: …the more strongly they perceive the efficiency benefits of reuse. | ✓ | +| H1c: …the more strongly they perceive the quality benefits of reuse. | ✓ | +| H1d: …the more strongly they perceive the task selection benefits of reuse. | ✓ | +| H1e: …the less strongly they perceive the loss of control risks of code reuse. | ✗ | +| +Access to local search: + Developers reuse more existing code… | | +| H2a: …the larger their personal OSS network. | ✓ | +| H2b: …the greater the number of OSS projects they have been involved in. | ✓ | +| +Project maturity: + | | +| H3: Developers reuse more existing code the less mature their project. | ✓ | +| +Compatibility with project goals: + Developers reuse more existing code… | | +| H4a: …the less important challenge seeking is for them as a motivation to work on their project. | ✗ | +| H4b: …the less important coding fun and enjoyment is for them as a motivation to work on their project. | ✗ | +| H4c: …the more important skill improvement is for them as a motivation to work on their project. | ✓ | +| H4d: …the more important community commitment is for them as a motivation to work on their project. | ✓ | +| H4e: …the more important OSS reputation building is for them as a motivation to work on their project. | ✗ | +| H4f: …the more important commercial signaling is for them as a motivation to work on their project. | ✗ | + + +Legend: ✓: fully confirmed; ✓: partially confirmed; ✗: not supported + + +\textsuperscript{17} Descriptive statistics of all explanatory variables are depicted in Table A5 in the Appendix. The correlation matrix is shown in Table A6 in the Appendix. + + +\textsuperscript{18} In contrast to an OLS regression, a Tobit model accounts for the censoring of the dependent variable. In the present case this means, for example, that the share of functionality from reused resources cannot be less than zero percent, or larger than 100 percent. +Table 4: Multivariate Analysis of Developers’ Reuse Behavior + + +| | Past importance of reuse | (1) Likert scale | (2) Percentage scale | (3) Future importance of reuse (Likert scale) | +|--------------------------------|--------------------------|------------------|----------------------|-----------------------------------------------| +| +Attitude toward reuse + | | | | | +| BenefitEffectiveness (H1a) | 0.222 + (0.076) | 2.701 + (1.021) | 0.168 + (0.063) | | +| BenefitEfficiency (H1b) | 0.653 + (0.084) | 5.959 + (1.114) | 0.517 + (0.069) | | +| BenefitQuality (H1c) | 0.303 + (0.081) | 1.800 + (1.073) | 0.250 + (0.067) | | +| BenefitTaskSelection (H1d) | 0.155 + (0.078) | 3.528 + (1.041) | 0.132 + (0.064) | | +| IssueControlLoss (H1e) | -0.030 (0.077) | -0.506 (1.036) | -0.004 (0.064) | | +| +Access to local search + | | | | | +| DevOSSNetsize (log) (H2a) | 0.165 + (0.083) | 2.098 + (1.102) | 0.230 + (0.069) | | +| DevOtherProjects (H2b) | 0.022 (0.016) | 0.398 + (0.208) | 0.032 + (0.013) | | +| +Project maturity + | | | | | +| ProjPhase (H3) | -0.149 + (0.070) | -3.227 + (0.928) | -0.219 + (0.057) | | +| +Compatibility with project goals + | | | | | +| MotChallenge (H4a) | -0.148 + (0.083) | -2.559 + (1.103) | -0.067 (0.068) | | +| MotFun (H4b) | 0.098 (0.080) | 0.575 (1.072) | 0.055 (0.066) | | +| MotLearning (H4c) | 0.003 (0.080) | -1.438 (1.053) | -0.015 (0.066) | | +| MotCommunity (H4d) | 0.177 + (0.086) | 1.964 + (1.150) | 0.148 + (0.071) | | +| MotOSSReputation (H4e) | 0.005 (0.057) | 0.128 (0.758) | 0.065 (0.047) | | +| MotSignaling (H4f) | -0.054 (0.061) | 0.336 (0.817) | 0.013 (0.051) | | +| +Subjective norms + | | | | | +| DevNorm | 0.140 + (0.066) | 2.372 + (0.887) | 0.197 + (0.055) | | +| +Perceived behavioral control + | | | | | +| ProjPolSupport | 0.440 + (0.200) | 0.946 (2.670) | 0.297 + (0.165) | | +| ProjPolDiscourage | -1.087 + (0.457) | -4.977 (6.161) | -1.279 + (0.383) | | +| ConditionLack | -0.250 + (0.044) | -2.317 + (0.589) | -0.168 + (0.036) | | +| ConditionLicense | 0.065 (0.045) | 0.309 (0.599) | 0.018 (0.037) | | +| ConditionLanguage | 0.030 (0.060) | -0.071 (0.802) | 0.060 (0.049) | | +| ConditionArchitecture | 0.017 (0.052) | 0.481 (0.698) | 0.017 (0.043) | | +| DevSkill | -0.075 (0.095) | -0.123 (1.270) | -0.018 (0.078) | | +| +Further control variables + | | | | | +| ProjSize | 0.000 (0.002) | -0.021 (0.024) | -0.002 (0.001) | | +| ProjComplexity | 0.131 (0.092) | 2.194 + (1.236) | 0.0190 (0.076) | | +| ProjStack | 0.210 + (0.091) | 1.499 (1.209) | 0.135 + (0.074) | | +| ProjStandalone | 0.118 (0.197) | 0.233 (2.633) | 0.203 (0.163) | | +| DevOSSExperience | 0.010 (0.018) | 0.076 (0.249) | 0.000 (0.015) | | +| DevProjTime | 0.014 + (0.008) | -0.039 (0.107) | 0.008 (0.007) | | +| DevProjShare | 0.003 (0.002) | 0.031 (0.033) | 0.001 (0.002) | | +| DevProf | 0.056 (0.186) | 0.214 (2.492) | 0.184 (0.154) | | +| DevEduReuse | -0.127 (0.165) | -1.177 (2.201) | -0.266* (0.136) | | +| DevProfEduReuse | 0.603 + (0.237) | 5.883 + (3.094) | 0.378 + (0.193) | | +| Residence-N. America | -0.159 (0.181) | -3.310 (2.408) | 0.120 (0.149) | | +| Residence-S. America | 0.236 (0.359) | -3.424 (4.743) | -0.013 (0.294) | | +| Residence-Asia & RoW | -0.102 (0.226) | 0.764 (3.031) | -0.109 (0.187) | | +| Constant | 3.026 + (0.888) | 23.275 + (11.87) | 2.545 +** (0.731) | | +| Observations | 624 | 624 | 624 | | +| Pseudo R² | 0.107 | 0.029 | 0.119 | | +| Likelihood ratio | Χ²(35)=267.42, p<0.0001 | Χ²(35)=162.74, p<0.0001 | Χ²(35)=289.55, p<0.0001 | | +| σ | 1.790 | 24.337 | 1.493 | | + + +Notes: All models are Tobit models; standard errors in parentheses; * significant at 10%; ** significant at 5%; *** significant at 1%. + + +Electronic copy available at: https://ssrn.com/abstract=1489789 +5.3.1. Attitude Toward Reuse + + +The regression results confirm hypotheses $H1a$ to $H1d$. Developers who perceive higher effectiveness, efficiency, quality, or task selection benefits from code reuse attribute a higher importance to it and practice it more. The coefficients for all four hypotheses are positive and significant for all dependent variables and all specifications. In contrast, hypothesis $H1e$ is not confirmed. The data does not show that developers who fear to lose control over their project reuse less code. This is surprising as, in our descriptive analysis, loss of control was ranked as the main issue developers have with code reuse. A plausible interpretation is that developers’ concerns about losing control over their project affect their decision as to which code to reuse, but do not affect the total amount of code they reuse. For example, developers concerned about losing control might choose to reuse only components developed by other projects that have a proven track record of fixing bugs quickly and keeping the structure of their code stable (Haefliger et al., 2008). + + +5.3.2. Access to Local Search + + +The effect of developers’ access to local search on their reuse behavior was captured by the logarithm of the size of their OSS network ($H2a$) and the number of other OSS projects they have been involved in ($H2b$). Hypothesis $H2a$ is confirmed in all models while $H2b$ is confirmed only partially, its coefficient not being significant in model 1. Nonetheless, all coefficients are positive in all models, supporting our assumption that developers that can access, evaluate, understand, and integrate reusable code more easily due to local search practice more code reuse. + + +The finding that the number of years a developer has been involved in OSS does not exhibit a significant effect on her reuse behavior (see control variable DevOSSExperience) is consistent with our argumentation regarding local search. We had claimed that developers who can turn to their personal OSS network or their experience in other OSS projects reuse more because of their better access to local search. A greater number of years involved in OSS alone does not yet facilitate such better access because for example a developer with ten years of OSS work spent in only one project does not have access to local search regarding which code other projects use to solve a particular problem. +5.3.3. Project Maturity + + +Our hypothesis that developers reuse less code once their project has matured (H3) is confirmed across all dependent variables and specifications.(^{19}) Developers do indeed seem to leverage reuse as a tool to deliver a “credible promise” early on and overcome liabilities of newness to get on a par with competing existing projects, while later project phases call for specific refinements of their projects where there is less available code to reuse. + + +5.3.4. Compatibility with Project Goals + + +Regarding the compatibility of code reuse with a developer’s individual project goals, hypothesis (H4d) (community commitment) is confirmed in all models except model 2; (H4a) (challenge seeking) is confirmed only in models with past reuse as the dependent variable (models 1, 2 and 5). For all other hypotheses (coding fun and enjoyment ((H4b)), skill improvement ((H4c)), OSS reputation building ((H4e)), and commercial signaling ((H4f))) the null hypothesis cannot be rejected. + + +The support for hypothesis (H4d) highlights that developers who feel they are part of the OSS community and want it to grow and be successful rely more on code reuse than other developers. Code reuse is compatible with their goal of contributing to the OSS community because by leveraging code reuse they can contribute more and in higher quality.(^{20}) The partial confirmation of hypothesis (H4a) supports our assumption that the developers’ goal to seek and tackle technical challenges impedes code reuse. By reusing existing code, developers would not be denied the pleasure of solving a problem by themselves. Thus, they would rather refrain from code reuse if challenge seeking is of major importance to them in their OSS work. The finding that the respective coefficient is not significant when the dependent variable is the developers’ future intent to reuse may be due to the desire + + +(^{19}) Note that in models 1, 2, 4 and 5 where past reuse behavior is the dependent variable, the amount of reused code reported by developers with projects in later development phases is their average reuse level including the assumed high levels of code reuse of early phases and the proposed lower levels of later phases. However, if reuse goes down with maturity as proposed then also average reuse decreases over the lifetime of a project. + + +(^{20}) Moreover, developers who are more sympathetic toward the OSS community might also be affected by the general positive attitude toward reuse of this community (e.g. Raymond, 2001). This effect is, however, captured via subjective norms as control variable. +developers may have to solve a problem by themselves, without external help, is something that can occur spontaneously and is thus difficult to predict. + + +We now turn to those hypotheses that are not supported. We had argued that similarly to challenge seeking, the fun and enjoyment developers experience when writing code leads them to reuse less code ((H4b)), but we cannot confirm this hypothesis. In fact, the respective coefficients are not negative as expected, but positive, though insignificant. The remaining unconfirmed hypotheses, skill improvement ((H4c)), OSS reputation building ((H4e)) and commercial signaling ((H4f)) partially show varying coefficient signs. This could be because, contrary to our assumptions, code reuse could be both supportive as well as detrimental to these goals. While reused code could be used as an example to improve programming skills, it could also hamper learning if developers treat the reused code as a black box. Regarding reputation building and commercial signaling, we had expected that developers who make their projects more successful with the help of code reuse are regarded more highly in the OSS community and can present themselves as better developers to potential employers or business partners. However, it is also possible that in certain situations the code created by developers themselves without the help of code reuse is important to build their OSS reputation or signal skills to potential employers and partners. In these situations, developers would refrain from code reuse if reputation building or signaling is a main motivation for their OSS work. + + +5.3.5. Control Variables + + +Due to the large number of control variables included in our model, we only point out a few main results. The social norms as perceived by developers show a consistently significant and positive influence as predicted by TPB. Consequently, OSS developers who feel that their peers appreciate them reusing existing code will reuse more. Of the variables describing developers’ perceived behavioral control, the lack of reusable code has a consistently negative and significant influence on reuse behavior. With the exception of one dependent variable, project policies discouraging reuse lead to reduced code reuse, while policies promoting reuse are found to significantly increase reuse behavior in three models (1, 4, 6). Lastly, developers who had received training on reuse in companies, practice significantly more code reuse, while developers who had only learned about reuse during their +academic education do not differ in their code reuse behavior from developers who had not had reuse in their curriculum. + + +To summarize, the regression analyses shed light on developers’ code reuse behavior. In particular, the (partially) confirmed hypotheses $H2$ (access to local search), $H3$ (project maturity), and $H4a$ (challenge seeking) provide interesting findings that are also relevant beyond the scope of OSS. + + +5.4. Possible threats to validity and limitations of the study + + +In the following we employ the four generally accepted criteria of validity (Cook and Campbell, 1979) as our structure: Construct validity, internal validity, statistical conclusion validity and external validity. + + +Construct validity threats concern the ability to measure what we are interested in measuring. As pointed out in sections 4 and 5, the measures employed in this study are based on existing measures from other studies and our interviews. All measures were assessed for clarity by other researchers and OSS developers during pretests as described above. Furthermore, all multi-item constructs were quantitatively gauged with regards to reliability, convergent validity, and discriminant validity. We thus consider our study to possess sufficient construct validity. Nonetheless, a potential issue is whether developers are able to accurately estimate their level of code reuse in a questionnaire. However, while an additional verification of our results using an objective measure of code reuse is certainly worthwhile, developers in our pretests convinced us that they can, with considerable precision, estimate their degree of code reuse. Furthermore, to ensure robustness of our findings we have employed three different measures of code reuse in the survey. Finally, also many other reuse studies rely on reported reuse levels (e.g. Frakes and Fox, 1995; Lee and Litecky, 1997). + + +Internal validity, maintaining that there should not exist alternative explanations for the relationships identified between our research model constructs, should also be given since our research model relies on the well established TPB and because we have included multiple further control variables derived from our interviews and OSS or reuse literature. A potential issue is our approach to deal with component and snippet reuse simultaneously. If component reuse in OSS development equaled black-box reuse there might exist different drivers for it than +for snippet reuse. However, because we find that about 50% of the surveyed developers modify the components they reuse we argue that at least in the OSS context component reuse does not constitute typical black-box reuse. Consequently, we expect both component and snippet reuse to be influenced by largely the same drivers. + + +In addition to that, we also consider our results to be valid with regard to our statistical conclusions since they are based on a sample of considerable size and backed by the significance levels of our hypotheses as well as the largely consistent results in various model specifications and with various dependent variables. + + +Finally, external validity threats concern the generalization of our findings. In line with the other main studies of individual OSS developers we drew our sample from SourceForge.net developers. As pointed out in chapter 4, we have no reason to believe that our sample is not representative of SourceForge.net developers. Thus, generalization for this most frequently researched group of OSS developers should be feasible. To ensure external validity when generalizing to OSS developers registered on other platforms (where e.g. projects are larger) or to traditional software developers working on proprietary software in commercial firms it would be necessary to replicate our study in these settings. However, both our data as well as our research model suggest that generalization to other contexts should yield similar results. For example on the data side we do not find significant differences between the reuse behavior of paid and hobbyist OSS developers. Regarding the research model, it would be surprising to find that rather general hypotheses such as the effect of network size or challenge seeking work differently in the context of proprietary software development. + + + + +Conclusion + + + + +In this paper, we set out to use quantitative data obtained through a survey to explain and understand code reuse in OSS projects. Contributing to the emerging stream of scholarly work on code reuse in OSS, we present strong evidence that code reuse is of major importance in OSS development and has contributed to its success. We further show that OSS developers perceive efficiency and effectiveness as the main benefits of code reuse. Of relevance not only to OSS research but also to the domains of software engineering and the receiving side of +open innovation processes in general, our investigation of drivers of code reuse finds that developers with better access to local search due to a larger personal OSS network or more exposure to different OSS projects reuse more existing code, presumably because their costs of accessing this code are lower. Further, developers convinced of the benefits of code reuse (efficiency and effectiveness gains, enhanced software quality, and the chance to work on preferred tasks) practice it more, as do developers who can use code reuse to support their goal of serving the OSS community. Moreover, developers see code reuse as a means to kick-start new projects as it helps them deliver a “credible promise” and close the gap to existing and competing projects more quickly. Lastly, we find partial support for our hypothesis that those developers who desire to solve technical problems for the satisfaction of it refrain from reuse and, thus, make their projects less efficient and effective than they could be. + + +As academic work on code reuse in OSS has only just begun, it merits further research. While our study has addressed development with reuse, future work should investigate development for reuse, that is OSS projects which develop components primarily intended to be reused in other projects. Questions of relevance in this context are: why do developers bear the reportedly large additional costs of writing reusable code,(^{21}) or have they have found ways to mitigate them. Additionally, as has been pointed out by Haefliger et al. (2008), the strategies that OSS developers employ to make their reusable code known and reused deserve investigation. Moreover, the limitations of our work open up several further research avenues. First, our dependent variables reflect developers’ subjective perception of the importance of code reuse for their OSS work. In an alternative way, and potentially adding robustness to our findings, the importance of reuse could be captured more objectively by analyzing the code of a project. Similarly, independent variables captured from other data sources could be added to our model. For example, social network data derived from SourceForge.net (e.g. Fershtman and Gandal, 2009) could be employed to further extend and test our hypotheses on local search. Moreover, we have described code reuse in general, not differentiating between its various forms (components, snippets, algorithms). A more fine-grained analysis using these dimensions might yield further insights into the mechanics of + + +(^{21}) For example Tracz (1995) estimates that writing reusable code leads to 100 percent of additional effort. +code reuse in OSS projects. Finally, while we have focused on developers and their projects as determinants of code reuse, future work could employ an even more detailed approach and analyze single reuse incidents, incorporating developers, their projects, and the artifacts they consider for reuse. Such an approach could, for instance, analyze the impact of the quality of the relationship between the “giving” and the “receiving” side of the open innovation process on code reuse. + + +Beyond their scholarly implications, our findings are also of relevance to managerial practice. They highlight the high level of reuse within the OSS community that should provide motivation to firms to also leverage existing OSS code in their software development, thereby partly mitigating the typically high upfront investment costs of building an internal reuse library for artifacts that are not firm-specific (Frakes and Kang, 2005). Further, if they intend to pursue this avenue of reusing OSS code, commercial firms should encourage and support their employees to enhance their access to local search for OSS code by building personal OSS networks and by becoming involved in various OSS projects. Beyond reuse of OSS code, modified incentives and development processes based on our findings could support internal corporate reuse activities in software engineering and beyond. As part of such modifications, developers could be provided with the option to select tasks themselves, according to their preference, they could be compensated according to their work results delivered and not based on the time they have spent at work, and they could be required to deliver “credible promises” in new development projects (Haefliger et al., 2008). Lastly, to accommodate the desire of developers to tackle difficult technical challenges, which makes them reuse less than they could, firms could consider job enrichment (e.g. Herzberg, 1968) to integrate challenges into developers’ work that are in the best interest of the firm, thereby accommodating the needs of both developer and firm. + + +22 Obviously this has to be in accordance with the licenses of the OSS code. However, well-designed product architectures can mitigate many of the issues potentially arising here (Henkel and Baldwin, 2009). +7. References + + +Ajzen, I. (1991) "The Theory of Planned Behavior," +Organizational Behavior and Human Decision Processes + 50 (2), pp. 179-211. + + +Ajzen, I. (2002) "Constructing a TpB Questionnaire: Conceptual and Methodological Considerations," Manuscript, University of Massachusetts, Available at URL: http://people.umass.edu/aizen/pdf/tpb.measurement.pdf. + + +Aldrich, H. and E. Auster (1986) "Even Dwarfs Started Small: Liabilities of Age and Size and Their Strategic Implications," in Cummings, L. and B. Staw (Eds.) +Research in Organizational Behavior +, San Francisco, CA: JAI Press, pp. 165-198. + + +Amabile, T.M., K.G. Hill, A. Hennessey, and E.M. Tighe (1994) "The Work Preference Inventory: Assessing Intrinsic and Extrinsic Motivational Orientations," +Journal of Personality and Social Psychology + 66 (5), pp. 950-967. + + +Armitage, C. and M. Conner (2001) "The Theory of Planned Behavior," +British Journal of Social Psychology + 40 (4), pp. 471-499. + + +Baldwin, C.Y. and K.B. Clark (2006) "The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model?," +Management Science + 52 (7), pp. 1116-1127. + + +Banker, R.D., R.J. Kauffman, and D. Zweig (1993) "Repository Evaluation on Software Reuse," +IEEE Transactions of Software Engineering + 19 (4), pp. 379-389. + + +Bonaccorsi, A., S. Giannangeli, and C. Rossi (2006) "Entry Strategies under Competing Standards: Hybrid Business Models in the Open Source Software Industry," +Management Science + 52 (7), pp. 1085-1098. + + +Chang, H.-F.A. and A. Mockus (2008) "Evaluation of Source Code Copy Detection Methods on FreeBSD," +International Working Conference on Mining Software Repositories +, Leipzig, Germany. + + +Chesbrough, H.W. (2003) +Open Innovation. The New Imperative for Creating and Profiting from Technology +. Boston, MA: Harvard Business School Press. + + +Clary, E.G., M. Snyder, R.D. Ridge, J. Copeland, A.A. Stukas, and J. Haugen (1998) "Understanding and Assessing the Motivations of Volunteers: A Functional Approach," +Journal of Personality and Social Psychology + 74 (6), pp. 1516-1530. + + +Cook, T.D. and D.T. Campbell (1979) +Quasi-Experimentation: Design and Analysis Issues for Field Setting +. Chicago, IL: Rand McNally. + + +Crowston, K. and B. Scozzi (2008) "Bug Fixing Practices within Free/Libre Open Source Software Development Teams," +Journal of Database Management + 19 (2), pp. 1-30. + + +Crowston, K., K. Wei, J. Howison, and A. Wiggins (2009) "Free/Libre Open Source Software Development: What We Know and What We Do Not Know," (07.07.2009), Working Paper, Available at URL: http://floss.syr.edu/StudyP/Review%20Paper_070709.pdf. + + +Electronic copy available at: https://ssrn.com/abstract=1489789 +Csíkszentmihályi, M. (1990) +Flow: The Psychology of Optimal Experience +. New York, NY: Harper and Row. + + +Cusumano, M. and C. Kemerer (1990) "A Quantitative Analysis of U.S. And Japanese Practice in Software Development," +Management Science + 36 (11), pp. 1384-1406. + + +Dahlander, L. (2005) "Appropriation and Appropriability in Open Source Software," +International Journal of Innovation Management + 9 (3), pp. 259-285. + + +Davis, F.D., R.P. Bagozzi, and R.P. Warshaw (1989) "User Acceptance of Computer Technology: A Comparison of Two Theoretical Models," +Management Science + 35 (8), pp. 982-1002. + + +Desouza, K.C., Y. Awazu, and A. Tiwana (2006) "Four Dynamics for Bringing Use Back into Software Reuse," +Communications of the ACM + 49 (1), pp. 96-100. + + +DiBona, C. (2005) "Open Source and Proprietary Software Development," in DiBona, C., D. Cooper, and M. Stone (Eds.) +Open Source 2.0: The Continuing Evolution +, Sebastopol, CA: O'Reilly Media. + + +DiBona, C., J. Ockerbloom, and M. Stone (1999) "Introduction," in DiBona, C., S. Ockman, and M. Stone (Eds.) +Open Sources: Voices of the Open Source Revolution +, Sebastopol, CA: O'Reilly & Associates, pp. 1-17. + + +Fershtman, C. and N. Gandal (2009) "R&D Spillovers: The 'Social Network' of Open Source," (16.05.2009), Working Paper, Available at URL: http://www.tau.ac.il/~gandal/OSS.pdf. + + +Fornell, C. and F. Larcker (1981) "Evaluating Structural Equation Models with Unobservable Variables and Measurement Error," +Journal of Marketing Research + 13 (1), pp. 39-50. + + +Frakes, W.B. and C.J. Fox (1995) "Sixteen Questions About Software Reuse," +Communications of the ACM + 38 (6), pp. 75-87. + + +Frakes, W.B. and K. Kang (2005) "Software Reuse Research: Status and Future," +IEEE Transactions of Software Engineering + 31 (7), pp. 529 - 536. + + +German, D.M. (2007) "Using Software Distributions to Understand the Relationship among Free and Open Source Software Projects," +4th International Workshop on Mining Software Repositories +, Minneapolis, MN. + + +Ghosh, R.A., R. Glott, B. Krieger, and G. Robles (2002) "Free/Libre and Open Source Software: Survey and Study - Deliverable D18: Final Report - Part IV: Survey of Developers," Available at URL: http://www.infonomics.nl/FLOSS/report/FLOSS_Final4.pdf. + + +Gruber, M. and J. Henkel (2005) "New Ventures Based on Open Innovation - an Empirical Analysis of Start-up Firms in Embedded Linux," +International Journal of Technology Management + 33 (4), pp. 354-372. + + +Haefliger, S., G. von Krogh, and S. Spaeth (2008) "Code Reuse in Open Source Software," +Management Science + 54 (1), pp. 180-193. + + +Hair, J.F., Jr., R.L. Tataham, J.E. Anderson, and W. Black (2006) +Multivariate Data Analysis +. Upper Saddle River, NJ: Pearson Prentice Hall. +Hardgrave, B.C., F.D. Davis, and C.K. Riemenschneider (2003) "Investigating Determinants of Software Developers' Intentions to Follow Methodologies," +Journal of Management Information Systems + 20 (1), pp. 123-151. + + +Hardgrave, B.C. and R.A. Johnson (2003) "Toward an Information Systems Development Acceptance Model: The Case of Object-Oriented Systems Development," +IEEE Transactions on Engineering Management + 50 (3), pp. 322-336. + + +Hars, A. and S. Ou (2002) "Working for Free? Motivations for Participating in Open-Source Projects," +International Journal of Electronic Commerce + 6 (3), pp. 25-39. + + +Henkel, J. (2006) "Selective Revealing in Open Innovation Processes: The Case of Embedded Linux," +Research Policy + 35 (7), pp. 953-969. + + +Henkel, J. (2009) "Champions of Revealing - the Role of Open Source Developers in Commercial Firms," +Industrial and Corporate Change + 18 (3), pp. 435-471. + + +Henkel, J. and C.Y. Baldwin (2009) "Modularity for Value Appropriation: Drawing the Boundaries of Intellectual Property," (March 2009), Working Paper, Harvard Business School. + + +Hertel, G., S. Niedner, and S. Hermann (2003) "Motivation of Software Developers in the Open Source Projects: An Internet-Based Survey of Contributors to the Linux Kernel," +Research Policy + 32 (7), pp. 1159-1177. + + +Herzberg, F. (1968) "One More Time: How Do You Motivate Employees?," +Harvard Business Review + 46 (1), pp. 53-62. + + +Isoda, S. (1995) "Experience of a Software Reuse Project," +Journal of Systems and Software + 30, pp. 171-186. + + +Kim, Y.E. and E.A. Stohr (1998) "Software Reuse: Survey and Research Directions," +Journal of Management Information Systems + 14 (4), pp. 113-147. + + +Krueger, C.W. (1992) "Software Reuse," +ACM Computer Surveys + 24 (2), pp. 131-183. + + +Lakhani, K.R. and E. von Hippel (2003) "How Open Source Software Works: "Free" User-to-User Assistance," +Research Policy + 32 (6), pp. 923-943. + + +Lakhani, K.R. and R.G. Wolf (2005) "Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects," in Feller, J., B. Fitzgerald, S. Hissam, and K.R. Lakhani (Eds.) +Perspectives on Free and Open Source Software +, Cambridge, MA: MIT Press, pp. 3-22. + + +Langlois, R.N. (1999) "Scale, Scope, and the Reuse of Knowledge," in Dow, S.C. and P.E. Earl (Eds.) +Economic Organization and Economic Knowledge +, Cheltenham, UK: Edward Elgar, pp. 239-254. + + +Lee, N.-Y. and C.R. Litecky (1997) "An Empirical Study of Software Reuse with Special Attention to Ada," +Transactions on Software Engineering + 23 (9), pp. 537-549. + + +Lerner, J. and J. Tirole (2002) "Some Simple Economics of Open Source," +The Journal of Industrial Economics + 50 (2), pp. 197-234. +Majchrak, A., L.P. Cooper, and O.P. Neece (2004) "Knowledge Reuse for Innovation," +Management Science + 50 (2), pp. 174-188. + + +Mellarkod, V., R. Appan, D.R. Jones, and K. Sherif (2007) "A Multi-Level Analysis of Factors Affecting Software Developers' Intention to Reuse Software Assets: An Empirical Investigation," +Information & Management + 44 (7), pp. 613-625. + + +Mockus, A. (2007) "Large-Scale Code Reuse in Open Source Software," +1st International Workshop on Emerging Trends in FLOSS Research and Development +, Minneapolis, MN. + + +Moore, G.C. and I. Benbasat (1991) "Development of an Instrument to Measure the Perceptions of Adopting an Information Technology Innovation," +Information Systems Research + 2 (3), pp. 192-222. + + +Morisio, M., M. Ezran, and C. Tully (2002) "Success and Failure Factors in Software Reuse," +IEEE Transactions on Software Engineering + 28 (4), pp. 340-357. + + +Naur, P. and B. Randell (1968) +Software Engineering; Report on a Conference by the Nato Science Committee +. Brussels, Belgium: NATO Science Affairs Division. + + +Nunnally, J.C. (1978) +Psychometric Theory +. New York, NY: McGraw-Hill. + + +Podsakoff, P.M., S.B. MacKenzie, J. Lee, and N.P. Podsakoff (2003) "Common Method Biases in Behavioral Research: A Critical Review of the Literature and Recommended Remedies," +Journal of Applied Psychology + 88 (5), pp. 879-903. + + +Ravichandran, T. and M.A. Rothenberger (2003) "Software Reuse Strategies and Component Markets," +Communications of the ACM + 46 (8), pp. 109-114. + + +Raymond, E.S. (2001) +The Cathedral and the Bazaar +. Sebastopol, CA: O'Reilly & Associates 2nd Edition. + + +Riemenschneider, C.K. and B.C. Hardgrave (2001) "Explaining Software Development Tool Use with the Technology Acceptance Model," +Journal of Computer Information Systems + 41 (4), pp. 1-8. + + +Riemenschneider, C.K., B.C. Hardgrave, and F.D. Davis (2002) "Explaining Software Developer Acceptance of Methodologies: A Comparison of Five Theoretical Models," +IEEE Transactions on Software Engineering + 28 (12), pp. 1135-1145. + + +Roberts, J.A., I. Hann, and S.A. Slaughter (2006) "Understanding the Motivations, Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects," +Management Science + 52 (7), pp. 984-999. + + +Rossi Lamastra, C. (2009) "Software Innovativeness: A Comparison between Proprietary and Free/Open Source Solutions Offered by Italian SMEs," +R&D Management + 39 (2), pp. 153-169. + + +Sen, A. (1997) "The Role of Opportunism in the Software Design Reuse Process," +IEEE Transactions of Software Engineering + 23 (7), pp. 418-436. + + +Sen, R., C. Subramaniam, and M.L. Nelson (2008) "Determinants of the Choice of Open Source Software License," +Journal of Management Information Systems + 25 (3), pp. 207-239. + + +Electronic copy available at: https://ssrn.com/abstract=1489789 +Sherif, K., R. Appan, and Z. Lin (2006) "Ressources and Incentives for the Adoption of Systematic Software Reuse," +International Journal of Information Management + 26 (1), pp. 70-80. + + +Spaeth, S., M. Stuermer, S. Haefliger, and G. Von Krogh (2007) "Sampling in Open Source Software Development: The Case for Using the Debian GNU/Linux Distribution," +40th Annual Hawaii International Conference on System Sciences +, Waikoloa, HI. + + +Stewart, K.J. and S. Gosain (2006) "The Impact of Ideology on Effectiveness in Open Source Software Teams," +MIS Quarterly + 30 (2), pp. 291-314. + + +Straub, D. (1989) "Validating Instruments in MIS Research," +MIS Quarterly + 13 (2), pp. 147-169. + + +Subramanyam, R. and M. Xia (2008) "Free/Libre Open Source Software Development in Developing and Developed Countries: A Conceptual Framework with an Exploratory Study," +Decision Support Systems + 46 (1), pp. 173-186. + + +Tracz, W. (1995) +Confessions of a Used Program Salesman: Institutionalizing Software Reuse +. Reading, MA: Addison-Wesley. + + +von Krogh, G., S. Spaeth, and S. Haefliger (2005) "Knowledge Reuse in Open Source Software: An Exploratory Study of 15 Open Source Projects," +38th Annual Hawaii International Conference on System Sciences +, Big Island, HI. + + +von Krogh, G., S. Spaeth, S. Haefliger, and M. Wallin (2008) "Open Source Software: What We Know (and Do Not Know) About Motives to Contribute," (April 2008), Working Paper, DIME Working Papers on Intellectual Property, Available at URL: http://www.dime-eu.org/files/active/0/WP38_vonKroghSpaethHaefligerWallin_IPROSS.pdf. + + +von Krogh, G., S. Spaeth, and K.R. Lakhani (2003) "Community, Joining, and Specialization in Open Source Software Innovation: A Case Study," +Research Policy + 32 (7), pp. 1217-1241. + + +Watson, S. and K. Hewett (2006) "A Multi-Theoretical Model of Knowledge Transfer in Organizations: Determinants of Knowledge Contribution and Knowledge Reuse," +Journal of Management Studies + 43 (2), pp. 141-173. + + +West, J. (2003) "How Open Is Open Enough? Melding Proprietary and Open Source Platform Strategies," +Research Policy + 32 (7), pp. 1259-1285. + + +Wu, C.-G., J.H. Gerlach, and C.E. Young (2007) "An Empirical Analysis of Open Source Software Developers’ Motivations and Continuance Intentions," +Information & Management + 44 (3), pp. 253-262. + + +Ye, Y. and G. Fischer (2005) "Reuse-Conducive Development Environments," +Automated Software Engineering + 12 (2), pp. 199-235. +---------------------------------------- +------------------------------- +Section 210: +Appendix +---------------------------------------- +------------------------------- +Section 211: +Table A1: Factor Analysis and Reliability of Developer Motivation Constructs + + +| Construct/item | 1 | 2 | 3 | 4 | 5 | 6 | Cronbach’s α | +|----------------|-------|-------|-------|-------|-------|-------|--------------| +| +1. Challenge seeking + | | | | | | | 0.807 | +| Chal1 | 0.052 | 0.794 | 0.137 | 0.203 | 0.007 | 0.043 | | +| Chal2 | -0.031| 0.891 | 0.119 | 0.135 | 0.034 | 0.019 | | +| Chal3 | 0.020 | 0.794 | 0.075 | 0.172 | -0.026| 0.026 | | +| +2. Coding fun and enjoyment + | | | | | | | 0.746 | +| Fun1 | 0.021 | 0.176 | 0.122 | 0.763 | -0.024| 0.111 | | +| Fun2 | -0.008| 0.284 | 0.217 | 0.718 | 0.100 | 0.005 | | +| Fun3 | 0.038 | 0.165 | 0.077 | 0.839 | 0.010 | 0.002 | | +| +3. Community commitment + | | | | | | | 0.640 | +| Com1 | -0.068| 0.043 | 0.109 | 0.055 | 0.154 | 0.743 | | +| Com2 | 0.138 | 0.112 | 0.010 | 0.027 | -0.099| 0.691 | | +| Com3 | -0.051| -0.017| 0.089 | 0.033 | 0.186 | 0.832 | | +| +4. Skill improvement + | | | | | | | 0.758 | +| Learn1 | 0.101 | 0.148 | 0.832 | 0.162 | 0.003 | 0.044 | | +| Learn2 | 0.192 | 0.120 | 0.831 | 0.159 | 0.027 | 0.058 | | +| Learn3 | 0.034 | 0.093 | 0.721 | -0.005| 0.190 | 0.125 | | +| +5. OSS reputation building + | | | | | | | 0.901 | +| OSSRep1 | 0.253 | -0.004| 0.053 | 0.035 | 0.892 | 0.098 | | +| OSSRep2 | 0.240 | 0.021 | 0.055 | 0.010 | 0.900 | 0.091 | | +| +6. Commercial signaling + | | | | | | | 0.866 | +| ComSig1 | 0.847 | 0.004 | 0.178 | 0.065 | 0.095 | 0.019 | | +| ComSig2 | 0.857 | -0.027| 0.087 | -0.007| 0.250 | -0.016| | +| ComSig3 | 0.800 | 0.056 | 0.045 | -0.009| 0.359 | -0.031| | + + +Notes: The factor analysis uses principal component analysis and Varimax rotation; high factor loadings under each component in the rotated matrix are indicated by bold text and gray shading. + + +N=624. +---------------------------------------- +------------------------------- +Section 212: +Table A2: Discriminant Analysis of Developer Motivation Constructs + + +| Construct/item | 1 | 2 | 3 | 4 | 5 | 6 | +|----------------|-------|-------|-------|-------|-------|-------| +| +1. Challenge seeking + | | | | | | 0.757 | +| +2. Coding fun and enjoyment + | 0.444 +| | | | | 0.705 | +| +*3. Community commitment + | 0.112 +| 0.132 +| | | | 0.657 | +| +4. Skill improvement + | 0.285 +| 0.323 +| 0.207 +| | | 0.751 | +| +*5. OSS reputation building + | 0.033 | 0.064 | 0.194 +| 0.189 +| | 0.906 | +| +6. Commercial signaling + | 0.047 | 0.063 | 0.026 | 0.254 +| 0.495 +| 0.832 | + + +Notes: The diagonal bolded entries are square roots of the average variance extracted (AVE) of the respective construct; the off-diagonal entries are standardized correlations between constructs; * correlation significant at 10%; ** correlation significant at 5%; *** correlation significant at 1% level. + + +N=624. + + +Electronic copy available at: https://ssrn.com/abstract=1489789 +---------------------------------------- +------------------------------- +Section 213: +Table A3: Exploratory Factor Analysis of Reuse Benefits + + +| Item (Rank in Figure 2) | 1 | 2 | 3 | 4 | +|----------------------------------|-------|-------|-------|-------| +| Difficult Problem (Rank 3) | 0.081 | 0.171 | 0.090 | +0.948 + | +| Faster (Rank 1) | 0.181 | +0.793 + | -0.001 | 0.326 | +| Most Important (Rank 2) | 0.176 | +0.834 + | 0.236 | 0.062 | +| Most Fun (Rank 6) | -0.021 | 0.414 | +0.743 + | 0.021 | +| Outs Maintenance (Rank 7) | 0.332 | -0.029 | +0.779 + | 0.162 | +| Reliable SW (Rank 4) | +0.840 + | 0.278 | 0.130 | -0.031 | +| Secure SW (Rank 8) | +0.872 + | 0.124 | 0.113 | 0.090 | +| Standard SW (Rank 5) | +0.739 + | 0.002 | 0.097 | 0.237 | + + +Notes: The factor analysis uses principal component analysis and Varimax rotation; high factor loadings under each component in the rotated matrix are indicated by bold text and gray shading. + + +N=624. +---------------------------------------- +------------------------------- +Section 214: +Table A4: Exploratory Factor Analysis of Reuse Issues and Drawbacks + + +| Item (Rank in Figure 3) | 1 | 2 | 3 | +|----------------------------------|-------|-------|-------| +| Finding (Rank 9) | +0.854 + | 0.089 | 0.036 | +| Understanding (Rank 7) | +0.876 + | 0.125 | 0.073 | +| Adapting (Rank 6) | +0.847 + | 0.165 | 0.087 | +| Quality Risks (Rank 5) | 0.156 | +0.934 + | 0.100 | +| Security Risks (Rank 4) | 0.088 | +0.935 + | 0.084 | +| Performance Loss (Rank 8) | 0.231 | +0.451 + | 0.284 | +| Installation (Rank 2) | 0.152 | 0.089 | +0.764 + | +| Dependence (Rank 1) | -0.051 | 0.118 | +0.785 + | +| Additional Work (Rank 3) | 0.162 | 0.162 | +0.707 + | + + +Notes: The factor analysis uses principal component analysis and Varimax rotation; high factor loadings under each component in the rotated matrix are indicated by bold text and gray shading. *The loading of this item on its construct is rather low, however, it is retained due to the good overall Cronbach’s $\alpha$ of the construct (0.76). + + +N=624. +Table A5: Descriptive Statistics of Explanatory Variables Used in Table 6 + + +| Variable | Dummy variable equal to “1” if… | Frequency of “0” | Frequency of “1” | +|-------------------|----------------------------------|------------------|------------------| +| ProjPolSupport | Developer’s current main project has a policy encouraging its developers to reuse | 438 (70%) | 186 (30%) | +| ProjPolDiscourage | Developer’s current main project has a policy discouraging its developers from reuse | 606 (97%) | 18 (3%) | +| ProjStandalone | Developer’s current main project is a standalone executable application project and not a component project | 162 (26%) | 462 (74%) | +| DevProf | Developer is working as professional developer or has worked as professional developer for a firm | 191 (31%) | 433 (69%) | +| DevEduReuse | Developer has received training on reuse during her education | 412 (66%) | 212 (34%) | +| DevProfEduReuse | Developer has received training on reuse when working as software developer for a firm | 544 (87%) | 80 (13%) | +| Residence-N.America | Developer resides in North America | 455 (73%) | 169 (27%) | +| Residence-S.America | Developer resides in South America | 594 (95%) | 30 (5%) | +| Residence-Asia&RoW | Developer resides Asia, Africa, Australia or Oceania | 536 (86%) | 88 (14%) | + + +| Variable | Explanation | Min. | Max. | Med. | Mean | S.D. | +|-------------------|-----------------------------------------------------------------------------|-------|-------|-------|-------|-------| +| Benefit-Effectiveness | Factor score from exploratory factor analysis… on developer’s perception of effectiveness effects of code reuse | -4.762 | 2.047 | 0.178 | 0 | 1 | +| Benefit-Efficiency | …on developer’s perception of efficiency effects of code reuse | -3.568 | 2.313 | 0.093 | 0 | 1 | +| BenefitQuality | …on developer’s perception of quality effects of code reuse | -3.972 | 2.909 | -0.027| 0 | 1 | +| Benefit-TaskSelection | …on developer’s perception of task selection effects of code reuse | -3.884 | 3.026 | 0.033 | 0 | 1 | +| Issue-ControlLoss | …on developer’s perception of control loss effects of code reuse | -3.781 | 2.376 | 0.065 | 0 | 1 | +| DevOSS-Netsize (log) | Size of developer’s personal OSS network (as logarithm) | 0 | 6.217 | 2.197 | 2.001 | 1.033 | +| DevOtherProjects | Number of OSS projects besides current main project, that developer has ever been involved in | 0 | 48 | 2 | 3.617 | 5.388 | +| ProjPhase | Development phase of developer’s current main project (1=Pre-Alpha, 2=Alpha, 3=Beta, 4=Stable/Production, 5=Mature) | 1 | 5 | 3 | 3.221 | 1.184 | +| MotChallenge | Index variable constructed from challenge scale (1=Strongly disagree,…, 7=Strongly agree) | 1 | 7 | 5.333 | 5.128 | 1.060 | +| MotFun | Index variable constructed from fun scale (1=Strongly disagree,…, 7=Strongly agree) | 1.667 | 7 | 5.000 | 5.152 | 1.092 | +| MotLearning | Index variable constructed from learning scale (1=Strongly disagree,…, 7=Strongly agree) | 1 | 7 | 5.333 | 5.317 | 1.100 | + + +Electronic copy available at: https://ssrn.com/abstract=1489789 +| Variable | Description | Mean | SD | Median | N | +|--------------------------|-----------------------------------------------------------------------------|------|-----|--------|----| +| Mot-Community | Index variable constructed from community commitment scale (1=Strongly disagree,…, 7=Strongly agree) | 1 | 7 | 5.667 | 5.614 1.003 | +| MotOSS-Reputation | Index variable constructed from OSS reputation scale (1=Strongly disagree,…, 7=Strongly agree) | 1 | 7 | 4.000 | 3.609 1.621 | +| MotSignaling | Index variable constructed from signaling scale (1=Strongly disagree,…, 7=Strongly agree) | 1 | 7 | 4.667 | 4.312 1.527 | +| DevNorm | Index variable constructed from subjective norms scale (1=Strongly disagree,…, 7=Strongly agree) | 1 | 7 | 4.000 | 3.927 1.555 | +| ConditionLack | Developer’s agreement (1=Strongly disagree,…, 7=Strongly agree) to… lack of reusable code as impediment to reuse | 1 | 7 | 4 | 3.784 1.823 | +| Condition-License | … issues with license incompatibilities as impediment to reuse | 1 | 7 | 2 | 3.006 1.852 | +| Condition-Language | … issues with programming language incompatibilities as impediment to reuse | 1 | 7 | 2 | 2.154 1.401 | +| Condition-Architecture | … issues with project architecture as impediment to reuse | 1 | 7 | 2 | 2.630 1.597 | +| DevSkill | Self-assessment of developer’s software development skills compared to the average OSS developer (1=Much worse,…, 5=Much better) | 1 | 5 | 3 | 3.269 0.989 | +| ProjSize | Size of developer’s current main project in number of developers | 1 | 999 +| 2 | 6.091 44.420 | +| Proj-Complexity | Complexity of developer’s current main project compared to average project on SourceForge.net (1=Much less complex,…, 5=More more complex) | 1 | 5 | 3 | 2.947 1.029 | +| ProjStack | Position of developer’s current main project in software stack (1=Very low,…, 5=Very high) | 1 | 5 | 4 | 3.333 0.921 | +| DevOSS-Experience | Number of years developer has been active working on OSS projects | 1 | 40 +*| 5 | 5.668 4.709 | +| DevProjTime | Average weekly hours developer works on her current main project | 0.5 | 58 | 5 | 8.775 10.723 | +| DevProjShare | Share of work that has been done by developer in her current main project as opposed to other project team members | 5 | 100 | 90 | 67.436 36.998 | + + +*The main project of this developer is Linux where a very high number of project team members seems reasonable. + + +**This developer claims to have been involved in OSS even before it got started. We assume that she implies that she has already been working on a project that later became OSS at that point in time. + + +N=624. +| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | +|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----| +| 1 | BenefitEffectiveness | 1.00 | | | | | | | | | | | | | | | | | +| 2 | BenefitEfficiency | n.m. | 1.00| | | | | | | | | | | | | | | | +| 3 | BenefitQuality | n.m. | n.m.| 1.00| | | | | | | | | | | | | | | +| 4 | BenefitTaskSelection | n.m. | n.m.| n.m.| 1.00| | | | | | | | | | | | | | +| 5 | IssueControlLoss | n.m. | n.m.| n.m.| n.m.| 1.00| | | | | | | | | | | | | +| 6 | DevOSSNetsize | 0.14 | 0.11| | | | 1.00| | | | | | | | | | | | +| 7 | DevOtherProjects | -0.08| 0.31| 1.00| | | | | | | | | | | | | | | +| 8 | ProjPhase | 0.07 | -0.10| 0.17| 0.17| 1.00| | | | | | | | | | | | | +| 9 | MotChallenge | -0.08| | | | | | 1.00| | | | | | | | | | | +|10 | MotFun | 0.08 | | | | | | | 0.09| -0.08| 0.44| 1.00| | | | | | | +|11 | MotLearning | 0.08 | | | | | | | 0.16| -0.09| -0.12| 0.29| 0.32| 1.00| | | | | +|12 | MotCommunity | 0.09 | 0.14| 0.14| -0.07| 0.22| 0.13| 0.10| 0.11| 0.13| 0.21| 1.00| | | | | | | +|13 | MotOSSReputation | 0.15 | 0.10| -0.08| 0.13| 0.15| 0.09| | 0.19| 0.19| 1.00| | | | | | | | +|14 | MotSignaling | 0.07 | 0.16| | | | | | | | | | 0.25| 0.50| 1.00| | | | +|15 | DevNorm | 0.07 | 0.19| 0.26| 0.21| 0.09| 0.07| 0.10| 0.12| 0.18| 0.12| 1.00| | | | | | | +|16 | DevSkill | 0.12 | 0.07| 0.13| 0.10| 0.16| 0.15| 0.09| | | | | | | | | | | +|17 | ProjPolSupport | 0.10 | 0.09| 0.23| 0.19| 0.15| 0.10| 0.09| 0.19| 0.11| 0.12| 0.12| 1.00| | | | | | +|18 | ProjPolDiscourage | -0.09| 0.12| -0.07| -0.08| -0.07| 0.09| -0.11| 1.00| | | | | | | | | | +|19 | ConditionLack | -0.08| -0.19| -0.08| | | | | | | | | | | | | | | +|20 | ConditionLicense | -0.10| -0.15| | | | | | | | | | | | | | | | +|21 | ConditionLanguage | -0.23| | | | | | | | | | | | | | | | | +|22 | ConditionArchitecture| -0.16| -0.08| 0.07| -0.07| | | | | | | | | | | | | | +|23 | ProjSize | -0.07| -0.08| 0.19| 0.11| 0.09| 0.09| 0.08| 0.11| | | | | | | | | | +|24 | ProjComplexity | -0.11| 0.12| 0.09| 0.18| 0.19| 0.21| 0.09| 0.11| 0.38| 0.30| | | | | | | | +|25 | ProjStack | 0.15 | | | | | | | | | | | | | | | | | +|26 | ProjStandalone | 0.07 | | | | | | | | | | | | | | | | | +|27 | DevOSSExperience | 0.13 | -0.08| 0.26| 0.29| 0.29| -0.15| 0.12| -0.13| 0.25| 0.09| | | | | | | | +|28 | DevProjTime | -0.09| 0.13| 0.13| 0.11| 0.12| 0.11| 0.10| 0.21| 0.09| 0.17| 0.29| | | | | | | +|29 | DevProjShare | -0.21| -0.08| -0.22| 0.07| | | | | | | | | | | | | | +|30 | DevEduReuse | -0.09| -0.11| | | | | | | | | | | | | | | | +|31 | DevProfEduReuse | 0.08 | | | | | | | | | | | | | | | | | +|32 | DevProf | 0.13 | 0.11| 0.08| 0.08| -0.08| -0.10| -0.14| 0.07| 0.15| 0.08| 0.39| 0.07| | | | | | +|33 | Residence-N. America | 0.07 | | | | | | | | | | | | | | | | | +|34 | Residence-S. America | 0.07 | | | | | | | | | | | | | | | | | +|35 | Residence-Asia & RoW | -0.08| -0.09| | | | | | | | | | | | | | | | | +| | 18. ProjPolDiscourage | 19. ConditionLack | 20. ConditionLicense | 21. ConditionLanguage | 22. ConditionArchitecture | 23. ProjSize | 24. ProjComplexity | 25. ProjStack | 26. ProjStandalone | 27. DevOSSExperience | 28. DevProjTime | 29. DevProjShare | 30. DevEduReuse | 31. DevProfEduReuse | 32. DevProf | 33. Residence-N. America | 34. Residence-S. America | 35. Residence-Asia & RoW | +|---|----------------------|------------------|---------------------|----------------------|--------------------------|-------------|------------------|-------------|------------------|------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------| +| 18. ProjPolDiscourage | | | | | | | | | | | | | | | | | | +| 19. ConditionLack | 1.00 | | | | | | | | | | | | | | | | | +| 20. ConditionLicense | 0.17 | 1.00 | | | | | | | | | | | | | | | | +| 21. ConditionLanguage | 0.23 | 0.24 | 1.00 | | | | | | | | | | | | | | | +| 22. ConditionArchitecture | 0.24 | 0.11 | 0.33 | 1.00 | | | | | | | | | | | | | | +| 23. ProjSize | 0.08 | 1.00 | | | | | | | | | | | | | | | | +| 24. ProjComplexity | -0.09 | 0.18 | -0.09 | 0.16 | 1.00 | | | | | | | | | | | | | +| 25. ProjStack | -0.13 | -0.07 | 0.07 | -0.11 | 1.00 | | | | | | | | | | | | | +| 26. ProjStandalone | -0.101 | 0. | 0.13 | 0.37 | 1.00 | | | | | | | | | | | | | +| 27. DevOSSExperience | 0.13 | 0.07 | 0.23 | -0.08 | 1.00 | | | | | | | | | | | | | +| 28. DevProjTime | -0.08 | 0.14 | 0.15 | 0.38 | 0.11 | 1.00 | | | | | | | | | | | | +| 29. DevProjShare | -0.20 | -0.08 | -0.08 | -0.17 | -0.34 | -0.10 | -0.15 | 1.00 | | | | | | | | | | +| 30. DevEduReuse | -0.07 | 1.00 | | | | | | | | | | | | | | | | +| 31. DevProfEduReuse | 0.09 | 0.08 | 1.00 | | | | | | | | | | | | | | | +| 32. DevProf | -0.12 | -0.08 | 0.08 | -0.09 | -0.14 | 0.09 | 0.18 | 0.25 | 1.00 | | | | | | | | | +| 33. Residence-N. America | 0.15 | 0.09 | 1.00 | | | | | | | | | | | | | | | +| 34. Residence-S. America | -0.08 | 0.08 | n.m. | 1.00 | | | | | | | | | | | | | | +| 35. Residence-Asia & RoW | n.m. | n.m. | 1.00 | | | | | | | | | | | | | | | + + +Notes: Only correlations with p<0.1 are shown; n.m. = not meaningful because variables are dummy variables coding the same characteristic or are scores of the same exploratory factor analysis. +| Table A7: Multivariate Analysis of Developers' Reuse Behavior – Robustness Check | +|---------------------------------|-----------------|-----------------|-----------------| +| | (4) Likert scale | (5) Percentage scale | (6) Future importance of reuse (Likert scale) | +| +Attitude toward reuse + | | | | +| BenefitEffectiveness (H1a) | 0.220 + (0.076) | 2.464 +* (1.010) | 0.146 + (0.062) | +| BenefitEfficiency (H1b) | 0.634 + (0.080) | 6.047 + (1.059) | 0.499 + (0.066) | +| BenefitQuality (H1c) | 0.322 + (0.079) | 2.262 + (1.048) | 0.273 + (0.065) | +| BenefitTaskSelection (H1d) | 0.157 + (0.077) | 3.368 + (1.026) | 0.144 + (0.064) | +| IssueControlLoss (H1e) | | | | +| +Access to local search + | | | | +| DevOSSNetsize (log) (H2a) | 0.172 + (0.080) | 2.307 + (1.047) | 0.246 + (0.066) | +| DevOtherProjects (H2b) | 0.030 + (0.015) | 0.465 + (0.196) | 0.034 + (0.013) | +| +*Project maturity + | | | | +| ProjPhase (H3) | -0.124 + (0.066) | -2.984 + (0.871) | -0.204 + (0.054) | +| +Compatibility with project goals + | | | | +| MotChallenge (H4a) | | -2.466 + (0.962) | | +| MotFun (H4b) | | | | +| MotLearning (H4c) | | | | +| MotCommunity (H4d) | 0.180 + (0.081) | 1.912 + (1.067) | 0.163 + (0.066) | +| MotOSSReputation (H4e) | | | | +| MotSignaling (H4f) | | | | +| +Subjective norms + | | | | +| DevNorm | 0.120 + (0.065) | 2.133** (0.870) | 0.205 + (0.054) | +| +Perceived behavioral control +| | | | +| ProjPolSupport | 0.405 + (0.180) | 0.335 + (0.143) | | +| ProjPolDiscourage | -1.210 + (0.447)| -1.299 + (0.375)| | +| ConditionLack | -0.236 + (0.042)| -2.355 + (0.564)| -0.160 + (0.035)| +| ConditionLicense | | | | +| ConditionLanguage | | | | +| ConditionArchitecture | | | | +| DevSkill | | | | +| +*Further control variables + | | | | +| ProjSize | | | | +| ProjComplexity | | | | +| ProjStack | 0.232 + (0.083) | 0.172 +* (0.069) | | +| ProjStandalone | | | | +| DevOSSExperience | | | | +| DevProjTime | 0.016 + (0.007) | | | +| DevProjShare | | | | +| DevProf | | | | +| DevEduReuse | | | | +| DevProfEduReuse | 0.573 + (0.232) | 5.581 + (3.012) | 0.414** (0.189) | +| Residence-N. America | | | | +| Residence-S. America | | | | +| Residence-Asia & RoW | | | | +| Constant | 3.145 + (0.622) | 34.228 + (8.393)| 2.858 + (0.509) | +| +Observations + | 624 | 624 | 624 | +| +Pseudo R² + | 0.101 | 0.026 | 0.112 | +| +Likelihood ratio + | | | | +| $\chi^2(15)=252.81, p<0.0001$ | | | | +| $\chi^2(12)=149.36, p<0.0001$ | | | | +| $\chi^2(14)=272.67, p<0.0001$ | | | | +| $\sigma$ | 1.814 | 24.600 | 1.514 | + + +Notes: All models are Tobit models; standard errors in parentheses; * significant at 10%; ** significant at 5%; *** significant at 1%. Eliminated variables are also jointly insignificant. + + +Electronic copy available at: https://ssrn.com/abstract=1489789 +---------------------------------------- +------------------------------- +Section 215: +On the impact of using trivial packages: an empirical case study on npm and PyPI + + +Rabe Abdalkareem1 · Vinicius Oda1 · Suhaib Mujahid1 · Emad Shihab1 + + +Published online: 9 January 2020 +© Springer Science+Business Media, LLC, part of Springer Nature 2020 + + +Abstract +Code reuse has traditionally been encouraged since it enables one to avoid re-inventing the wheel. Due to the npm left-pad package incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix, some questioned such reuse. Reuse of trivial packages is particularly prevalent in platforms such as npm. To date, there is no study that examines the reason why developers reuse trivial packages other than in npm. Therefore, in this paper, we study two large platforms npm and PyPI. We mine more than 500,000 npm packages and 38,000 JavaScript applications and more than 63,000 PyPI packages and 14,000 Python applications to study the prevalence of trivial packages. We found that trivial packages are common, making up between 16.0% to 10.5% of the studied platforms. We performed surveys with 125 developers who use trivial packages to understand the reasons and drawbacks of their use. Our surveys revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we validate the most cited reason and drawback. We find that contrary to developers’ beliefs only around 28% of npm and 49% PyPI trivial packages have tests. However, trivial packages appear to be ‘deployment tested’ and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 18.4% and 2.9% of the studied trivial packages have more than 20 dependencies in npm and PyPI, respectively. + + +Keywords Trivial packages · JavaScript · Node.js · Python · npm · PyPI · Code reuse · Empirical studies + + +1 Introduction +Code reuse, in the form of combining related functionalities in packages, has been encouraged due to the fact that it can reduce the time-to-market, improve software quality and + + +Communicated by: Arie van Deursen + + +Rabe Abdalkareem +rab_abdu@encs.concordia.ca + + +Extended author information available on the last page of the article. +boost overall productivity (Basili et al. 1996; Lim 1994; Mohagheghi et al. 2004). Therefore, it is no surprise that platforms such as Node.js encourage reuse and attempt to facilitate code sharing, often delivered as packages or modules(^1) that are available on package management platforms, such as the Node Package Manager (\textit{npm}) and Python Package Index (\textit{PyPI}) (npm 2016; Bogart et al. 2016). + + +However, it is not all good news. There are many cases where code reuse has had negative effects, leading to an increase in maintenance costs and even legal action (McCamant and Ernst 2003; Orsila et al. 2008; Inoue et al. 2012; Abdalkareem et al. 2017a). For example, an incident of code reuse of a JavaScript package called left-pad, which was used by Babel, caused interruptions to some of the largest Internet sites, e.g., Facebook, Netflix, and Airbnb. Many referred to the incident as the case that ‘almost broke the Internet’ (Macdonald 2016; Williams 2016). That incident lead to many heated discussions about code reuse, sparked by David Haney’s blog post: “Have We Forgotten How to Program?” (Haney 2016). + + +While the real reason for the left-pad incident was that \textit{npm} allowed authors to unpublish packages (a problem which has been resolved (npm Blog 2016)), it raised awareness of the broader issue of taking on dependencies for trivial tasks that can be easily implemented (Haney 2016). In our previous work (Abdalkareem et al. 2017), we defined and examined trivial packages in \textit{npm}, and discovered a number of relevant findings: + + + + +Trivial JavaScript packages tend to be small in size and less complex. + + +Trivial packages are prevalent, making up approximately 16.8\% of all the packages on \textit{npm}. + + +JavaScript developers generally use trivial packages since they believe that trivial packages provide them with well tested and implemented code, however, they are concerned about the management of extra dependencies. + + + + +In addition, we found that in some cases, these trivial JavaScript packages can have their own dependencies, imposing significant overhead. + + +However, one major limitation of the original work was its deep focus on JavaScript and \textit{npm} in particular (Abdalkareem et al. 2017). For example, questions about the existence of trivial packages (and how they are defined) in other package management platforms remain. Also, whether the perceived advantages (e.g., that trivial packages are well tested) and disadvantages (e.g., management of additional dependencies) of using trivial packages generalized beyond JavaScript developers remain unanswered. + + +Hence, this paper has extended our previous work (Abdalkareem et al. 2017) to strengthen the empirical evidence on the use of trivial packages by replicating and extending our study on the Python Package Index (\textit{PyPI}). We chose to examine the \textit{PyPI} package management platform since 1) Python is one of the most popular general purpose programming languages, 2) Python has only one main well-established package platform, \textit{PyPI}, and 3) \textit{PyPI} is a mature package management platform that has been in existence for more than twelve years. Our extended study provides the following key additions: + + + + +We extended our study of the \textit{npm} package management platform and increased the \textit{npm} dataset from 231,092 to 501,001 packages. + + +We provide a definition of \textit{PyPI} trivial packages and examine the prevalence of trivial packages in the Python ecosystem. + + + + +(^1)In this paper, we use the term package to refer to a software library that is published on the studied package management platforms. +– We surveyed 37 Python developers to investigate the reasons for and drawback of using trivial packages in the PyPI package management platform. +– We examine the top main reasons of and drawbacks of using PyPI trivial packages based on the developers survey. + + +Altogether, our study involves more than 500,000 npm packages and 38,000 JavaScript applications and 63,000 PyPI packages and 14,000 Python applications. The study also contains survey results from 125 JavaScript and Python developers. Our findings indicate that: + + +The definition of trivial packages is the same in JavaScript and Python + The developers from the two different package management platforms tended to have the same definition of trivial packages. While we found in the original paper (Abdalkareem et al. 2017) that npm trivial packages are packages that have $\leq 35$ LOC and a McCabe’s cyclomatic complexity $\leq 10$, we also found that PyPI trivial packages have the same definition. + + +Trivial packages are common and popular in both, npm and PyPI management platforms + Of the 501,001 npm and 63,912 PyPI packages in our dataset, 16.0% and 10.6% of them are trivial packages. Moreover, of the 38,807 JavaScript and 14,717 Python applications on GitHub, 26.1% and 6.9% of them directly depend on one or more trivial packages. + + +JavaScript and Python developers differ in their perception of trivial packages + Only 23.9% of JavaScript developers considered the use of trivial packages as bad, whereas, 70.3% of Python developers consider the use of trivial package as a bad practice. + + +Developers believe that trivial packages provide them with well implemented/tested code and increase productivity + At the same time, the increase in dependency overhead and the risk of breakage of their applications are the two most cited drawbacks. + + +Developers need to be careful which trivial packages they use + Our empirical findings show that many trivial packages have their own dependencies. In npm, 43.2% of trivial packages have at least one dependency and 18.4% of trivial packages have more than 20 dependencies. In PyPI, 36.8% of trivial packages have at least one dependency, and 2.9% have more than 20 dependencies. + + +To facilitate the replicability of our work, we make our dataset and the anonymized developer responses publicly available (Abdalkareem et al. 2019). +---------------------------------------- +------------------------------- +Section 216: +1.1 Paper Organization + + +The paper is organized as follows: Section 2 provides the background and introduces our datasets. Section 3 presents how we determine what a trivial package is. Section 4 examines the prevalence of trivial packages and their use in JavaScript and Python applications. Section 5 presents the results of our developer surveys, presenting the reasons and perceived drawbacks for developers who use trivial packages. Section 6 presents our quantitative validation of the most commonly cited reason for and drawback of using trivial packages. The implications of our findings are noted in Section 7. We discuss the related works in Section 8, the limitations of our study in Section 9, and present our conclusions in Section 10. +2 Background and Case Studies + + +In this section, we provide background on the two studied package management platforms, npm and PyPI. We also provide an overview of the dataset collected and used in the rest of our study. + + +2.1 Node Package Manager (npm) + + +JavaScript is used to write client and server side applications. The popularity of JavaScript has steadily grown, thanks to popular frameworks such as Node.js and an active developer community (Bogart et al. 2016; Wittern et al. 2016). JavaScript projects can be classified into two main categories: JavaScript packages that are used in other applications or JavaScript applications that are used as standalone software. The Node Package Manager (npm) provides tools to manage JavaScript packages. + + +To perform our study, we gather two datasets from two sources. We obtain JavaScript packages from the npm registry and applications that use npm packages from GitHub. + + +npm Packages: + Since we are interested in examining the impact of ‘trivial packages’, we mined the latest version of all the JavaScript packages from npm as of September 30, 2017. For each package we obtained its source code from the npm registry. In total, we mined 549,629 packages. + + +GitHub JavaScript Applications: + We also want to examine the use of the npm packages in JavaScript applications. Therefore, we mined all of the JavaScript applications on GitHub. To obtain a list of JavaScript applications, we extracted all the applications identified as JavaScript application from the GHTorrent dataset (Gousios et al. 2014). Then, to ensure that we are indeed only obtaining the JavaScript applications from GitHub, and not npm packages, we compare the URL of the GitHub repositories from GHTorrent to all of the URLs we obtained from npm for the packages. If a URL from GitHub was also in npm, we flagged it as being an npm package and removed it from the application list. + + +To determine that an application uses npm packages, we looked for the ‘package.json’ file, which specifies (amongst others) the npm package dependencies used by the application. + + +Finally, to eliminate dummy applications that may exist in GitHub, we choose non-forked applications with more than 100 commits and more than 2 developers. Similar filtering criteria were used in prior work by Kalliamvakou et al. (2014). In total, we obtained 115,621 JavaScript applications and after removing applications that did not use the npm platform, we were left with 38,807 JavaScript applications. + + +2.2 Python Package Index (PyPI) + + +PyPI is the official package management platform for the Python programming language. Python is one of the most popular programming language today, mainly due to its strong community support and versatility, i.e., Python is used in many different domains from game development to server side applications (Vasilescu et al. 2015; Ray et al. 2014). Once again, we distinguish between Python packages, which are used in Python applications and standalone Python applications, which typically use Python packages. Similar to the case of JavaScript, we gather two datasets from two sources to perform our study. We obtain Python packages from the PyPI registry and applications that use PyPI packages from GitHub. + +PyPI Packages: + We collected the latest versions of the Python packages from PyPI in order to determine which packages are ‘trivial packages’. PyPI contains around 118,324 packages (Libraries.io 2017), as of September 30, 2017. In total, we were able to obtain 116,905 packages from the PyPI registry since some packages did not exist anymore. + + +GitHub Python Applications: + To examine the usage of ‘trivial packages’ in Python applications, we mined all of the Python applications hosted on GitHub provided by the GHTorrent dataset (Gousios et al. 2014). We followed the same aforementioned process used to gather JavaScript applications, to ensure that we are indeed only obtaining the Python applications from GitHub, and not PyPI package repositories. In a nutshell, we compare the URL of the GitHub repositories to the URLs we obtained from PyPI for the packages. If a URL from GitHub was also in PyPI, we flagged it as being a PyPI package and removed it from the application list. In total, we obtained 14,717 Python applications that are hosted on GitHub. In addition, to eliminate dummy or immature Python applications that may exist in GitHub, we performed the filtering steps as we did for the JavaScript application. We choose non-forked Python applications with more than 100 commits and more than 2 developers. +---------------------------------------- +------------------------------- +Section 217: +3 Defining Trivial Packages + + +Although what a trivial package is has been loosely defined in the past (e.g., in blogs (Hemanth 2015; Harris 2015)), we want a more precise and objective way to determine trivial packages. To determine what constitutes a trivial package, we conducted two separate surveys, one for each of the studied package management platforms (npm and PyPI). We mainly asked participants what they considered to be a trivial package and what indicators they used to determine if a package is trivial or not. We conducted two different surveys since: 1) the two studied package management platforms serve different programming languages, 2) developers from the two package management platforms may have different perspective of what they consider to be ‘trivial packages’. + + +For each package management platform (npm and PyPI), we devised an online survey that presented the source code of 16 randomly selected packages that range in size between 4 - 250 JavaScript/Python lines of code (LOC). Participants were asked to 1) indicate if they thought the package was trivial or not and 2) specify what indicators they use to determine a trivial package. We opted to limit the size of the selected packages in the surveys to a maximum of 250 JavaScript/Python LOC since we did not want to overwhelm the participants with the review of excessive amounts of code. + + +We asked the survey participants to indicate trivial packages from the list of packages provided. We provided the survey participants with a loose definition of what a trivial package is, i.e., a package that contains code that they can easily code themselves and hence, is not worth taking on an extra dependency for. Figure 1 shows an example of a trivial JavaScript package, called +is-Positive +, which simply checks if a number is positive. The survey questions were divided into three parts: 1) questions about the participant’s development + + +javascript +module.exports = function (n) { + return toString.call(n) === '[object Number]' && n > 0; +}; + + +Fig. 1 Package is-Positive on npm +background, 2) questions about the classification of the provided packages, and 3) questions about what indicators the participant would use to determine a trivial package. For the npm survey, we sent the survey to 22 developers and colleagues that were familiar with JavaScript development and received a total of 12 responses. We also sent the PyPI survey to 18 developers and colleagues that were familiar with Python development and received a total of 13 responses. It is important to note that we sent the two surveys to different groups of developers, to make sure that the participants in one survey are not biased through their experience of participating in the other (i.e., first) survey. + + +Participants’ Background and Experience: + The first four columns of Table 1 show the background of participants in the npm survey. Of the 12 respondents, 2 are undergraduate students, 8 are graduate students, and 2 are professional developers. Ten of the 12 respondents have at least 2 years of JavaScript experience and half of the participants have been developing with JavaScript for more than five years. + + +The last four columns of Table 1 show the background of participants in the PyPI survey. Of the 13 participants in this survey, 9 identified themselves as graduate students and 4 as professional developers working in industry; 7 participants had more than 5 years of Python development experience, 2 respondents had between 3 to 5 years, 3 others had 2 to 3 years of experience, and finally one person had less than 1 year of Python practice. We were happy to have the majority of our respondents be well-experienced with Python. + + +Result: + We asked participants of the two surveys to list what indicators they use to determine if a package is trivial or not and to indicate all the packages that they considered to be trivial. Of the 12 participants in the JavaScript survey, 11 (92%) state that the complexity of the code and 9 (75%) state that size of the code are indicators they use to determine a trivial package. Also, 3 (20%) mentioned that they used code comments and other indicators (e.g., functionality) to indicate if a package is trivial or not. The results of the Python survey reveal that 9 (69%) of the developers use size of the code and 9 (69%) of them use complexity of the code as the main indicators to determine trivial packages. Also, 7 (54%) of the participants stated that they use source code comments to determine trivial Python packages and 3 (23%) of the participants mentioned some other indicators that they can use to identify a trivial package. For example one participant related a trivial Python package as “If it’s only one function”. + + +| npm | Experience in JavaScript | # | Developers’ position | # | PyPI | Experience in python | # | Developers’ position | # | +|-----|--------------------------|---|----------------------|---|-----|----------------------|---|----------------------|---| +| <1 | 2 | | Undergrad Student | 2 | <1 | 1 | | Undergrad Student | 0 | +| 2 – 3| 3 | | Graduate Student | 8 | 2 – 3| 3 | | Graduate Student | 9 | +| 3 – 5| 1 | | Professional Developer| 2 | 3 – 5| 2 | | Professional Developer| 4 | +| >5 | 6 | | – | – | >5 | 7 | | – | – | +| Total| 12 | | Total | 12| Total| 13 | | Total | 13| +Since it is clear that size and complexity are the most common indicators of trivial packages and they are a universal measure that can be measured for both, JavaScript and Python, we use these two measures to determine trivial packages. It should be mentioned that participants could provide more than one indicator, hence the percentages above sum to more than 100%. + + +Next, we analyze all of the packages that were marked as trivial from the two surveys. Our main goal of this analysis is to find which values of the size and complexity metrics are indicative of trivial packages. + + +npm Survey Responses: + In total, we received 69 votes for the 16 packages. We ranked the packages in ascending order, based on their size, and tallied the votes for the most voted packages. We find that 79% of the votes consider packages with less than 35 lines of code to be trivial. We also examine the complexity of the packages using McCabe’s cyclomatic complexity, and find that 84% of the votes marked packages that have a total complexity value of 10 or lower to be trivial. It is important to note that although we provide the source code of the packages to the participants, we do not explicitly provide the size or the complexity of the packages to the participants to not bias them towards any specific metrics. + + +PyPI Survey Responses: + we received 89 votes for the 16 packages. Similar to the case of npm, we ranked the packages in ascending order, based on their size, and tallied the votes for the most voted packages. We find that 76.4% of the votes consider packages that are equal or less than 35 lines of code to be trivial. We also examine the complexity of the packages using McCabe’s cyclomatic complexity, and find that 79.8% of the votes marked packages that have a total complexity value of 10 or lower to be trivial Python package. Similar to npm, we also did not provide any metric values for the packages to avoid bias. + + +Based on the aforementioned findings, we used the two indicators JavaScript/Python LOC ≤ 35 and complexity ≤ 10 to determine trivial packages in our dataset. Hence, we define trivial JavaScript/Python packages as ( { X_{LOC} \leq 35 \cap X_{Complexity} \leq 10 } ), where ( X_{LOC} ) represents the JavaScript/Python LOC and ( X_{Complexity} ) represents McCabe’s cyclomatic complexity of package ( X ). Although we use the aforementioned measures to determine trivial packages, we do not consider this to be the only possible way to determine trivial packages. +---------------------------------------- +------------------------------- +Section 218: +4 How Prevalent are Trivial Packages? + + +In this section, we want to know how prevalent trivial packages are. We examine prevalence from two aspects: the first aspect is from package management platforms (npm and PyPI) perspective, where we are interested in knowing how many of the packages on these two +package management platforms are trivial. The second aspect considers the use of trivial packages in JavaScript and Python applications. + + +To identify trivial packages in our two datasets, we calculate the LOC and complexity of all the npm and PyPI packages. For the LOC, we calculate the number of lines of source code after removing white space and source code comments. As for the complexity, we use McCabe’s complexity since it is widely used in industry and academia (Ebert and Cain 2016). Then, for each package, we removed test code since we are mostly interested in the actual source code of the packages. To identify and remove the test code, similar to prior work (Gousios et al. 2014; Tsay et al. 2014; Zhu et al. 2014), we look for the term “test” (and its variants such as ‘tests’ and/or ‘TEST_code’) in the file names and file paths. To calculate the LOC and the complexity of every package in our datasets, we use the Understand tool by SciTools (https://scitools.com/). Understand is a source code analysis tool that provides various code metrics and has been extensively used in other work (e.g., Rahman et al. 2019; Castelluccio et al. 2019). + + +4.1 How Many of npm’s & PyPI’s Packages are Trivial? + + +npm +: We use the two measures, LOC and complexity, to determine trivial packages, which we now use to quantify the number of trivial packages in our dataset. Our dataset contained a total of 549,629 npm packages. For each package, we calculated the number of JavaScript code lines and removed packages that had zero LOC, which removed 48,628 packages. We eliminated npm packages that have zero LOC since they present dummy or empty packages that developers publish for different reasons such as reserve a unique package name. This left us with a final number of 501,001 packages. + + +Out of the 501,001 npm packages we mined, 80,232 (16.0%) packages are trivial packages. In addition, we examined the growth of trivial packages in npm. Figure 2 shows the percentage of trivial to all packages published on npm per month. We see an increasing trend in the number of trivial packages published over time before the growth of trivial packages became stable around the beginning of 2015. Overall, approximately 14.0% of the packages added every month are trivial packages. We investigated the spike around March 2016 and found that this spike corresponds to the time when npm disallowed the un-publishing of packages (npm Blog 2016). + + + +In addition, to see the effect of the left-pad incident on the number of published trivial packages, we investigate the number of published trivial npm packages before and after the left-pad incident. Out of 216,309 npm packages that published before the left-pad incident, we found 34,750 (16.1%) are trivial packages. As after the left-pad incident, out of the 284,692 that are published, we found 45,482 (16.0%) are trivial packages. + + +PyPI: + For the PyPI dataset, we are also interested in discerning the trivial packages from the others in terms of LOC and complexity. For such, we mined the 116,905 available packages on the PyPI platform. We got all the 116,905 packages from PyPI register. However, a package on PyPI could be released/distributed in different formats and we were not able to process them. We found that 42,242 of PyPI packages are platform exclusive (e.g., windows .exe or mac .dmg) or are corrupted compressed .gz files that we could not analyzed. This process left us with 74,663 PyPI packages for which we measure their LOC and complexity. We then remove packages that had zero LOC, which removed another 10,751 packages. We remove packages that had zero LOC since we do not want to count empty packages that exist on PyPI for various reasons such as learning to publish packages on PyPI. + + +Our analysis reveals that out of the 63,912 PyPI packages we analyzed, 6,759 (10.6%) packages are trivial packages in the PyPI package management platform. We again examined the growth of trivial packages in PyPI. Figure 3 shows the percentage of trivial to all packages published on PyPI per month for the time period between 2011 and 2017. We see there is a slight increase in the trend of publishing trivial packages on the PyPI platform and that trend starts to decrease in late 2013. We also found that approximately 11% of the packages added every month are trivial packages. + + +We also looked at the percentage of trivial to all packages publish before and after the left-pad incident. We found that out of 33,335 PyPI package published prior to the left-pad incident, 3,717 (11.2%) of them are trivial packages while 3,042 (10.0%) of all packages published after the left-pad incident are trivial. + + + +4.2 How Many Applications Depend on Trivial Packages? + + +JavaScript Applications: + Just because trivial packages exist on npm, it does not mean that they are actually being used. We also examine the number of applications that use trivial packages. To do so, we examine the package.json file, which contains all the dependencies that an application installs from npm. However, in some cases, an application may install a package but not use it. To avoid counting such instances, we parse the JavaScript code of all the examined applications and use regular expressions to detect the required dependency statements, which indicates that the application actually uses the package in its code(^2). Finally, we measured the number of packages that are trivial in the set of packages used by the applications. Note that we only consider npm packages since it is the most popular package manager for JavaScript packages and other package managers only manage a subset of packages (e.g., Bower (2012) only manages front-end/client-side frameworks, libraries and modules). We find that of the 38,807 applications in our dataset, 10,139 (26.1%) directly depend on at least one trivial package. + + +Python Applications: + Similar to the case of JavaScript, we also analyzed the Python applications that depend on trivial packages. In contrast to JavaScript’s availability of a ‘packages.json’ file, analyzing Python applications presents some challenges to fully identify a given script’s dependency set for the reasons described previously on Section 4.1. We statically parse the source code after relevant “import” like clauses, along with other statements that allow for verifying that the packages are effectively being put in use (i.e., the package is both supposed to be installed and its functions/definitions are indeed being called, rather than merely being just imported and not used). To facilitate this analysis, we use the popular snakefood (http://furius.ca/snakefood/) tool. The tool generates dependency graphs from Python code through parsing the Abstract Syntax Tree of the Python files. Our analysis showed that out of the 14,717 examined Python applications, 1,024 (6.9%) were found to depend on one or more trivial PyPI package. +---------------------------------------- +------------------------------- +Section 219: +5 Survey Results + + +We surveyed developers to understand the reasons for and the drawbacks of using trivial packages. We used a survey because it allows us to obtain first-hand information from the developers who use these trivial packages. In order to select the most relevant participants, we sent out the survey to developers who use trivial packages. We used Git’s pickaxe command on the lines that contain the required dependency statements in the JavaScript and Python applications. Doing so helped us identify the name and email of the developer who introduced the trivial package dependency. + + +(^2)Note that if a package is required in the application, but does not exist, it will break the application. +Survey Participants: To mitigate the possibility of introducing misunderstood or misleading questions, we initially sent the survey to two developers and incorporated their minor suggestions to improve the survey. For npm participants, we sent the survey to 1,055 JavaScript developers from 1,696 applications. To select the developers, we ranked them based on the number of trivial packages they use. We then took a sample of 600 developers that use trivial packages the most, and another 600 of those that indicated the least use of trivial packages. The survey was emailed to the 1,200 selected developers, however, since some of the emails were returned for various reasons (e.g., the email account does not exist anymore, etc.), we could only reach 1,055 developers. We also sent the survey to all Python developers after filtering out the invalid and duplicated developers’ emails. We successfully sent the survey to 460 Python developers that introduce trivial Python packages from PyPI in 1,024 Python applications in our dataset. + + +We designed the survey using Google Forms. The survey listed the trivial package and the application that we detected the trivial package in. In total, we received 125 developer responses. First, we received 88 responses to our survey from the JavaScript developers, which translates to a response rate of 8.3%. Our survey response rate is higher than the typical 5% response rate reported in questionnaire-based software engineering surveys (Singer et al. 2008). The left part of Table 2 show the JavaScript experience and the position of the developers. The majority (67) of the respondents have more than 5 years of experience, 14 have between 3-5 years and 7 have 1-3 years of experience. As for the position of the survey respondents, of the 88 respondents, 83 of them identified as developers working either in industry (68) or as full time independent developers (15). The remaining 5 identified as being casual developers (2) or other (3), including one student and two developers working in executive positions at npm. + + +Second, we received 37 survey responses from the Python developers, yielding a response rate of 8.04%, which is again in accordance with what is supposedly been observed on other studies in the software engineering domain (Singer et al. 2008). The right part of Table 2 shows the Python experience and position of the developers. The vast majority of the respondents (92%) identified themselves to have more than five years of Python development experiences. 3 respondents only identified themselves to have development experience in Pythons range between more than 3 to five years. Regarding the current position of the survey respondents, 27 of the respondents refer themselves as developers working in industry and 4 developers identified themselves as full time independent developers. The reset of the respondents are identified as being a casual developers (1) or other (5) including researchers and students. + + +| npm | Experience in JavaScript | # | Developers’ Position | # | PyPI | Experience in Python | # | Developers’ Position | # | +|-----|--------------------------|---|----------------------|---|-----|----------------------|---|----------------------|---| +| 1 - 3 years | 7 | Industrials | 68 | 1 - 3 years | 0 | Industrials | 27 | +| > 3 - 5 years | 14 | Independent | 15 | > 3 - 5 years | 3 | Independent | 4 | +| > 5 years | 67 | Casual | 2 | > 5 years | 34 | Casual | 1 | +| – | – | Other | 3 | – | – | Other | 5 | +| Total | 88 | Total | 88 | Total | 37 | Total | 37 | +The fact that most of the respondents are experienced JavaScript and Python developers gives us confidence in our survey responses. + + +5.1 Do Developers Consider Trivial Packages Harmful? + + +The first question of our survey to the participants is: “Do you consider the use of trivial packages as bad practice?” The reason to ask this question so bluntly is that it allows us to gauge, in a very deterministic way, how the developers felt about the issue of using trivial packages. We provided three possible replies, Yes, No or Other in which case they were provided with a text box to elaborate. Figure 4 shows the distribution of responses from both JavaScript and Python developers. Of the 88 JavaScript participants, 51 (57.9%) stated that they do NOT consider the use of trivial packages as bad practice. Another 21 (23.9%) stated that they indeed think that using trivial package is a bad practice. The remaining 16 (18.2%) stated that it really depends on the circumstances, such as the time available, how critical a piece of code is, and if the package used has been thoroughly tested. + + +Contrary to the case of JavaScript, 26 (70.3%) of the Python developers who responded to our survey generally consider the use of trivial packages as bad practice. Only 3 (8.1%) of survey participants stated that they do not think that using trivial package is a bad practice. The remaining 8 (21.6%) indicate that it really depends on the circumstances. For example, P-PyPI 3 states: “If the language doesn’t provide such common, inherently useful functionality then fixing this oversight by the use of a third-party library is only reasonable. Moreover, little functionality is actually ‘trivial’. It may be short to implement but most likely a mistake in it will introduce a bug into the program as surely as a mistake in something ‘non-trivial’.” + + + + +Fig. 4 Developer responses to the question “is using a trivial package bad?” Most JavaScript developers answered no, whereas most Python developers answered yes. +5.2 Why Do Developers Use Trivial Packages? + + +While we have answered the question as to whether developers say using trivial packages is a bad practice, what we are most interested in is why do developers resort to using trivial packages and what do they view as the drawbacks of using trivial packages. Therefore, the second part of the survey asks participants to list the reasons why they resort to using trivial packages. To ensure that we do not bias the responses of the developers, the answer fields for these questions were in free-form text, i.e., no predetermined suggestions were provided. We then analyze separately the responses from the two surveys (JavaScript and Python). After gathering all of the responses, we grouped and categorized the responses in a two-phase iterative process. In the first phase, two of the authors carefully read the participant’s answers and independently came up with a number of categories that the responses fell under. Next, they discussed their groupings and agreed on the extracted categories. Whenever they failed to agree on a category, the third author was asked to help break the tie. Once all of the categories were decided, the same two authors went through all the answers again and independently classified them into their respective categories. For the majority of the cases, the two authors agreed on most categories and the classifications of the responses. To measure the agreement between the two authors, we used Cohen’s Kappa coefficient (Cohen 1960). The Cohen’s Kappa coefficient has been used to evaluate inter-rater agreement levels for categorical scales, and provides the proportion of agreement corrected for chance. The resulting coefficient is scaled to range between -1 and 1, where a negative value means less than chance agreement, zero indicates exactly chance agreement, and a positive value indicates better than chance agreement (Fleiss and Cohen 1973). In our categorization, the level of agreement measured between the authors was of 0.90 and 0.83 for the npm survey and PyPI survey, respectively, which is considered to be excellent inter-rater agreement. + + +Table 3 shows the reasons for using trivial packages, as reported by respondents from both JavaScript and Python surveys. As we can see from the table, the two most cited reasons + + +| Reason | Description | npm #Resp. | % | PyPI #Resp. | % | +|-------------------------------|-----------------------------------------------------------------------------|------------|------|-------------|------| +| Well-implemented & tested | Participants state that trivial packages are effectively implemented and tested. | 48 | 54.6%| 20 | 54.1%| +| Increased productivity | Trivial packages reduce the time needed to implement existing source code. | 42 | 47.7%| 12 | 32.4%| +| Well-maintained code | It eases source code maintenance, since other developers maintain the trivial package. | 8 | 9.1% | 2 | 5.4% | +| Improved readability & reduced complexity | Using trivial packages improve the source code quality in terms of readability and reduce complexity. | 8 | 9.1% | 5 | 13.5%| +| Better performance | Trivial packages improve the performance of web applications compared to the use of large frameworks. | 3 | 3.4% | 0 | 0.0% | +| No reason | – | 7 | 8.0% | 7 | 18.9%| +(i.e., well-implemented & tests and increased productivity) are the same for both npm and PyPI package management platforms. However, when it comes to the 3 less common reasons, there is a slight difference between npm and PyPI, most notably, the reason of trivial packages provide better performance was not evident in our survey. + + +Next, we discuss each of the reasons presented in Table 3 in more detail: + + +R1. +Well-implemented & tested: + The most cited reason for using trivial packages is that they provide well implemented and tested code. More than half of the responses mentioned this reason with 54.6% and 54.1% of the responses from JavaScript and Python, respectively. In particular, although it may be easy for developers to code these trivial packages themselves, it is more difficult to make sure that all the details are addressed, e.g., one needs to carefully consider all edge cases. Some example responses that mention these issues are stated by participants P-npm 68, P-npm 4, and P-PyPI 5, who cite their reasons for using trivial packages as follows: P-npm 68: “Tests already written, a lot of edge cases captured [...].”, P-npm 4: “There may be a more elegant/efficient/correct/cross-environment-complatable solution to a trivial problem than yours”, and P-PyPI 5: “They have covered extra cases that I would not do or thought initially.” + + +R2. +Increased productivity: + The second most cited reason is the improved productivity that using trivial packages enables with 47.7% and 32.4% for JavaScript and Python, respectively. Trivial tasks or not, writing code on your own requires time and effort, hence, many developers view the use of trivial packages as a way to boost their productivity. In particular, early on in a project, a developer does not want to worry about small details, they would rather focus their efforts on implementing the more difficult tasks. For example, participants P-npm 13 and P-npm 27 from the JavaScript survey state: P-npm 13: “[...] and it does save time to not have to think about how best to implement even the simple things.” & P-npm 27: “Don’t reinvent the wheel! if the task has been done before.”. Another example from the Python survey, participant P-PyPI 17 states: “Often I do write the code myself. And then package it into a re-usable module so that I don’t have to write it again later. And again. And again... At this point, whether the module is authored by myself or someone else is mostly irrelevant. What’s relevant is that I get to avoid repeatedly implementing the same functionality for each new project.” + + +The aforementioned are clear examples of how developers would rather not code something, even if it is trivial. Of course, this comes at a cost, which we discuss later. + + +R3. +Well-maintained code: + A less common (9.1% and 5.4% of the responses from JavaScript and Python), but cited reason for using trivial packages is the fact that the maintenance of the code need not to be performed by the developers themselves; in essence, it is outsourced to the community or the contributors of the trivial packages. For example, participants P-npm 45 and P-PyPI 1 states, P-npm 45: “Also, a highly used trivial package is probable to be well maintained.” and P-PyPI 1: “The simple advantages are that they may be trivial AND used by many people and therefore potentially maintained by developers.” Even tasks such as bug fixes are dealt with by the contributors of the trivial packages, which is very attractive to the users of the trivial packages, as reported by participant P-npm 80: “[...], leveraging feedback from a larger community to fix bugs, etc.” + + +R4. +Improved readability & reduced complexity: + Participants also reported that using trivial packages improves the readability and reduces the complexity of their code. +with 9.1% and 13% responses for the two package management platforms. For example, P-npm 34 states: “immediate clarity of use and readability for other developers for commonly used packages[...].” & P-npm 47 states: “Simple abstract brings less complexity.” Python developers report the same advantage of using trivial packages. For example, P-PyPI 5 states that “Code clarity. When many two liners become one liners it saves space. Its the whole point of batteries included mentally...” + + +R5. Better performance: A few of the JavaScript participants (3.4%) stated that using trivial packages improves performance since it alleviates the need for their application to depend on large frameworks. Notably, the load time of trivial packages compared to larger JavaScript packages is small, which speeds up the overall load time of the applications. For example, P-npm 35 states: “[...] you do not depend on some huge utility library of which you do not need the most part.” While JavaScript developers reported that trivial packages improve the performance, the Python developers do not report such a claim. One explanation for this is that JavaScript is used to develop front-end applications, which is often sensitive to performance i.e., load time, whereas the Python is used to implement applications in a wide variety of domains. + + +Overall the developer responses show that there is a different perception of using trivial package among developers from the two package management platforms. Only a small percentage (8.0%) of the respondents from JavaScript stated that they do not see a reason to use trivial packages. However, for Python developers 18.9% of the respondents believe that there are no advantages of using trivial packages. + + +5.3 Drawbacks of Using Trivial Packages + + +In addition to knowing the reasons why developers resort to trivial packages, we wanted to understand the other side of the coin - what they perceive to be the drawbacks of their decision to use these packages. The drawbacks question was part of our survey and we followed the same aforementioned process to analyze the survey responses. In the case of the drawbacks the Cohen’s Kappa agreement measure was 0.86 and 0.91 for npm and PyPI, respectively, which is considered to be an excellent agreement. + + +Table 4 lists the drawback mentioned by the survey respondents along with a brief description and the frequency of each drawback. As we can see from the table, the top two most cited drawbacks (i.e., dependency overhead and breakage of applications) are the same for both, npm and PyPI. However, for the less cited drawbacks, npm developers cited performance, development slow down and missed learning opportunities as the next set of drawbacks, whereas in PyPI, the developers consider security, development slow down and decreased performance as the next set of drawbacks. It is worth noting however that there is very little difference between the individual drawbacks (e.g., security vs. development +Table 4 Drawback of using trivial packages in npm and PyPI + + +| Drawback | Description | npm | % | Python | % | +|------------------------|-----------------------------------------------------------------------------|------|------|--------|------| +| Dependency overhead | Using trivial packages results in a dependency mess that is hard to update and maintain. | 49 | 55.7%| 25 | 67.6%| +| Breakage of applications | Depending on a trivial package could cause the application to break if the package becomes unavailable or has a breaking update. | 16 | 18.2%| 12 | 32.4%| +| Decreased performance | Trivial packages decrease the performance of applications, which includes the time to install and build the application. | 14 | 15.9%| 3 | 8.1% | +| Slows development | Finding a relevant and high quality trivial package is a challenging and time consuming task. | 11 | 12.5%| 4 | 10.8%| +| Missed learning opportu | The practice of using trivial packages leads to developers not learning and experiencing writing code for trivial tasks. | 8 | 9.1% | 0 | 0% | +| Security | Using trivial packages can open a door for security vulnerability. | 7 | 8.0% | 5 | 13.5%| +| Licensing issues | Using trivial packages could cause licensing conflicts. | 3 | 3.4% | 2 | 5.4% | +| No drawbacks | – | 7 | 8.0% | 3 | 8.1% | + + +slow down) within the two package management platforms (i.e., npm and PyPI). Next, we discuss each of the drawbacks in more detail: + + +D1. Dependency overhead: The most cited drawback of using trivial packages is the increased dependency overhead, e.g., keeping all dependencies up to date and dealing with complex dependency chains, that developers need to bear (Bogart et al. 2016; Mirhosseini and Parnin 2017). This situation is often referred to as ‘dependency hell’, especially when the trivial packages themselves have additional dependencies. This drawback came through clearly in many comments, which account for 55.7% of the responses form JavaScript developers. For example, P-npm 41 states: “[...] people who don’t actively manage their dependency versions could [be] exposed to serious problems [...]” & P-npm 40: “Hard to maintain a lot of tiny packages”. For Python developers, the percentage of responses related to dependency overhead is high (67.6%) as well. Some example responses from Python developers that mention these issues are stated by participants P-PyPI 2, P-PyPI 4 & P-PyPI 13 who state that: P-PyPI 2: “...it’s more difficult to distribute something with a dependency that doesn’t come with Python.”, P-PyPI 4: “Lots of brittle dependencies.” & P-PyPI 13: “When your projects consist of a lot trivial modules, it becomes almost impossible to track their update and some time you might forget what even they do.” Hence, while trivial packages may provide well-implemented/tested code and improve productivity, developers are clearly aware that the management of the additional dependencies is something they need to deal with. +D2. +Breakage of applications: + Developers also worry about the potential breakage of their application due to a specific package or version becoming unavailable. JavaScript developers stated this issue in 18.2% of the responses while the percentage is 32.4% for Python developers. For example, in the left-pad issue, the main reason for the breakage was the removal of left-pad, P-npm 4 states: “Obviously the whole ‘left-pad crash’ exposed an issue” & P-PyPI 22 states: “potential for breaking (NPM leftpad situation)”. However, since that incident, npm has disabled the possibility of a package being removed (npm Blog 2016). Although disallowing the removal solves part of the problem, packages can still be updated, which may break an application. This issue was clear from one of the responses, P-PyPI 7, who stated “Potential for breaking changes from version to version.” For a non-trivial package, it may be worth it to take the risk, however, for trivial packages, it may not be worth taking such a risk. + + +D3. +Decreased performance: + This issue is related to the dependency overhead drawback. Developers mentioned that incurring the additional dependencies slowed down the build and run time and increased application installation times (15.9% and 8.1%). For example, P-npm 64 states: “Too many metadata to download and store than a real code.” & P-npm 34 states: “[...], slow installs; can make project noisy and unintuitive by attempting to cobble together too many disparate pieces instead of more targeted code.” Another Python developer, P-PyPI 1, states: “If the modules are not so ubiquitous, then needing the dependency is a real drag as one will have to install it. Also, the same job done with your own may run much faster and be easier to understand. As mentioned earlier, in some cases it is not just the fact that the trivial package adds a dependency, but in some cases the trivial package itself depends on additional packages, which negatively impacts performance even further. + + +D4. +Slows development: + In some cases, the use of trivial packages may actually have a reverse effect and slow down development with 12.5% & 10.8% of responses from JavaScript and Python developers. For example, as P-npm 23 and P-npm 15 state: P-npm 23: “Can actually slow the team down as, no matter how trivial a package, if a developer hasn’t required it themselves they will have to read the docs in order to double check what it does, rather than just reading a few lines of your own source.” & P-npm 15: “[...], we have the problem of locating packages that are both useful and “trustworthy” [...].” It can be difficult to find a relevant and trustworthy package. Even if others try to build on your code, it is much more difficult to go fetch a package and learn it, rather than read a few lines of your code. Python developers also agree on this issue, for example P-PyPI 15 states “If finding, reading, and understanding the documentation of a module takes longer than reading its implementation, the hiding of functionality in third-part trivial modules obscures the source base.” + + +D5. +Missed learning opportunities: + In certain cases reported by only JavaScript developers (9.1%), the use of these trivial packages is seen as a missed learning opportunity for developers. For example, P-npm 24 states: “Sometimes people forget how to do things and that could lead to a lack of control and knowledge of the language/technology you are using”. This is a clear example of where just using a package, rather than coding the solution yourself, will lead to less knowledge about the code base. In contrast to JavaScript developers, Python developers seem to not to be worried about this issue since the use of trivial packages is not as common within the Python developer community as JavaScript developers. + + +D6. +Security: + In some cases the trivial packages may have security flaws that make the application more vulnerable. This is an issue pointed out by a few developers (8.0% and 13.5%), for example, as P-npm 15 mentioned earlier, it is difficult to find +packages that are trustworthy. Also, P-npm 57 mentions: “If you depend on public trivial packages then you should be very careful when selecting packages for security reasons” & P-PyPI 3 states “more dependencies, greater likelihood of not knowing of how code actually works at lower level, security issues.” As in the case of any dependency one takes on, there is always a chance that a security vulnerability could be exposed in one of these packages. + + +D7. Licensing issues (3.4%): In some cases from both responses (3.4% and 5.4% for JavaScript and Python), developers are concerned about potential licensing conflicts that trivial packages may cause. For example, P-npm 73 states: “[...], possibly license-issues”, P-npm 62: “[...], there is a risk that the ‘trivial’ package might be licensed under the GPL must be replaced anyway prior to shipping.” P-PyPI 23 also mentions “Can be licensing hell.” + + +In general, we observe similar concerns regarding the use of trivial packages in the two software managements platforms studied. There were also approximately 8% of the responses in both package management platforms that stated they do not see any drawbacks with using trivial packages. +---------------------------------------- +------------------------------- +Section 220: +6 Putting Developer Perceptions Under the Microscope + + +The developer surveys provided us with valuable insights on why developers use trivial packages and what they perceive to be their drawbacks. Whether there is empirical evidence to support their perceptions remains unexplored. Thus, we examine the most commonly cited reason for using trivial packages, i.e., the developers’ belief that trivial packages are well tested, and drawback, i.e., the impact of additional dependencies, based on our findings in Section 5. + + +6.1 Examining the ‘Well Tested’ Perception + + +As shown in Table 3, more than half of the responses from the studied package management platforms indicate that they use trivial packages because developers believe that they are well implemented and tested. However, is this really the case - are trivial packages really well tested? In this section, we want to examine whether this belief has any grounds or not. + + +6.1.1 Node Package Manager (npm) + + +npm requires that developers provide a test script name with the submission of their packages (listed in the package.json file). In fact, 73.7% (59,110 out of 80,232) of the trivial packages in our dataset have some test script name listed. However, since developers can provide any script name under this field, it is difficult to know if a package is actually tested. + + +We examine whether a npm package is really well tested and implemented from two aspects; first, we check if a package has tests written for it. Second, since in many cases, developers consider packages to be ‘deployment tested’, which means that the trivial +packages are used by many developers, we also consider the usage of a package as an indicator of it being well tested and implemented (Zambonini 2011). To carefully examine whether a package is really well tested and implemented, we use the npm online search tool (known as npms (Cruz and Duarte 2017)) to measure various metrics related to how well the packages are tested, used and valued. To provide its ranking of the packages, npms mines and calculates a number of metrics based on development (e.g., tests) and usage (e.g., no. of downloads) data. We use three metrics measured by npms to validate the ‘well tested and implemented’ perception of developers, which are: + + +1) +Tests: + considers the tests’ size, coverage percentage and build status for a project. We looked into the npms source code and found that the Tests metric is calculated as: +[ \text{testsSize} \times 0.6 + \text{buildStatus} \times 0.25 + \text{coveragePercentage} \times 0.15. ] +We use the Tests metric to determine if a package is tested and how trivial packages compare to non-trivial packages in terms of how well tested they are. One example that motivates us to investigate how well tested a trivial package is the response by P-npm 68, who says: “Tests already written, a lot edge cases captured [...].” + + +2) +Community interest: + evaluates the community interest in the packages, using the number of stars on GitHub & npm, forks, subscribers and contributors. Once again, we find through the source code of npms that Community interest is simply the sum of the aforementioned metrics, measured as: +[ \text{starsCount} + \text{forksCount} + \text{subscribersCount} + \text{contributorsCount}. ] +We use this metric to compare how interested the community is in trivial and non-trivial packages. We measure the community interest since developers view the importance of the trivial packages as evidence of its quality as stated by P-npm 56, who says: “[...] Using an isolated module that is well-tested and vetted by a large community helps to mitigate the chance of small bugs creeping in.” + + +3) +Download count: + measures the mean downloads for the last three months. Again, the number of downloads of a package is often viewed as an indicator of the package’s quality; as P-npm 61 mentions: “this code is tested and used by many, which makes it more trustful and reliable”. + + +As an initial step, we calculate the number of trivial packages that have a Tests value greater than zero, which means trivial packages that have some tests. We find that only 28.4% of the trivial packages have tests, i.e., a Tests value > 0. In addition, we compare the values of the Tests, Community interest and Download count for Trivial and non-Trivial packages. Our focus is on the values of the aforementioned metric values for trivial packages, however, we also present the results for non-trivial packages to put our results in context. + + +Figure 5 shows the bean-plots for the Tests, Community interest and Download count. In all cases trivial packages have, on median, a smaller Community interest value and Download count compared to non-trivial packages except for the Tests value. The Fig. 5a shows that for the Tests metric, trivial packages have, on median, a similar value as non-trivial packages. That said, we observe from Fig. 5a that the distribution of the Tests metric is similar for both, trivial and non-trivial packages. Most packages have a Tests value of zero, then there are small pockets of packages that have values of aprox. 0.30, + + + + +3It is important to note that the motivation and full derivation (e.g., why they put a weight of 0.15 on the test coverage, etc.) of the metrics is beyond the scope of this paper. We refer interested readers to the npms documentation for more details (Cruz and Duarte 2017). To make our paper self-sufficient, we include how the metrics are calculated here. +0.6, 0.9 and 1.0. In the case of the Community interest and Download count metrics, once again, we see similar distributions, although clearly the median values are lower for trivial packages. + + +To examine whether the difference in metric values between trivial and non-trivial packages is statistically significant, we performed a Mann-Whitney test to compare the two distributions and determine if the difference is statistically significant, with a $p$-value < 0.05. We also use Cliff’s Delta ($d$), which is a non-parametric effect size measure to interpret the effect size between trivial and non-trivial packages. As suggested in Grissom and Kim (2005), we interpret the effect size value to be small for $d < 0.33$ (positive as well as negative values), medium for $0.33 \leq d < 0.474$ and large for $d \geq 0.474$. + + +Table 5 shows the $p$-values and effect size values. We observe that in all cases the differences are statistically significant, however, the effect size is small. The results show that although the majority of trivial packages do not have tests written for them, and have statistically lower Community interest and Download count values, their effect size is smaller than non-trivial packages. +---------------------------------------- +------------------------------- +Section 221: +6.1.2 Python Package Index (PyPI) + + +Since PyPI does not collect any metadata to show if the Python package is tested or not, we use other data sources to examine the well tested perception. To do so, we use two ways to examine whether Python packages are tested or not: 1) we use the source code of the packages that are hosted on GitHub. 2) we relied on information about Python packages + + +| Metrics | $p$-value | $d$ | +|------------------|-----------|--------------| +| Tests | 2.2e-16 | $-0.222$ (small) | +| Community interest | 2.2e-16 | $-0.225$ (small) | +| Downloads count | 2.2e-16 | $-0.261$ (small) | +collected by the open source service libraries.io (https://libraries.io/). libraries.io monitors and collects the metadata of open source packages across 36 different package management platforms. It falls under the CC-BY-SA 4.0 licenses and has been used in other research work (e.g., Decan et al. 2018a, b). We obtain the extracted metadata information related to PyPI package management. Once again, we examine the testing perception in three complementary ways. + + +1) +Tests: + we examine if the package has any test code written. Since there is no standard way to determine that a Python application has tests (e.g., there exist more than 100 Python testing tools (https://wiki.python.org/moin/PythonTestingToolsTaxonomy)), we manually investigate whether the PyPI package contains test code written or not. The idea is that if the developers writes tests, then they will put these tests in the package repository. One example that motivated us to look for the test code of a package is the developer response: P-PyPI 11 who stated “Shorter code overall, well-tested code for fundamental tasks helps smooth over language nits”. + + +Since this is a heavily manual process, we decide to examine a representative sample of the packages. Therefore, we take a statistically significant sample from the 6,759 Python packages that we identify as trivial Python packages (Section 4.1). The sample size is selected randomly to attain 5% confidence interval and a 95% confidence level. This sampling process result in 364 PyPI trivial packages. Then, two of the authors manually examine the code bases of sampled packages looking for test code to identify the packages that has test. After that, we measure Cohen’s Kappa coefficient to evaluate the level of agreement between the two annotators (Cohen 1960). As a result of this process, we find that the level of agreement between the two authors to be 0.97, which is consider to be excellent agreement. Finally, the two authors discuss the cases that they do not agree on and come to an agreement. + + +2) +Community interest: + evaluates the community interest in the packages, using the number of stars on GitHub, forks, subscribers and contributors. We adopted the same formula defined by npms, which is basically the sum of the aforementioned metrics, measured as: ( \text{starsCount} + \text{forksCount} + \text{subscribersCount} + \text{contributorsCount} ). We use this metric to compare how interested the community is in trivial and non-trivial packages. We measure the community interest since developers view the importance of the trivial packages as evidence of its quality. + + +3) +Usage count: + represents the number of applications that use a package. The more applications using a Python package, the more popular that package is. This may also indicate that the package is of high quality. For example, P-PyPI 11 indicated “The simple advantages are that they may be trivial AND used by many people and therefore potentially maintained by developers.” Hence, we use the usage count metric since it indicates the package quality; thus, many developers use it in their applications. To calculate the number of Python applications that use PyPI trivial packages, we use the libraries.io dataset that provides a list of Python applications and the packages they depend on. Also, for each PyPI package in our dataset, we count the number of Python applications that use that package. + + +We found that out of the 364 sampled trivial Python packages that we manually examined, 185 (50.82%) packages do not have test code in them, while 179 (49.18%) of the examined packages have test code written in them. It is important to note that our analysis only examines whether a trivial package has tests or not, whether these tests are actually effective is a completely different issue and is one of the reasons for examining the other two metrics Community interest and Usage count. +Figure 6 shows the bean-plots for the Community interest and Usage count values for trivial and non-trivial Python packages in our dataset. The figures show that in the two cases trivial Python packages have, on median, a smaller Community interest value and Usage count compared to non-trivial packages. That said, we observe from Fig. 6a that in the case of the Community interest metric, we see clearly the median values are lower for trivial packages. Figure 6b shows that the distribution of the Usage count metric is similar for both, trivial and non-trivial packages. Once again, we examine whether the difference in metric values between trivial and non-trivial packages is statistically significant. We performed a Mann-Whitney test to compare the two distributions and determine if the difference is statistically significant. We also use Cliff’s Delta ($d$) to measure the effect size between PyPI trivial and non-trivial packages. Table 6 shows the $p$-values and effect size values. We observe that in the cases of community interest and usage count, the differences are statistically significant, and the effect size is small and negligible, respectively. + + +6.2 Examining the ‘Dependency Overhead’ Perception + + +As discussed in Section 5, the top cited drawback of using trivial packages is that developers need to take on and maintain extra dependencies, i.e, dependency overhead. Examining the impact of dependencies is a complex and well-studied issue (e.g., de Souza and Redmiles 2008; Decan et al. 2016; Abate et al. 2009) that can be examined in a multitude of ways. We choose to examine the issue from both, the application and the package perspectives. + + +6.2.1 Application-level Analysis + + +When compared to coding trivial tasks themselves, using a trivial package imposes extra dependencies. One of the most problematic aspects of managing dependencies for +Table 6 Mann-Whitney Test (p-value) and Cliff’s Delta (d) for trivial vs. non trivial packages in PyPI + + +| Metrics | p-value | d | +|------------------|-----------|------------| +| Community interest | 2.2e-16 | −0.251 (small) | +| Usage count | 0.004557 | −0.039 (negligible) | + + +Applications is when these dependencies are updated, causing a potential to break their application. Therefore, as a first step, we examined the number of releases for trivial and non-trivial packages. The intuition here is that developers need to put in extra effort to ensure the proper integration of new releases. The bean-plots in Figs. 7 & 8 show the distribution of the number of releases for our studied package management platforms. Figure 7a shows that trivial packages on npm have less releases than non-trivial packages (median is 1 for trivial and 2 for non-trivial packages). However, when we examine the number of different release types, we found that trivial and non-trivial npm packages have similar numbers of minor and major releases (Fig. 7c & b). As for the patch releases, trivial npm packages have less patch releases. In Fig. 8a, we also observe that trivial packages on PyPI have less releases than non-trivial packages. We again examine the number of releases of PyPI packages based on the release type. Figures 8b, c, and d show the distribution of minor, major, and patch releases for trivial and non-trivial PyPI packages. From Fig. 8b and c, we do not see any difference between trivial and non-trivial packages for the minor and major releases. As for the patch releases, we observe that trivial PyPI packages have a smaller number of patch releases. The fact that the trivial packages are updated less frequently may be attributed to the fact that trivial packages ‘perform less functionality’, hence they need to be updated less frequently. In addition, to examine whether the differences in the distribution of the type of releases between trivial and non-trivial packages are statistically significant, we performed a Wilcox test. We also use Cliff’s Delta (d) to examine the effect size. Table 7 shows the p-values and the effect size for all the releases types for npm and PyPI. It shows that for all the releases types the differences are statistically significant, having p-values < 0.05. Also, the effect size values are small or negligible. + + +Next, we examined how developers choose to deal with the updates of trivial packages. One way that application developers reduce the risk of a package impacting their application is to ‘version lock’ the package. For example in the JavaScript application that use npm packages, version locking a dependency/package means that it is not updated automatically, and that only the specific version mentioned in the packages.json file is used. As stated in a few responses from our survey, e.g., P-npm 8: “[...] Also, people who don’t lock...” + + +Fig. 7 Distribution of different types of releases for trivial and non-trivial npm packages +down their versions are in for some pain”. In general, there are different types of version locks, i.e., only updating major releases, updating patches only, updating minor releases or no lock at all, which means the package automatically updates. The version locks are specified in a configuration file next to every package name for example npm defines it in the packages.json file. We examined the frequency at which trivial and non-trivial packages are locked. For npm, we find that on average, trivial packages are locked 26.3% of the time, whereas non-trivial packages are locked 28.2% of the time. The Wilcox test also shows that the difference is statistically significant $p$-value $< 0.05$ ($p$-value $= 9.116e-07$). On the other hand, in PyPI, we find that on average, trivial packages are locked 31.7% of the time, whereas non-trivial packages are locked 36.2% of the time. Also, the Wilcox test shows that the difference is statistically significant with $p$-value $= 9.707e-08$. + + +Our findings show that trivial packages are locked less in npm and the same is true in PyPI where trivial packages are locked less than non-trivial packages. In both cases however, we find that there is not a large difference between the percentage of packages (trivial vs. non-trivial) being locked. +---------------------------------------- +------------------------------- +Section 222: +6.2.2 Package-level Analysis + + +At the package level, we investigate the direct and indirect dependencies of trivial packages. In particular, we would like to determine if the trivial packages have their own dependencies, which makes the dependency chain even more complex. For each trivial and non-trivial package on npm, we install it and then count the actual number of (direct and indirect) dependencies that the package requires. Doing so, allows us to know the true (direct and indirect) dependencies that each package requires. Note that simply looking into the .json + + +| Release type | npm $p$-value | $d$ (small) | PyPI $p$-value | $d$ (small) | +|--------------|---------------|-------------|---------------|-------------| +| All | 2.2e-16 | -0.2016 | 2.2e-16 | -0.2995 | +| Minor | 2.2e-16 | -0.0823 | 2.2e-16 | -0.2447 | +| Major | 2.2e-16 | -0.1185 | 2.2e-16 | -0.1276 | +| Patch | 2.2e-16 | -0.1985 | 2.2e-16 | -0.2729 | +file and the +require + statements will provide the direct dependencies, but not the indirect dependencies. Hence, we downloaded all the packages in our +npm + dataset, mock installed(^4) them and build the dependency graph for the +npm + platform. + + +Similarly, for +PyPI +, we count the actual number of (direct and indirect) dependencies that the package requires. To do so, we leveraged the metadata provided by Valiev et al. (2018). In their study, Valiev et al. extracted the list of direct and indirect dependencies of each package on +PyPI +. We resort to use the data provided in Valiev et al. (2018) since it is recently extracted data and covers the history of +PyPI + for more than six years. We then read the dependencies of each package and build a dependency graph for the +PyPI + platform. + + +Figure 9 shows the distribution of dependencies for trivial and non-trivial packages for the +npm + and +PyPI +. Since most trivial packages have no dependencies, the median is zero. Therefore, we bin the trivial packages based on the number of their dependencies and calculate the percentage of packages in each bin. + + +Table 8 shows the percentage of packages and their respective number of dependencies for both +npm + and +PyPI +. We observe that the majority of +npm + trivial packages (56.9%) have zero dependencies, 21% have between 1-10 dependencies, 3.8% have between 11-20 dependencies, and 18.4% have more than 20 dependencies. The table also shows that +PyPI + trivial packages do not have as much dependencies as the +npm + packages. In fact 63.2% of +PyPI + packages have zero dependencies and approx. 34% of trivial packages have between 1-20 dependencies. Only approx. 3% of the +PyPI + trivial packages have more than 20 dependencies. Interestingly, the table shows that some of the trivial packages in +npm + have many dependencies, which indicates that indeed, trivial packages can introduce significant dependency overhead. It also shows that +PyPI + trivial packages have small number of dependencies. One explanation of such a difference is that Python language has a more mature standard API that provides most of the needed utility functionalities. + + +(^4)we modified the +npm + code to intercept the install call and counted the installations needed for every package. +Table 8 Percentage of packages vs. the number of dependencies used in the npm and PyPI package management platforms + + +| Packages | npm # Dependencies (Direct & Indirect) | PyPI # Dependencies (Direct & Indirect) | +|------------|----------------------------------------|----------------------------------------| +| | 0 | 1-10 | 11-20 | >20 | 0 | 1-10 | 11-20 | >20 | +| Trivial | 56.9% | 21% | 3.8% | 18.4% | 63.2% | 29.6% | 4.3% | 2.9% | +| Non Trivial| 37.1% | 24.1% | 6.8% | 32.1% | 42.5% | 39.4% | 10.7% | 7.4% | + + +Trivial packages have fewer releases and are less likely to be version locked than non-trivial packages. That said, developers should be careful when using trivial packages, since in some cases, trivial packages can have numerous dependencies. In fact, we find that 43.4% of npm trivial packages have at least one dependency and 18.4% of npm trivial packages have more than 20 dependencies while 36.8% of PyPI trivial packages have at least one dependency and 2.9% of PyPI trivial packages have more than 20 dependencies. +---------------------------------------- +------------------------------- +Section 223: +7 Relevance and Implications + + +A common question that is asked in empirical studies is - so what? what are the implications of your findings? why would practitioners care about your findings? We discuss the issue of relevance of our study to the developer community, based on the responses of our survey and highlight some of the implications of our study. + + +7.1 Relevance: Do Practitioners care? + + +At the start of the study, we were not sure how practically relevant our study of trivial packages is. However, we were surprised by the interest of developers in our study. In fact, one of the developers (P-npm 39) explicitly mentioned the lack of research on this topic, stating “There has not been enough research on this, but I’ve been taking note of people’s proposed “quick and simple” code to handle the functionality of trivial packages, and it’s surprised me to see the high percentage of times the proposed code is buggy or incomplete.” + + +Moreover, when we conducted our studies, we asked respondents if they would like to know the outcome of our study and if so, they provide us with an email address. Of the 125 JavaScript and Python respondents, 81 (aprox. 65%) of them provided their email for us to provide them with the outcomes of our study. Some of these respondents hold very high level leadership roles in npm. To us this is an indicator that our study and its outcomes are of high relevance to the JavaScript and Python development communities. + + +7.2 Implications of Our Study + + +Our study has a number of implications on both software engineering practice and research. +7.2.1 Practical Implications + + +A direct implication of our findings is that trivial packages are commonly used by others, perhaps indicating that developers do not view their use as a bad practice, especially JavaScript developers. Moreover, developers should not assume that all trivial packages are well implemented and tested, since our findings show otherwise. npm developers need to expect more trivial packages to be submitted, making the task of finding the most relevant package even harder. Hence, the issue of how to manage and help developers find the best packages needs to be addressed. For example P-npm 15 indicated that “... we have the problem of locating packages that are both useful and ‘trustworthy’ in an ever growing sea of packages.” To some extent, npms has been recently adopted by npm to specifically address the aforementioned issue. Developers highlighted that the lack of a decent core or standard JavaScript library causes them to resort to trivial packages. Often, they do not want to install large frameworks just to leverage small parts of the framework, hence they resort to using trivial packages. For example, P-npm 35 “especially in JavaScript relieves you from thinking about cross browser compatibility for special cases/coming up with polyfills and testing all edge cases yourself. Basically it’s a substitute for the missing standard library. And you do not depend on some huge utility library of which you do not need the most part” & P-PyPI 23 “Usually an indication of the inadequacy of the standard library. This seems particularly so of JavaScript where you might find yourself using many such modules.” Therefore, there is a need by the JavaScript community to create a standard JavaScript API or library in order to reduce the dependence on trivial packages. This issue of creating such a standard JavaScript library is under much debate (Fuchs 2016). + + +7.2.2 Implications for Future Research + + +Our study mostly focused on determining the prevalence, reasons for and drawbacks of using trivial packages in two large package management platforms npm and PyPI. Based on our findings, we find a number of implications and motivations for future work. First, our survey respondents indicated that the choice to use trivial packages is not black or white. In many cases, it depends on the team and the application. For example, one survey respondent stated that on his team, less experienced developers are more likely to use trivial packages, whereas the more experienced developers would rather write their own code for trivial tasks. The issue here is that the experienced developers are more likely to trust their own code, while the less experienced are more likely to trust an external package. Another aspect is the maturity of the application. As some of the survey respondents pointed out, they are much more likely to use trivial packages early on in the development life cycle, so they do not waste time on trivial tasks and focus on the more fundamental tasks of their application. Once their application matures, they start to look for ways to reduce dependencies since they pose potential points of failure for their application. Our study motivates future work to examine the relationship between team experience and application maturity and the use of trivial packages. + + +Second, survey respondents also pointed out that using trivial packages is seen favourably compared to using code from Questions & Answers (Q&A) sites such as StackOverflow or Reddit. For example, P-npm 84 stated that “I’d have to do research on how to solve a particular problem, peruse questions and answers on StackOverflow, Reddit, or Coderanch, and find the most recent and readable solution among everything I’ve found, then write it myself. Why go through all of this work when you can simply ‘require()’ someone else’s solution and continue working towards your goal in a matter of seconds?” When compared +to using code on StackOverflow, where the developer does not know who posted the code, who else uses it or whether the code may have tests or not, using a trivial package that is on npm and/or PyPI is seen as much better option. In this case, using trivial packages is not seen as the best choice, but it is certainly a better choice. Although there have been many studies that examined how developers use Q&A sites such as StackOverflow (Abdalkareem et al. 2017a, b; Wu et al. 2018; Baltes and Diehl 2018), we are not aware of any studies that compare code reuse from Q&A sites and trivial packages. Our findings indicate the need for such a study. +---------------------------------------- +------------------------------- +Section 224: +8 Related Work + + +In this section, we discuss the work that is related to our study. We divided the related work to work related to code reuse in general and work studied software ecosystems. + + +8.1 Studies of Code Reuse + + +Prior research on code reuse has shown its many benefits, which include improving quality, development speed, and reducing development and maintenance costs (Mockus 2007; Lim 1994; Mohagheghi et al. 2004; Basili et al. 1996). For example, Sojer and Henkel (2010) surveyed 686 open source developers to investigate how they reuse code. Their findings show that more experienced developers reuse source code and 30% of the functionality of open source software (OSS) projects reuse existing components. Developers also reveal that they see code reuse as a quick way to start new projects. Similarly, Haefliger et al. (2008) conducted a study to empirically investigate the reuse in open source software, and the development practices of developers in OSS. They triangulated three sources of data (developer interviews, code inspections and mailing list data) of six OSS projects. Their results showed that developers used tools and relied on standards when reusing components. Mockus (2007) conducted an empirical study to identify large-scale reuse of open source libraries. Their study shows that more than 50% of source files include code from other OSS libraries. On the other hand, the practice of reusing source code has some challenging drawbacks including the effort and resource required to integrate reused code (Di Cosmo et al. 2011). Furthermore, a bug in the reused component could propagate to the target system (Dogguy et al. 2011). While our study corroborates some of these findings, the main goal is to define and empirically investigate the phenomenon of reusing trivial packages, in particular in JavaScript and Python applications. + + +8.2 Studies of Software Ecosystems + + +In recent years, analyzing the characteristics of ecosystems in software engineering has gained momentum (Bavota et al. 2013; Bloemen et al. 2014; Manikas 2016; Decan et al. 2016). For example, in a recent study, Bogart et al. (2015) and Bogart et al. (2016) empirically studied three ecosystems, including npm, and found that developers struggle with changing versions as they might break dependent code. Wittern et al. (2016) investigated the evolution of the npm ecosystem in an extensive study that covers the dependence between npm packages, download metrics and the usage of npm packages in real applications. One of their main findings is that npm packages and updates of these packages are steadily growing. More than 80% of packages have at least one direct dependency. +Other studies examined the size characteristics of packages in an ecosystem. German et al. (2013) studied the evolution of the statistical computing project GNU R, with the aim of analyzing the differences between code characteristics of core and user-contributed packages. They found that user-contributed packages are growing faster than core packages. Additionally, they reported that user-contributed packages are typically smaller than core packages in the R ecosystem. Kabbedijk and Jansen (2011) analyzed the Ruby ecosystem and found that many small and large projects are interconnected. Decan et al. (2018b) investigated the evolution of package dependency networks for seven packaging ecosystems. Their findings reveal that the studied packaging ecosystems grow over time in term of number of published and updated packages. They also observed that there is an increasing number of transitive dependencies for some packages. + + +Other works investigate the challenges of using external packages of a software ecosystem including; identify conflicts between JavaScript package (Patra et al. 2018), examine how pull requests help developers to upgrade out-of-date dependencies in their applications (Mirhosseini and Parnin 2017), study the usage of repository badges in the npm ecosystem (Trockman et al. 2018), and the usage of dependency graph to discover hidden trend in an ecosystem (Kula et al. 2018). + + +In many ways, our study complements the previous work since, instead of focusing on all packages in an ecosystem, we specifically focus on trivial packages and we studied them in two different package management platforms npm and PyPI. Moreover, we examine the reasons developers use trivial package and what they view as their drawbacks. We study the reuse of trivial packages, which is a subset of general code reuse. Hence, we do expect there to be some overlap with prior work. Like many empirical studies, we confirm some of the prior findings, which is a contribution on its own (Hunter 2001; Seaman 1999). Moreover, our paper adds to the prior findings through, for example, our validation of the developers’ assumptions. Lastly, we do believe our study fills a real gap since 65% of the participants said they wanted to know our study outcomes. +---------------------------------------- +------------------------------- +Section 225: +9 Threats to Validity + + +In this section, we discuss the threats to the validity of our case study. + + +9.1 Internal Validity + + +Internal validity concerns factors that may have influenced our results such as our datasets collection process. To study the reasons for and drawback of using trivial packages, we surveyed developers. There is potential that our survey questions may have influenced the replies from the respondents. However, to minimize such influence, we made sure to ask for free-form responses and we publicly share our survey and all of our anonymized survey responses (Abdalkareem et al. 2019). Moreover, the way we asked the survey questions might have affected the response from our respondents, causing their responses to advocate or not advocate the use of trivial packages. To reduce this bias, we ensure participants’ anonymity. Also, our study may be impacted by the fact that an overlap does not exist between the developer groups who participated in the two user studies (i.e., defining trivial packages and understanding developers’ perception about the use of trivial packages). We find that the second survey served as a confirmation of the observations made by the first survey participants, however, given that these are two different populations, they may have reported on different observations. +We removed test code from our dataset to ensure that our analysis only considers production source code. We identified test code by searching for the term ‘test’ (and its variants e.g., ‘TEST_code’) in the file names and file paths. Even though this technique is widely accepted in the literature (Gousios et al. 2014; Tsay et al. 2014; Zhu et al. 2014), to confirm whether our technique is correct, i.e., files that have the term ‘test’ in their names and paths actually contain test code, we took a statistically significant sample of the packages to achieve a 95% confidence level and a 5% confidence interval and examined them manually. We found that in all the examined cases contain test code. + + +In addition, to examine the well-tested perception for the PyPI trivial packages, the first two authors manually examined the source code of the trivial packages to classify whether they have test code written or not. To ensure the validity of our classification, we measure the classification agreement between the two authors. We found that the classification agreement between the two authors to be excellent (Cohen’s Kappa value of 0.97). + + +9.2 Construct Validity + + +Construct validity considers the relationship between theory and observation, in case the measured variables do not measure the actual factors. To define trivial packages, we surveyed 12 JavaScript and 13 Python developers. However, we find that there was consensus for what is considered a trivial package. Although our analysis shows that packages with ( \leq 35 ) LOC and a complexity ( \leq 10 ) are trivial packages, we believe that other definitions are possible for trivial packages. That said, of the 125 survey participants that we emailed about using trivial packages, only 2 mentioned that a flagged package is not a trivial package (even though it fit our criteria). To us, this is a confirmation that our definition applies in the vast majority of cases, although clearly it is not perfect. + + +In addition, to determine what is considered to be a trivial package, we conducted an experiment with JavaScript and Python developers who are mostly students (undergraduate and graduate students) with some professional experience. While this may not present professional developers per se (Sjoberg et al. 2002), prior work has shown that experiment with students will provide the same results as professional developers in software engineering domain (Salman et al. 2015; Höste et al. 2000). + + +To identify the JavaScript and Python applications that we examine in our study, we rely on the metadata provided by the GHTorrent dataset (Gousios et al. 2014). Thus, our selection of JavaScript and Python applications heavily depends on the correctness of the applications’ programming language listed in GHTorrent. + + +We use the LOC and cyclomatic complexity of the code to determine trivial packages. In some cases, these may not be the only measures that need to be considered to determine a trivial packages. For example, some of the trivial packages have their own dependencies, which may need to be taken into consideration. Our experience tells us that most developers only look at the package itself and not at its dependencies when determining if it is trivial or not. That said, when we replicated this questionnaire with another set of participants from the Python language community, we found that developers seem to confirm our definition of trivial JavaScript/Python packages (Abdalkareem et al. 2019). + + +Based on our user study, we defined trivial npm packages as a package that have (<= 35) LOC and Cyclomatic Complexity (<= 10). However, one threat to this definition is that 10 cyclomatic complexity is high for a package to be trivial. To examine this concern, we calculate the cyclomatic complexity of all the non-trivial packages in our dataset and found that on average non-trivial npm packages have a cyclomatic complexity of 803, which indicates +that 10 Cyclomatic complexity value in our definition is still significantly smaller compared to the one for non-trivial packages. + + +To study trivial packages in the PyPI package management platform, we were able to extract 63,912 packages. Collecting more packages may provide more details about trivial packages on the PyPI package management platform. Also, to identify the Python applications that use PyPI trivial packages, we use the snakefood tool (http://furius.ca/snakefood/) to extract the applications dependencies. Hence, we are limited by the accuracy of snakefood in extracting the used packages in Python applications. + + +In our study, to understand why developers use trivial packages, we conducted two user surveys with JavaScript and Python developers. These two surveys were performed on different dates, and as a consequence, may affect the outcome of the survey results. However, given that these two package management platforms are independent, we envision that the impact of this date shift is not significant. + + +In our study, to identify developers who used trivial packages in their applications, we use regular expressions to identify these packages. This process may flag the wrong package by the developers. To mitigate this threat, during our analysis, we make sure that we extract the right packages through several rounds of manual checking of the results. In addition, none of the developers that we contacted indicated that she/he does not use the identified packages, which serves as a slight confirmation that our methodology is not incorrect. + + +In our study on npm, we used npms to measure various quantitative metrics related to testing, community interest and download counts. Our measurements are only as accurate as npms, however, given that it is the main search tool for npm, we are confident in the npms metrics. We also use libraries.io to calculate the community interested and the usage count metrics for PyPI packages, and our measurements are as accurate as libraries.io. We resort to use the libraries.io data since it has been used on other prior work (e.g., Decan et al. 2018a, b). In addition, we use the dataset provided by Valiev et al. (2018) to measure the direct and indirect dependencies of the packages on PyPI. + + +In our analysis, we also use different R packages to perform our analysis, our analysis may be impacted by the accuracy of these used R packages. To mitigate this threat we make our dataset and used tools available online (Abdalkareem et al. 2019). + + +9.3 External Validity + + +External validity considers the generalization of our findings. All of our findings were derived from open source JavaScript applications and npm packages and its replication on Python and PyPI packages. Even though we believe that the two studied package management platforms are amongst the most commonly used ones, our findings may not generalize to other platforms or ecosystems. That said, historical evidence shows that examples of individual cases contributed significantly in areas such as physics, economics, social sciences and even software engineering (Flyvbjerg 2006). We believe that strong empirical evidence is built from both studies on individual cases and studies on large samples. + + +Our list of reasons for and drawbacks of using trivial packages are based on a survey of 88 JavaScript and 37 Python developers. Although this is a large number of developers, our results may not hold for all developers. A different sample of developers may result in a different list or ranking of advantages and disadvantages. To mitigate the risk due to this sampling, we contacted developers from different applications and as our responses show, most of them are experienced developers. + + +We do not distinguish between the domain of studied packages, which may impact the findings. However, to help mitigate any bias we analyzed more than 500,000 npm and +74,663 PyPI packages that cover a wide range of package domains. Lastly, our study is based on open source applications that are hosted on GitHub, therefore, our study may not generalize to other open source or commercial applications. +---------------------------------------- +------------------------------- +Section 226: +10 Conclusion + + +The use of trivial packages is an increasingly popular trend in software development (Abdalkareem et al. 2017; Abdalkareem 2017). Like any development practice, it has its proponents and opponents. The goal of our study is to extend our understanding of the use of trivial packages. We examine the prevalence, reasons, and drawbacks of using trivial packages in different package management platforms. Thus, we consider trivial packages in PyPI in addition to the previous studied npm (Abdalkareem et al. 2017). + + +Our results indicate that trivial packages are commonly and widely used in JavaScript and Python applications. We also find that while the majority of JavaScript developers in our study do not oppose the use of trivial packages, the majority of Python developers believe that using trivial packages could be harmful. Additionally, based on the developers’ responses, developers from the two package management platforms stated that the main reasons for developers to use trivial packages is due to the fact that they are considered to be well implemented and tested. They do cite the additional dependencies’ overhead as a drawback of using these trivial packages. Our empirical study showed that considering trivial packages to be well tested is a misconception since more than half of the studied trivial package do not even have tests. However, these trivial packages seem to be ‘deployment tested’ and have similar Community interest and Download/Usage count values as non-trivial packages. In addition, we find that some of the trivial packages have their own dependencies. In our studied dataset, 18.4% of the npm and 2.9% of the PyPI trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they use. + + +Based on our findings, we provide the following practical suggestions for software developers: + + +– Developers should not assume that trivial packages are well-tested and implemented since we found only 28.4% and 49.2% of npm and PyPI trivial packages have test code. +– Due to the fact that trivial packages have their own dependencies, developers should be aware that using these trivial packages would increase the dependency overhead of their applications. + + +Acknowledgments The authors are grateful to the many survey respondents who dedicated their valuable time to respond to our surveys. Also, the authors would like to thank the anonymous reviewers and the editor for their thoughtful feedback and suggestions that help us improve our study. + + +References + + +Abate P, Di Cosmo R, Boender J, Zacchiroli S (2009) Strong dependencies between software components. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM ’09, IEEE Computer Society, pp 89–99 + + +Abdalkareem R (2017) Reasons and drawbacks of using trivial npm packages: The developers’ perspective. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, ACM, pp 1062–1064 +Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE ’17, ACM, pp 385–395 + + +Abdalkareem R, Oda V, Mujahid S, Shihab E (2019) On the impact of using trivial packages: An empirical case study on npm and pypi. https://doi.org/10.5281/zenodo.3095009 + + +Abdalkareem R, Shihab E, Rilling J (2017) On code reuse from Stack Overflow: An exploratory study on Android apps. Inf Softw Technol 88(C):148–158 + + +Abdalkareem R, Shihab E, Rilling J (2017) What do developers use the crowd for? a study using Stack Overflow. IEEE Softw 34(2):53–60 + + +Baltes S, Diehl S (2018) Usage and attribution of Stack Overflow code snippets in gitHub projects. Empirical Software Engineering + + +Basili VR, Briand LC, Melo WL (1996) How reuse influences productivity in object-oriented systems. Commun ACM 39(10):104–116 + + +Bavota G, Canfora G, Penta MD, Oliveto R, Panichella S (2013) The evolution of project inter-dependencies in a software ecosystem: The case of Apache. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM ’13, IEEE Computer Society, pp 280–289 + + +Blais M snakefood: Python Dependency Graphs. http://furius.ca/snakefood/. (accessed on 09/23/2018) + + +Bloemen R, Amrit C, Kuhlmann S, Ordóñez Matamoros G (2014) Gentoo package dependencies over time. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR ’14, ACM, pp 404–407 + + +Bogart C, Kastner C, Herbsleb J (2015) When it breaks, it breaks: How ecosystem developers reason about the stability of dependencies. In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop, ASEW ’15, IEEE Computer Society, pp 86–89 + + +Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: Cost negotiation and community values in three software ecosystems. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’16, ACM, pp 109–120 + + +Bower (2012) Bower a package manager for the web. https://bower.io/. (accessed on 08/23/2016) + + +Castelluccio M, An L, Khomh F (2019) An empirical study of patch uplift in rapid release development pipelines. Empir Softw Eng 24(5):3008–3044 + + +Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46 + + +Cruz A, Duarte A (2017) npms. https://npms.io/. (accessed on 02/20/2017) + + +de Souza CRB, Redmiles DF (2008) An empirical study of software developers’ management of dependencies and changes. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, ACM, pp 241–250 + + +Decan A, Mens T, Constantinou E (2018a) On the impact of security vulnerabilities in the npm package dependency network. In: International Conference on Mining Software Repositories + + +Decan A, Mens T, Grosjean P (2018b) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering + + +Decan A, Mens T, Grosjean P et al (2016) When github meets CRAN: an analysis of inter-repository package dependency problems. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, volume 1 of SANER ’16, IEEE, pp 493–504 + + +Di Cosmo R, Di Ruscio D, Pelliccione P, Pierantonio A, Zacchiroli S (2011) Supporting software evolution in component-based FOSS systems. Sci Comput Program 76(12):1144–1160 + + +Dogguy M, Glondu S, Le Gall S, Zacchiroli S (2011) Enforcing type-Safe linking using inter-package relationships. Studia Informatica Universalis 9(1):129–157 + + +Ebert C, Cain J (2016) Cyclomatic complexity. IEEE Softw 33(6):27–29 + + +Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33:613–619 + + +Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12(2):219–245 + + +Fuchs T (2016) What if we had a great standard library in JavaScript? – medium. https://medium.com/@thomasfuchs/what-if-we-had-a-great-standard-library-in-javascript-52692342ee3f.pw7d4cq8j. (accessed on 02/24/2017) + + +German D, Adams B, Hassan A (2013) Programming language ecosystems: the evolution of R. In: Proceedings of the 17th European Conference on Software Maintenance and Reengineering, CSMR ’13, IEEE, pp 243–252 + + +Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean ghtorrent: Github data on demand. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR ’14, ACM, pp 384–387 + + +Grissom RJ, Kim JJ (2005) Effect sizes for research: A broad practical approach. Lawrence Erlbaum Associates Publishers +Haefliger S, Von Krogh G, Spaeth S (2008) Code reuse in open source software. Manag Sci 54(1):180–193 + + +Haney D (2016) Npm & left-pad: Have we forgotten how to program? http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/. (accessed on 08/10/2016) + + +Harris R (2015) Small modules: it’s not quite that simple. https://medium.com/@Rich_Harris/small-modules-it-s-not-quite-that-simple-3ca532d65de4. (accessed on 08/24/2016) + + +Hemanth HM (2015) One-line node modules -issue#10- sindresorhus/ama. https://github.com/sindresorhus/ama/issues/10. (accessed on 08/10/2016) + + +Höst M, Regnell B, Wohlin C (2000) Using students as subjects—a comparative study of students and professionals in lead-time impact assessment. Empir Softw Eng 5(3):201–214 + + +Hunter JE (2001) The desperate need for replications. J Consum Res 28(1):149–158 + + +Inoue K, Sasaki Y, Xia P, Manabe Y (2012) Where does this code come from and where does it go? - integrated code history tracker for open source systems -. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, IEEE Press, pp 331–341 + + +Kabbedijk J, Jansen S (2011) Steering insight: An exploration of the Ruby software ecosystem. In: Proceedings of the Second International Conference of Software Business, ICSOB ’11, Springer, pp 44–55 + + +Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining gitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR ’14, ACM, pp 92–101 + + +Kula RG, Roover CD, German DM, Ishio T, Inoue K (2018) A generalized model for visualizing library popularity, adoption, and diffusion within a software ecosystem. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering, volume 00 of SANER ’18, pp 288–299 + + +Libraries.io. Libraries.io - the open source discovery service. https://libraries.io/. (accessed on 05/20/2018) + + +Libraries.io (2017) Pypi. https://libraries.io/pypi. (accessed on 03/08/2017) + + +Lim WC (1994) Effects of reuse on quality, productivity, and economics. IEEE Softw 11(5):23–30 + + +Macdonald F (2016) A programmer almost broke the Internet last week by deleting 11 lines of code. http://www.sciencealert.com/how-a-programmer-almost-broke-the-internet-by-deleting-11-lines-of-code. (accessed on 08/24/2016) + + +Manikas K (2016) Revisiting software ecosystems research: a longitudinal literature study. J Syst Softw 117:84–103 + + +McCamant S, Ernst MD (2003) Predicting problems caused by component upgrades. In: Proceedings of the 9th European Software Engineering Conference Held Jointly with 11th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE ’03, ACM, pp 287–296 + + +Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE ’17, IEEE Press, pp 84–94 + + +Mockus A (2007) Large-scale code reuse in open source software. In: Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development, FLOSS ’07, IEEE Computer Society, p 7– + + +Mohagheghi P, Conradi R, Killi OM, Schwarz H (2004) An empirical study of software reuse vs. defect-density and stability. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, IEEE Computer Society, pp 282–292 + + +npm (2016) What is npm? — node package managment documentation. https://docs.npmjs.com/getting-started/what-is-npm. (accessed on 08/14/2016) + + +npm Blog T (2016) The npm blog changes to npm’s unpublish policy. http://blog.npmjs.org/post/141905368000/changes-to--unpublish-policy. (accessed on 08/11/2016) + + +Orsila H, Geldenhuys J, Ruokonen A, Hammouda I (2008) Update propagation practices in highly reusable open source components. In: Proceedings of the 4th IFIP WG 2.13 International Conference on Open Source Systems, OSS ’08, pp 159–170 + + +Patra J, Dixit PN, M. Pradel (2018) Conflictjs: Finding and understanding conflicts between javaScript libraries. In: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, ACM, pp 741–751 + + +Python Python testing tools taxonomy - python wiki. https://wiki.python.org/moin/PythonTestingToolsTaxonomy. (accessed on 05/16/2018) + + +Rahman MT, Rigby PC, Shihab E (2019) The modular and feature toggle architectures of google chrome. Empir Softw Eng 24(2):826–853 + + +Ray B, Posnett D, Filkov V, Devanbu P (2014) A large scale study of programming languages and code quality in gitHub. In: Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’14, ACM, pp 155–165 +Salman I, Misirli AT, Juristo N (2015) Are students representatives of professionals in software engineering experiments? In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 1 of ICSE ’15, . IEEE, pp 666–676 + + +SciTools Understand tool. https://scitools.com/. (accessed on 04/16/2019) + + +Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572 + + +Singer J, Sim SE, Lethbridge TC (2008) Software engineering data collection for field studies. In: Guide to Advanced Empirical Software Engineering. Springer, london, pp 9–34 + + +Sjoberg DIK, Anda B, Arisholm E, Dyba T, Jorgensen M, Karahasanovic A, Koren EF, Vokac M (2002) Conducting realistic experiments in software engineering. In: Proceedings International Symposium on Empirical Software Engineering, IEEE, pp 17–26 + + +Sojer M, Henkel J (2010) Code reuse in open source software development Quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(12):868–901 + + +Trockman A, Zhou S, Kästner C, Vasilescu B (2018) Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In: Proceedings of the International Conference on Software Engineering, ICSE ’18, ACM + + +Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in gitHub. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, ACM, pp 356–366 + + +Valiev M, Vasilescu B, Herbsleb J (2018) Ecosystem-level determinants of sustained activity in open-source projects A case study of the pyPi ecosystem. In: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ’18. ACM + + +Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in gitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE ’15, ACM, pp 805–816 + + +Williams C (2016) How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript. http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos. (accessed on 08/24/2016) + + +Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the javaScript package ecosystem. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, ACM, pp 351–361 + + +Wu Y, Wang S, Bezemer C-P, Inoue K (2018) How do developers utilize source code from Stack Overflow? Empirical Software Engineering + + +Zambonini D (2011) A Practical Guide to Web App Success, chapter 20. Five Simple Steps. (accessed on 02/23/2017). In: Gregory O (ed) + + +Zhu J, Zhou M, Mockus A (2014) Patterns of folder use and project popularity: A case study of gitHub repositories. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’14, ACM, pp 30:1–30:4 + + +Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. + + +Rabe Abdalkareem is a postdoctoral fellow in the Software Analysis and Intelligence Lab (SAIL) at Queen’s University, Canada. He received his Ph.D. in Computer Science and Software Engineering from Concordia University, Montreal, Canada. His research investigates how the adoption of crowdsourced knowledge affects software development and maintenance. Abdalkareem received his masters in applied Computer Science from Concordia University. His work has been published at premier venues such as FSE, ICSME and Mobile-Soft, as well as in major journals such as TSE, IEEE Software, EMSE and IST. Contact him at rab_abdu@encs.concordia.ca; http://users.encs.concordia.ca/rababdu. +Vinicius Oda is a MASc. student in the Department of Computer Science and Software Engineering at Concordia University, Montreal. His research interests include Software Engineering, Software Ecosystems, and Mining Software Repositories, among others. + + +Suhaib Mujahid is a Ph.D. student in the Department of Computer Science and Software Engineering at Concordia University. He received his masters in Software Engineering from Concordia University (Canada) in 2017. He obtained his Bachelors in Information Systems at Palestine Polytechnic University. His research interests include wearable applications, software quality assurance, mining software repositories and empirical software engineering. You can find more about him at http://users.encs.concordia.ca/smujahi. + + +Emad Shihab is an Associate Professor and Concordia University Research Chair in the Department of Computer Science and Software Engineering at Concordia University. His research interests are in Software Engineering, Mining Software Repositories, and Software Analytics. His work has been published in some of the most prestigious SE venues, including ICSE, ESEC/FSE, MSR, ICSME, EMSE, and TSE. He serves on the steering committees of PROMISE, SANER and MSR, three of the leading conferences in the software analytics areas. His work has been done in collaboration with and adopted by some of the biggest software companies, such as Microsoft, Avaya, BlackBerry, Ericsson and National Bank. He is a senior member of the IEEE. His homepage is: http://das.encs.concordia.ca. +Affiliations + + +Rabe Abdalkareem\textsuperscript{1} \cdot Vinicius Oda\textsuperscript{1} \cdot Suhaib Mujahid\textsuperscript{1} \cdot Emad Shihab\textsuperscript{1} + + +Vinicius Oda +v_oda@encs.concordia.ca + + +Suhaib Mujahid +s_mujahi@encs.concordia.ca + + +Emad Shihab +eshihab@encs.concordia.ca + + +\textsuperscript{1} Data-Driven Analysis of Software (DAS) Lab, Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada +---------------------------------------- +------------------------------- +Section 227: +Understanding the Usage, Impact, and Adoption of Non-OSI Approved Licenses + + +Rômulo Meloca1, Gustavo Pinto2, Leonardo Baiser1, Marco Mattos1, Ivanilton Polato1, Igor Scaliante Wiese1, Daniel M German3 + + +1Federal University of Technology – Paraná (UTFPR), 2University of Pará (UFPA), 3University of Victoria + + +ABSTRACT + + +The software license is one of the most important non-executable pieces of any software system. However, due to its non-technical nature, developers often misuse or misunderstand software licenses. Although previous studies reported problems related to licenses clashes and inconsistencies, in this paper we shed the light on an important but yet overlooked issue: the use of non-approved open-source licenses. Such licenses claim to be open-source, but have not been formally approved by the Open Source Initiative (OSI). When a developer releases a software under a non-approved license, even if the interest is to make it open-source, the original author might not be granting the rights required by those who use the software. To uncover the reasons behind the use of non-approved licenses, we conducted a mix-method study, mining data from 657K open-source projects and their 4,367K versions, and surveying 76 developers that published some of these projects. Although 1,058,554 of the project versions employ at least one non-approved license, non-approved licenses account for 21.51% of license usage. We also observed that it is not uncommon for developers to change from a non-approved to an approved license. When asked, some developers mentioned that this transition was due to a better understanding of the disadvantages of using an non-approved license. This perspective is particularly important since developers often rely on package managers to easily and quickly get their dependencies working. + + +CCS CONCEPTS + + +• Software and its engineering → Open source model; + + +KEYWORDS + + +Open Source Software, Software license, OSI approved + + +ACM Reference Format: + + +Rômulo Meloca1, Gustavo Pinto2, Leonardo Baiser1, Marco Mattos1, Ivanilton Polato1, Igor Scaliante Wiese1, Daniel M German3. 2018. Understanding the Usage, Impact, and Adoption of Non-OSI Approved Licenses. In Proceedings of MSR ’18: 15th International Conference on Mining Software Repositories, Gothenburg, Sweden, May 28–29, 2018 (MSR ’18), 11 pages. https://doi.org/10.1145/3196398.3196427 +---------------------------------------- +------------------------------- +Section 228: +1 INTRODUCTION + + +The software licenses are one of the most important non-executable part of any software system [5]. Particularly relevant to open-source software (OSS), open-source licenses not only drive how one can use an OSS but also ensure to what extent others can reuse it [19]. Similarly to software code, software licenses change [27] and evolve [25]. Software relicensing is, indeed, commonplace in open-source software world [7]. As an example, Facebook recently relicensed four key open-source softwares from BSD + Patents to MIT license1. According to them, this change was motivated by an unhappy community looking for alternatives under permissive licenses. This concern, however, pertains not only to large software companies that maintain open-source softwares, since software license is a common good of any open-source software. Therefore, there is no surprise that software licensing is an active research field [1, 4, 16, 23]. + + +Despite of its importance, developers do not fully understand problems related to license usage [1], such as the lack of licenses or license inconsistencies. The way developers develop software only exacerbates this problem, since simple actions such as copying a code snippet from the web has the potential of infringing a software license [12, 13]. This issue becomes even more relevant in the open-source era, where a constant flow of new open-source software is born at a regular basis [10]. That is, developers have a myriad of codebases to refer to, but the way they do might infringe a software license (and consequently the whole chain of software that depends on it). + + +Another relevant yet not fully understood problem is the use of open-source licenses that have not been approved by OSI, the Open Source Initiative (see Section 2 for details). Such software licenses were not formally approved by an open-source regulator and, therefore, has not been vetted to be open-source. Currently, OSI maintains a list with 83 approved open-source software licenses2. All these licenses went through a rigorous review process, and not all licenses submitted are approved (e.g., the CC0 license3 has been submitted but was not approved). According to their website, the purpose of the OSI’s license review process is to “(1) Ensure approved licenses conform to the Open Source Definition (OSD), (2) Identify appropriate License Proliferation Category, (3) Discourage vanity and duplicative Licenses”4. Furthermore, because OSI defined what open source is (the Open Source Definition) it claims that “only software licensed under an OSI-approved Open Source license should be labeled ‘Open Source’ software.”5 + + +1https://code.facebook.com/posts/300798627056246 +2https://opensource.org/licenses/alphabetical +3https://opensource.org/faq#cc-zero +4https://opensource.org/approval +5https://opensource.org/faq +In this study, we investigate to what extent software licenses that do not provide open-source guarantees (or “non-approved licenses” for short) are used in open-source projects published on package managers. Package managers are particularly relevant to license usage due to at least two reasons: (1) they are growing up faster in terms of number of libraries available and packages published [3, 28], and (2) since packages obey a standardized architecture [22], installing and reusing a third-party package comes with no pain. Therefore, packages published in package managers might have a higher number of dependencies than those that do not rely on a package manager. As we shall see in Section 4, on average, a package at NPM has 4.80 dependencies (3rd Quartile: 5, Max: 792). + + +In this paper we study three well-known package managers: NPM (Node Package Manager), RubyGems and CRAN (The Comprehensive R Archive Network). For each one of these package managers, we downloaded and investigated all packages available on them. After this process, we ended up with a comprehensive list of 657,811 software packages scattered through the three well-known, long lived package managers. Specifically, we investigated 510,964 NPM packages, 11,366 CRAN packages, and 135,481 RubyGems packages. Still, in order to provide an evolutionary perspective of the license usage on these packages, we studied 4,367,440 different packages versions (3,539,494 on NPM and 816,580 on RubyGems, and 11,366 on CRAN. We manually analyzed each license employed in each one of these package versions. + + +This paper makes the following contributions: + + + + +We conducted the largest study on licenses usage and evolution targeting ∼660k packages (and their 4.3 million versions) published in three well-known package managers (NPM, RubyGems and CRAN). + + +We studied the impact of the use of non-approved licenses comprehending the whole dependency chain. + + +We deployed a survey with 76 package publishers (package developers, owners, or authors) to understand how and why do they use non-approved licenses. +---------------------------------------- +------------------------------- +Section 229: +2 BACKGROUND ON OPEN-SOURCE LICENSES + + +The Open Source Definition [17], published by OSI defines 10 properties that a software license must satisfy to be called Open Source. OSI has also established an approval process, through which a license will be approved as Open Source. As of today, only 83 licenses have been approved (although many other have been submitted). Other organizations also approve licenses to be open source, such as the Free Software Foundation (FSF), and the Debian Foundation (these two call them Free Software Licenses—with one exception, the NASA Open Source Agreement 1.3, all OSI approved licenses are considered free software by the FSF7). + + +In the scope of this paper, we consider licenses approved by OSI only. This decision was motivated by the fact that differently than FSF, which can both develop and approve licenses, OSI does not develop — only approves — licenses. Since a license can be submitted by anyone interested in contributing to open-source, the community participation, a crucial aspect of modern open-source software [2, 18], is much more strong at the OSI side. + + +To better understand the approval process and the implications of not using an OSI approved license, we conducted a semi-structured interview with an OSI’s board member. According to him, anybody can submit a license for OSI approval. During the certification process, everyone is invited to participate in the review and discussion about the license. The goal of the certification process is to make sure that the submitted license meets all criteria stated at the Open-Source Definition. If the licenses satisfies the requirements set by the Open-Source definition, the license is approved. + + +One of the main benefits of using an OSI approved license is the guarantee that OSI—and the open source community at large—has vetted the license and that the license is widely known. Therefore, the community can understand, trust, and use a license. Otherwise, if there was no OSI, everyone could develop a new license and claim that it was open-source; this would require that those using the software hire lawyers to understand such license. + + +This means that, even if some license is very popular in other domains, such as the Create Commons Zero (CC0) license, software released under CC0 is not open-source software. According to the board member, more importantly, this threat applies recursively: “if project ‘A’ (which uses an OSI approved license) depends on project ‘B’ (which does not use an OSI approved license), this would be as dangerous as project ‘A’ not using an OSI approved license”. Nevertheless, if one is interested in publishing software assets only (such as data or images), such open-source data can be safely released under CC0 (the requirements of the OSD does not apply for assets). A similar issue occurs when one does not state any license. In this case, the original author has not granted any rights to the recipient of the software. That is, without a permission from the original author, no one can use, redistribute, or create derivative works. Which is the clearly the opposite of the open-source concepts. +---------------------------------------- +------------------------------- +Section 230: +3 METHOD + + +In this section we present our research questions and method, the data gathered and, our ground definitions. + + +3.1 Research Questions + + +The main goal of this study is to gain an in-depth understanding of non-approved open-source licenses. We designed the following three research questions to guide our research: + + + + +RQ1 +: How common are non-approved licenses on software packages? + + +RQ2 +: What is the impact of non-approved licenses on the package managers ecosystem? + + +RQ3 +: Why developers adopt non-approved licenses? + + + + +To answer these questions, we conducted a two-phase research, adopting a sequential mixed-method approach. First, we collected data about license usage and evolution on a corpus of ∼660k software packages (Section 3.2). After that, we performed a survey targeting 76 package publishers (Section 3.3). +3.2 First study: mining license usage + + +3.2.1 Package and Package Managers. In our first study, we mined license information of software packages hosted in three well-known, long-lived package managers: NPM, RubyGems, and CRAN. The package managers studied have the following characteristics: + + + + + + +NPM + manages and indexes Node.js packages. Node.js is a JavaScript runtime environment. The NPM package manager was launched in 2009 and, as of October 2017, it contains over 521K packages. Although it offers support for maintaining packages in-site (it has a version control system), most of the packages available on it are maintained elsewhere (e.g., GitHub). To submit a package to NPM, a user must create an account and push the package using the NPM software utility. + + + + + + +RubyGems + manages and indexes Ruby packages. RubyGems was launched in 2009 and, as of October 2017, it contains over 192K packages. It also offers support for maintaining packages in-site, but most of the packages published are maintained elsewhere (e.g., GitHub). RubyGems distributes binaries (i.e., a gem file) through its web interface. Anyone interested in submitting a package to RubyGems must create an account and push the package using the gem software utility. + + + + + + +CRAN + manages and indexes R packages. Differently from NPM and RubyGems, CRAN distributes both the source and binary code of packages published on it. CRAN was launched in 1998 and, as of October 2017, it contains over 11K packages. One interested in submitting a package to CRAN needs to create an account and submit the package through CRAN web interface. + + + + + + +These package managers are the host of several well-known and non-trivial software packages, including React on NPM, Rails on RubyGems, and ggplot2 on CRAN. Packages in these package managers are downloaded millions of times per month. For instance, only on September 2017, on NPM, the packages BlueBird(^\text{11}), React(^\text{12}), and Lodash(^\text{13}) were, in total, downloaded more than 69 million times (18 mi, 6 mi, and 45 mi, respectively). Package managers also make available package releases (i.e., a new version). Table 1 presents the distribution of versions per package. As we can see, the 56% of packages published at NPM have up to three version (58% on RubyGems, and 75% on CRAN). Packages with 10 or more versions are also common (17% on NPM, 16% on RubyGems, but 0.5% on CRAN). Generally speaking, CRAN has less package versions than NPM and RubyGems. + + +3.2.2 Data Collection. We created an infrastructure to download, extract data, and match dependencies between package versions. Our infrastructure downloaded metadata for all packages available on the three package managers. Both NPM and RubyGems provide an API to collect relevant data(^\text{14}). Our infrastructure gathers CRAN metadata navigating through its public HTML files. For CRAN and NPM, we collected our data on September the 7th, 2017. We collected RubyGems metadata on September the 15th, 2017. Table 2 depicts the metadata download in each package version for each package manager. + + +| # of Versions | CRAN | NPM | RubyGems | +|---------------|------|-----|----------| +| 1 | 8,848| 150,546| 42,668 | +| 2 | 1,942| 80,243| 22,720 | +| 3 | 360 | 55,028| 15,089 | +| 4 | 140 | 39,890| 10,743 | +| 5 | 67 | 30,192| 7,688 | +| 6 | 38 | 22,886| 5,814 | +| 7 | 30 | 18,190| 4,549 | +| 8 | 12 | 15,105| 3,550 | +| 9 | 17 | 12,000| 2,870 | +| ≥10 | 67 | 86,884| 19,790 | + + +After downloading the metadata, our infrastructure validated whether a (downloaded) package (X) depends on (also downloaded) a package (Y). We validated dependencies using the version number stated in package (X) and the version number defined in package (Y). The three package managers use the notion of delimiters to express a range of possible versions that are compatible with a given package. Example of delimiters include the characters “(>)”, “(<)”, “(^\sim)”, and “(^\wedge)”. For example, a package (X) that depends on the ‘react’ package can declare a dependency as “react@(^\sim)15.0.0”, which indicates that package (X) depends on any version compatible with react@15.0.0. In addition, in the NPM and in the RubyGems, package publishers could use the “(x)” character to specify a small range of versions (e.g., 1.1.x or 1.x). To match dependencies, we selected the first version available that matched the pattern. As an example, NPM package ‘gulp’, version ‘2.6.0’ (gulp@2.6.0 for short) depends on package event-stream@3.0.x. As a result, our infrastructure successfully matched package gulp@2.6.0 to event-stream@3.0.0 dependency. This match procedure is important for the impact analysis (RQ2). + + +We downloaded data using three Google Cloud Platform VMs. We used one dual-core VM with 7.5Gb of main memory and 20Gb of SSD, and two single-core VMs with 3.5Gb of main memory and 10Gb of hard disk. After downloading, our dataset occupied 1.2Gb of disk space (1.1Gb of NPM data, 4.6Mb of CRAN data, and 182Mb of RubyGems data). +of RubyGems data). The infrastructure used as well as the data collected can be found at the companion website(^\text{15}). + + +Table 3 shows the distribution of number of licenses per package version. + + +| # of Licenses | CRAN | NPM | RubyGems | +|---------------|----------|----------|----------| +| 0 | 0 | 369,914 | 394,582 | +| 1 | 5,346 | 3,158,391| 419,095 | +| 2 | 5,881 | 10,287 | 2,411 | +| 3 | 130 | 669 | 355 | +| 4 | 6 | 222 | 29 | +| 5 | 1 | 11 | 61 | +| 6 | 2 | 0 | 46 | +| 10 | 0 | 0 | 1 | + + +As we can see, the majority of packages have a single license. Interestingly, no package with no license could be found at CRAN. This happens because CRAN does not publish packages without the selection of a license(^\text{16}). Still, package versions with two or more licenses are common. For instance, the package +sixarm_ruby_unaccent@1.1.2 +, published at RubyGems, was released with 10 licenses (they are: apache-2.0, artistic-2.0, bsd-3-clause, cc-by-nc-sa-4.0, agpl-3.0, gpl-3.0, lgpl-3.0, mit, mpl-2.0, and ruby). + + +Table 4 presents the number of dependencies per version package. Approximately 29% of NPM package versions have no dependencies (39% for CRAN and 30% for RubyGems, respectively). + + +| # of Dependencies | CRAN | NPM | RubyGems | +|-------------------|----------|----------|----------| +| 0 | 6,435 | 1,047,089| 258,810 | +| 1 | 1,782 | 537,283 | 194,312 | +| 2 | 1,701 | 412,121 | 143,616 | +| 3 | 1,517 | 322,234 | 84,679 | +| 4 | 1,183 | 241,449 | 51,338 | +| 5 | 978 | 180,349 | 31,424 | +| 6 | 733 | 139,429 | 22,698 | +| 7 | 521 | 111,070 | 13,720 | +| 8 | 436 | 85,631 | 11,302 | +| 9 | 323 | 69,024 | 8,699 | +| ≥10 | 1,060 | 472,466 | 32,879 | + + +Although the average number of dependencies per package version is 3.8, outliers were found. For instance, the CRAN package +seurat@2.0.1 + has 41 dependencies, the RubyGems package +aws-sdk-resources@3.1.0 + has 105 dependencies, and the NPM package +primeng-custom@4.0.0-beta.1 + has 500 dependencies. + + +3.2.3 License Groups. As aforementioned, we downloaded metadata for 657,811 software packages (510,964 NPM packages, 11,366 CRAN packages, and 135,481 RubyGems packages), spanning 4,367,440 versions (3,539,494 on NPM and 816,580 on RubyGems, and 11,366 on CRAN). When analyzing the licenses with which each version was released, we found that some of them included typos or wrong names. This happened because NPM and RubyGems allow one to fill the license field with any information. We then manually normalized each license found. + + +The normalization process was conducted in pairs, followed by conflict resolution meetings. For each license, two authors checked if it (1) was approved by OSI, (2) was not approved but was defined somewhere else, i.e., in the Software Package Data Exchange(^\text{17}), (3) was not approved neither not defined anywhere else. Licenses not found at OSI list neither at SPDX were allocated in the Other category. To check whether the license was already defined, we searched for its specification on blog posts, Q&A websites, and mailing lists. If the formal specification of a license was not found, the license was included on the non-approved license group. After this process, we ended up with six license groups, namely: + + + + +OSI licenses +: Any licenses approved by OSI. For this case, we also fixed small issues, such as trivial typos. As an example, we successfully normalized the "apache 2" license to its correct form, "apache-2.0". + + +Incomplete licenses +: Any probably approved license, although we could not fix some issues. For instance, package publishers often omit the version number, e.g., "bsd" or "lgpl", so we could not be sure about which license version was used. + + +SPDX (but not OSI) licenses +: These are the licenses listed in the SPDX License List(^\text{18}) that were not formally approved by OSI. This group include popular and defined licenses, such as, the "Do What the Fuck You Want to Public License" (WTFPL) or the "Creative Commons Zero" (CC0) license. + + +Missing or Absence of a license +: We aggregated in this group package versions without any license at all (i.e., when package publishers left empty the license field), or developers filled explicit with the NONE word the license field. This is a sub-category of copyright licenses because, as discussed in Section 2, when no license is declared, the original authors retains all rights. + + +Other +: Any licenses with undefined typos, wrong names, or even curses. Examples include: the "d" license and the "Not specified" license. Additionally, we included in this group licenses that the packager publisher put an external link in the license information. We did not inspect each file individually and such data was not included in any of the analysis we conducted because they represent less than 0.5%. + + +Copyright licenses +: This occurs when package publishers explicitly mention that they retain the copyright. Examples include the "my own" license, the "(c) Copyright" license, or the "all rights reserved" license. + + + + +At the end of this normalization process, we ended up with 973 distinct licenses (758 at NPM, 46 at CRAN, and 336 at RubyGems). + + +(^{15})https://github.com/rmeloca/EcosystemsAnalysis + + +(^{16})https://cran.r-project.org/web/packages/policies.html + + +(^{17})https://spdx.org/ + + +(^{18})https://spdx.org/licenses/ +Non-approved licenses comprehend all licenses but not OSI licenses and Incomplete licenses. +---------------------------------------- +------------------------------- +Section 231: +3.3 Second study: a survey with package publishers + + +In our second study, we deployed a survey with package publishers of the NPM package manager. We focused on this package manager because (1) the email addresses of the package publishers could be recovered and (2) packages in this package manager exhibits the greatest number of dependencies, which are more likely to affect/be affected, if a license inconsistency is found. We used the following criteria to identify our population: we selected the package publishers of packages versions released under a non-approved license with at least one dependency. This ensures that the irregularity propagates to other packages. After apply the criteria, we obtained 385 package publishers from different project. + + +Our survey was based on the recommendation Smith et al. [21], employing principles for increasing survey participation, such as sending personalized invitations, allowing participants to remain completely anonymous and, asking closed and direct questions as much as possible. Our survey had 14 questions (three of which were open), grouped in three broad interests: demographics (e.g., what is your gender? and what is your profession?), understanding non-approved adoption (e.g., why did you choose? and are you aware of the implications?), and usage frequency (e.g., how often do you use non-approved licenses? and how often do you do not declare a license?). The open questions were analyzed in pairs, followed by conflict resolution meetings. Participation was voluntary and the estimated time to complete each survey was 5-10 minutes. When sending our invitation email, 8 messages were not delivered due to technical reasons. We received 76 responses, representing 20% of response rate. The survey is available at: https://goo.gl/Jiuwzp. +---------------------------------------- +------------------------------- +Section 232: +4 RESULTS + + +In this section, we report the results of our study grouped by each research question. +---------------------------------------- +------------------------------- +Section 233: +4.1 RQ1. How common are non-approved licenses on software packages? + + +After the normalization process, we found a total of 973 distinct licenses. These licenses were declared a total of 4,369,024 times. The number of license declarations is higher than the number of its package versions given that one package often employs more than one license (as showed at Table 3). Table 5 shows the distribution of each license group. + + +As we can see, non-approved licenses (all licenses defined at Section 3.2.3 except OSI licenses and Incomplete licenses) were used 858,311 times, which corresponds to roughly 20% of the overall license usage. Most of them, nevertheless, are related to the absence of a license. We found 764,496 package versions without any license declaration (which is accounts for 89% of the non-approved license usage). In particular, on RubyGems, missing licenses correspond to 48% of the total license used (10.41% on NPM). + + +We also studied license usage through an evolutionary perspective. In order to provide a general overview, Table 6 groups evolution patterns of license changes. We pairwise analyzed all versions available in order to verify how many times a license changed from one group to another. The results show that package versions, regardless of the package manager, tend to propagate their license used over their versions. Therefore, the main diagonal always have the higher values. For instance, at NPM we found that 311,455 package versions without any license associated still had this non-approved license in the next version. +---------------------------------------- +------------------------------- +Section 234: +Table 5: License Groups on Package Versions + + +| Group | CRAN | NPM | RubyGems | TOTAL | +|---------------------|------|-------|----------|--------| +| OSI | 15,724 | 3,009,782 | 403,693 | 3,429,199 | +| INCOMPLETE | 34 | 73,647 | 7,833 | 81,514 | +| SPDX but not OSI | 162 | 30,688 | 6,215 | 37,065 | +| MISSING | 8 | 400,618 | 396,178 | 796,804 | +| OTHER | 220 | 10,978 | 4,953 | 16,151 | +| COPYRIGHT | 0 | 7,106 | 1,185 | 8,291 | +---------------------------------------- +------------------------------- +Section 235: +Table 6: Patterns of license evolution + + +| NPM From\To | OSI | INC | SPDX | MISS | OTH | COP | +|-------------|-----|-----|------|------|-----|-----| +| OSI | 2,576,692 | 3,012 | 2,060 | 2,125 | 423 | 116 | +| INC | 4,573 | 61,535 | 44 | 144 | 363 | 205 | +| SPDX | 2,153 | 26 | 25,489 | 182 | 78 | 56 | +| MISS | 8,911 | 321 | 256 | 337,711 | 87 | 23 | +| OTH | 502 | 345 | 99 | 51 | 9,231 | 241 | +| COP | 200 | 212 | 58 | 19 | 267 | 6,424 | + + +| RubyGems From\To | OSI | INC | SPDX | MISS | OTH | COP | +|------------------|-----|-----|------|------|-----|-----| +| OSI | 336,639 | 505 | 574 | 380 | 553 | 37 | +| INC | 854 | 6,575 | 99 | 5 | 116 | 0 | +| SPDX | 618 | 82 | 5,095 | 51 | 270 | 1 | +| MISS | 8,112 | 329 | 279 | 324,153 | 185 | 10 | +| OTH | 808 | 119 | 272 | 9 | 4,197 | 14 | +| COP | 50 | 1 | 1 | 5 | 15 | 1,029 | + + +Since the changes from approved to non-approved are the most relevant ones to our study, we counted how many times a package version changed from an OSI-approved license to a non-approved license, and vice-versa. We identified these changes in 12,491 packages at RubyGems and 24,075 packages at NPM. Among these package, on RubyGems, 10,442 package versions changed from a non-approved to an approved license. In this case, the publishers corrected their wrong license as presented in Table 8. + + +Interestingly, the number of changes from an approved to a non-approved license was much lesser. On RubyGems, we found only 2,049 package versions that changed from an approved license to a non-approved license. A similar behavior occurred at NPM. The number of changes from a non-approved license is much greater than the opposite: (16,339 package versions changed from a non-approved license to an approved one, whereas 7,736 package +versions changed from an approved to a non-approved one). As an example, when upgrading from zorg@0.0.1 to zorg@0.0.10, the NPM package changed from the know "ISC" license to no license at all. We did not performed this analysis at CRAN because it does not provide such information. + + +To provide a more fine-grained perspective about the evolution patterns, we analyzed the top 10 most common changes from an approved license to a non-approved license, and vice-versa. Table 7 presents the evolution patterns, focusing on changes from an approved to a non-approved license. The majority of changes observed were when changing from MIT license to no license at all (1,286 instances found on NPM, and 248 on RubyGems). The effects of a missing license are exactly the opposite a developer might think: it applies the copyright instead of opening the source code. Therefore, the migration from a missing license to the MIT license can be explained as a correction of this effect, specially due to the permissive characteristics of such license. This evidence is supported by Almeida [1] and by ours findings that developers might not fully understand the licensing process of a software. + + +Table 7: The 10 Most Common License Evolution Patterns: From Approved to Non-Approved + + +| NPM | RubyGems | +|-----|----------| +| Evolution Patterns | # | Evolution Patterns | # | +| mit → missing | 1,286 | mit → missing | 248 | +| isc → missing | 604 | apache-2.0 → missing | 85 | +| apache-2.0 → missing | 116 | bsd-3-clause → missing | 33 | +| bsd-2-clause → missing | 37 | lgpl-2.0 → missing | 4 | +| gpl-3.0 → missing | 20 | gpl-3.0 → missing | 4 | +| bsd-3-clause → missing | 19 | bsd-2-clause → missing | 2 | +| gpl-2.0 → missing | 12 | gpl-2.0 → missing | 2 | +| lgpl-3.0 → missing | 9 | lgpl-3.0 → missing | 1 | +| fair → missing | 9 | ms-pl → missing | 1 | +| mpl-2.0 → missing | 7 | — | — | + + +RQ1 Summary. We found 1,058,554 packages versions (24.23%) released under non-approved licenses. Packages published on RubyGems are the most affected ones (55% of them employed a non-approved license). The missing (lack of a license) license is widespread. When license change occurs, most of the package versions keep the same license, although changes from a non-approved to an approved license, and vice-versa, are common. + + +Table 8: The 10 Most Common License Evolution Patterns: From Non-Approved to Approved + + +| NPM | RubyGems | +|-----|----------| +| Evolution Patterns | # | Evolution Patterns | # | +| missing → mit | 6,667 | missing → mit | 6,556 | +| missing → isc | 831 | missing → apache-2.0 | 614 | +| missing → apache-2.0 | 633 | missing → gpl-3.0 | 239 | +| missing → bsd-3-clause | 262 | missing → gpl-2.0 | 153 | +| missing → gpl-3.0 | 137 | missing → bsd-3-clause | 133 | +| missing → bsd-2-clause | 91 | missing → lgpl-3.0 | 86 | +| missing → gpl-2.0 | 85 | missing → bsd-2-clause | 81 | +| missing → lgpl-3.0 | 61 | missing → artistic-2.0 | 73 | +| missing → mpl-2.0 | 49 | missing → agpl-3.0 | 33 | +| missing → agpl-3.0 | 35 | missing → lgpl-2.1 | 31 | + + +4.2 RQ2. What is the impact of non-approved licenses on the package managers ecosystem? + + +To understand the impact of a non-approved license, we calculated two types of metrics (irregular and affected) in three different granularities (graph order). + + + + + + +Irregular. + A package is called irregular if at least one of its versions has a direct dependency to a package released under a non-approved license. If a package is irregular it means that it can affect other packages that depends on it. + + + + + + +Affected. + A package is affected if at least one of its versions has direct or indirect dependency to a package that is irregular. Direct dependency is when one package father (affected) depends on its child (irregular). Indirect dependency is when there are more than one level between affect and irregular packages. + + + + + + +With these metrics, we analyzed the whole dependency graph of all package versions. Table 9 shows the impact of non-approved licenses in terms of packages, versions, and dependencies. In terms of packages, although NPM have more irregular and affected packages, RubyGems presents a higher proportion of irregular (46% vs 18%) and affected (55% vs 38%) packages than NPM, which suggests that almost half of all package versions on RubyGems are irregular. The low number of packages, versions, and dependencies affected at CRAN is because CRAN prevents the absence of licenses by requiring package publishers to choose at least one from their license selection. Again, when we projected the impact including the indirect dependencies of each package version, the impact in NPM is higher than RubyGems, because NPM packages have more versions. + + +To provide a more detailed example, Figure 1 shows a fragment of a dependency graph of the package request@0.8.1.0. This particular package has 23,205 direct dependencies to it (6,840 are irregular) and 42,938 indirect dependencies to it (parents). Moreover, we omitted from Figure 1 the regular direct dependencies. In the figure, solid lines edges are regular dependencies and dotted lines edges are irregular dependencies. Double border lines vertexes are regular package versions whereas single solid border ones are irregular. +Table 9: Impact caused by non-approved licenses in each package manager + + +| Graph Order | Metric | CRAN | NPM | RubyGems | +|-------------|--------|------|-----|----------| +| Packages | # | 11,366 | 510,964 | 135,481 | +| | Irregular | 1082 | 78,224 | 62,967 | +| | Proportion | 0.095 | 0.153 | 0.464 | +| | Affected | 1455 | 194,741 | 75,475 | +| | Proportion | 0.128 | 0.381 | 0.557 | +| Versions | # | 11,366 | 3,539,494 | 816,580 | +| | Irregular | 35 | 690,703 | 440,443 | +| | Proportion | 0.003 | 0.195 | 0.539 | +| | Affected | 36 | 1,619,248 | 520,967 | +| | Proportion | 0.003 | 0.457 | 0.637 | +| Dependencies | # | 1,086 | 15,521,508 | 1,765,288 | +| | Irregular | 59 | 1,364,281 | 1,088,298 | +| | Proportion | 0.054 | 0.087 | 0.616 | + + +Dotted border vertexes represents affected packages. Notice that a package might be irregular and affect at the same time. + + +We also observed that in this fragment of the graph, three packages have a non-approved missing license associated with: "assert-plus", "verror", and "extsprintf". It is worth to mention that package "assert-plus" and "extsprintf" are considered regular packages because they do not have a dependency to any package version released under a non-approved license. + + +Figure 1: Example of a affected package version dependency tree + + +Another example occurs on RubyGems package manager: the package activesupport, actually on version 4.2.6, was downloaded 174,538,434 times on its entire life cycle, but in the version 4.0.0, released on 2013 (25th June), this package was depending to the unlicensed packages minitest@4.2.0, multi_json@1.3.3, thread_safe@0.1.0 and tzinfo@0.3.37 (activesupport also was depending to the MIT-licensed package i18n@0.6.4). This particular version was downloaded 3,107,216 times and was used by 1,093 another published packages directly and by 16,526 packages taking into account both direct and indirect dependencies. The package activesupport is a toolkit extracted from the Rails framework’s core. + + +To provide an extra perspective of the impact of non-approved licenses, we compared the number of irregular and affected values with incomplete licenses. We chose incomplete licenses because they can be interpreted as wrong licenses, since they do not have a correct name or version license. + + +Table 10 presents the most common incomplete licenses per package manager. Among the most incomplete licenses, we observed that package publishers are using a number of licenses omitting its version. + + +Table 10: Top 10 Incomplete Licenses + + +| CRAN | NPM | RubyGems | +|------|-----|----------| +| License # | License # | License # | +| agpl 12 | bsd 59,132 | bsd 4,280 | +| bsd 11 | gpl 7,904 | gpl 1,783 | +| cecill 6 | lgpl 2,747 | lgpl 1,067 | +| mpl 2 | epl 1,173 | agpl 304 | +| epl 2 | mpl 854 | artistic 166 | +| bsl 1 | agpl 832 | epl 71 | +| —— | free 218 | mpl 50 | +| —— | ibm 216 | free 36 | +| —— | apl 194 | osl 26 | +| —— | cecill 179 | afl 16 | + + +In this sense, Table 11 presents the impact of Incomplete licenses. It is worth to mention that even if we consider the incomplete licenses as inconsistent licenses, non-approved licenses (9) presented a higher impact than Incomplete licenses, for instance, the number of irregular packages caused by non-approved licenses are 62,154 against 63,329 irregular packages caused by Incomplete licenses on RubyGems (the ratio of the difference 813/362 is almost 2.5 times higher). If we compare the affected versions on RubyGems, the impact of non-approved licenses are almost 69 times higher than the Incomplete Licenses. In a general way, we also found that NPM is more affected by Incomplete licenses than RubyGems. + + +Finally, CRAN packages were highly impacted by Incomplete licenses, which is mostly due to the lack of a license version. This behavior turns ∼11% of CRAN packages irregulars, which affects almost 15% of the published packages. + + +We recognize that non-approved licenses are dangerous to both package authors (publishers on package managers) and users – that create but not explicit publish a package with direct dependencies to published packages – because of the uncertainty whether the dependencies of the desired-to-publish package are regular or not. In fact, package publishers should look at the whole dependency chain. However, a few factors might imply in the presence of such irregularities in package managers, such as the height of the package dependency tree, and the presence of newcomers at +Table 11: Impact caused by Incomplete licenses in each package manager + + +| Graph Order | Metric | CRAN | NPM | RubyGems | +|-------------|--------|------|-----|----------| +| Packages | # | 11,366 | 510,964 | 135,481 | +| | Irregular | 1,256 | 94,515 | 63,329 | +| | Proportion | 0.110 | 0.184 | 0.467 | +| | Affected | 1,480 | 197,626 | 75,455 | +| | Proportion | 0.130 | 0.386 | 0.556 | +| Versions | # | 11,366 | 3,539,494 | 816,580 | +| | Irregular | 38 | 825,520 | 443,072 | +| | Proportion | 0.003 | 0.233 | 0.542 | +| | Affected | 38 | 1,639,430 | 520,836 | +| | Proportion | 0.003 | 0.463 | 0.637 | +| Dependencies | # | 1,086 | 15,521,508 | 1,765,288 | +| | Irregular | 62 | 1,759,643 | 1,098,489 | +| | Proportion | 0.057 | 0.113 | 0.622 | + + +the open source community, who might not be completely aware about license constraints. + + +RQ2 Summary: + Non-approved licenses impact packages from NPM and RubyGems, making packages irregular and affecting both its direct and indirect dependencies. Non-approved licenses can be considered more harmful than incomplete licenses since their impact is higher when compared to the amount of irregular and affected packages and versions by each License group. +---------------------------------------- +------------------------------- +Section 236: +4.3 RQ3. Why developers adopt non-approved licenses? + + +To answer this question, we report the results of our survey with 76 package publishers. Our target population is 94% male and 96% work for the software development industry. About 53% of them have created or contribute to up to 30 open-source projects (18% of them have created or contribute to more than 100 open-source projects). Still, 48% of the respondents believe that about 20% of these created/contributed open-source projects use a non-approved license. More interestingly, however, is the fact that 27% of the respondents have no idea about how many projects they contribute use a non-approved license. Similarly, in Section 4.1, we showed evidence that about 18% of the package versions studied use a non-approved license. + + +When we asked why do they use a non-approved license, we found that 26 of the respondents do not care about the specific license terms. Along this line, one respondent mentioned that "I chose WTFPL license because I really don’t care about who and how use my modules. I share my code with people and it’s a pleasure for me to just know if someone finds it useful. Maybe if I wrote something really great like Facebook’s React I would think about fame". Also, 17 respondents acknowledged that using a non-approved license was a naive decision: "I thought I was appropriate". Still, small projects seem to be more prone to be licensed under a non-approved license. Yet, 5 respondents are aware that a non-approved license makes sense when licensing non-software projects, for instance, "Because it fits the content of the repository best (it is not a source code repository, but contains only data)". Finally, some developers adopt non-approved licenses because they claim they are simpler (6 occurrences) or more open (4 occurrences), for instance, one respondent said that she likes "the idea of WTFPL. Makes everything pretty clear. You just do what you want." + + +Right afterwards, we asked whether they are aware of the implications of using a non-approved license; 43% of the respondents mentioned a lack of awareness. For those who mentioned to be aware of the implications, we asked them to cite one example of an implication. Among the answers we found that developers believe that a non-approved license might limit the adoption of their software (12 occurrences). As an example, one respondent said that "If you use a license others have never heard of, others are less likely to contribute and/or may be wary of using you software." Code thefts was also a recurring implication, mentioned by 7 respondents. Finally, one respondent raised the fact that the main implication of using a non-approved license is that "it can’t be automatically recognized by machines to categorize software under any license which may exclude the software from search results". This is particularly interesting, since Github helps the project owners to choose a correct license for their repositories. However, the Github help documentation also highlight to developers that they are responsible to define the correct license as we can see on this paragraph: "GitHub provides the information on an as-is basis and makes no warranties regarding any information or licenses provided on or through it, and disclaims liability for damages resulting from using the license information." + + +In the next five following questions, we asked how often do they (Q9) investigate if the license that you chose conforms with the license that your project depends, (Q10) do not declare a license, (Q11) use a non-approved license, (Q12) use a copyright license in one open-source software, and (Q13) use more than one license (either approved or not)? Figure 2 shows the results. + + + + +This figure shows a couple of interesting information. First, we can see 46% of respondents "Never" or "Rarely" take into account +the license used in the software’s dependencies. We believe this is an important result because, as we discussed in Section 2, licenses inconsistencies directly impact any project that depends upon. With similar implications, 11% of the respondents “Always” or “Very Often” do not declare a license. One respondent even mentioned that she “Frequently forget to declare any license and it seems unimportant.” Similarly, 25% of the respondents “Always” or “Very Often” use a non-approved license. Finally, 94% mentioned that they “Never” or “Rarely” use more than one license (either approved or not). One respondent mentioned that the reasons of why she uses more than one license is related to the fork-based model: “TypoPRO is a collection(!) of fonts and each font already has its distinct Open Source software license from their upstream vendor. So, TypoPRO stays under (this union) set of upstream licenses.” + + +RQ3 Summary. 26 respondents do not care about the license used. Some respondents believe that non-approved licenses are more open and simpler to use. Among the implications, 12 respondents believe that non-approved licenses can limit the adoption of their software. 46% of the respondents do not take license into account when choosing a package dependency. +---------------------------------------- +------------------------------- +Section 237: +5 IMPLICATIONS + + +This research has implications for different kinds of stakeholders. Three of these possible groups are discussed below. + + +Package managers. Since we observed that both NPM and RubyGems do not require developers to inform a license, many packages published on these packages managers either (1) do not use any license or (2) state a wrong or incomplete license name (RQ1). This problem not only hinders researchers from conducting in-depth studies on license usage, but also have the potential of confusing software developers interested in using the software package. Package managers, therefore, might introduce mechanisms to prevent the introduction of wrong (or even non-existing) license names. + + +Researchers. Although software licensing is an established research topic, our notion of non-approved licenses was not yet fully explored (RQ1) and its implications were unclear (RQ2). Researchers can expand our comprehension of non-approved licenses in many ways. First, researchers could introduce mechanisms to automatically detect the use of non-approved licenses. Still, since packages tend to propagate their licenses over the releases (RQ1), researchers can create techniques to avoid non-approved license propagation. + + +CS Professors. Educators can also benefit from the findings of this study. Since software license is a common misunderstood topic among software developers [1], software engineering professors could bring problems related to license usage to the classroom, and invite students to discuss possible solutions or compare to the perception of professional software developers (RQ3). Similarly, in order to make software licenses more appealing to aspiring software engineers, professors can use our license inconsistency graph (RQ2) in advanced data-structure classes, and invite students to understand license inconsistencies in complex and deeper graphs. +---------------------------------------- +------------------------------- +Section 238: +6 THREATS TO VALIDITY + + +In a study of such proportion, there are always many limitations and threats to validity. First, we could not retrieve data from 2,140 packages (1,079 NPM packages, 1,052 RubyGems packages, and 9 CRAN packages). This happened because such packages metadata could not be located. However, these packages represent only 0.04% of the whole universe of packages from our study. + + +Second, the normalization process was manual and, therefore, error-prone. We mitigated this threat using pair-review work. Each author independently analyzed the same set of licenses, with subsequent conflict resolution meetings. Both the original and normalized license sets are available for future analysis. We choose not to analyze the external FILE licenses because most of these package versions are hosted on GitHub and would require manual search for the license file into the repositories. At CRAN 1,391 package versions have a file license declared; on NPM, 19,010; and, RubyGems have more than 20,000 package version using the FILE license. + + +Third, one might argue that our packages studied might be full of simple, trivial software projects. However, packages available on package managers are often more mature, when compared to software projects hosted on other coding websites such as Github, which are often personal projects and class projects [9]. + + +Fourth, we rely on the licenses approved by OSI. Even if a license is commonplace — for instance, we found 4,927 package versions using the creative commons zero (CC0) license (104 at CRAN, 3,022 at NPM and 1,801 at RubyGems) — we still consider such licenses as non-approved. Although we are aware that many other institutions such as the Free Software Foundation (FSF) and the Debian Foundation approve licenses, we decided to stick to OSI approval because: (1) licenses can be submitted by anyone interested in to get an OSI approve, and (2) licenses approved by OSI are commonly used — as we shown in Table 5, there are only few licenses found in our dataset that were not recognized by OSI. + + +Finally, we did not double checked whether the license informed at the package manager was, indeed, the same declared at the official package website. We chose not to validate the license used due to two reasons: first, the package publisher (which is often a core member of the project) is in charge of declaring the license used in a given published version. That is, no one other than the package publisher would be more confident to state the correct license used; second, because we manually studied hundreds of thousands of software packages. These software packages are often hosted in a third-party coding website (e.g., GitHub or BitBucket), which store license information using distinct ways, e.g.: Github shows the license name at the project’s first page (if the algorithm succeed at inferring the license, which is not always the case); BitBucket, on the other hand, does not explicitly demand any license when creating a repository. Additionally, if the project has the proper license file, it will display the license on the project’s cover page. This problem only exacerbates when considering license information per version release. Therefore, due to the lack of standards and our substantial sample size, performing such manual process would be prohibitive. +---------------------------------------- +------------------------------- +Section 239: +7 RELATED WORK + + +Recent studies investigated licenses inconsistencies, which is a similar to our concept of non-approved licenses. Since non-approved licenses also introduce inconsistencies, one can see non-approved +as a subset of license inconsistencies. However, we believe that the implication of non-approved licenses are greater than the known problems related to licenses inconsistencies. + + +To the best of our knowledge, our work is the first to analyze the usage and adoption of Non-Approved licenses. We also discussed the impact of Non-Approved licenses compared to incomplete licenses in the package manager context, which have attracted more attention from practitioners and researchers, since NPM, CRAN and RubyGems are growing faster and becoming increasingly popular. We summarize the related work in terms of licenses maintenance and evolution and licenses inconsistencies. + + +Di Penta et al. [4] proposed a method to track the evolution of software licensing and investigated its relevance on six open source projects. Most of the inconsistencies found were related to files without a license. Vendome et al. [24, 27] conducted a large empirical study investigating when and why developers adopt or change software licenses. Recently, Vendome et al. [26] performed another large-scale empirical study on the change history of over 51K FOSS systems to investigate the prevalence of known license exceptions, presenting a categorization and a Machine Learning-Based Detection algorithm to do identify license exceptions. Santos [20] analyzed a set of 756 projects from FLOSSmole repository of Sourceforge.net data that had changed their source code distribution allowances. The author found 88 projects with a “none” license – which might leave projects exposed and legally unattended – and, 55 times where projects changed their current state of having a license to one where they have no license. + + +German et al. [8] investigated how the licenses declared in packages are consistent with the source code files in the Fedora ecosystem. Manabe et al. [15] extended it by proposing a graph visualization to understand those relationships. They found that the GPL Licenses are more likely to include other licenses, while Apache Licenses tend to contain files only under the same license. The authors reported changes from a valid license to none and some cases where a non-valid license was changed to a valid license. + + +Wu et al. [30, 31] investigated license inconsistencies caused by re-distributors that removed or modified the license header in the source code. The authors described and categorized different types of license inconsistencies, proposing a method to detect them in the Debian ecosystem. The authors found that, on average, more than 24% of packages relationship have a “none” license between them, however this effect was not discussed. Wu et al. [29] also studied whether the issues of license inconsistencies are properly solved by analyzing two versions of Debian investigating the evolution patterns of license inconsistencies, which will disappear when the downstream projects get synchronized. + + +Lee et al. [14] compared machine-based algorithms to identify potential license violations and guide non-experts to manually inspect violations. The authors reported that the accuracy of crowds is comparable to that of experts and to the machine learning algorithm. Interesting to note that approximately 25% of files from 227 projects (79.4% of projects analyzed) did not have any license. + + +Almeida et al. [1] conducted a survey with 375 developers to understand whether they understand violations and assumptions from three popular open source licenses (GNU GPL3.0, GNU LGPL 3.0 and MPL 2.0) both alone and in combination. The authors confront the answers with expert’s opinion, and found that the answers were consistent in 62% of 42 cases. Although previous work in understanding software licenses pointed “None” as frequently choose for files and packagers, neither scenario involved this aspect. + + +Van der Burg et al. [23] proposed an approach to construct and analyze the Concrete Build Dependency Graph (CBDG) of a software system by tracing system calls at build-time. Through a case study of seven open source systems, the authors showed that the constructed CBDGs can accurately classify sources as included in or excluded from deliverables with 88%-100% precision and 98%-100% recall, and can uncover license compliance inconsistencies in real software systems. German and Di Penta [6] presented a method for open source license compliance of Java applications. The authors implemented a tool called Kenen, to mitigate any potential legal risk for developers that reuse open source components. Kapitsaki et al. [11] compared tools that are used to detect licenses of software components and avoid license violations, classifying them in three types: License information identification from source code and binaries, software metadata stored in code repositories, and license modeling and associated reasoning actions. +---------------------------------------- +------------------------------- +Section 240: +8 CONCLUSION + + +In this paper we conducted a large-scale study on non-approved licenses, in terms of usage, impact, and adoption. Non-approved licenses are any license not approved by OSI, the Open Source Initiative. Software released under a non-approved license cannot be claimed to be open-source (the original author retains all rights). Non-approved licenses include licenses with typos, wrong names, or even curses, or even missing licenses (e.g., when package publishers do not fill the license information). + + +When mining data from ~657k open-source projects, we observed that hundreds of non-approved licenses exist. About 24% of the packages released used at least one of these non-approved licenses. The majority of non-approved licenses found are, in fact, the absence of a license. Still, we found that package publishers tend to propagate the same license used though package versions. Non-approved licenses impact packages from NPM and RubyGems more than Incomplete licenses when we compared to the amount of irregular and affected packages and versions. Finally, when we asked packagers publishers about non-approved license, we found that 46% of the respondents do not take license into account when choosing a package dependency, some respondents believe that non-approved licenses are more open and simpler to use. On the other hand, 12 respondents believe that non-approved licenses may limit the adoption of their software. + + +For future work, we plan to investigate the evolution of non-approved licenses in a fine-grained way (e.g., through commits instead of version releases). This would deepen our understanding on why non-approved licenses are adopted. Still, since CRAN developers might have a more diverse background (e.g., biologists, mathematicians, among others), we plan to get in touch with them to understand their motivations behind the usage of non-approved licenses. + + +ACKNOWLEDGMENTS + + +This work is supported by Fundação Araucária; CNPq (406308/2016-0 and 430642/2016-4); PROPESP/UFPA; and FAPESP (2015/24527-3). +REFERENCES + + +[1] D. A. Almeida, G. C. Murphy, G. Wilson, and M. Hoye. 2017. Do Software Developers Understand Open Source Licenses?. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC) - 1–11. https://doi.org/10.1109/ICPC. + + +[2] Jailton Coelho and Marco Tulio Valente. 2017. Why Modern Open Source Projects Fail. In 25th International Symposium on the Foundations of Software Engineering (FSE) 186–186. + + +[3] Eirini Kalliamvakou and Tom Mens. 2017. An Empirical Comparison of Developer Retention in the RubyGems and Npm Software Ecosystems. Innov. Syst. Softw. Eng. 13, 2-3 (Sept. 2017), 101–115. https://doi.org/10.1007/s11334-017-0030-4 + + +[4] Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035–2071. https://doi.org/10.1007/s10664-015-9393-5 + + +[5] Karl Fogel. 2017. Producing Open Source Software: How to Run a Successful Free Software Project (second ed.). O'Reilly Media. http://www.producingoss.com/. + + +[6] D. German and M. Di Penta. 2012. A Method for Open Source License Compliance of Java Applications. IEEE Software 29, 3 (May 2012), 58–63. https://doi.org/10.1109/MS.2012.50 + + +[7] Daniel M. German and Jesús M. González-Baralona. 2009. An Empirical Study of the Reuse of Software Licensed under the GNU General Public License. Springer Berlin Heidelberg, Berlin, Heidelberg, 185–198. https://doi.org/10.1007/978-3-642-02032-2_17 + + +[8] D. M. German, M. Di Penta, and J. Davies. 2010. Understanding and Auditing the Licensing of Open Source Software Distributions. In 2010 IEEE 18th International Conference on Program Comprehension. 84–93. https://doi.org/10.1109/ICPC.2010.48 + + +[9] Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035–2071. https://doi.org/10.1007/s10664-015-9393-5 + + +[10] Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The Promises and Perils of Mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014) 92–101. + + +[11] Georgia M. Kapitsaki, Nikolaos D. Tselikas, and Ioannis E. Foukarakis. 2015. An insight into license tools for open source software systems. Journal of Systems and Software 102 (2015), 72 – 87. https://doi.org/10.1016/j.jss.2014.12.050 + + +[12] Cory Kapser and Michael W. Godfrey. 2008. “Cloning considered harmful” considered harmful. patterns of cloning in software. Empirical Software Engineering 13, 6 (2008), 645–692. + + +[13] Miryung Kim, L. Bergman, T. Lau, and D. Notkin. 2004. An ethnographic study of copy and paste programming practices in OOPL. In Empirical Software Engineering, 2004. ISESE ’04. Proceedings. 2004 International Symposium on. 83–92. + + +[14] Sanghoon Lee, Daniel M. German, Seung-won Hwang, and Sunghun Kim. 2015. Crowdsourcing Identification of License Violations. Journal of Computing Science and Engineering 9, 4 (2015), 190–203. + + +[15] Yuki Manabe, Daniel M. German, and Katsuro Inoue. 2014. Analyzing the Relationship between the License of Packages and Their Files in Free and Open Source Software. Springer Berlin Heidelberg, Berlin, Heidelberg, 51–60. https://doi.org/10.1007/978-3-642-55129-4_6 + + +[16] Trevor Maryka, Daniel M. German, and Germán Poo-Caamaño. 2015. On the Variability of the BSD and MIT Licenses. Springer International Publishing, Cham, 146–156. https://doi.org/10.1007/978-3-319-17837-0_14 + + +[17] OSD. 2018. The Open Source Definition (Annotated). (2018). https://opensource.org/osd-annotated + + +[18] Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. More Common Than You Think: An In-depth Study of Casual Contributors. In IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016 - Volume 1. 112–123. https://doi.org/10.1109/ICPC. + + +[19] Lawrence Rosen. 2004. Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall PTR, Upper Saddle River, NJ, USA. + + +[20] Carlos Denner dos Santos. 2017. Changes in free and open source software licenses: managerial interventions and variations on project attractiveness. Journal of Internet Services and Applications 8, 1 (07 Aug 2017), 11. https://doi.org/10.1186/s13174-017-0062-3 + + +[21] E. Smith, R. Loftin, E. Murphy-Hill, C. Bird, and T. Zimmermann. 2013. Improving developer participation rates in surveys. In 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 89–92. https://doi.org/10.1109/CHASE.2013.6614738 + + +[22] Diomidis Spinellis. 2012. Package Management Systems. IEEE Software 29, 2 (2012), 84–86. + + +[23] Sander van der Burg, Eelco Dolstra, Shane McIntosh, Julius Davies, Daniel M. German, and Armijn Hemel. 2014. Tracing Software Build Processes to Uncover License Compliance Inconsistencies. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, New York, NY, USA, 731–742. https://doi.org/10.1145/2642937.2643013 + + +[24] Christopher Vendome, Gabriele Bavota, Massimiliano Di Penta, Mario Linares-Vásquez, Daniel M. German, and Denys Poshyvanyk. 2017. License usage and changes: a large-scale study on gitHub. Empirical Software Engineering 22, 3 (01 Jun 2017), 1537–1577. https://doi.org/10.1007/s10664-016-9438-4 + + +[25] Christopher Vendome, Gabriele Bavota, Massimiliano Di Penta, Mario Linares-Vásquez, Daniel M. Germán, and Denys Poshyvanyk. 2017. License usage and changes: a large-scale study on gitHub. Empirical Software Engineering 22, 3 (2017), 1537–1577. + + +[26] Christopher Vendome, Mario Linares-Vasquez, Gabriele Bavota, Massimiliano Di Penta, Daniel M. German, and Denys Poshyvanyk. 2017. Machine Learning-based Detection of Open Source License Exceptions. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 118–129. https://doi.org/10.1109/ICSE.2017.19 + + +[27] Christopher Vendome, Mario Linares-Vasquez, Gabriele Bavota, Massimiliano Di Penta, Daniel M. German, and Denys Poshyvanyk. 2015. When and Why Developers Adopt and Change Software Licenses. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (ICSME ’15). IEEE Computer Society, Washington, DC, USA, 31–40. https://doi.org/10.1109/ICSM.2015.7332449 + + +[28] Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A Look at the Dynamics of the JavaScript Package Ecosystem. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR ’16). ACM, New York, NY, USA, 351–361. https://doi.org/10.1145/2901739.2901743 + + +[29] Yuhao Wu, Yuki Manabe, Daniel M. German, and Katsuro Inoue. 2017. How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects. Springer International Publishing, Cham, 69–79. https://doi.org/10.1007/978-3-319-57735-7_8 + + +[30] Y. Wu, Y. Manabe, T. Kanda, D. M. German, and K. Inoue. 2015. A Method to Detect License Inconsistencies in Large-Scale Open Source Projects. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 324–333. https://doi.org/10.1109/MSR.2015.37 + + +[31] Yuhao Wu, Yuki Manabe, Tetsuya Kanda, Daniel M. German, and Katsuro Inoue. 2017. Analysis of license inconsistency in large collections of open source projects. Empirical Software Engineering 22, 3 (01 Jun 2017), 1194–1222. https://doi.org/10.1007/s10664-016-9487-8 +---------------------------------------- +------------------------------- +Section 241: +Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development + + +MAHMOUD JAHANSHAHI, DAVID REID, and AUDRIS MOCKUS, University of Tennessee, USA + + +In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. In contrast to dependency-based reuse, the infrastructure to systematically support copy-based reuse appears to be entirely missing. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse. We seek a better understanding of such reuse by measuring its prevalence and identifying factors affecting the propensity to reuse. To identify reused artifacts and trace their origins, our method exploits World of Code infrastructure. We begin with a set of theory-derived factors related to the propensity to reuse, sample instances of different reuse types, and survey developers to better understand their intentions. Our results indicate that copy-based reuse is common, with many developers being aware of it when writing code. The propensity for a file to be reused varies greatly among languages and between source code and binary files, consistently decreasing over time. Files introduced by popular projects are more likely to be reused, but at least half of reused resources originate from “small” and “medium” projects. Developers had various reasons for reuse but were generally positive about using a package manager. + + +CCS Concepts: • Software and its engineering → Software creation and management; • General and reference → Empirical studies. + + +Additional Key Words and Phrases: Reuse, Open Source Software, Software Development, Copy-based Reuse, Software Supply Chain, World of Code +---------------------------------------- +------------------------------- +Section 242: +1 INTRODUCTION + + +Software reuse refers to the practice of developing software systems from existing software rather than creating them from scratch [55]. Starting from scratch may demand more time and effort than reusing pre-existing, high-quality code that fits the required task. Developers, therefore, opportunistically and frequently reuse code [48]. Programming for clearly defined problems often starts with a search in code repositories, typically followed by careful copying and pasting of the relevant code [85]. + + +The fundamental principle of Open Source Software (OSS) lies in its “openness”, which enables anyone to access, inspect, and reuse any artifact of a project. This could significantly enhance the efficiency of the software development process. Platforms such as GitHub increase reuse opportunities by enabling the community of developers to curate software projects and by promoting and improving the process of opportunistic discovery and reuse of artifacts [46]. A significant portion of OSS is intentionally built to be reused, offering resources or functionality to other software projects [39], thus such reuse can be categorized as one of the building blocks of OSS. Indeed, developers in the open source community not only seek opportunities to reuse existing high-quality code, but also actively promote their own well-crafted artifacts for others to utilize [33]. Being widely reused + + +Authors’ address: Mahmoud Jahanshahi, mjahansh@vols.utk.edu; David Reid, dreid6@vols.utk.edu; Audris Mockus, audris@utk.edu, Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA. + + +Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. + + +© 2025 Copyright held by the owner/author(s). +ACM 1557-7392/2025/1-ART +https://doi.org/10.1145/3715907 + + +ACM Trans. Softw. Eng. Methodol. +not only increases the popularity of the software project and its maintainers while providing them with job prospects [79], but also may bring new maintainers as well as corporate support [46]. + + +Most commonly, code reuse refers to the introduction of explicit dependencies on the functionality provided by ready-made packages, libraries, frameworks, or platforms maintained by other projects (referred to as dependency-based or black-box reuse). Such external code is not modified by the developer and, generally, not committed into the project’s repository but relied upon via a package manager. Copy-based reuse (or white-box reuse), on the other hand, refers to the case where source code (or other reusable artifacts) is reused by copying the original code and committing the duplicate code into a new repository. It may remain the same or be modified by the developer after reuse. We specifically focus on copy-based reuse in this study. + + +While it is generally accepted that programs should be modular [75], with internal implementation details not exposed outside the module, copy-based reuse does exactly the opposite. OSS’s copy-based reuse, where any source code file or even a code snippet can be reused in another project, may result in multiple, possibly modified instances of the same source code replicated across various files and repositories. These copies may undergo further changes during maintenance, leading to multiple different versions of the originally identical code existing in the latest releases of corresponding projects. Unifying such multiplicity of versions in copy-based reuse to refactor it into a single package that all these projects could depend upon may not always be a tractable problem. + + +Moreover, as this reuse process continues across various projects, possibly with some modifications, data related to the initial design, authorship, copyright status, and licensing could be lost [76]. This loss could impede future enhancements and bug-fixing efforts. It might also diminish the motivation for original authors who seek recognition for their work and lead to legal complications for downstream users. These issues impact not only those who reuse the code but also the software dependent on at least one package that involves reused code [20]. + + +As the landscape of Open Source Software (OSS) expands, tracing the origins of source code, identifying high-quality code suitable for reuse, and deciphering the simultaneous progression of code across numerous projects become increasingly challenging. This can pose risks, such as the spread of potentially low-quality or vulnerable code [46] (e.g., orphan vulnerabilities [78]). + + +Despite the sustained attention and potential benefits and risks associated with reuse, the exact scale, prevalent practices, and possible negative impacts related to OSS-wide reuse have not been thoroughly explored. This is primarily due to the formidable task of tracking code throughout the entirety of OSS [46]. + + +Gaining a more comprehensive understanding of reuse practices could guide future research towards developing methods or tools that enhance productivity while mitigating the inherent risks associated with reuse. Specifically, we aim to quantify several aspects concerning the extent and nature of reuse in OSS, providing information necessary to investigate approaches that support this common activity, making it more efficient and safer. + + +We use a measurement framework created by Jahanshahi and Mockus [46] that tracks all versions of project artifacts, referred to as blobs(^1), across all repositories. In this approach, the first time each blob is committed to a repository is identified. The (repository, blob) tuples are then sorted based on the commit time of the first appearance of that unique blob in the repository. The repository with the earliest commit time is identified as the originating repository, and the person who made that commit is recognized as the creator of the blob. Reuse instances are then identified by pairing the originating repository with any subsequent repositories that commit the same blob. + + +Our work investigates how much and what kind of the whole-file reuse happens at the scale of OSS, with findings that could help guide future research and tool development to support this common but potentially risky activity. First, we show how the existing studies, by ignoring “small” and inactive projects, miss almost half of the code reused even by the “largest” and most active projects. There is a necessity for more in-depth study to + + +(^1)In alignment with the terminology used in the Git version control system, we use the term “blob” to refer to a single version of a file. +fully comprehend how these abundant yet unseen “dark matter” projects contribute to reuse activity. Second, we theorize about and investigate empirically the properties of artifacts and originating projects that influence the likelihood of file reuse, addressing a key question that previous work, which has predominantly focused on copy detection techniques, has missed. To investigate historic reuse trends, we also introduce a time-limited measure of reuse. Our findings reveal several surprising patterns showing how copying varies with the programming language, properties of a blob, and originating projects. These insights could help prioritize and articulate further research and tool development that supports the most common reuse patterns. Third, we obtain responses from 374 developers about the code they have reused or originated. Most respondents write code with an explicit expectation that it will be reused. Developers reuse code for several reasons and are not concerned with bugs in the reused code, but they are willing to use package managers for reused code if such tools were provided. Overall, we find that despite its questionable reputation due to inherent risks, code copying is common, useful, and many developers keep it in mind when writing code. + + +In summary we ask the following research questions: + + +RQ1 How much copy-based reuse occurs? What factors affect the propensity to reuse? + (a) How extensive is copying in the entire OSS landscape? + (b) Is copy-based reuse limited to a particular group of projects? + (c) Do characteristics of the blob affect the probability of reuse? + (d) Do characteristics of the originating project affect the probability of reuse? + + +RQ2 How do developers perceive and engage with copy-based reuse? + + +To foster reproducibility, we have made the replication package for this study, including datasets creation scripts and analysis notebooks, publicly available at https://zenodo.org/records/14743941. +---------------------------------------- +------------------------------- +Section 243: +2 BACKGROUND + + +This section is structured to provide a comprehensive understanding of the context and foundation for our research. It begins with an exploration of the types of reuse in software supply chains. Following this, we delve into the associated risks, discussing potential vulnerabilities, legal issues, and other challenges that can arise from software reuse. The third subsection introduces the social contagion theory (SCT) which helps select factors likely to affect the diffusion and adoption of reuse practices within the open source software development community. + + +2.1 Reuse in Software Supply Chains + + +A software supply chain comprises various components, libraries, tools, and processes used to develop, build, and publish software artifacts. It covers all stages from initial development to final deployment, including proprietary and open source code, configurations, binaries, plugins, container dependencies, and the infrastructure required to integrate these elements. The software supply chain ensures that the right components are delivered to the right places and at the right times to create functioning software products. Software reuse is one form of the software supply chain that enhances efficiency, reduces costs, and mitigates the risks associated with developing new software from scratch. + + +In the context of open source software, reuse in software supply chains can be categorized based on how the open source components are integrated and utilized within software projects [69–71]. + + +2.1.1 Dependency-based Reuse. Dependency-based reuse involves using open source libraries and packages as dependencies in a project. These dependencies are typically managed through package managers such as NPM for JavaScript, pip for Python, or Maven for Java. The reliance on these dependencies can introduce vulnerabilities and risks if not properly managed [98]. A web application using the React library, which in turn depends on numerous other libraries is an example of reuse in this kind of supply chain. +2.1.2 Copy-based Reuse. Copy-based reuse is the type of reuse investigated in this work. In copy-based reuse, code from open source projects is copied directly into a project. For example, a developer might copy a utility function from an open source repository and integrate it into their own project. While this approach is quick, it can lead to challenges in maintaining and updating the copied code. It is essential to track and manage these copies to ensure they are secure and up-to-date [56]. + + +2.1.3 Knowledge-based Reuse. Knowledge-based reuse involves using knowledge and practices derived from open source projects without directly copying code or using dependencies. It includes the adoption of development methodologies, architectural patterns, and best practices from open source communities. For example, implementing a microservices architecture inspired by successful open source projects. While not explicitly detailed by many researchers, the concept of knowledge-based supply chains is inferred from broader discussions of open source influence on software development practices [100]. + + +2.2 Associated Risks +While reuse can potentially reduce development costs, it is not always beneficial. It could introduce certain risks that might eventually escalate the overall costs of a project. These risks include, but are not limited to, security vulnerabilities, compliance, and the spread of bugs or low-quality code [31, 46]. + + +2.2.1 Security. The relationship between security and reuse can possess a dual-nature: a system can become more secure by leveraging mature dependencies, but it can also become more vulnerable by creating a larger attack surface through exploitable dependencies [35]. + + +In the context of copy-based reuse, extensive code copying can lead to the widespread dissemination of potentially vulnerable code. These artifacts may reside not only in inactive projects (that are still publicly available for others to reuse and potentially spread the vulnerability further), but also in highly popular and active projects [78]. + + +Understanding the copy-based supply chain helps in identifying potential security risks and implementing appropriate safeguards [73]. Therefore, detecting reused code aids in identifying and consistently patching these vulnerabilities across all affected systems [56]. + + +2.2.2 Compliance. Many open source licenses come with specific requirements that must be met. Unintentional reuse of code that is subject to intellectual property (IP) rights or licensing restrictions can lead to legal complications. Understanding the supply chain and detecting reused artifacts ensures compliance with licensing agreements and protects against IP infringements [59, 100]. + + +As software systems evolve, their licenses evolve as well. This evolution can be driven by various factors such as changes in the legal environment, commercial code being licensed as free and open source, or code that has been reused from other open source systems. The evolution of licensing can impact how a system or its parts can be subsequently reused [46]. Therefore, monitoring this evolution is important [19]. However, keeping track of the vast amount of data across the entire OSS landscape is a challenging task, and as a result, many developers fail to adhere to licensing requirements [2, 32]. + + +For example, investigating a subset of codes reused in the Stack Overflow environment revealed an extensive number of potential license violations [2]. Even when all license requirements are known, the challenge of combining software components with different and possibly incompatible licenses to create a software application that complies with all licenses, while potentially having its own, persists and is of great importance [32]. When individual files are reused, licensing information may be lost, and the findings of our study might suggest approaches to identify and remediate such problems. +2.2.3 Quality. Ensuring that all components of the supply chain meet quality standards is essential for the reliability and performance of the final product [9]. Copied code that has not been thoroughly vetted and tested can introduce bugs and defects. By identifying and evaluating such reused code, organizations can ensure that it meets their quality standards [69]. + + +Code reuse is not only assumed to escalate maintenance costs under specific conditions, but it is also seen as prone to defects. This is because inconsistent modifications to duplicated code can result in unpredictable behavior [48]. Additionally, failure to consistently modify identifiers (such as variables, functions, types, etc.) throughout the reused code can lead to errors that often bypass compile-time checks and transform into hidden bugs that are extremely challenging to detect [58]. + + +Apart from the bugs introduced through code reuse, the source code itself could have inherent bugs or be of low quality. These issues can propagate similarly to how security vulnerabilities spread. The patterns of reuse identified in this study could potentially suggest strategies to leverage information gathered from multiple projects with reused code, thereby reducing such risks. + + +2.3 Social Contagion Theory + + +Reusing code is an instance of technology adoption. One of the key questions we want to ask is what may affect the propensity of adopting (copying) a blob. Social Contagion Theory (SCT) [14] is a widely used theory for examining dynamic social networks and human behavior in the context of technology adoption [3, 84]. In the field of software engineering, it has been used to explain how developers select software packages [64]. + + +We are using SCT to theorize about the dynamics of code reuse by conceptualizing it in terms of exposure, infectiousness, and susceptibility. SCT helps us frame our research questions by providing a structured way to analyze how code reuse spreads within the open source community. Specifically, we explore how developers become aware of reusable code, the inherent qualities of the code that make it more likely to be reused, and the characteristics of projects or developers that make them more likely to adopt reusable code. These dimensions guide the formation of our research questions, enabling us to systematically investigate the factors influencing reuse activity in open source software. The key value of SCT in our case is to help articulate factors affecting copy propensity via three dimensions: + + + + + + +Exposure. + Exposure is an intuitive notion that in order to copy an artifact, you first have to learn about and find it. + + + + + + +Infectiousness. + Infectiousness is the property of the artifact that affects its propensity to be reused. + + + + + + +Susceptibility. + Susceptibility is the property of the destination project or developer that reflects how much benefit they would (or believe they would) derive by reusing the artifact. + + + + + + +First, for a blob (infectious agent) to be reused, a developer needs to become aware of it. In other words, it needs to be exposed to the open source community (population). Social coding platforms such as GitHub provide various crowd-sourced signals of project popularity. Developers may consider these characteristics of project popularity or health when choosing what resource to use [23, 61]. These considerations suggest that developers are more likely to be exposed to code in more popular or active projects. Therefore, we used project properties as a proxy for the likelihood of awareness. This primarily addresses RQ1-b and RQ1-d in our study. + + +The second concept of SCT, infectiousness, means that a highly virulent infectious agent is more likely to spread. In our context, this can be measured by the characteristics of the blob itself, corresponding to RQ1-c. Most of the literature on reuse has primarily focused on this aspect of the reused resource. + + +The final concept in our theory is susceptibility, which refers to the vulnerability of the target population to the infectious agent. In our case, this can be approximated by the characteristics of the target project (or author) that reuses the blob. For example, the use value, or how much the blob is needed in the project that copies it. +These characteristics are, by definition, highly specific to the target project, making them more challenging to measure. We aim to shed more light on this aspect in RQ2. +---------------------------------------- +------------------------------- +Section 244: +3 RELATED WORK AND CONTRIBUTIONS + + +While the benefits and risks associated with code reuse seem tangible, the extent and types of reuse across the entirety of OSS remain unclear. To prioritize these risks and benefits, and explore methods to minimize or maximize them respectively, we employ the approach introduced in our previous work [46]. This method allows us to track copy-based reuse on a scale commensurate with the vast size of OSS. The scope of copying activity is not fully encompassed by previous studies based on convenience samples, as we will illustrate in the results section. + + +We are not aware of any other curation system that operates at the level of a blob or finer granularity, nor is there an easy way to determine the extent of OSS-wide copy-based reuse at that level. Methods for identifying reuse, such as the one introduced by Kawamitsu et al. [50], are designed to find reuse between specific input projects and do not easily scale to detect reuse across all OSS repositories [46]. The methods we use to identify and characterize reuse could, therefore, serve as a foundation for tools that expose this difficult-to-obtain yet potentially important phenomenon [46]. We acknowledge that the actual extent of reuse is most likely much higher than what we find at blob-level granularity. Nevertheless, we believe the results we present will still be insightful, especially as the lower bound for the extent of copy-based reuse activity in the entirety of OSS. + + +We first differentiate copy-based reuse from related fields and then discuss our contributions. + + +3.1 Related Research Areas + + +To comprehensively understand copy-based reuse, it is essential to discuss two closely related fields: the clone detection and the clone-and-own practice. Following discussion will focus on differentiating copy-based reuse from dependency-based reuse, clone detection, and clone-and-own practices, situating these within the broader context of code reuse literature. + + +3.1.1 Code Reuse Analysis. Code Reuse Analysis encompasses techniques and practices that aim to maximize the efficiency and reliability of software development by leveraging existing code. Techniques such as static analysis, dependency analysis, and repository mining help identify reusable components within a codebase [52]. Through these methods, code reuse analysis seeks to reduce redundancy and enhance maintainability. Frakes and Kang [25] show that systematic code reuse can significantly reduce development time and costs while improving software quality. + + +3.1.2 Clone Detection. Clone Detection is a technique within code reuse analysis for identifying similar or identical code fragments in a codebase. This process involves using tools to detect exact or slightly modified duplicates, which can then be refactored into reusable components. Techniques range from textual and token-based methods to more advanced semantic and abstract syntax tree (AST) analyses [80, 91]. These methods focus on identifying code clones within constrained contexts, often limited to small code snippets within a few projects [92]. Clone detection helps in managing redundancy and maintaining code quality by highlighting areas where code can be simplified and reused [80]. The effectiveness of clone detection tools has been validated in various studies, showing significant improvements in software maintainability [49]. + + +3.1.3 Clone and Own. Clone and Own is a practice where existing software components are copied and modified to meet new requirements. This approach is often utilized in product line engineering and situations where rapid development is important. Clone-and-own allows developers to quickly adapt existing solutions but can lead to maintenance challenges due to the proliferation of similar, independently maintained code fragments [54, 82]. +practice, common in open source development, involves significant modifications and independent maintenance, often leading to divergent development paths [7, 30]. + + +While clone detection focuses on technical identification of code snippets, the clone-and-own practice highlights the importance of customization and independent management of forked projects. As the clone-and-own practice involves both technical customization and significant social factors, such as community engagement and governance models, understanding these aspects is important for managing forked projects [7, 30]. Although clone-and-own supports the purpose of code reuse by facilitating quick adaptation, it often results in code duplication, complicating long-term maintenance. Research has shown that clone-and-own is prevalent in practice due to its simplicity and effectiveness in the short term [4]. + + +3.1.4 Copy-based Reuse. Copy-based reuse, a form of code reuse, involves copying existing code and potentially modifying it for use in new contexts. This method allows for rapid development but shares the maintenance challenges associated with clone-and-own, as duplicated code must be managed across different parts of the software. In summary, code reuse analysis encompasses techniques like clone detection to manage redundancy and practices like clone-and-own to adapt existing code for new purposes. While clone detection and code reuse analysis share the goal of improving code quality and maintainability by identifying and managing redundancy, clone-and-own focuses on rapid adaptation rather than efficient redundancy management, despite serving a similar purpose in promoting reuse. Both copy-based reuse and clone detection address code duplication but differ significantly in their methodologies and scopes. Copy-based reuse research, as exemplified by our work, provides a broader, ecosystem-level perspective, incorporating social aspects and the characteristics of entire projects. In contrast, clone detection focuses on the technical identification of code snippets within specific contexts, while the clone-and-own practice emphasizes customization and independent maintenance of forked projects. + + +3.2 Contributions + + +Our contribution in this work has three aspects as follows. + + +3.2.1 Accuracy. Our study leverages the World of Code (WoC) infrastructure to analyze reuse of nearly the entire open source software landscape. This allows the capture of the instances of copying that would be missed if only a subset of public repositories were to be analyzed. In contrast, previous studies often focused on samples of mostly “popular” repositories drawn from specific communities or subsets of programming languages. They either have mostly concentrated on a specific community (e.g. Java language, Android apps, etc.) [21, 39, 40, 43, 68, 86] or only sampled from a single hosting platform (e.g. GitHub) [33, 34]. This, consequently, prevented identification of all inter-community or out-of-sample copies. + + +Even research with more comprehensive programming language coverage such as study by Lopes et al. [60] or studies by Hata et al. [41, 42] analyze only a subset of programming languages and additionally use convenience sampling methods by excluding less active or “unimportant” repositories. As our results demonstrate, even inactive and “small” projects appear to provide many of the artifacts reused in OSS, even by the “largest” and most active projects. + + +Existing literature on code cloning primarily focuses on empirical studies, case studies, and tool evaluations. Empirical studies typically analyze code clones within specific projects or samples of open source software repositories. These datasets are large but not exhaustive of the entire OSS ecosystem. For example, studies by Juergens et al. [48], Roy et al. [81] examine hundreds to thousands of files or repositories, providing valuable but partial insights. Case studies offer in-depth analysis of cloning practices within individual projects or organizations, giving detailed context but limiting the scale to the specific cases under study. Tool evaluations involve benchmark studies of clone detection tools, evaluating their performance on curated datasets. While these studies contribute important information about tool effectiveness, they do not cover the entire OSS ecosystem. +Unlike studies that rely on selective sampling, our analysis encompasses nearly the entire open source software ecosystem, providing a broad and necessary foundation for understanding code reuse. This is a fundamental requirement for accurately tracking the origin of files within entire OSS, as it helps to uncover accurate trends and patterns that would be biased in analyses based on the samples of such data, offering a more accurate understanding of reuse practices. + + +3.2.2 Methodology and Focus. Copy-based reuse has not been explored as thoroughly as the dependency-based reuse (e.g., [15, 26, 74]). For example, Mili et al. [66] have shown that dependency-based reuse can lead to more sustainable software architectures by promoting component-based design and reducing redundancy. Additionally, Brown and Wallnau [11] demonstrated that by leveraging well-defined interfaces and reusable libraries, dependency-based reuse can significantly improve software maintainability and scalability. Nevertheless, very few, if any, similar analyses exist regarding copy-based reuse. Copy-based reuse is potentially no less important, but is a much less understood form of reuse [46]. Most studies in copy-based reuse domain focus on clone detection tools and techniques [1, 40, 47, 81, 97] rather than on the characteristics of entire source code files that possibly make reuse more or less likely. + + +Furthermore, almost all studies we reviewed focus solely on source code reuse, whereas we track all artifacts, whether they are code or other reusable development resources [46]. By using the World of Code research infrastructure, which encompasses nearly the entire OSS ecosystem, we identified and analyzed copying activity at this scale for the very first time. + + +In contrast to clone detection, which primarily involves identifying similar code snippets within specific directories or domains [45, 90], our research addresses the broader context of entire files and diverse artifacts across the OSS ecosystem, providing a more comprehensive understanding of reuse. Our method bridges the clone detection and clone-and-own approaches by detecting all instances of reuse, whether they are kept without any changes or modified after reuse, thereby encompassing both the technical and managerial aspects of code reuse. + + +In existing clone detection literature, several methods are employed to identify code clones. These methods include text-based, token-based, tree-based, and graph-based techniques. Text-based methods detect clones by comparing raw text, which is straightforward but can be less accurate due to variations in formatting. Token-based methods improve on this by converting code into tokens and detecting similarities at this more abstract level, enhancing accuracy but still being susceptible to variations in code structure. Tree-based methods parse the code into abstract syntax trees (ASTs) and identify clones by comparing these trees, providing a more structured and semantically meaningful detection. Graph-based methods further abstract code into control flow or data flow graphs, allowing for the detection of more complex and semantic clones [81]. + + +The clone and own literature primarily employs these detection methods to understand the broader landscape of code cloning. For example, Juergens et al. [48] utilized a combination of these techniques to analyze cloning practices in software projects. These methods are effective in identifying different types of clones, such as exact, parameterized, and semantic clones, but they often focus on similarities and patterns rather than exact matches. + + +In contrast, our research employs a method focused on identifying reuse at the blob-level, specifically detecting if the exact versions of code have been copied. While it misses instances where a single code snippet has been copied, this approach does not rely on abstractions or patterns. This method involves obtaining hashes for all versions of the entire open source software ecosystem to detect identical code segments, ensuring that every version of code is tracked to its origin. This exhaustive and detailed approach allows for a comprehensive analysis of copy-based supply chains at the OSS level. Since software supply chains form a network over the entire OSS, it is not feasible to study them by sampling projects: representative samples from large graphs are notoriously difficult to obtain (see, e.g., [57]). +In addition to ensuring that the entire file has been copied and committed, our method easily scales to the entire OSS ecosystem as it avoids the need to look for similarities among tens of billions of versions by utilizing hashes. Traditional clone detection techniques would need to be substantially modified to work at this scale. We discuss some of the potential approaches in Section 8.1. + + +3.2.3 Influencing Factors and Social Aspects. Our study explores how the characteristics of OSS projects influence the propensity for their artifacts to be reused, examining their social aspects. Previously, the focus has been primarily on the desired functionality and the code itself [29, 87], but we also investigate the social aspects of this phenomenon in the open source community. + + +The literature on clone detection and our research both explore the social aspects of code reuse, but they do so from different perspectives and with varying emphases on social and technical factors. Existing literature on clone detection primarily focuses on the technical aspects of identifying code clones and understanding their impact on software maintenance and quality. For instance, studies by Juergens et al. [48], Roy and Cordy [80] delve into the reasons for code cloning, such as improving productivity, learning, and avoiding reimplementation of similar functionalities. These studies often highlight the technical motivations behind code cloning, such as reusability and rapid prototyping, but they also touch upon social aspects like collaborative development and knowledge sharing within teams. However, the primary emphasis remains on the technical detection and management of code clones. + + +In contrast, our research takes a broader view by examining how the characteristics of open source software projects influence the propensity for their artifacts to be reused. This includes a detailed analysis of both social and technical factors. Our study explores the diverse motivations and implications of reuse in the OSS community, considering aspects such as project size, community engagement, and the collaborative nature of OSS development. By doing so, we highlight the importance of social dynamics in code reuse, including factors like community contributions, the reputation of projects, and the collaborative environment that fosters code sharing and reuse. + + +By examining these social and technical factors, our study provides a more comprehensive understanding of the motivations behind code reuse in the OSS community. We draw parallels to other factors influencing copy-based reuse, such as the ease of access to code, the open and collaborative nature of OSS projects, and the role of community support and documentation. This broader perspective allows us to highlight the diverse and sometimes conflicting motivations for code reuse, ranging from technical efficiency to social recognition and collaborative learning. +---------------------------------------- +------------------------------- +Section 245: +4 METHODOLOGY + + +We begin by briefly describing the World of Code infrastructure utilized in our study, followed by presenting the methods introduced in our previous work [46] to identify instances of copying. Next, we explain the time complexity of our method and discuss the rationale behind our choice. In the second and third subsections, we discuss methods used to answer each research question in more detail. + + +To make the subsequent discussion precise, we first introduce a few definitions. The time when each unique blob ( b ) was first committed to each project ( P ) is denoted as ( t_b(P) ). The first repository ( P_o(b) = \text{ArgMin}_P t_b(P) ) is referred to as the originating repository for ( b ) (and the first author as the creator). Then project pairs consisting of a project with the originating commit and the destination project with one of the subsequent commits producing the same blob ( (P_o(b), P_d(b)) ) are identified as reuse instances. The reuse propensity (the likelihood that a blob will be copied to at least one other project) is then modeled based on the type of the file represented by the blob and the activity and popularity characteristics of the originating projects. +4.1 Identification of Reused Blobs + + +4.1.1 World of Code Infrastructure. Finding duplicate pieces of code and tracking all revisions of that code across all open source projects is a data- and computation-intensive task due to the vast number of OSS projects hosted on numerous platforms [46]. Previous studies on reuse have consequently often focused on a relatively small subset of open source software, potentially missing the full extent of reuse that could only be obtained with a nearly complete collection [46]. World of Code (WoC) [62, 63] infrastructure aims to address these challenges by regularly discovering, retrieving, indexing, and cross-referencing information from new and updated version control repositories that are publicly available. + + +WoC operationalizes copy-based reuse by mapping blobs, which are versions of the source code, to all commits and projects where they have been created. This means that copy-based reuse is detected only if an entire file is duplicated without any alterations [46]. If the reuser commits the reused blob before making any modifications, this method will find it; however, if they commit only after making alterations to the original file, it will not be identified. Given this, our study focuses solely on whole-file copying activity. Consequently, different versions of what was originally the same file will be treated as distinct entities since they are different blobs. + + +4.1.2 Project Deforking. To understand reuse across the entirety of open source software, it is important to identify distinct software projects. Git commits are based on a Merkle Tree structure, uniquely identifying modified blobs, and therefore, shared commits between repositories typically indicate forked repositories. As a distributed version control system (VCS), Git facilitates cloning (via git clone or the GitHub fork button), resulting in numerous repositories that serve as distributed copies of the same project. While this feature enables distributed collaboration, it also leads to many clones of the original repository [72]. + + +To differentiate copy-based reuse from forking, we use project deforking map $p2P$ provided in WoC [72]. Using community detection algorithms, this map provides a clearer picture of distinct projects by linking forked repositories $p$ to a single deforked project $P$ based on shared commits. + + +An advantage of this map over using the fork data from platforms like GitHub is that WoC’s p2P map is based on shared commits, providing higher recall by not missing forks that did not occur through GitHub’s forking option but rather through cloning the repository. Additionally, forks and clones hosted on different platforms cannot be traced easily, but the WoC map is platform-independent and does not have this constraint. Moreover, some forks may diverge significantly from the original repository but are still considered forks by hosting platforms. WoC’s deforking algorithms use community detection via shared commits. If forks diverge substantially via maintenance after forking, the community detection algorithm would recognize them as distinct projects, which reduces false positives and increases precision. + + +Whenever we mention “project” in our paper, we are actually referring to a “deforked project” as defined here. This ensures that our discussions about reuse are based on unique instances of software development projects rather than duplicated efforts through forks. + + +4.1.3 Dataset Creation. To understand the identification of reused blobs, it is important to explain how the dataset we used [46] was created. Despite the key relationships WoC offers, several obstacles had to be resolved. The initial step was to pinpoint the first instance, denoted as $t_b(P)$, when each of the approximately 16 billion blobs appeared in each of the almost 108 million projects. To this goal, first the c2fbb map (which is the result of diff on a commit: commit file, blob, old blob and lists all blobs created by each commit) was joined with the c2dat map (full commit data) to obtain the date and time of each commit. The result was then joined with the c2P map (commit to project) to identify all projects containing that commit. + + + + +2See https://github.com/woc-hack/tutorial for more information about WoC map naming convention +The result is a new c2btP map (commit to blob, time, and Project). To create the timeline for each blob, all that data was sorted by blob, time, and project resulting in b2tP map ((b, t, P)) where we have only blob, time, and the deforked project that contain our desired timeline (C_1). + + +Finally, the blob timelines(^3) were used to identify instances of reuse ((C_1(P_1), C_1(P_2))) or Ptb2Pt map, where the first project is the originating project(^4) and the second project is the destination project of the reused blob, meaning the blob was created at a later time in this project. This resulting Ptb2Pt map contains all instances of blob reuse. The data flow of reuse identification is shown in Figure 1. +---------------------------------------- +------------------------------- +Section 246: +4.1.4 Time Complexity Analysis. + + +To evaluate the complexity and time requirements of our methodology for identifying reuse, we analyze the time complexity of each step and provide a benchmark for execution time on a typical computer setup. The overall time complexity is dominated by the sorting operations involved in processing the large maps. Data preparation and joining involve merging the precalculated maps in WoC, namely the c2fbb, c2P, and c2dat maps. Since these maps are already sorted and split into 128 partitions, we can join them with a complexity of (128 \times O(l + m + n)), where (l, m,) and (n) are the number of rows in the maps respectively. + + +We then drop the commit hashes and sort the joined b2tP map based on blob, time, and project, which is the most computationally intensive step, with a complexity of (O(n \log n)), where (n) is the total number of rows in the b2tP map. Identifying reuse instances, given that the data is already sorted by blob, has a complexity of (O(n)), where (n) is the total number of copy instances. + + +Using a high-performance workstation as a benchmark (8-core processor at 3.5 GHz, 128 GB RAM, 2 TB SSD), we calculate the execution time for each step. Data preparation and joining, with a linear-time merge, primarily involve reading and writing large files. With a sequential read/write speed of approximately 500 MB/s for SSDs, joining the maps (total size around 128 billion rows) is expected to take roughly 1-2 hours. Sorting the created b2tP map, which requires external sorting of about 74 billion rows, necessitates multiple passes over the data. Based on empirical data, a modern external sorting algorithm with 8 cores can handle around 0.5 billion rows per hour. Hence, sorting this map would take approximately 148 hours. Identifying reuse instances, involving efficient I/O operations, is estimated to take 4-6 hours. In total, the entire process is estimated to take approximately 153-156 hours, or about 6.5 days. + + +Detecting code reuse in finer granularity than blob-level, such as through syntax tree parsing or text similarity techniques, would offer a more comprehensive view of code reuse. However, these methods involve several computational challenges and resource constraints, making them impractical for our study. + + +Parsing the abstract syntax tree (AST) for each file to detect structural similarities involves several computational steps. First, each file must be parsed into its AST representation, which itself is an (O(n)) operation where (n) is the total number of unique blobs. For our dataset of 16 billion blobs, this parsing step alone would be extremely resource-intensive. Following parsing, comparing each AST to identify potential reuse instances would require pairwise comparisons. The pairwise comparison complexity is (O(n^2)), resulting in an infeasible (O((16 \times 10^9)^2)) complexity. + + +Text similarity measures on the other hand, such as Levenshtein distance or cosine similarity, involve comparing each blob’s contents with every other blob. These methods typically operate with a complexity of (O(n^2)) for each pair of files, again resulting in an infeasible (O((16 \times 10^9)^2)) complexity. Even with optimizations like locality-sensitive hashing or other approximation techniques, the scale of the data renders this approach impractical. + + +Given the significant computational complexity and resource requirements, detecting code reuse at a finer granularity than blob-level is not feasible for our study. Instead, we have chosen to focus on blob-level reuse detection, which provides a practical and scalable solution. While this approach is limited to detecting exact file + + + + +(^3)All but the first commit time creating the blob for each project were dropped as a blob is often reused within a repository. + + +(^4)See section 7 for the limitations in identifying the originating project. +Fig. 1. Reuse Identification Data Flow Diagram +copies, it ensures that the analysis remains within the bounds of available computational resources and time constraints, thereby enabling a thorough and efficient examination of code reuse in the OSS landscape. + + +4.2 RQ1: How much copy-based reuse occurs? What factors affect the propensity to reuse? + + +4.2.1 RQ1-a: How extensive is copying in the entire OSS landscape? To investigate how widespread whole-file copying in OSS actually is, we first want to establish a baseline: what fraction of blobs were ever reused, and if reused, to how many downstream projects? Specifically, in RQ1-a, we are showing the number of blobs, originating as well as destination projects (deforked), and copy instances across the entire OSS ecosystem. These numbers are not estimates but the actual numbers calculated over the complete dataset. + + +4.2.2 RQ1-b: Is copy-based reuse limited to a particular group of projects? One may argue that the results in RQ1-a are not necessarily important, as only “small” projects may reuse code in a copy-based manner. To see if this is actually the case, we randomly sampled 5 million reuse instances from each of the 128 files into which the data was divided, based on the first two bytes of the hash of blobs. This resulted in a total of 640 million instances for the analysis. This approach ensured that our sample was distributed across the entire dataset, capturing a diverse range of copy instances. The sample size of 640 million instances constitutes approximately 2.67% of the entire dataset. Although this is a small fraction of the data, it is sufficiently large to ensure the statistical reliability and representativeness of our analysis, as the large absolute size of the sample guarantees its statistical reliability according to the Central Limit Theorem. + + +Before going further, we need to define the qualitative and, more importantly, subjective terms of “small” and “big” projects with quantitative and justified measures. Crowston and Howison [17] and Koch and Schneider [51] have shown that project activity, as measured by commit frequency, is a strong indicator of project health and sustainability. Additionally, the use of stars as a metric is well-supported in the literature, as they represent a form of user endorsement and are correlated with project visibility and perceived quality [77]. We choose these two metrics because both the number of commits and the number of stars are indicators of a project’s activity and popularity. Commits reflect the ongoing development and maintenance efforts, which are important for the sustainability and evolution of a project. Stars, on the other hand, reflect the community’s interest and endorsement, indicating the project’s visibility and influence. These metrics are widely used in empirical software engineering research to evaluate the health and impact of open source projects [8, 47]. + + +We define projects with over 100 commits and 10 stars as “big” projects. The mean and 3rd quantile values for the number of commits in our dataset are 46 and 12, respectively. This aligns with established practices in the literature where thresholds are often set significantly above average to isolate highly active projects. By setting the threshold at more than double the mean, we ensure that only the top-performing projects are classified as big. Similarly, the threshold of 10 stars is set based on the mean of 2.33 and 3rd quantile value of 0 for stars. This indicates that the majority of projects receive few or no stars, reflecting their popularity and community engagement levels. By selecting projects with at least 10 stars, we focus on those with significant community recognition, capturing less than 1% of the dataset but representing the most influential projects. + + +The thresholds chosen for “small” group, on the other hand, are projects with no stars and fewer than 10 commits to ensure the projects are indeed small and inactive. This approach ensures that the small group, comprising 62% of projects, includes those with minimal activity and engagement, consistent with findings by Gousios and Spinellis [37] that a large proportion of open source projects are relatively inactive. We consider all the other projects that do not fall into either the big or small categories as the “medium” group. The medium group captures the middle ground, excluding only the extremes, thus providing a balanced representation of the majority of active projects. + + +Using this taxonomy, we counted the number of unique blobs involved in these copy instances between groups. It should be mentioned that a blob can have several downstream projects that do not necessarily fall into the +same group. Therefore, we considered the biggest downstream project for our analysis purposes. For example, if a blob originated in a medium project and was reused by both a big and a small project, we count it in the “medium to big” category. Considering the biggest downstream project for each unique blob ensures that the most significant reuse instances are captured. This approach is supported by research indicating that the impact of code reuse is often determined by the size and activity of the downstream projects utilizing the code [68, 95]. By focusing on the largest downstream project, we ensure that our analysis reflects the most substantial and influential reuse cases of a particular blob. + + +4.2.3 RQ1-c: Do characteristics of the blob affect the probability of reuse? The third part of our research question (RQ1) focuses on the properties of reused artifacts. To address this, we obtained a large random sample of blobs comprising 1/128 of all blobs. We have to point out that unlike RQ1-b, where we randomly sampled copy instances (meaning all the blobs involved were reused at least once), here we are sampling from the b2tP map that includes all blobs, whether they have been reused or not. Our dataset is divided into 128 files based on the first two bytes of the blob hash. Hash functions, by design, distribute input data evenly across the output space. The use of hash functions to divide data ensures a uniform distribution across the resultant files [67]. By using one of these 128 files as our sample, and given the vast size of the dataset, we ensure that it is an unbiased representation of the entire dataset and that this sample size is sufficient to achieve high statistical power and accuracy in our analyses. + + +We then employed a logistic regression model with the response variable being one for reused blobs and zero for non-reused blobs. Logistic regression is a robust statistical method used to model the probability of a binary outcome based on one or more predictor variables. It is widely used in empirical software engineering to understand factors influencing software development practices [44]. By using logistic regression, we can quantify the effect of various predictors on the likelihood of a blob being reused. + + +In this research question, we are concerned with infectiousness based on our Social Contagion Theory. Specifically, we are looking for properties of artifacts that affect their propensity to be reused. The first predictor in our model is the programming language of the blob. Different programming languages are associated with distinct package managers, development environments, and community cultures, which can influence reuse practices [6]. For example, the ease of dependency management in languages like Python (via pip) or JavaScript (via NPM) might facilitate reuse more than in languages with less mature package management systems. Thus, including the programming language as a predictor helps capture these contextual differences. We anticipate that source code for programming languages such as C, which lack package managers, is likely to be copied more frequently than source code for languages with sophisticated package managers, such as JavaScript. + + +The second predictor is the time of blob creation. This factor helps account for temporal dynamics by indicating the period during which a blob was created, reflecting different reuse practices over time. We hypothesize that older blobs were more likely to be reused due to fewer available reusable artifacts in the OSS landscape at the time. However, the time of creation inherently includes the effect of a blob’s availability duration ((t_b(P_d) - t_b(P_o))), meaning older blobs have had more time to be discovered and reused. Previous research by Weiss and Lai [95] indicates that the age and visibility of code artifacts influence their reuse. + + +To isolate and examine the influence of the creation period without the confounding effect of longer availability, we introduce the concept of time-limited reuse. By focusing on copies occurring within specific time intervals after the blob’s creation, we remove the advantage of longer visibility and can better assess how the creation period itself influences reuse(^5). We evaluated both one-year and two-year intervals and found similar results. By evaluating both intervals and finding similar results, we enhance the robustness of our conclusions. To maintain conciseness and avoid repetition, we report the findings for the two-year interval. Reporting the two-year interval results provides a balance between sufficient observation time for reuse events and the practical need for concise reporting. Consequently, we excluded blobs created after May 1, 2020, ensuring that all blobs had at least two + + +(^5)This definition is used solely for the purposes of our regression model and subsequent analysis. It is not applied in RQ1-a, RQ1-b, or RQ2. +years to be potentially reused, providing a consistent time frame for analysis [96]. This approach ensures that our findings are not skewed by varying availability periods. + + +The third predictor is whether the blob is a source code or a binary. We hypothesize that binaries, identified by their git treatment or file extensions like tar, jpeg, or zip, may exhibit different reuse patterns compared to source code. We expect that binary files, such as images, might be copied more often because they are easy to understand and reuse but difficult to recreate. Unlike other types of files, developers cannot easily extract specific parts or functionalities from binary files. That is, source code blobs are directly reusable and modifiable, whereas binaries might be reused as-is without modification. This distinction is important as it affects the ease or necessity of reuse [27]. Therefore, when it comes to whole-file reuse, which is our definition of reuse in this work, we anticipated that binary blobs are more likely to be copied. + + +The last factor we hypothesize might affect the propensity of a blob to be reused is its size. The size of a blob can influence its reuse for several reasons. Larger blobs may contain more functionality, making them more attractive for reuse. Conversely, smaller blobs may be simpler to integrate into existing projects. Previous research by Capiluppi et al. [12] and Mockus [68] has indicated that the size of code artifacts can impact their maintainability, comprehensibility, and ultimately their reuse. + + +To investigate whether a difference exists between the sizes of copied and non-copied blobs, we exclude binary blobs from the analysis. The size of binary blobs is not comparable to the size of source code blobs due to their fundamentally different nature. Binary blobs often include compiled code, media files, or compressed archives, which do not provide a meaningful comparison to plain text source code in terms of size. Because of these differences, we did not incorporate blob size as a predictor in our logistic regression model. Including binary blobs could skew the results and lead to misleading conclusions. Instead, we perform a t-test to compare the sizes of copied blobs and non-copied blobs. The t-test is a robust statistical method used to determine whether there is a significant difference between the means of two groups [88]. By applying the t-test, we can rigorously assess whether blob size influences the likelihood of reuse. + + +4.2.4 RQ1-d: Do characteristics of the originating project affect the probability of reuse? The fourth part of RQ1 concerns the chances of finding or being aware of a blob approximated by signals at the project level. This is the exposure factor in the Social Contagion Theory. To conduct this study, we use WoC’s MongoDB project database to randomly sample one million projects, comprising nearly 1% of all projects indexed by WoC, to achieve a balance between statistical validity and computational feasibility. A sample size of one million is large enough to provide a representative snapshot of the entire population. + + +We then search the reuse instances ( (C_1(\% > 3), C_1(\% < 3)) ) in our Ptb2Pt map to determine if the project originated at least one reused blob. A logistic regression model with the response variable being one if the project has introduced at least one reused blob (and zero otherwise) is then constructed. The predictors in the project-level model include the number of commits, blobs, authors, forks, earliest commit time, the activity duration of the project (the time between the first and the last commit in that project), the binary ratio (the ratio of binary blobs to total blobs), and the programming language. We also use the number of GitHub stars for each project as a predictor. This data in WoC (number of stars) is sourced from GHTorrent [36]. + + +The choice of these predictors for our model is based on the current literature on relevant project properties. + + + + + + +Number of Commits. + Number of commits is a strong indicator of project activity and maintenance. Koch and Schneider [51] show that projects with higher commit frequencies tend to have more active development and are more likely to be reused due to their perceived reliability and continuous improvement. + + + + + + +Number of Blobs. + Number of blobs represents the volume of content and potential reusable components. Larger projects with more blobs are likely to offer more opportunities for reuse [68]. It can also indicate the project’s complexity and modularity. Projects with more files may be more modular and provide more reusable components. +• +Number of Authors. + Number of authors reflects the collaborative nature of a project. Projects with more contributors tend to have diverse expertise, which supports innovation and decentralized communication, improving the development process [17], and potentially increasing the likelihood of reuse. + + + + + + +• +Number of Forks. + Number of forks is a proxy for the project’s popularity and community engagement. Projects with more forks are often viewed as valuable and trustworthy [93], increasing their reuse potential. + + +• +Earliest Commit Time and the Activity Duration. + Earliest commit time and the activity duration provide insights into the project’s maturity and stability. Older and long-active projects are more likely to be well-established and reused [28]. + + +• +GitHub Stars. + GitHub stars is a form of social endorsement, indicating community approval and interest. Projects with more stars are likely to be considered high-quality and reliable, making them more attractive for reuse [8]. + + +• +Binary Ratio. + Binary ratio, defined as the ratio of binary blobs to total blobs, can impact the reuse potential of a project. Binary blobs, such as compiled code or media files, often indicate pre-packaged functionalities or resources that are ready for use. A higher binary ratio may suggest that a project provides ready-to-use components, which can facilitate reuse [68]. + + +Regarding language assignment, at the blob-level, WoC’s b2sl map was used for blob language detection based on file extensions. This method is straightforward and effective for identifying the programming languages of individual blobs. Nevertheless, assigning a primary language to a project is more complex due to the use of multiple languages in most projects. WoC’s MongoDB project database provides counts of files with each language extension, allowing us to pick the most frequent extension as the project’s main language. For our study, we considered only a subset of blobs, specifically originating blobs (blobs first seen in OSS within the project), and assumed the most common language among these blobs as the project’s primary language. This approach aligns with the practice of determining the dominant language based on primary contributions [94]. + + +4.3 +RQ2: How do developers perceive and engage with copy-based reuse? + + +The second research question in our study aims to triangulate the quantitative results and understand how developers perceive and engage with copy-based reuse. While quantitative research often focuses on metrics such as frequency, intensity, or duration of behavior, qualitative methods are better suited to explore the beliefs, values, and motives underlying these behaviors [13]. + + +Using a questionnaire for triangulation allows us to obtain self-reported data, which can confirm or challenge the quantitative findings. This method helps identify any discrepancies and provides a deeper understanding of participant behavior [18]. In our study, the questionnaire included a direct question (“Did you create or copy this file?”) to gather self-reported data on whether participants copied the blob, offering a direct measure to compare against the quantitative results. + + +Additionally, based on the Social Contagion Theory (SCT), we hypothesize that the characteristics of the destination project and/or author influence reuse activity. However, treating all reusers the same could be problematic, as developers may have fundamentally different reasons for reuse. Motivations for reuse can vary widely based on individual needs, project requirements, and perceived benefits from the reused code [24, 68]. Our primary focus was to understand these motivations to categorize different types of reuse, potentially providing more insight into measuring susceptibility for future research. By categorizing motivations, we aim to identify distinct patterns and factors influencing reuse behavior, facilitating the development of targeted strategies to enhance code reuse practices. This approach aligns with qualitative research methods that seek to explore complex phenomena through detailed, contextualized analysis [16]. +To gain insights into the motivations behind copy-based reuse, we conducted an online survey targeting both the authors of commits introducing reused blobs and the authors of commits in the originating repositories. The survey aimed to capture a range of experiences and perceptions related to copy-based reuse. + + +4.3.1 Survey Content and Questions. The survey included questions about the nature of the file, why it was needed, how it was chosen, and whether developers would use tools to manage reused files. General questions about the repositories and developers’ expertise were also included. Notably, the question about the reason for needing the file was open-ended to capture unbiased and detailed responses about the motivations for reuse. All the questions were optional, except for the very first one, which asked if the respondent had created or reused the file. We chose not to directly ask why did developers choose to copy to avoid provoking legal and ethical concerns about copy-based reuse. For this reason, instead, we asked: “Why was this file needed? How did it help your project?” + + +Furthermore, we asked developers if the project in which the file resides was intended to be used by other people. Understanding whether creators intend for their resources to be reused helps assess the cultural and strategic aspects of OSS development. If a significant portion of creators design their code with reuse in mind, it indicates a collaborative ecosystem where resources are shared and built upon. + + +We also asked a series of Likert scale (on a scale from 1 to 5) questions as follows. + + + + +“To what extent did this file help you?” - Gauging how helpful creators and reusers find the reused blobs provides quantitative data on the perceived value of the reused code. Comparing the ratings between creators and reusers highlights any discrepancies or alignment in perceived usefulness. + + +“To what extent were you concerned about potential bugs in this file?” - Investigating reusers’ concerns about bugs in reused code sheds light on the perceived risks associated with this practice. Understanding the level of concern can indicate how much trust reusers place in the original code’s quality. + + +“How important is it for you to know if the original file has been changed?” - Understanding reusers’ concerns about changes in the original files helps identify potential issues related to the stability and continuity of reused code. Frequent changes can disrupt the functionality of dependent projects. + + +“How likely would you use a package manager which could handle changes to this file if there was one?” - Understanding the likelihood of reusers adopting a package manager if available provides insights into the demand for tools that can streamline and manage code reuse. + + + + +4.3.2 Sampling Strategy. To ensure a representative and comprehensive sample, we stratified the data along several dimensions. Stratified sampling ensures that all relevant subgroups are adequately represented in the survey, enhancing the generalizability of the findings [16]. By considering multiple dimensions such as productivity, popularity, copying patterns, file types, and temporal aspects, we ensure a comprehensive analysis that captures the diversity of reuse behaviors in the OSS community: + + + + +Productivity and Popularity +: Based on the number of commits and stars, we differentiated between high and low productivity/popularity projects (similar to RQ1-b). + + +Copying Patterns +: We distinguished between instances where only a few files were copied versus multiple files, as these might indicate different reuse behaviors. + + +File Extension +: We included various file types and programming languages to capture a diverse range of reuse scenarios. + + + + + + +6The survey and its procedure was approved by our institutional review board, ensuring that it adhered to ethical guidelines for research involving human subjects. + + +7See online appendix for survey questions. +• +Temporal Dimensions +: We considered the blob creation time and the delay from creation to reuse to understand temporal patterns in reuse behavior. + + +4.3.3 +Survey Design. + For each copy instance, we targeted the author of the commit introducing the blob into the destination repository and the author of the commit in the originating repository. This dual perspective allowed us to capture both the originator’s and the reuser’s viewpoints, offering a more comprehensive understanding of the reuse dynamics. + + +We conducted three rounds of surveys, progressively expanding the sample size and refining the questions based on feedback and preliminary results. We chose to conduct our survey in three steps to ensure a thorough and iterative approach to understanding developer motivations behind copy-based reuse. + + + + + + +We handpicked 24 developers (12 creators and 12 reusers) for an initial survey with open-ended questions. This round aimed to gather in-depth qualitative data and identify key themes. This small, purposive sample size allows for deep, exploratory insights, which are important for the initial stages of qualitative research [38]. + + + + + + +The survey was sent to 724 subjects (329 creators and 395 reusers) with a mix of open-ended and multiple-choice questions. This round helped validate and refine the themes identified in the first round. The increased sample size in this round provides more data to ensure that the themes and patterns observed are not idiosyncratic but rather indicative of broader trends. This intermediate sample size balances the need for more extensive data while still allowing for qualitative depth [65]. + + + + + + +The survey was expanded to 8734 subjects (2803 creators and 5931 reusers), with most questions being multiple-choice to facilitate quantitative analysis, except for the open-ended question about the reason for needing the file. The large sample size in this final round ensures that the findings are statistically significant and generalizable across the broader population of developers involved in copy-based reuse. This sample size aligns with recommendations for achieving sufficient statistical power in survey research [53]. + + + + + + +The reason behind the seemingly random numbers of survey subjects in the three rounds is that after sampling our data, we had to perform data cleansing and preparation to reach the survey target audience. This process normally caused some samples to be removed. Initially, we chose sample sizes of 30, 1,000, and 10,000 respondents for the three rounds respectively, but after the data cleansing process, the actual numbers were lower. + + +4.3.4 +Thematic Analysis. + The thematic analysis allows us to systematically identify patterns and themes within qualitative data, providing deep insights into the reasons behind copy-based reuse [10]. To analyze the survey responses, we followed a structured thematic analysis process as outlined by Yin [99]: + + + + + + +Compiling +: First author compiled all responses. + + + + + + +Disassembling +: Each author individually analyzed and coded the responses to identify ideas, concepts, similarities, and differences [5, 89]. + + + + + + +Reassembling +: The coded responses were organized into meaningful themes by each author independently, focusing on identifying different types of reuse [10]. + + + + + + +Interpreting and Concluding +: The authors discussed and compared the themes, clarifying and organizing them to ensure a coherent and comprehensive understanding. The final themes were then used to reclassify and interpret all survey responses. +---------------------------------------- +------------------------------- +Section 247: +5 +RESULTS & DISCUSSIONS + + +The numbers presented in this section are derived from version U of WoC, which was the most recent version available at the time of this analysis. +---------------------------------------- +------------------------------- +Section 248: +8 Only if they had explicitly disclosed their email address on their public profile. + + +9 https://bitbucket.com/swsc/overview +5.1 RQ1: How much copy-based reuse occurs? What factors affect the propensity to reuse? + + +5.1.1 RQ1-a: How extensive is copying in the entire OSS landscape? We identified nearly 24 billion copy instances (unique tuples containing the blob and originating and destination projects) encompassing more than 1 billion distinct blobs. With approximately 16 billion blobs in the entire OSS landscape (as approximated by WoC), 6.9% of the blobs have been reused at least once, and each reused blob is copied to an average of 24 other projects (see Table 1). + + +| Count | Total | % | +|----------------|----------------|-----| +| Reuse instances| 23,914,332,270 | - | - | +| Blobs | 1,084,211,945 | 15,698,467,337 | 6.9% | +| Originating projects | 31,706,416 | 107,936,842 | 29.4% | +| Destination projects | 86,483,266 | 107,936,842 | 80.1% | + + +Nearly 32 million projects (about 30% of the nearly 108 million deforked OSS projects indexed by WoC) originated at least one reused blob. Over 86 million projects have copied these blobs, meaning 80% of OSS projects have reused blobs from another project at least once. + + +RQ1-a Key Findings: +1. We identified nearly 24 billion copy instances encompassing more than 1 billion distinct blobs. +2. 6.9% of all the blobs in the entire OSS have been reused at least once. +3. About 30% of all OSS projects originated at least one reused blob, and 80% of projects have reused blobs at least once. + + +The extensive reuse observed highlights the efficiency gains in OSS development, as projects benefit from existing code to accelerate development cycles and reduce costs. The widespread reuse also raises security concerns, as vulnerabilities in copied code can propagate across numerous projects. This necessitates improved vulnerability detection and management practices to ensure the integrity of reused code. Additionally, License violations due to improper code reuse can lead to legal challenges and compliance issues, underscoring the importance of clear licensing and adherence to open source policies. Furthermore, our identification of blob-level reuse, which only accounts for exact matches and not slight modifications, suggests that the actual extent of code reuse might be even higher. The findings advocate for the development of better tools and infrastructure to manage copy-based reuse, including automated detection of security and legal risks, and tools for maintaining code quality in reused components. + + +5.1.2 RQ1-b: Is copy-based reuse limited to a particular group of projects? The numbers already demonstrate the prevalence of copy-based reuse in the OSS community. To understand how this reuse activity is distributed across different groups of projects, we constructed a contingency table as explained in the methods section. Each blob’s originating project is unique and falls into one of three categories (big, medium, and small). However, downstream projects are not unique and we consider the largest downstream project for each blob. + + +Our analysis revealed nearly 112 million unique blobs reused in our 640 million sample copy instances, with nearly 13 million of these blobs reused by at least one big project (see Table 2). This indicates that more than 11% of blobs are reused at least once by at least one big project, showing that copy-based reuse is not limited to small projects but is a widespread phenomenon in the OSS community. +Table 2. Blob Counts in Reuse Sample + + +| Biggest Downstream Projects | Total | +|-----------------------------|---------------| +| | Big Medium | Small | | +| Upstream Projects | 6,748,621 | 22,273,811 | 6,515,122 | 35,537,554 (31.8%) | +| Medium | 5,348,651 | 36,434,732 | 14,552,148 | 56,335,531 (50.3%) | +| Small | 691,644 | 10,151,838 | 9,231,618 | 20,075,100 (17.9%) | +| Total | 12,788,916 (11.4%) | 68,860,381 (61.5%) | 30,298,888 (27.1%) | 111,948,185 | + + +However, it is still unclear if these reused blobs are predominantly introduced by big projects. If this were the case, one could presume that these blobs are mostly of good quality and not error-prone, making the costs of managing and tracking code propagation through such reuse potentially outweigh the benefits. Sampling copy instances revealed that big projects are responsible for only about 30% of reused blobs, while the remaining 70% are introduced by medium and small projects. Specifically, nearly 18% of these blobs are introduced by small projects, with the remaining 50% coming from medium projects. Furthermore, even for big projects, almost 50% of the blobs they reuse originate from medium and small projects (see Table 2). Therefore, it is evident that not only big projects serve as upstream sources for copy-based reuse. Indeed, many blobs introduced by medium and small projects are being widely reused. + + +Even if all widely reused blobs were exclusively introduced by big projects, copy-based reuse still requires management for several reasons. For example, security vulnerabilities may continue to spread even after the main project has fixed the issue [78]. + + +RQ1-b Key Findings: + + + + +32% of reused blobs originate from big projects, which comprise 1% of the total projects. + + +18% of reused blobs originate from small projects, which make up 62% of the total projects. + + +50% of reused blobs originate from medium projects, which represent 37% of the total projects. + + +Nearly 50% of blobs reused by big projects originate from medium and small projects, highlighting significant cross-category reuse. + + + + +Our findings demonstrate that a non-negligible portion of reused code in the OSS community comes from medium and small projects, challenging the assumption that high-quality code predominantly originates from large projects. This implies a diverse quality spectrum in reused code and underscores the importance of ensuring quality and security across all project sizes, as vulnerabilities in smaller projects can propagate widely. Tools that can track the origin and usage of blobs are essential to ensure timely updates and fixes across the OSS ecosystem, mitigating risks associated with vulnerabilities and outdated code. The widespread nature of code reuse across projects of all sizes, emphasizes the need for quality assurance, effective management, and community collaboration to maintain the health and sustainability of the OSS landscape. + + +5.1.3 RQ1-c: Do characteristics of the blob affect the probability of reuse? In this section, we first demonstrate the reuse trends, followed by the logistic regression model predicting the probability of a blob being reused. Additionally, we present the reuse propensity per language and show the difference in blob size between reused and non-reused blobs. Finally, we discuss a case study using JavaScript as an example. + + +[ \frac{5,348,651 + 691,644}{12,788,916} ] +Reuse Trends. As explained in the methods section, we use a 2-year-limited copying definition in the RQ1-c and RQ1-d models and results. This means that we consider a blob reused only if it has been reused within 2 years of its creation. With this definition, 7.5% of blobs have been reused. Figure 2a shows the total counts of new blobs and copied blobs for each quarter since the year 2000\textsuperscript{11}. Both counts exhibit rapid growth, although the growth in new blob creation appears to outpace that of copying. To investigate this difference, Figure 2b shows the reuse propensity measured via the reuse ratio (reused blobs divided by total blobs), confirming that new blob creation has outpaced copied blobs since 2006 when the ratio began to decline. + + + + + + +Fig. 2. Quarterly Reuse Trends + + +Logistic Regression Model. We expect the nature of the blob to affect its propensity to be reused. To test this hypothesis, we use a logistic regression model where the response variable is set to one if the blob has been copied at least once (i.e., has been committed in at least two projects) within two years of its creation, and zero otherwise. We used WoC definition of the programming language associated with each blob and categorized less common programming languages in the sample as “other”. The descriptive statistics of the variables are presented in Table 3. + + +| Variable | Statistics | +|----------------|-------------------------------------------------| +| Reused | Yes: 6,419,388 (7.5%) No: 78,136,705 (92.5%) | +| Language (Counts) | JavaScript 11,122,849 Java 4,579,458 C 3,460,733 (Other) 65,393,053 | +| Creation Time (Date) | 5% Median 7/29/2012 Mean 2/7/2018 95% 5/28/2017 | +| Binary | Yes: 18,516,721 (21.8%) No: 66,039,372 (78.2%) | + + +\textsuperscript{11}The number of projects and blobs was much smaller before 2000. +The sample dataset is predominantly composed of blobs written in JavaScript, with significant counts also in Java and C. Additionally, the distribution of blob creation time is provided, showing a median date of February 7, 2018. Furthermore, a notable proportion of the blobs, 21.8%, are binary. + + +The results of our logistic regression model are shown in Tables 4 and 5. The model shows that the coefficients for all predictors are statistically significant with p-values less than 0.0001, meaning they impact the probability of a blob being reused (see Table 4). + + +| Estimate | Std. Error | z value | Pr(>|z|) | +|----------|------------|---------|---------| +| (Intercept) | -18.0293 | 0.0186 | -967.07 | < 2 × 10^{-16} | +| Binary | 0.4775 | 0.0010 | 460.16 | < 2 × 10^{-16} | +| Creation Time | 0.8108 | 0.0010 | 828.34 | < 2 × 10^{-16} | +| C | 0.7142 | 0.0017 | 426.32 | < 2 × 10^{-16} | +| C# | -0.1277 | 0.0033 | -38.15 | < 2 × 10^{-16} | +| Go | 0.3095 | 0.0065 | 47.74 | < 2 × 10^{-16} | +| JavaScript | -0.0832 | 0.0015 | -56.21 | < 2 × 10^{-16} | +| Kotlin | -0.5606 | 0.0133 | -42.02 | < 2 × 10^{-16} | +| ObjectiveC | 0.0810 | 0.0066 | 12.30 | < 2 × 10^{-16} | +| Python | -0.0327 | 0.0030 | -10.97 | < 2 × 10^{-16} | +| R | 0.4070 | 0.0083 | 49.22 | < 2 × 10^{-16} | +| Rust | 0.0879 | 0.0095 | 9.30 | < 2 × 10^{-16} | +| Scala | -0.6168 | 0.0123 | -50.21 | < 2 × 10^{-16} | +| TypeScript | 0.1827 | 0.0046 | 39.38 | < 2 × 10^{-16} | +| Java | 0.0794 | 0.0019 | 42.37 | < 2 × 10^{-16} | +| PHP | 0.3561 | 0.0024 | 151.14 | < 2 × 10^{-16} | +| Perl | 0.7664 | 0.0082 | 92.95 | < 2 × 10^{-16} | +| Ruby | -0.4782 | 0.0044 | -108.58 | < 2 × 10^{-16} | + + +The ANOVA table (Table 5) provides insights into the significance of different variables. We see that all the predictors have p-value equal to zero, meaning that the null hypothesis(^{12}) can be rejected. The null deviance is 45,438,151, which represents the deviance of a model with only the intercept. Adding the Binary variable reduces the deviance by 124,114, indicating its strong influence on reuse likelihood. The Creation Time variable further reduces the deviance by 830,322, highlighting its importance in predicting reuse. The “Language” variable also reduces the deviance by 230,614. Although these reductions might seem small relative to the null deviance, they are statistically significant given the large sample size and the high degrees of freedom involved. + + +To assess the direction and the size of predictor effects, we need to go further. In a logistic regression model, a positive coefficient estimate indicates that as the predictor variable increases, the odds of the outcome occurring increase, while a negative coefficient estimate indicates that as the predictor variable increases, the odds of the outcome occurring decrease. Since the coefficients represent the change in the log-odds of the outcome for a one-unit increase in the predictor, we transform these coefficients to odds ratios by exponentiating them to interpret the actual impact of each predictor. The odds ratio indicates how the odds of the outcome change with a one-unit increase in the predictor. The results are shown in Figure 3. This graph displays the odds ratios for + + +(^{12})H0: The reduced model (without the predictor) provides a fit to the data that is not significantly worse than the full model (with the predictor). This suggests that the predictor does not significantly improve the model’s fit. +Table 5. Blob-level Model - ANOVA Table + + +| | Df | Deviance | Resid. Df | Resid. Dev | p.value | +|------------------|----|------------|-----------|--------------|---------------| +| NULL | | 84,556,092 | | 45,438,151.00| | +| Binary | 1 | 124,114.20 | 84,556,091| 45,314,036.80| $< 2 \times 10^{-16}$ | +| Creation Time | 1 | 830,322.63 | 84,556,090| 44,483,714.17| $< 2 \times 10^{-16}$ | +| Language | 15 | 230,614.17 | 84,556,075| 44,253,100.00| $< 2 \times 10^{-16}$ | + + +various predictors in the logistic regression model at the blob level. An odds ratio greater than 1 indicates an increase in the likelihood of reuse, while an odds ratio less than 1 indicates a decrease. + + +Fig. 3. Blob-level Model - Logistic Regression Odds Ratios + + +The creation time has the highest positive coefficient. The time variable in the model represents the time elapsed from the blob’s creation until current time, meaning that older blobs have higher time values. The positive coefficient indicates that newer blobs (with smaller time values) are less likely to be reused. This is not because they have been visible for a shorter duration (as we controlled for this with the time-bound definition of reuse), but likely due to other factors we hypothesized, such as fewer artifacts being available for reuse at the time of their creation. +Binary blobs show a significant increase in reuse likelihood with an odds ratio of 1.63. Given this confirmed effect, we calculated the reuse propensity for binary and non-binary blobs separately. The results showed that 9.5% of binary blobs were reused, compared to 7.0% of non-binary blobs in our sample. + + +Different programming languages show varied impacts on reuse likelihood. Blobs written in Perl, C, R, PHP, Go, TypeScript, Objective-C, Java, and Rust are more likely to be reused, with Perl showing the highest odds ratio. In contrast, blobs written in Kotlin, Scala, Ruby, C#, JavaScript, and Python are less likely to be reused, with Kotlin and Scala showing the most significant negative coefficients. This variability suggests that certain languages, perhaps due to their prevalence or specific use cases, are more conducive to code reuse. + + +Per-Language Propensity. + Following our logistic regression results, which demonstrated that programming language is a statistically significant factor in reuse probability of a blob, we calculated the propensity to copy for each programming language, measured as the percentage of reused blobs within that language (see Table 6). The results show that blobs written in Perl have the highest propensity to be reused at 18.5%, indicating a strong tendency for code reuse among Perl developers. Conversely, Kotlin has the lowest propensity at 3.0%, suggesting minimal code reuse in this language. Languages such as C (15.2%) and PHP (9.9%) also show high reuse rates, while Python (6.4%), JavaScript (5.5%), and TypeScript (6.3%) have lower rates. Other languages like Java (7.8%), Go (7.9%), and R (9.8%) fall in the middle range, with moderate reuse rates. + + +| Language | Ratio | Language | Ratio | Language | Ratio | +|------------|-------|------------|-------|------------|-------| +| C | 15.2% | ObjectiveC | 8.4% | TypeScript | 6.3% | +| C# | 6.0% | Python | 6.4% | Java | 7.8% | +| Go | 7.9% | R | 9.8% | PHP | 9.9% | +| JavaScript | 5.5% | Rust | 6.7% | Perl | 18.5% | +| Kotlin | 3.0% | Scala | 3.8% | Ruby | 5.1% | + + +JavaScript Example. + The role of programming language in reuse activity might have several underlying reasons, as previously discussed. One such reason is the presence of a reliable package manager. If true, improvements in a package manager should reduce the propensity to reuse an artifact. To examine this, we analyzed the timeline of the reuse ratio for JavaScript, shown in Figure 4. The figure indicates a sharper decrease in the slope around 2010, the year the NPM package manager was introduced. This downward trend continues until mid-2013, when the copying activity rate drops to around 7% and then levels off. This pattern supports the hypothesis that the introduction and adoption of NPM significantly reduced code reuse through copying. + + +However, it is important to note that this is just an illustration, and further research is needed to understand this phenomenon fully. Our current study was not focused on this aspect, so we did not conduct an in-depth analysis. Additional investigations with more data points and comparisons with other languages that have introduced similar improvements in their package management systems are necessary to confirm that the observed effect is not coincidental or specific to JavaScript alone. + + +Blob Size. + The final predictor we hypothesized to affect the reuse probability of a blob was its size. To investigate whether there is a significant difference between the sizes of copied and non-copied blobs, we conducted a t-test comparing these sizes. Our analysis revealed a significant difference (p-value < 2.2e-16), indicating that, on average, copied blobs are smaller than non-copied blobs. + + +However, the effect varies by language. Specifically, per-language t-tests reveal that copied blobs are smaller in languages like JavaScript and TypeScript, larger in languages such as C and Python, and remain unchanged in +Objective-C, as detailed in Table 7. For example, in JavaScript, the t-value is -59.9, suggesting that copied blobs are significantly smaller, while in C, the t-value is 195.9, indicating that copied blobs are larger. Similar patterns are observed in other languages, with TypeScript showing a t-value of -35.9 (smaller copied blobs) and Python a t-value of -5.8 (also smaller copied blobs). Conversely, languages like Java (t-value 120.7) and PHP (t-value 28.6) show that copied blobs tend to be larger. + + +Table 7. Size Difference between Reused and non-Reused Blobs +(Positive t value means larger reused blobs.) + + +| Language | t value | p-value | Language | t value | p-value | +|------------|---------|---------------|------------|---------|---------------| +| C | 195.9 | $< 2 \times 10^{-16}$ | Rust | -7.8 | $< 2 \times 10^{-16}$ | +| C# | 12.5 | $< 2 \times 10^{-16}$ | Scala | 9.1 | $< 2 \times 10^{-16}$ | +| Go | 15.5 | $< 2 \times 10^{-16}$ | TypeScript | -35.9 | $< 2 \times 10^{-16}$ | +| JavaScript | -59.9 | $< 2 \times 10^{-16}$ | Java | 120.7 | $< 2 \times 10^{-16}$ | +| Kotlin | -14.5 | $< 2 \times 10^{-16}$ | PHP | 28.6 | $< 2 \times 10^{-16}$ | +| ObjectiveC | 0.7 | 0.430298 | Perl | 5.8 | $< 2 \times 10^{-16}$ | +| Python | -5.8 | $< 2 \times 10^{-16}$ | Ruby | -24.9 | $< 2 \times 10^{-16}$ | +| R | -7.6 | $< 2 \times 10^{-16}$ | Other | -364.9 | $< 2 \times 10^{-16}$ | +This variation highlights that the relationship between blob size and reuse propensity is complex and influenced by language-specific factors. While our findings demonstrate a general trend of smaller copied blobs, the differing patterns across languages suggest that other underlying factors may be at play. + + +RQ1-c Key Findings: + + + + +The reuse ratio is decreasing over time. + + +7.5% of blobs have been reused within two years of creation. + + +Older blobs, when controlling for the confounding effect of increased visibility, are more likely to be reused. + + +Binary blobs are 63% more likely to be reused. + + +Programming languages significantly impact reuse likelihood. Blobs written in languages like Perl, C, R, PHP, Go, TypeScript, Objective-C, Java, and Rust are more likely to be reused, while those written in Kotlin, Scala, Ruby, C#, JavaScript, and Python are less likely to be reused. + + +The reuse ratio timeline for JavaScript shows a notable decrease in slope around the year the NPM package manager was introduced. + + +Copied blobs are generally smaller than non-copied blobs, but this is not consistent across different languages. The size difference varies by language, with reused blobs in C, Java, PHP, Go, C#, Scala, Perl, and Objective-C being larger than non-reused blobs, while in JavaScript, TypeScript, Ruby, Kotlin, Rust, R, and Python, the reused blobs are smaller than non-reused blobs. + + + + +The higher reuse propensity among binary blobs suggests that binaries are inherently more reusable, likely due to their compiled nature, which allows easy integration across projects. The lower reuse likelihood of newer blobs indicates a potential issue with the integration and acceptance of recent contributions, possibly due to rapid technological advancements and shifts in development practices. The significant impact of programming languages on reuse likelihood highlights the importance of language-specific tools and ecosystems. Languages with higher reuse rates, such as Perl and C, benefit from mature ecosystems, while newer or niche languages like Kotlin and Scala show lower reuse rates, potentially due to smaller communities. The decline in JavaScript code reuse post-NPM introduction suggests that improved package management can reduce the need for direct code copying, promoting more modular and maintainable codebases. + + +Regarding blob size, the general trend indicates that smaller code artifacts are more reusable, likely due to their simplicity and ease of integration. However, this trend varies significantly across different programming languages. For example, in languages like JavaScript and TypeScript, copied blobs tend to be smaller, supporting the idea of writing concise and modular code to enhance reusability. In contrast, in languages like C and Python, copied blobs are often larger, suggesting that the nature and use cases of these languages might necessitate larger reusable components. This variation underscores the importance of understanding language-specific factors when considering code reuse management strategies. +---------------------------------------- +------------------------------- +Section 249: +5.1.4 RQ1-d: Do characteristics of the originating project affect the probability of reuse? + + +In this section, we first present the logistic regression model. We then demonstrate the per-language reuse propensity and compare it to blob-level results. Finally, we analyze binary blob reuse. + + +Logistic Regression Model. + We applied a logistic regression model to determine the likelihood of a project introducing at least one reused blob. The response variable is binary: 1 if the project has introduced a reused blob, 0 otherwise. Descriptive statistics for the model variables are presented in Table 8. Consistent with blob-level data, the most frequent languages in our sample are JavaScript and Java. +Table 8. Project-level Model - Descriptive Statistics + + +| Variable | Description | Statistics | +|----------|------------------------------------|---------------------| +| Reused | Project has at least 1 reused blob | Yes: 205,140 (33.7%) No: 403,195 (66.3%) | +| | 5% | Median | Mean | 95% | +| Blobs | Number of generated blobs | 1 | 15 | 162.7 | 397 | +| Binary | Binary blobs to total blobs ratio | 0 | 0 | 0.1 | 0.6 | +| Commits | Number of commit | 1 | 5 | 57.0 | 84 | +| Authors | Number of authors | 1 | 1 | 2.5 | 3 | +| Forks | Number of forks | 0 | 0 | 1.5 | 1 | +| Stars | Number of GitHub stars | 0 | 0 | 3.4 | 2 | +| Time | Earliest commit time | 7/18/2013 | 3/26/2018 | 9/15/2017 | 3/3/2020 | +| Activity | Total months project was active | 1 | 1 | 2.5 | 8 | +| Language | JavaScript Java Python PHP C (Other) | 86,065 | 43,172 | 40,503 | 24,659 | 22,258 | 391,678 | + + +Spearman’s correlation analysis, suitable for the observed heavily skewed distributions, is presented in Table 9. The number of commits shows a high correlation with two other predictors: activity time (0.68) and the number of blobs (0.67). These high correlations indicate redundancy, as the number of commits does not add significant information beyond what is already captured by activity time and the number of blobs. This redundancy can lead to multicollinearity, potentially distorting the model’s coefficients and reducing interpretability. Consequently, we remove the number of commits from the model, simplifying it without sacrificing explanatory power. All other correlations are below 0.52, which are not concerning. + + +Table 9. Project-level Model - Spearman’s Correlations Between Predictors + + +| | Blobs | Binary | Commits | Authors | Forks | Stars | Time | Activity | +|--------|-------|--------|---------|---------|-------|-------|------|----------| +| Blobs | 1.00 | 0.46 | 0.67 | 0.34 | 0.22 | 0.22 | 0.09 | 0.52 | +| Binary | - | 1.00 | 0.18 | 0.12 | 0.06 | 0.05 | 0.02 | 0.14 | +| Commits| - | - | 1.00 | 0.45 | 0.27 | 0.26 | 0.05 | 0.68 | +| Authors| - | - | - | 1.00 | 0.32 | 0.22 | 0.05 | 0.38 | +| Forks | - | - | - | - | 1.00 | 0.48 | 0.14 | 0.28 | +| Stars | - | - | - | - | - | 1.00 | 0.13 | 0.28 | +| Time | - | - | - | - | - | - | 1.00 | 0.05 | +| Activity| - | - | - | - | - | - | - | 1.00 | + + +The results for the project-level logistic regression model are shown in Tables 10 and 11. All the variables in the model have p-values less than 0.05, indicating that they are statistically significant in predicting the likelihood of a project introducing reused blobs (see Table 10). This demonstrates strong evidence against the null hypothesis, suggesting that these variables do have an effect on reuse. + + +Examining the ANOVA results (Table 11) provides further insight into the impact and significance of these predictors. We see that all the predictors have p-value equal to zero, meaning that the null hypothesis can be rejected. The deviance values in the ANOVA table indicate the reduction in model deviance when each predictor is included. For example, adding the number of blobs to the model reduces the deviance by 131,219.53, a +Table 10. Project-level Model - Coefficients + + +| Estimate | Std. Error | z value | Pr(>|z|) | +|----------|------------|---------|----------| +| (Intercept) | -4.79 | 0.16 | -30.01 | < 2 × 10^{-16} | +| Blobs | 0.61 | 0.00 | 228.94 | < 2 × 10^{-16} | +| Binary | 0.77 | 0.02 | 40.09 | < 2 × 10^{-16} | +| Authors | 0.09 | 0.01 | 8.24 | < 2 × 10^{-16} | +| Forks | 0.31 | 0.01 | 27.72 | < 2 × 10^{-16} | +| Stars | 0.06 | 0.01 | 7.19 | 6.61 × 10^{-13} | +| Time | 0.10 | 0.01 | 12.00 | < 2 × 10^{-16} | +| Activity | 0.07 | 0.01 | 10.48 | < 2 × 10^{-16} | +| C | -0.33 | 0.02 | -19.60 | < 2 × 10^{-16} | +| C# | -0.30 | 0.02 | -15.74 | < 2 × 10^{-16} | +| Go | -0.29 | 0.04 | -7.70 | 1.33 × 10^{-14} | +| JavaScript | 0.21 | 0.01 | 22.58 | < 2 × 10^{-16} | +| Kotlin | -0.23 | 0.05 | -4.30 | 1.75 × 10^{-5} | +| ObjectiveC | -0.13 | 0.03 | -3.63 | 0.000288 | +| Python | -0.19 | 0.01 | -14.78 | < 2 × 10^{-16} | +| R | -0.27 | 0.05 | -5.93 | 3.04 × 10^{-9} | +| Rust | -0.48 | 0.07 | -6.65 | 2.87 × 10^{-11} | +| Scala | -0.27 | 0.07 | -3.79 | 0.000153 | +| TypeScript | 0.88 | 0.03 | 34.57 | < 2 × 10^{-16} | +| Java | -0.25 | 0.01 | -20.90 | < 2 × 10^{-16} | +| PHP | 0.29 | 0.01 | 19.59 | < 2 × 10^{-16} | +| Perl | -0.31 | 0.10 | -3.20 | 0.001395 | +| Ruby | 0.63 | 0.02 | 33.18 | < 2 × 10^{-16} | + + +A substantial reduction that underscores its important role in the model. These results confirm the importance of these predictors in explaining the variability in the likelihood of reuse. + + +Table 11. Project-level Model - ANOVA Table + + +| Df | Deviance | Resid. Df | Resid. Dev | p.value | +|----|----------|-----------|------------|---------| +| NULL | 608,334 | 777,660.48 | | | +| Blobs | 1 | 131,219.53 | 608,333 | 646,440.95 | < 2 × 10^{-16} | +| Binary | 1 | 662.94 | 608,332 | 645,778.01 | < 2 × 10^{-16} | +| Authors | 1 | 926.69 | 608,331 | 644,851.32 | < 2 × 10^{-16} | +| Forks | 1 | 2,084.02 | 608,330 | 642,767.30 | < 2 × 10^{-16} | +| Stars | 1 | 63.77 | 608,329 | 642,703.53 | 1.44 × 10^{-15} | +| Time | 1 | 156.98 | 608,328 | 642,546.54 | < 2 × 10^{-16} | +| Activity | 1 | 139.31 | 608,327 | 642,407.24 | < 2 × 10^{-16} | +| Language | 15 | 5,178.20 | 608,312 | 637,229.03 | < 2 × 10^{-16} | + + +To understand the size and direction of the impacts, we look at the odds ratios inferred from the logistic regression coefficients. The odds ratio is calculated as the exponential of the coefficient. An odds ratio greater... +than 1 indicates a positive impact, while an odds ratio less than 1 indicates a negative impact. The results are shown in Figure 5. + + + + +The logistic regression analysis shows that several predictors significantly impact the likelihood of a project having a reused blob. TypeScript, Binary, Ruby, and Blobs have the strongest positive effects, indicating that increases in these variables substantially raise the odds of a project being reused. Other positive predictors include Forks, PHP, JavaScript, Time, Authors, Activity, and Stars, which also increase the likelihood, though to a lesser extent. Conversely, predictors like Rust, C, Perl, C#, Go, Scala, R, Java, Kotlin, Python, and Objective-C negatively impact the odds, suggesting that increases in these variables decrease the likelihood of a project introducing a reused blob. + + +When interpreting the time variable, it is important to note that since the earliest commit timestamp is represented as a number, we calculated the time elapsed from the earliest commit to the current date for better interpretability. A larger time value indicates an older earliest commit. The model shows that time has a positive coefficient, suggesting that the older the earliest commit, the higher the probability of introducing reused blobs. This result could be influenced by two factors. First, at the blob-level model, we already observed that older blobs have a higher probability of being reused. Additionally, while the time-bound definition of reuse controls for the confounding effect of longer visibility at the blob level, it does not account for the longer visibility of the project itself. Therefore, the observed result might also be affected by the project’s age, which implies longer visibility, even though the blob is reused within two years of its creation. +Per-Language Propensity. The project-level model highlights the significance of programming languages in the likelihood of a project introducing a reused blob. To explore this further, we calculated the percentage of projects in each language that have introduced reused blobs. From our previous analysis (RQ1-a), we know that approximately 29% of projects introduced at least one reused blob. When using the time-bound definition of copying, this ratio increased to 33% in our sample. The results for each language are shown in Table 12. + + +| Languages | Ratio | Language | Ratio | Language | Ratio | +|-----------|--------|----------|--------|----------|--------| +| C | 33.2% | ObjectiveC | 40.0% | TypeScript | 62.3% | +| C# | 37.0% | Python | 30.5% | Java | 36.2% | +| Go | 31.3% | R | 28.5% | PHP | 46.4% | +| JavaScript| 41.2% | Rust | 31.5% | Perl | 29.9% | +| Kotlin | 40.0% | Scala | 36.0% | Ruby | 51.2% | + + +The ratio of projects that have introduced reused blobs varies significantly across different programming languages, offering new insights compared to the blob-level analysis. For example, projects dominated by TypeScript have the highest probability (62%) of introducing at least one reused blob. This finding is particularly interesting because, at the blob level, the propensity to copy in TypeScript was lower than average. This discrepancy suggests that TypeScript projects, acting as upstream in the language’s supply chain, are less centralized. Developers in this language seem more inclined to incorporate code from various, possibly unknown, projects. + + +Other languages also show distinct patterns. For instance, Ruby projects have a high probability (51%) of reusing blobs, whereas Python projects have a lower probability (30.5%). This variation indicates that the likelihood of code reuse is strongly influenced by the primary language of the project, reflecting different practices and community norms across languages. These insights emphasize the importance of considering programming language when studying code reuse patterns in software projects. + + +To ensure these results are comparable to blob-level analysis, we calculated the copied blob ratio (copied blobs to total blobs) for each project and took the average of this ratio for projects in each language. An important difference here with the blob-level propensity is that at the blob level, language assignment was based on the file extension of each blob, with binary blobs categorized as “Other”. In this project-level analysis, the language of a blob is determined by the predominant language of the project it belongs to. For example, a Python-written blob in a C-dominated project is counted as a C blob. Similarly, binary blobs are assigned the language of the dominant language in their respective projects. The results of this new definition are shown in Table 13. + + +| Language | Ratio | Language | Ratio | Language | Ratio | +|----------|--------|----------|--------|----------|--------| +| C | 15.4% | ObjectiveC | 9.5% | TypeScript | 5.6% | +| C# | 4.7% | Python | 7.3% | Java | 5.8% | +| Go | 6.7% | R | 7.2% | PHP | 9.5% | +| JavaScript| 8.8% | Rust | 5.1% | Perl | 21.2% | +| Kotlin | 3.4% | Scala | 3.5% | Ruby | 5.3% | + + +The propensity to copy varies when using this project-level definition compared to the blob-level definition (see Table 6). +For example, the propensity to copy in JavaScript-dominated projects is higher than for JavaScript blobs in general (8.8% vs. 5.5%). This indicates a greater likelihood of reuse within JavaScript projects compared to individual JavaScript blobs from various projects. This could be attributed to the modularity and strong reuse culture in the JavaScript ecosystem, where libraries and frameworks are frequently shared and integrated. JavaScript projects often incorporate multiple languages, such as HTML and CSS for web development or server-side languages for backend functionality, enhancing reuse through shared components. The evolution of JavaScript projects, involving various tools and libraries, also contributes to the higher reuse rate within the project context. + + +In Perl-dominated projects, the propensity to reuse is higher than for Perl blobs in general (21.2% vs. 18.5%). This suggests that blobs within Perl projects are more likely to be reused compared to individual Perl blobs from different projects. Perl’s strong culture of code reuse and sharing, exemplified by the Comprehensive Perl Archive Network (CPAN), encourages the use and distribution of reusable code modules. Perl projects often include a wide range of scripts and utilities shared across different applications, enhancing reuse. Furthermore, Perl’s use in scripting, text processing, and system administration often requires the reuse of common patterns and libraries, contributing to the higher reuse rate within projects. + + +Conversely, R-dominated projects show a lower propensity to reuse compared to R blobs in general (7.2% vs. 9.8%). This implies that individual R blobs are more likely to be reused than blobs within R-dominated projects. R is primarily used for statistical computing and data analysis, where specific scripts and functions are reused across different analyses. However, R projects are often tailored to specific datasets and analyses, resulting in lower overall reuse within the project context. The specialized nature of many R projects, with unique data processing and analysis pipelines, limits reuse compared to individual reusable components like functions and libraries. + + +Java-dominated projects exhibit a lower propensity to reuse compared to Java blobs in general (5.8% vs. 7.8%). This indicates that individual Java blobs are more likely to be reused than blobs within Java-dominated projects. Java is widely used across various domains, and reusable components like libraries and frameworks are common across different projects. However, Java projects tend to be large and complex, with specific architectures and dependencies that may limit cross-project reuse. The high degree of customization and specificity in Java enterprise applications reduces the reuse rate within the project context compared to the reuse of individual Java blobs or libraries. + + +These analyses reflect the differing dynamics of code reuse in various programming ecosystems. Understanding these differences can help improve strategies for fostering code reuse and optimizing software development practices across different languages and project contexts. + + +Binary Blob Analysis. + Although previous analyses indicated that binary blobs are more likely to be reused, we aimed to investigate whether this propensity varies across projects dominated by different programming languages. At the blob level, it was not feasible to ascertain the programming language of a binary blob. However, at the project level, such analysis becomes possible. Therefore, we examined the reused binary blob ratio (the percentage of reused binary blobs to total reused blobs) within each language and compared it to the binary blob ratio (the percentage of binary blobs to total blobs) within the same language, utilizing a t-test to identify any significant differences. + + +Consistent with the blob-level analysis, the reused binary blob ratio exceeds the general binary blob ratio across all programming languages, indicating a higher likelihood of reuse for binary blobs. This observation raises questions about language-specific differences in binary blob reuse. Specifically, we hypothesize that binary blobs are more frequently reused in certain languages compared to others. In other words, we want to know if identifying a reused binary blob allows us to infer that it is more likely to originate from projects written in particular languages. + + +Our findings confirm this hypothesis, as the proportion of reused binary blobs varies significantly among different programming languages. Nevertheless, we hypothesize that at least some of this difference stems from +the general difference in binary blob ratios in different languages and is not limited to reuse. Our statistical tests reveal that the binary blob ratios indeed differ significantly across languages. Consequently, the ratio of reused binary blobs also exhibits significant variation among different languages, suggesting that this difference does not necessarily mean varying binary reuse practices among them. + + +We want to determine if the higher number of reused binary blobs in a certain language is solely due to the general prevalence of binary blobs in that language, or if some languages tend to reuse more binary blobs. To control for this confounding effect, we normalize the binary blob reuse ratio based on the total binary blob ratio. Given the binary blobs ratio $br$ in a project (binary blobs over total blobs), we defined the reused binary ratio $cbr$ (binary reused blobs to total reused blobs) to binary ratio $br$ metric. This metric ($cbr/br$) averaged 4.104 for all the projects in our sample. By using a linear regression with the project’s primary language as a predictor, we obtained the results shown in Table 14. + + +$$m = \frac{cbr}{br} = \frac{cbc/cc}{bc/c}$$ + + +$m$: normalized binary reuse metric + +$cbr$: copied binary ratio + +$br$: binary ratio + +$cbc$: copied binary count + +$cc$: copied count + +$bc$: binary count + +$c$: total count + + +| Language | Metric | p-value | Language | Metric | p-value | +|----------|--------|---------|----------|--------|---------| +| C | 3.33 | 0.810722| Rust | 6.06 | 0.422024| +| C# | 4.92 | 0.025270| Scala | 5.38 | 0.545028| +| Go | 5.73 | 0.173372| TypeScript| 5.17 | 0.063922| +| JavaScript| 7.04 | $< 2 \times 10^{-16}$| Java | 4.91 | 0.000497| +| Kotlin | 5.42 | 0.306698| PHP | 4.49 | 0.035326| +| ObjectiveC| 2.17 | 0.217673| Perl | 3.32 | 0.975449| +| Python | 2.19 | 0.005547| Ruby | 3.51 | 0.951277| +| R | 2.65 | 0.614773| + + +Our analysis reveals that the reused binary blobs to binary blobs metric varies across programming languages. Notably, C#, JavaScript, Python, Java, and PHP exhibit statistically significant differences (p-value < 0.05). In particular, JavaScript projects demonstrate a higher tendency to reuse binary blobs, while Python projects show a lower tendency. This suggests that in JavaScript-dominated projects, reusing binary blobs is likely more efficient and cost-effective than reusing code. Conversely, Python projects might benefit more from reusing code rather than binary blobs. + + +The complete coefficients and regression ANOVA tables are available in the online appendix. +RQ1-d Key Findings: + + + + +Project properties significantly impact the probability of their blobs being reused, with binary ratio, number of blobs, forks, authors, activity duration, and stars having a positive impact. + + +Older projects are more likely to have introduced reused blobs. + + +Blobs residing in projects dominated by different programming languages have varying probabilities of reuse, with TypeScript, Ruby, PHP, and JavaScript having higher probabilities, and Rust, C, Perl, C#, Go, Scala, R, Java, Kotlin, Python, and Objective-C having lower probabilities. + + +On average, 33.7% of projects have introduced at least one reused blob, but this percentage varies significantly between languages, with TypeScript (62.3%) and Ruby (51.2%) having the highest propensity, and R (28.5%) and Perl (29.9%) the lowest. + + +The tendency to reuse binary blobs is much higher in JavaScript projects, while Python projects show a lower tendency. + + + + +The project-level analysis reveals that various factors significantly influence the likelihood of code reuse in open source software projects. Projects with more blobs, binary blob ratio, and longer activity tend to exhibit higher reuse rates. This aligns with our hypothesis that project health, activity, and popularity signals play an important role in promoting reuse. + + +The variation in reuse likelihood across different programming languages underscores the influence of language-specific ecosystems and practices, consistent with blob-level results. For instance, TypeScript and Ruby projects show the highest propensity for reuse, which may be due to their robust ecosystems and strong community practices that encourage code sharing and reuse. Conversely, languages like Python and Perl have lower reuse rates, suggesting different reuse dynamics and possibly a need for improved tools and practices to foster reuse. However, the impact between the blob’s language and the language of the project it resides in differs. This suggests that the underlying factors behind these differences are not just technical aspects of the languages and their tools, but also their community culture and practices. + + +The significant reuse of binary blobs, particularly in languages like JavaScript, indicates that binary artifacts are valuable assets in software projects. This might be due to the efficiency and ease of integrating precompiled binaries compared to source code. However, the lower reuse rate of binary blobs in Python suggests that this language’s ecosystem favors source code reuse, which could be due to its dynamic nature and the extensive use of interpreted scripts. These findings have important implications for the development and support of tools that facilitate reuse in different programming languages. For languages like JavaScript, where binary blob reuse is prevalent, enhancing asset libraries could be beneficial. In contrast, for languages like Python, where code reuse is more advantageous, improving code package managers would be more appropriate. This differentiation underscores the necessity for tailored support tools to optimize reuse practices in various programming environments. + + +These findings highlight the impact of project context on reuse patterns and suggest that different definitions and granularity levels can yield varying insights into code reuse behaviors. + + +5.2 RQ2: How do developers perceive and engage with copy-based reuse? + + +Across three rounds, we received 247 complete responses from reusers and 127 from creators. There were also 360 and 178 partial responses, making the total of 607 and 305 responses from reusers and creators respectively. The results are shown in Table 15. + + +As will be discussed in Section 7.1.2, the identified originating repository might not always be the true creator of the blob. 39% of developers identified as creators reported reusing the blob from another source. Additionally, reusers might have obtained the blob from another reuser and not the original creator (see Section 7.1.3). Among +Table 15. Survey Participation + + +| | Total | Started | Completed | Response Rate | Completion Rate | +|--------|-------|---------|-----------|---------------|-----------------| +| Creator| 3,144 | 305 | 127 | 9.70% | 4.04% | +| Reuser | 6,338 | 607 | 247 | 9.58% | 3.90% | +| Total | 9,482 | 912 | 374 | 9.62% | 3.94% | + + +the reusers who confirmed reusing the blob, 43% acknowledged the originating project as the source, 48% reported copying it from elsewhere, and 9% did not answer the question. + + +These findings provide important estimates: the fraction of reuse within open source software (OSS) is at least 61%, and the fraction of reuse from originating projects is at least 43%. This data is essential for understanding the dynamics of code reuse within OSS, highlighting the significance of both direct reuse from original projects and secondary reuse through intermediate projects. + + +Furthermore, only 60% of those identified as reusers confirmed reusing the blob, while the remaining 40% claimed to have created it (see Table 16). This discrepancy can be attributed to several factors. First, some individuals might indeed be the original authors of the blob in the originating project, implying they have reused their own resources. Second, this gap could be explained by activities in private repositories (e.g., Developer A creates a file in a private repository, Developer B copies it to a public repository, and then Developer A reuses it in another public repository). Third, as mentioned in Section 4.3, concerns about potential licensing violations might have made many reusers uncomfortable admitting the reuse explicitly. Additionally, developers’ faulty memory could play a role, especially for reuse instances that occurred a long time ago. + + +One potential area for further investigation could be examining the project owners and commit authors for each copy instance to gain a better understanding of this gap. However, this was not pursued further in this study as it was not the main focus. Exploring these factors in future research could provide deeper insights into the complexities of code reuse and attribution within open source software projects. + + +Table 16. Identified vs. Claimed Creators & Reusers + + +| Identified | Creators | Reusers | Total | +|------------|----------|---------|-------| +| Claimed | Creator | 77 (61%)| 99 (40%)| 176 | +| | Reuser | 50 (39%)| 148 (60%)| 198 | +| Total | 127 | 247 | 374 | + + +Another dimension of the survey explored the intentions of creators for others to reuse their artifacts. Sixty-two percent of creators indicated that their resources were intended for reuse by others. When asked about the helpfulness of the particular blob on a scale from 1 to 5 (with 5 being the most helpful), reusers rated the average helpfulness at 3.81, while creators rated it at 4.24. This suggests that developers are well aware of the reuse potential of their artifacts, even if the blob may be essential primarily for their own projects. + + +In the background sections, we discussed the risks associated with this type of reuse. We asked reusers if they were concerned about these risks as well. On a scale from 1 to 5 (with 5 being the most concerned), the average concern about bugs in the reused file was 1.83, and the average concern about changes in the original file was 2.35. Several factors might contribute to the low level of concern among developers, including trust in the original code’s quality or confidence in their own testing processes. However, this lack of concern could facilitate the spread of potentially harmful code, even if the creator fixes the original code. The fact that reusers are not significantly worried about these risks amplifies the potential risk at the OSS supply chain level. +Next, we asked participants how likely they would be to use a package manager if one were available for the particular blob. On a scale from 1 to 5 (with 5 being the most likely), the average likelihood of using a package manager was 2.93. This indicates that although developers may not be very concerned about bugs or changes (potential improvements), many would still use such a tool if it were available. This suggests that “package-manager” type tools for refactoring or at least maintaining reused code might gain traction if developed. These results are shown in Table 17. + + +| Question (audience) | Responses | Average | Median | StdDev | +|--------------------------------------|-----------|---------|--------|--------| +| How helpful? (creators) | 156 | 4.25 | 5 | 1.15 | +| How helpful? (reusers) | 185 | 3.82 | 4 | 1.32 | +| Concern about bugs? (reusers) | 185 | 1.85 | 1 | 1.33 | +| Concern about changes in the original file? (reusers) | 187 | 2.33 | 2 | 1.56 | +| Likelihood of using a package manager? (reusers) | 184 | 2.89 | 3 | 1.64 | + + +Finally, the thematic analysis of reasons for reuse, specifically responses to the question “why”, revealed eight themes from the 162 responses we received (see Table 18). This analysis provides a nuanced understanding of the motivations behind code reuse, highlighting several key themes. + + +| Theme | Description | Frequency | +|-------|------------------------------|-----------| +| Demo | demonstration, test, prototype | 14 | +| Dependency | part of a library | 11 | +| Education | learning purposes | 16 | +| Functionality | specific functionality | 39 | +| Own | own reuse | 2 | +| Resource | image, style, dataset, license | 30 | +| Template | template, starting point, framework | 14 | +| Tool | parser, plugin, SDK, configuration | 23 | + + +As expected, one of the main reasons for reuse was to provide specific functionality. This indicates that developers often reuse code to incorporate existing functionalities into their projects, saving time and effort in development, a practice well-documented in the literature [48]. This underscores the importance of reusable components in efficient software development. + + +Another observed theme was the reuse of various resources, including datasets, instructions, license files, and graphical or design objects (e.g., PNG, JPEG, fonts, styles). This aligns with the significant reuse of binary blobs identified in RQ1. The inclusion of diverse resources indicates that developers often depend on readily available materials to enhance their projects’ visual or functional aspects. While the literature acknowledges this practice, our findings suggest a slightly higher emphasis on resource reuse. This indicates that resource management might be more important for developers than previously thought. + + + + +14Since survey participants were chosen through stratified sampling, these frequencies do not represent the actual data distribution. +Reusing tools such as parsers, plugins, SDKs, and configuration files was mentioned 23 times. This practice is noted for its practicality and efficiency in setting up development environments and ensuring consistency across projects. This highlights the role of auxiliary software components in streamlining development processes and providing necessary infrastructure or functionality. + + +Assignments, school projects, learning objectives, and similar concepts were another prominent theme. This emphasizes the role of code reuse in the software development knowledge supply chain, as developers reuse existing code to understand and learn new concepts. + + +Code reuse for demonstration, testing, and prototyping purposes was identified 14 times. This theme suggests that developers often reuse code to quickly create prototypes or test scenarios without focusing on the quality, security, or licensing of the reused code. The priority in these cases is to achieve rapid results. This aligns with the findings by Juergens et al. [48], that developers often clone code to create prototypes and perform tests. Some of these quick prototypes, however, may end up as active projects. + + +Templates, starting points, and frameworks were mentioned 14 times. Developers often clone templates or frameworks to have a solid foundation for their projects, a practice supported by findings of Roy and Cordy [80]. This approach leverages existing structures to expedite development and ensure consistency. + + +Part of a library or dependency management was cited 11 times. This practice is highlighted in studies that emphasize the importance of managing dependencies within the development process, such as the study by Roy and Cordy [80]. Although checking in library files is not considered best practice, many developers do so to maintain specific versions and avoid potential issues with updates or changes. This conscious decision highlights a trade-off between best practices and practical needs. + + +Reusing one’s own code was mentioned twice. The theme of “own reuse” where developers clone their own code for reuse in new projects, is less prominently featured in the literature compared to other reasons for code cloning. Developers clone their own code to ensure consistency, save time, and leverage previously written and tested code. This practice is practical and efficient, especially when developers are familiar with the code and its functionality. However, the literature does not emphasize this reason as strongly. While studies acknowledge the broader concept of code reuse, their focus is more on reusing code from external sources, libraries, or for educational purposes [48, 80]. This discrepancy suggests that “own reuse” might be an underexplored area in existing research. It indicates that while developers recognize and practice it frequently, it may not be as thoroughly documented or emphasized in the academic literature. This gap highlights an opportunity for further investigation into how and why developers engage in “own reuse” and its impact on software development processes. + + +There were also 13 instances where responses were either incomprehensible or the respondent did not remember the file or the reason for reuse. +RQ2 Key Findings: + + + + +39% of identified creators stated they reused the blob from another source. + + +Among reusers, 43% acknowledged the originating project (direct reuse), while 48% copied from elsewhere (indirect reuse). + + +Reuse within the OSS landscape is at least 61%. + + +60% of reusers confirmed reuse; 40% claimed creation. + + +62% of creators intended their resources for reuse. + + +Reusers are not very concerned about potential bugs or changes in the original file. + + +Reusers are willing to use a package manager if available. + + +Main reuse themes are: functionality, resources, tools, education, demo/testing/prototyping, templates, dependencies, and own reuse. + + + + +The findings reveal that a non-negligible portion of developers engage in copy-based reuse within the OSS community. This practice is common, with many reusers sourcing code not directly from the original creators but through intermediaries. Understanding these dynamics is important for improving the transparency and traceability of reused code, which could potentially enhance code quality and security. + + +The discrepancies between identified and claimed creators highlight complexities in attribution and ownership. Additionally, survey respondents’ replies are not always accurate or true, which further complicates understanding the true origins of code. This gap underscores the need for better tracking mechanisms within repositories to accurately reflect code origins. Future research could delve deeper into these factors, offering insights that could inform policy and tooling improvements in OSS development. + + +Creators often intend their code to be reused, and both creators and reusers recognize the utility of such artifacts. This positive perception suggests that promoting reuse can be beneficial for the community, fostering collaboration and innovation. However, the difference in helpfulness ratings indicates that there might be room for improving the clarity and documentation of reusable code to better meet reusers’ needs. + + +Despite the low concern about potential risks like bugs and changes, the moderate interest in package management tools suggests an opportunity for developing solutions that can help maintain and refactor reused code. Such tools could mitigate risks by providing updates and improvements in a managed manner, enhancing the overall reliability of reused code. + + +The thematic analysis of reuse motivations provides a comprehensive view of why developers opt for copy-based reuse. Reusing for specific functionality underscores the importance of modular and reusable code in software development. It also highlights the potential benefits of well-documented and easily integrable code components that can be readily reused by others. + + +This practice of including library files suggests a deliberate effort to maintain stability and avoid the uncertainties that might come with updates or changes. However, it also highlights a potential area for improvement in developer education and best practices, as well as the importance of tools that can help manage dependencies more effectively. These insights contribute to our understanding of the motivations behind code reuse and the practical considerations developers face in maintaining their projects. + + +While reusing for demo and testing can accelerate development and innovation, it also raises potential risks. Developers may inadvertently propagate vulnerabilities or violate licenses, leading to broader issues within the software supply chain. Highlighting the importance of balancing speed and security during testing phases can inform best practices and educational efforts. + + +Educational use underscores the educational value of code reuse. Reusing existing code allows learners to understand real-world applications and coding practices, fostering skill development. However, it also emphasizes +the need for proper guidance and resources to ensure that educational reuse is done ethically and effectively. Encouraging educators to integrate lessons on best practices in code reuse can enhance the quality of learning and adherence to legal and ethical standards. + + +The proportion of no meaningful answers and not recalling the file, indicate that not all reuse instances are well-documented or remembered by developers. This lack of clarity can hinder the understanding and traceability of reuse practices. It highlights the need for better documentation and tracking mechanisms to ensure that the reasons and contexts for reuse are transparent and well-understood. Implementing such measures can improve the management of reused code and resources, reducing potential risks associated with undocumented reuse. +---------------------------------------- +------------------------------- +Section 250: +6 IMPLICATIONS + + +6.1 For Developers + + +Copy-based reuse enables developers to save time and effort by leveraging existing code. However, it introduces risks such as maintenance fragmentation, security vulnerabilities, and outdated dependencies. To address these challenges, developers should adopt tools and practices to track reused code, ensure compliance with licensing requirements, and mitigate risks associated with unverified code quality. + + +Fostering a practice of systematically reviewing and documenting reused code not only enhances its reliability and maintainability, but also contributes to the overall sustainability of software projects. Additionally, staying informed about updates to reused code and integrating these updates promptly can further reduce risks associated with outdated or insecure components. + + +6.2 For Businesses + + +Businesses that rely on open source software must proactively address the inherent risks of copy-based reuse, including security vulnerabilities and potential non-compliance with licensing terms. Investing in robust tools for tracking and maintaining reused code is critical to safeguarding the software supply chain. This effort should encompass implementing workflows for regularly updating and reviewing reused components. + + +Moreover, businesses should actively support smaller open source projects that provide valuable code contributions. Such support not only enhances the quality and reliability of business-critical software, but also fosters goodwill and collaboration within the open source community. By taking these steps, businesses can effectively mitigate risks while strengthening the ecosystem upon which they rely. + + +6.3 For the Open Source Community + + +The open source community plays an important role in ensuring the safe and effective reuse of code. By promoting best practices for ethical and secure reuse, such as adopting standardized licensing and improving quality benchmarks, the community can minimize risks and build trust in shared resources. Equally important is supporting small and medium-sized projects that contribute significantly to the reusable code base. Providing mentorship, funding, and collaboration opportunities can bolster the overall open source ecosystem, fostering innovation and cooperation across projects. + + +Additionally, establishing centralized repositories or resources that facilitate traceability and offer detailed metadata on provenance, authorship, and licensing can streamline the reuse process and mitigate associated risks. These efforts collectively enhance the reliability, sustainability, and scalability of open source software. + + +6.4 For Researchers and Educators + + +Researchers have a unique opportunity to investigate finer-grained reuse patterns, such as instances involving slight modifications or partial reuse, to better understand the factors influencing reuse and its long-term impact. +on software quality and security. Such insights can guide the development of tools and methodologies that promote safe and effective reuse practices. + + +Educators should integrate lessons on ethical reuse practices, licensing compliance, and dependency management into software engineering curricula. By leveraging real-world case studies and addressing practical challenges, such as balancing development speed with security concerns, educators can equip future developers to navigate the complexities of software reuse responsibly. This approach will help ensure that the next generation of software professionals actively supports the sustainability and growth of open source ecosystems. + + +6.5 For OSS Platform Maintainers + + +Platforms like GitHub and GitLab are well-positioned to enhance practices surrounding copy-based reuse. Improving traceability mechanisms to preserve provenance, authorship, and licensing metadata is essential for minimizing risks such as unintentional license violations and outdated dependencies. Integrating features for automated detection of license conflicts, dependency vulnerabilities, and changes in reused code can further empower developers to manage their projects efficiently and securely. + + +Additionally, platforms can offer educational resources and in-platform guidance to encourage best practices for reuse and compliance. By fostering a culture of informed and collaborative reuse, platform maintainers can contribute significantly to the long-term sustainability and resilience of the open source ecosystem. +---------------------------------------- +------------------------------- +Section 251: +7 LIMITATIONS + + +7.1 Internal Validity + + +7.1.1 Commit Time. The identification of the first occurrence and consequently building the reuse timeline of a blob is based on the commit timestamp. This time is not necessarily accurate as it depends on the user’s system time. The dataset we utilized followed suggestions by Flint et al. [22] and other methods to eliminate incorrect or questionable timestamps. This increases the reliability of our reuse timeline. We also used version history information to ensure the time of parent commits does not postdate that of child commits [46]. This adds an extra layer of consistency and validation, further enhancing the accuracy of our data. + + +7.1.2 Originating Project. The accuracy of origination estimates is highly reliant on the completeness of data. Even if we assume that the World of Code (WoC) collection is exhaustive, it is possible that some blobs may have originated in a private repository before being copied into a public one. This means that the originating repository in WoC may not be the actual creator of the blob. This scenario suggests that even with a comprehensive dataset, there could be instances of code reuse that remain undetected, adding another layer of complexity to understanding the full extent of reuse across open source projects. For example, a 3D cannon pack asset(^\text{15}) was committed by 38 projects indexed by WoC. However, that asset was originally created earlier in the Unity Asset Store [46]. + + +By utilizing the extensive WoC collection, we provide a broad and detailed analysis of code reuse, capturing a significant portion of open source activity even if some instances of private-to-public transitions are missed. Additionally, the examples we identified, such as the 3D cannon pack asset, highlight the practical implications and real-world relevance of our findings, demonstrating the robustness of our analysis despite potential data gaps. Our approach addresses the inherent challenges of tracking code origination and reuse, offering a framework that can be refined and expanded in future research to further improve accuracy and comprehensiveness. + + +7.1.3 Copy Instance. A unique combination of blob, originating project, and destination project might not always accurately represent the actual pattern of reuse. This is because some destination projects could potentially reuse the blob from a different source other than the originating project. For instance, if we have three projects—A, B,(^\text{15})https://assetstore.unity.com/packages/3d/props/weapons/stylish-cannon-pack-174145 +and C—in order of blob creation, project C might copy from either project A or B. Additionally, certain blobs are not reused but are created independently in each repository, such as an empty string or a standard template automatically generated by a common tool [46]. These blobs are excluded by using the list provided by WoC [62]. + + +Despite this limitation, our results remain significant. By recognizing the potential for indirect reuse and independently created blobs, we provide a more nuanced understanding of the reuse landscape, accounting for the complexity of code propagation across projects. Excluding independently created blobs and utilizing WoC’s comprehensive list ensures that our analysis focuses on genuine reuse instances, enhancing the reliability of our findings. + + +7.2 External Validity + + +7.2.1 Blob-level Reuse. Our work focuses solely on the reuse of entire blobs, deliberately excluding the reuse of partial code segments within files. While blob-level reuse is common, it only covers a subset of the broader code reuse landscape. Blob-level reuse is more relevant to scenarios where larger code blocks, consisting of entire files or even groups of files, are reused compared to statement or function-level reuse. This means that our results might have an implicit bias towards programming languages or ecosystems that rely more heavily on complete files, potentially overlooking reuse practices prevalent in languages that favor modular or snippet-based reuse. + + +This limitation also implies that different versions of the same file, even if they differ by just one character, generate different blobs due to distinct file hashes. Consequently, blob reuse does not equate to file reuse. Defining file reuse is challenging because it is difficult to determine what constitutes equivalence between files in different projects [46]. This could be a potential reason for the higher level of reuse in binary blobs, as they are relatively harder to modify. + + +Despite these limitations, our results remain significant for several reasons: + + + + +Prevalent Pattern +: By concentrating on entire blob reuse, we address a prevalent and impactful pattern in software development. This allows us to provide valuable insights into a substantial portion of code reuse practices. + + +Clarity and Precision +: Analyzing entire blobs offers a clear and precise method for identifying reuse, avoiding the ambiguity and complexity associated with defining partial file reuse. This clarity enhances the reliability of our findings. + + +Efficiency and Scalability +: Blob-level analysis is computationally efficient and scalable, enabling us to process large datasets and draw meaningful conclusions from extensive data. This scalability is important for comprehensive empirical studies. + + +Foundation for Future Research +: Our work lays the groundwork for future studies that can build on our findings to explore partial file reuse and other nuanced aspects of code reuse. By addressing a well-defined scope, we provide a solid foundation for subsequent research. + + + + +In summary, while our focus on blob reuse introduces certain limitations, it also provides clear, scalable, and impactful insights into code reuse practices. This targeted approach enables us to contribute valuable findings to the field, despite the inherent complexities of defining and analyzing file reuse. Although blob-level reuse is less granular than statement or method-level reuse, findings at the blob level would also apply to sub-blob-level analysis, which should adjust for blob-level reuse. Future studies are needed to investigate the extent to which different levels and types of code reuse overlap or differ. + + +7.2.2 Survey Response Rate. The relatively low response rate to our survey may have been due to the perception of the respondents that copying code is a sensitive subject. These concerns may have impacted the responses even in cases when developers chose to participate. It suggests that further work may be needed to design surveys that do not create such impressions. +Additionally, since many of these reuse instances happened a long time ago, developers might have forgotten about them. Therefore, it is important to conduct regular surveys to capture the experiences while developers still remember their practices. +---------------------------------------- +------------------------------- +Section 252: +8 FUTURE WORK + + +8.1 Code-Snippet Granularity + + +We discussed in methodology section that going to a finer granularity than blob-level to detect code reuse is not practically feasible. Nevertheless, there are approaches that can make this a relatively more tractable problem. Specifically, hashing the abstract syntax tree (AST) for each code snippet (such as classes or functions) in a blob and mapping blobs to these hashes could potentially make finer-grained code reuse detection more feasible. + + +Assuming an average of $k$ code snippets for each of the 16 billion blobs, the parsing and hashing operation has a complexity of $O(n)$, resulting in $O(16 \times 10^9 \times k)$. We can then perform a self-join on the created map of blob to syntax tree hash (b2AST) using the AST hash as the key. The self-join complexity depends on the number of unique hashes and their distribution. In the worst case, if every blob had unique hashes, the join operation would approach $O((16 \times 10^9 \times k)^2)$. However, the join complexity would typically be significantly less if there are many common hashes. A more realistic estimate assumes that the number of unique AST hashes $h$ is much smaller than the total number of entries in the b2AST map, making the join complexity closer to $O(h \times 16 \times 10^9 \times k)$. This join, although potentially large, can be more feasible than pairwise comparisons of entire blobs due to the more efficient handling of common hashes. + + +By examining code reuse at the granularity of code snippets, we could potentially uncover a far more intricate network of reuse. This approach might reveal patterns and practices that are not noticeable when looking solely at whole-file or blob-level reuse. Although this increased complexity is challenging to manage, it offers valuable opportunities for a more comprehensive analysis of reuse [46]. + + +8.2 Dependency-Based Reuse + + +In this work, we aimed to demonstrate the prevalence and importance of copy-based reuse. To gain a comprehensive understanding of code reuse, it is important to analyze both copy-based and dependency-based reuse. Each type of reuse reveals different aspects of how software developers leverage existing code in their projects. By studying them side by side, we can paint a more complete picture of the extent and nuances of reuse in software development. Ignoring one in favor of the other would provide an incomplete narrative [46]. + + +8.3 Upstream Repository + + +As highlighted in the limitations section, we currently lack precise knowledge about the source from which a repository reuses a file. We tend to assume it is from the originating repository in all instances of copying. However, this assumption may not capture the real-world complexity of reuse. To enhance our understanding of how developers identify suitable repositories for reuse, we could potentially leverage meta-heuristic algorithms or artificial intelligence techniques. These advanced methods might enable us to predict the actual source of reused artifacts in each instance of copying with greater accuracy [46]. + + +8.4 Open Source Software Supply Chain Network + + +Directed Acyclic Graphs (DAGs) have been instrumental in clone detection and reuse literature due to their ability to model and analyze complex relationships and dependencies between various software components. +In the context of copy-based reuse, the dataset created using the World of Code (WoC)\textsuperscript{16} infrastructure can be leveraged to construct DAGs that represent the flow and reuse across different repositories. + + +The dataset’s detailed tracking of blob copies, including their origins and destinations, provides a rich source of data to map these relationships accurately. By drawing DAGs, researchers can visualize and analyze the propagation of reused blobs, identifying critical nodes (projects or blobs) that play a central role in the reuse network. This visualization helps in understanding the structure and dynamics of reuse, highlighting patterns such as the most reused blobs, the central projects in the reuse network, and potential vulnerabilities or licensing issues propagating through these reused blobs. + + +DAGs can reveal how reuse spreads across projects, helping to identify which projects are the primary sources of reusable blobs and how code flows between different projects. By mapping out the reuse network, it is possible to pinpoint critical points where vulnerabilities or licensing issues could propagate, allowing for targeted interventions to mitigate these risks. Understanding the reuse network also aids in developing better tools and practices for managing code quality and ensuring that reused code is maintained and updated consistently across all projects that use it. + + +Studies on large-scale clone detection such as Sajnani et al. [83] and Koschke [52] provide foundational methodologies for leveraging DAGs in these contexts. These methodologies can be adapted and extended using our dataset to enhance the understanding of copy-based reuse in open source software development. + + +8.5 Tool Development + + +As discussed in the background section, different types of code reuse can have impacts on several critical areas, including security, licensing, and code quality. Understanding these implications and addressing them is important for advancing software development practices. + + +Security. + Reused code can propagate vulnerabilities across multiple projects [78]. For instance, if a security flaw exists in a reused blob, it can potentially affect all projects that include this blob. Analyzing the reuse patterns can help identify critical points where vulnerabilities might spread and allow for proactive mitigation measures. There have been notable incidents where widespread code reuse led to security breaches. For example, the Heartbleed bug in OpenSSL had far-reaching impacts due to the extensive reuse of the affected code across numerous projects. Future research can focus on developing automated tools that scan reused code for known vulnerabilities and suggest patches. This proactive approach can enhance the security posture of software systems. + + +Compliance. + Reused code may carry licensing obligations that need to be respected. Failure to comply with these obligations can lead to legal disputes and financial penalties. By understanding reuse patterns, organizations can ensure they meet licensing requirements. There have been instances where companies faced legal challenges due to improper reuse of code with restrictive licenses. For example, using GPL-licensed code in a proprietary software without complying with GPL terms has led to lawsuits. Developing tools that automatically check for license compliance when code is reused can help organizations avoid legal pitfalls. These tools can flag potential issues and provide guidance on how to resolve them. + + +Code Quality. + Reused code may not always meet the quality standards of the adopting project. Ensuring that reused code adheres to best practices and coding standards is essential for maintaining overall code quality. Poorly written code can lead to maintenance challenges and degraded performance in adopting projects. Future work can focus on creating tools that assess the quality of reused code and suggest improvements. These tools can analyze code for adherence to coding standards, detect code smells, and recommend refactoring. + + +\textsuperscript{16}For more information about how to access this data, please visit: https://github.com/woc-hack/tutorial. +Package Managers. Developing package managers tailored for different programming languages and communities can be highly beneficial. These managers can offer more relevant and effective support for managing code reuse in specific environments. Additionally, enhancing existing package managers with features such as reuse tracking, version control, and automated updates can improve development efficiency and reduce the associated risks of code reuse. + + +Community Engagement. Engaging with open source communities to develop tools and practices that address the unique needs of different ecosystems, and collaborating with these communities, can ensure widespread adoption and effectiveness. Continuously gathering user feedback and iterating on the tools to enhance their functionality and usability is also important. This iterative process helps create robust and reliable tools that meet the evolving needs of software developers. +---------------------------------------- +------------------------------- +Section 253: +9 CONCLUSIONS + + +In conclusion, our study highlights the non-negligible role of copy-based reuse in open source software development. By leveraging the extensive World of Code (WoC) dataset, we provided a comprehensive analysis of code reuse, revealing that a substantial portion of open source projects engage in this practice. Our findings indicate that 6.9% of all blobs in OSS have been reused at least once, and 80% of projects have reused blobs from another project. This widespread reuse emphasizes the efficiency gains in OSS development but also raises concerns about security and legal compliance. + + +The variation in reuse patterns across programming languages underscores the influence of language-specific ecosystems and practices. Moreover, the higher propensity for binary blob reuse suggests a need for tailored tools to support different types of reuse. Future research should focus on improving the accuracy and comprehensiveness of reuse detection and exploring the impact of partial file reuse. + + +The survey results further enrich our understanding of reuse practices. We found that many creators intended their resources for reuse, indicating a collaborative mindset among developers. Reusers generally found the reused blobs helpful. Despite these positive perceptions, reusers showed relatively low concern about potential bugs and changes in the original files. This low level of concern could suggest either a high level of trust in the quality of the reused code or a lack of awareness of the associated risks. Additionally, the survey revealed a moderate interest in using package managers to handle changes to reused files. This indicates potential demand for tools that can streamline and manage code reuse more effectively. + + +Overall, our work provides insights into the patterns and factors affecting code reuse, advocating for better management and support tools to enhance the sustainability and security of OSS. By addressing the identified risks and leveraging the collaborative nature of the OSS community, we can improve code reuse practices and outcomes. + + +ACKNOWLEDGMENTS + + +This work was supported in part by the National Science Foundation under Award Numbers 1901102 and 2120429. The authors additionally thank Dr. James Herbsleb and Dr. Bogdan Vasilescu for their valuable advice and insightful comments, which helped improve this work. The authors also thank the reviewers for their constructive feedback and suggestions, which helped enhance the quality of this paper. + + +REFERENCES + + +[1] Qurat Ul Ain, Wasi Haider Butt, Muhammad Waseem Anwar, Farooque Azam, and Bilal Maqbool. 2019. A systematic review on code clone detection. IEEE access 7 (2019), 86121–86144. + + +[2] Le An, Ons Mlouki, Foutse Khomh, and Giuliano Antoniol. 2017. Stack overflow: a code laundering platform?. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 283–293. +[3] Corey M Angst, Ritu Agarwal, Vallabh Sambamurthy, and Ken Kelley. 2010. Social contagion and information technology diffusion: The adoption of electronic medical records in US hospitals. +Management Science + 56, 8 (2010), 1219–1241. + + +[4] Giuliano Antoniol, Massimiliano Di Penta, and Ettore Merlo. 2004. An automatic approach to identify class evolution discontinuities. In +Proceedings. 7th International Workshop on Principles of Software Evolution +, 2004. IEEE, 31–40. + + +[5] Zubin Austin and Jane Sutton. 2014. Qualitative research: Getting started. +The Canadian journal of hospital pharmacy + 67, 6 (2014), 436. + + +[6] Tegawendé F Bissyandé, Ferdian Thung, David Lo, Lingxiao Jiang, and Laurent Réveillere. 2013. Popularity, interoperability, and impact of programming languages in 100,000 open source projects. In +2013 IEEE 37th annual computer software and applications conference +. IEEE, 303–312. + + +[7] Kelly Blincoe, Jyoti Sheoran, Sean Goggins, Eva Petakovic, and Daniela Damian. 2016. Understanding the popular users: Following, affiliation influence and leadership on GitHub. +Information and Software Technology + 70 (2016), 30–39. + + +[8] Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Predicting the popularity of github repositories. In +Proceedings of the The 12th international conference on predictive models and data analytics in software engineering +. 1–10. + + +[9] Lina Boughton, Courtney Miller, Yasemin Acar, Dominik Wermke, and Christian Kästner. 2024. Decomposing and Measuring Trust in Open-Source Software Supply Chains. In +Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results +. 57–61. + + +[10] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. +Qualitative research in psychology + 3, 2 (2006), 77–101. + + +[11] Alan W Brown and Kurt C Wallnau. 1998. The current state of CBSE. +IEEE software + 15, 5 (1998), 37–46. + + +[12] Andrea Capiluppi, Patricia Lago, and Maurizio Morisio. 2003. Characteristics of open source projects. In +Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings +. IEEE, 317–327. + + +[13] Ashley Castleberry and Amanda Nolen. 2018. Thematic analysis of qualitative research data: Is it as easy as it sounds? +Currents in pharmacy teaching and learning + 10, 6 (2018), 807–815. + + +[14] Nicholas A Christakis and James H Fowler. 2013. Social contagion theory: examining dynamic social networks and human behavior. +Statistics in Medicine + 32 (2013), 556–577. Issue 4. https://doi.org/10.1002/sim.5408 + + +[15] Russ Cox. 2019. Surviving Software Dependencies: Software reuse is finally here but comes with risks. +Queue + 17, 2 (2019), 24–47. + + +[16] John W Creswell and J David Creswell. 2017. +Research design: Qualitative, quantitative, and mixed methods approaches +. Sage publications. + + +[17] Kevin Crowston and James Howison. 2005. The social structure of free and open source software development. + + +[18] Norman K Denzin. 2017. +The research act: A theoretical introduction to sociological methods +. Routledge. + + +[19] Massimiliano Di Penta, Daniel M German, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2010. An exploratory study of the evolution of software licensing. In +2010 ACM/IEEE 32nd International Conference on Software Engineering +, Vol. 1. IEEE, 145–154. + + +[20] Muyue Feng, Weixuan Mao, Zimu Yuan, Yang Xiao, Gu Ban, Wei Wang, Shiyang Wang, Qian Tang, Jiahuan Xu, He Su, et al. 2019. Open-source license violations of binary software at large scale. In +2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) +. IEEE, 564–568. + + +[21] Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security. In +2017 IEEE Symposium on Security and Privacy (SP) +. 121–136. https://doi.org/10.1109/SP.2017.31 + + +[22] Samuel W Flint, Jigyasa Chauhan, and Robert Dyer. 2021. Escaping the time pit: Pitfalls and guidelines for using time-based git data. In +2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) +. IEEE, 85–96. + + +[23] William Frakes and Carol Terry. 1996. Software reuse: metrics and models. +ACM Computing Surveys (CSUR) + 28, 2 (1996), 415–435. + + +[24] William B Frakes and Christopher J Fox. 1995. Sixteen questions about software reuse. +Commun. ACM + 38, 6 (1995), 75–ff. + + +[25] William B Frakes and Kyo Kang. 2005. Software reuse research: Status and future. +IEEE transactions on Software Engineering + 31, 7 (2005), 529–536. + + +[26] William B Frakes and Giancarlo Succi. 2001. An industrial study of reuse, quality, and productivity. +Journal of Systems and Software + 57, 2 (2001), 99–106. + + +[27] Mark Gabel and Zhendong Su. 2010. A study of the uniqueness of source code. In +Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering +. 147–156. + + +[28] Jonas Gamalielsson and Björn Lundell. 2014. Sustainability of Open Source software communities beyond a fork: How and why has the LibreOffice project evolved? +Journal of systems and Software + 89 (2014), 128–145. + + +[29] CJ Michael Geisterfer and Sudipto Ghosh. 2006. Software component specification: a study in perspective of component selection and reuse. In +Fifth International Conference on Commercial-off-the-Shelf (COTS)-Based Software Systems (ICCBSS’05) +. IEEE, 9–pp. + + +[30] Daniel M German. 2002. The evolution of the GNOME Project. In +Proceedings of the 2nd Workshop on Open Source Software Engineering +. 20–24. + + +[31] Daniel M German, Massimiliano Di Penta, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2009. Code siblings: Technical and legal implications of copying code between applications. In +2009 6th IEEE International Working Conference on Mining Software Repositories +. IEEE, 81–90. +[32] Daniel M German and Ahmed E Hassan. 2009. License integration patterns: Addressing license mismatches in component-based development. In 2009 IEEE 31st international conference on software engineering. IEEE, 188–198. + + +[33] Mohammad Gharehyazie, Baishakhi Ray, and Vladimir Filkov. 2017. Some from here, some from there: Cross-project code reuse in github. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 291–301. + + +[34] Mohammad Gharehyazie, Baishakhi Ray, Mehdi Keshani, Masoumeh Soleimani Zavosht, Abbas Heydarnoori, and Vladimir Filkov. 2019. Cross-project code clones in GitHub. Empirical Software Engineering 24, 3 (2019), 1558–1573. + + +[35] Antonios Gkortzis, Daniel Feitosa, and Diomidis Spinellis. 2021. Software reuse cuts both ways: An empirical analysis of its relationship with security vulnerabilities. Journal of Systems and Software 172 (2021), 110653. + + +[36] Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 233–236. + + +[37] Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent’s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 12–21. + + +[38] Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How many interviews are enough? An experiment with data saturation and variability. Field methods 18, 1 (2006), 59–82. + + +[39] Stefan Haefliger, Georg Von Krogh, and Sebastian Spaeth. 2008. Code reuse in open source software. Management science 54, 1 (2008), 180–193. + + +[40] Steve Hanna, Ling Huang, Edward Wu, Saung Li, Charles Chen, and Dawn Song. 2012. Juxtapp: A scalable system for detecting code reuse among android applications. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 62–81. + + +[41] Hideaki Hata, Raula Gaikovina Kula, Takashi Ishio, and Christoph Treude. 2021. Research artifact: The potential of meta-maintenance on GitHub. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 192–193. + + +[42] Hideaki Hata, Raula Gaikovina Kula, Takashi Ishio, and Christoph Treude. 2021. Same file, different changes: the potential of meta-maintenance on github. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 773–784. + + +[43] Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, and Maximilian Irlbeck. 2011. On the extent and nature of software reuse in open source java projects. In International Conference on Software Reuse. Springer, 207–222. + + +[44] David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. John Wiley & Sons. + + +[45] Katsuro Inoue, Yuya Miyamoto, Daniel M German, and Takashi Ishio. 2021. Finding code-clone snippets in large source-code collection by CCgrep. In Open Source Systems: 17th IFIP WG 2.13 International Conference, OSS 2021, Virtual Event, May 12–13, 2021, Proceedings 17. Springer, 28–41. + + +[46] Mahmoud Jahanshahi and Audris Mockus. 2024. Dataset: Copy-based Reuse in Open Source Software. In 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). IEEE, 42–47. + + +[47] Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE’07). IEEE, 96–105. + + +[48] Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. 2009. Do code clones matter?. In 2009 IEEE 31st International Conference on Software Engineering. IEEE, 485–495. + + +[49] Cory J Kapser and Michael W Godfrey. 2008. “Cloning considered harmful” considered harmful: patterns of cloning in software. Empirical Software Engineering 13 (2008), 645–692. + + +[50] Naohiro Kawamitsu, Takashi Ishio, Tetsuya Kanda, Raula Gaikovina Kula, Coen De Roover, and Katsuro Inoue. 2014. Identifying source code reuse across repositories using lcs-based source code similarity. In 2014 IEEE 14th international working conference on source code analysis and manipulation. IEEE, 305–314. + + +[51] Stefan Koch and Georg Schneider. 2002. Effort, co-operation and co-ordination in an open source software project: GNOME. Information Systems Journal 12, 1 (2002), 27–42. + + +[52] Rainer Koschke. 2007. Survey of research on software clones. + + +[53] Robert V Krejcie and Daryle W Morgan. 1970. Determining sample size for research activities. Educational and psychological measurement 30, 3 (1970), 607–610. + + +[54] Charles W Krueger. 2001. Easing the transition to software mass customization. In International Workshop on Software Product-Family Engineering. Springer, 282–293. + + +[55] Charles W Krueger. 1992. Software reuse. ACM Computing Surveys (CSUR) 24, 2 (1992), 131–183. + + +[56] Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. Sok: Taxonomy of attacks on open-source software supply chains. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1509–1526. + + +[57] Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 631–636. + + +[58] Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. 2006. CP-Miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on software Engineering 32, 3 (2006), 176–192. +[59] Long Liang, Xiaobo Wu, Jing Deng, and Xin Lv. 2022. Research on Risk Analysis and Governance Measures of Open-source Components of Information System in Transportation Industry. +Procedia Computer Science + 208 (2022), 106–110. https://doi.org/10.1016/j.procs.2022.10.017 + + +[60] Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. +Proceedings of the ACM on Programming Languages + 1, OOPSLA (2017), 1–28. + + +[61] Adolfo Lozano-Tello and Asunción Gómez-Pérez. 2002. BAREMO: how to choose the appropriate software component using the analytic hierarchy process. In +Proceedings of the 14th international conference on Software engineering and knowledge engineering +. 781–788. + + +[62] Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus. 2019. World of code: an infrastructure for mining the universe of open source VCS data. In +2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) +. IEEE, 143–154. + + +[63] Yuxing Ma, Tapajit Dey, Chris Bogart, Sadika Amreen, Marat Valiev, Adam Tutko, David Kennard, Russell Zaretzki, and Audris Mockus. 2021. World of code: Enabling a research workflow for mining and analyzing the universe of open source vcs data. +Empirical Software Engineering + 26, 2 (2021), 1–42. + + +[64] Yuxing Ma, Audris Mockus, Russel Zaretzki, Randy Bradley, and Bogdan Bichescu. 2020. A methodology for analyzing uptake of software technologies among developers. +IEEE Transactions on Software Engineering + 48, 2 (2020), 485–501. + + +[65] Mark Mason et al. 2010. Sample size and saturation in PhD studies using qualitative interviews. + + +[66] Hafedh Mili, Fatma Mili, and Ali Mili. 1995. Reusing software: Issues and research directions. +IEEE transactions on Software Engineering + 21, 6 (1995), 528–562. + + +[67] Michael Mitzenmacher and Eli Upfal. 2017. +Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis +. Cambridge university press. + + +[68] Audris Mockus. 2007. Large-scale code reuse in open source software. In +First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007) +. IEEE, 7–7. + + +[69] Audris Mockus. 2019. Insights from open source software supply chains (keynote). In +Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering + (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 3. https://doi.org/10.1145/3338906.3342813 + + +[70] Audris Mockus. 2022. Tutorial: Open Source Software Supply Chains. https://mockus.org/papers/SSCISEC22.pdf + + +[71] Audris Mockus. 2023. Securing Large Language Model Software Supply Chains. https://mockus.org/papers/wocllm.pdf ASE’23 LLMs in Software Engineering. + + +[72] Audris Mockus, Diomidis Spinellis, Zoe Kotti, and Gabriel John Dusing. 2020. A complete set of related git repositories identified via community detection approaches based on shared commits. In +Proceedings of the 17th International Conference on Mining Software Repositories +. 513–517. + + +[73] Chinenye Okafor, Taylor R Schorlemmer, Santiago Torres-Arias, and James C Davis. 2022. Sok: Analysis of software supply chain security by establishing secure design properties. In +Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses +. 15–24. + + +[74] Joel Ossher, Sushil Bajracharya, and Cristina Lopes. 2010. Automated dependency resolution for open source software. In +2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) +. IEEE, 130–140. + + +[75] David Lorge Parnas. 1972. On the criteria to be used in decomposing systems into modules. +Commun. ACM + 15, 12 (1972), 1053–1058. + + +[76] Shi Qiu, Daniel M German, and Katsuro Inoue. 2021. Empirical study on dependency-related license violation in the javascript package ecosystem. +Journal of Information Processing + 29 (2021), 296–304. + + +[77] Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In +Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering +. 155–165. + + +[78] David Reid, Mahmoud Jahanshahi, and Audris Mockus. 2022. The extent of orphan vulnerabilities from code reuse in open source software. In +Proceedings of the 44th International Conference on Software Engineering +. 2104–2115. + + +[79] Jeffrey A. Roberts, Il-Horn Hann, and Sandra A. Slaughter. 2006. Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the apache projects. +Management Science + 52, 7 (July 2006), 984–999. + + +[80] Chanchal Kumar Roy and James R Cordy. 2007. A survey on software clone detection research. +Queen’s School of Computing TR + 541, 115 (2007), 64–68. + + +[81] Chanchal K Roy, James R Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. +Science of computer programming + 74, 7 (2009), 470–495. + + +[82] Julia Rubin and Marsha Chechik. 2013. A survey of feature location techniques. , 29–58 pages. + + +[83] Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. In +Proceedings of the 38th international conference on software engineering +. 1157–1168. + + +[84] Mohammadreza Samadi, Alexander Nikolaev, and Rakesh Nagi. 2016. A subjective evidence model for influence maximization in social networks. +Omega + 59 (2016), 263–278. +[85] Susan Elliott Sim, Charles LA Clarke, and Richard C Holt. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings. 6th International Workshop on Program Comprehension. IWPC’98 (Cat. No. 98TB100242). IEEE, 180–187. + + +[86] Manuel Sojer and Joachim Henkel. 2010. Code reuse in open source software development: Quantitative evidence, drivers, and impediments. Journal of the Association for Information Systems 11, 12 (2010), 2. + + +[87] Chintakindi Srinivas, Vangipuram Radhakrishna, and CV Guru Rao. 2014. Clustering and classification of software component for efficient component retrieval and building component reuse libraries. Procedia Computer Science 31 (2014), 1044–1050. + + +[88] Student. 1908. The probable error of a mean. , 25 pages. + + +[89] Jane Sutton and Zubin Austin. 2015. Qualitative research: Data collection, analysis, and management. The Canadian journal of hospital pharmacy 68, 3 (2015), 226. + + +[90] Jeffrey Svajlenko, Iman Keivanloo, and Chanchal K Roy. 2013. Scaling classical clone detection tools for ultra-large datasets: An exploratory study. In 2013 7th International Workshop on Software Clones (IWSC). IEEE, 16–22. + + +[91] Jeffrey Svajlenko and Chanchal K Roy. 2014. Evaluating modern clone detection tools. In 2014 IEEE international conference on software maintenance and evolution. IEEE, 321–330. + + +[92] Jeffrey Svajlenko and Chanchal K Roy. 2015. Evaluating clone detection tools with bigclonebench. In 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 131–140. + + +[93] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of social and technical factors for evaluating contribution in GitHub. In Proceedings of the 36th international conference on Software engineering. 356–366. + + +[94] Bogdan Vasilescu, Alexander Serebrenik, and Vladimir Filkov. 2015. A data set for social diversity studies of github teams. In 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, 514–517. + + +[95] David M Weiss and Chi Tau Robert Lai. 1999. Software product-line engineering: a family-based software development process. Addison-Wesley Longman Publishing Co., Inc. + + +[96] Katrin Weller and Katharina E Kinder-Kurlanda. 2016. A manifesto for data sharing in social media research. In Proceedings of the 8th ACM Conference on Web Science. 166–172. + + +[97] Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 87–98. + + +[98] Dapeng Yan, Yuqing Niu, Kui Liu, Zhe Liu, Zhiming Liu, and Tegawendé F Bissyandé. 2021. Estimating the attack surface from residual vulnerabilities in open source software supply chain. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). IEEE, 493–502. + + +[99] Robert K Yin. 2015. Qualitative research from start to finish. Guilford publications. + + +[100] Yuhang Zhao, Ruigang Liang, Xiang Chen, and Jing Zou. 2021. Evaluation indicators for open-source software: a review. Cybersecurity 4 (2021), 1–24. +---------------------------------------- +------------------------------- +Section 254: +How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub + + +Shurui Zhou + +Carnegie Mellon University, USA + + +Bogdan Vasilescu + +Carnegie Mellon University, USA + + +Christian Kästner + +Carnegie Mellon University, USA + + +ABSTRACT + + +The notion of forking has changed with the rise of distributed version control systems and social coding environments, like GitHub. Traditionally forking refers to splitting off an independent development branch (which we call hard forks); research on hard forks, conducted mostly in pre-GitHub days showed that hard forks were often seen critical as they may fragment a community. Today, in social coding environments, open-source developers are encouraged to fork a project in order to contribute to the community (which we call social forks), which may have also influenced perceptions and practices around hard forks. To revisit hard forks, we identify, study, and classify 15,306 hard forks on GitHub and interview 18 owners of hard forks or forked repositories. We find that, among others, hard forks often evolve out of social forks rather than being planned deliberately and that perception about hard forks have indeed changed dramatically, seeing them often as a positive non-competitive alternative to the original project. + + +ACM Reference Format: +Shurui Zhou, Bogdan Vasilescu, and Christian Kästner. 2020. How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub. In 42nd International Conference on Software Engineering (ICSE ’20), May 23–29, 2020, Seoul, Republic of Korea. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3377811.3380412 +---------------------------------------- +------------------------------- +Section 255: +1 INTRODUCTION + + +The notion of forking in open-source has evolved: Traditionally, forking was the practice of copying a repository and splitting off new independent development, often under a new name; forking was rare and was typically intended to compete with or supersede the original project [15, 30, 32]. Nowadays, forks in distributed version control systems are public copies of repositories in which developers can make changes, potentially, but not necessarily, with the intention of integrating those changes back into the original repository. + + +With the rise of social coding and explicit support in (distributed) version control systems, forking of repositories has been explicitly promoted by sites like GitHub, Bitbucket, and GitLab, and has indeed become very popular [19, 34]. For example, we identified over 114,120 GitHub projects with more than 50 forks, and over 9,164 projects with more than 500 forks as of June 2019, with numbers rising quickly. However, most of these modern forks are not forks in the traditional sense. As in our prior work [53], we distinguish between social forks, referring to creating a public copy of a repository on a social coding site like GitHub, often with the goal of contributing to the original project, and hard forks, referring to the traditional notion of splitting off a new development branch. + + +Hard forks have been discussed controversially throughout the history of free and open-source software: On the one hand, free and open-source licenses codified the right to create hard forks, which was seen as essential for guaranteeing flexibility and fostering disruptive innovations [15, 30, 32] and useful for encouraging a survival-of-the-fittest model [48]. On the other hand, hard forks were frequently considered as risky to projects, since they could fragment a community and lead to confusion for both developers and users [15, 26, 30, 36], and there was a strong norm against forking; many well known hard forks exist (e.g., LibreOffice, Jenkins, io.js; see Fig. 1), but there are not many well known cases where both communities survived and are both healthy after a hard fork, with a prominent exception being the BSD variants. + + +Prior research into forking of free and open-source projects focused on the motivations behind hard forks [8, 12, 13, 26, 31, 39, 47], the controversial perceptions around hard forks [6, 15, 26, 30, 36, 49], and the outcomes of hard forks (including studying factors that influence such outcomes) [39, 49]. However, essentially all that research has been conducted before the rise of social coding, much of it on SourceForge (GitHub was launched in 2008 and became the dominant open-source hosting site around 2012; cf. Fig 1). + + +In this paper, we argue that perceptions and practices around forking could have changed significantly since SourceForge’s heydays. In contrast to the strong norm against forking back then, we conjecture that the promotion of social forks on sites like GitHub, and the often blurry line between social and hard forks, may have encouraged forking and lowered the bar also for hard forks. At the same time, advances in tooling, especially distributed version control systems like Git [7] and transparency mechanisms on social coding sites [10], may have enabled new opportunities and + + +Figure 1: Timeline of some popular open-source forking events; popularity approximated with Google Trends. +changed common practices and perceptions. The professionalization of open-source development and the increasing involvement of corporations or even corporate ownership of open-source projects may have further tilted perceptions. + + +Therefore, we argue that it is time to revisit, replicate, and extend research on hard forks, asking the central question of this work: +How have perceptions and practices around hard forks changed? + Updating and deepening our understanding regarding practices and perceptions around hard forks can inform the design of better tools and management strategies to facilitate efficient collaboration. Furthermore, we attempt to automate the process of identifying hard forks among social forks and quantifying how frequent hard forks are across GitHub, which previous research did not cover. + + +Using a mixed-methods empirical design, combining repository mining with 18 developer interviews, we investigate: + + + + + + +Frequency of hard forks: + We attempt to quantify the frequency of hard forks among all the (mostly social) forks on GitHub. Specifically, we design and refine a classifier to automatically detect hard forks. We find 15,306 instances, showing that hard forks are a significant concern, even though their relative numbers are low. + + + + + + +Common evolution patterns of hard forks: + We classify the evolution of hard forks and their corresponding upstream repository to observe outcomes, including whether the fork and upstream repositories both sustain their activities and whether they synchronize their development. We develop our classification by visualizing and qualitatively analyzing evolution patterns (using card sorting) and subsequently automate the classification process to analyze all detected hard forks. We find that many hard forks are sustained for extended periods and a substantial number of hard forks still at least occasionally exchange commits with the upstream repository. + + + + + + +Perceptions of hard forks: + In interviews with 18 open-source maintainers of forks and corresponding upstream repositories, we solicit practices and perceptions regarding hard forks and analyze whether those align with ones reported in pre-social-coding research. We find that the ‘stigma’ often reported around hard forks is largely gone, indeed forks including hard forks are generally seen as a positive, with many hard forks complementing rather than competing with the upstream repository. Furthermore, with social forking encouraging forks as contribution mechanism, we find that many hard forks are not deliberately planned but evolve slowly from social forks. + + + + + + +Overall, we contribute (1) a method to identify hard forks, (2) a dataset of 15,306 hard forks on GitHub, (3) a classification and analysis of evolution patterns of hard forks, and (4) results from interviews with 18 open source developers about the reasons for hard forks, interactions across forks, and perceptions of hard forks. + + +Our research focuses on development practices on GitHub, which is by far the dominant open-source hosting platform (cf. Fig. 1) and has been key in establishing the social forking phenomenon. Even large projects primarily hosted on other sites often have a public mirror on GitHub, allowing us to gather a fairly representative picture of the entire open-source community. Our main research instruments are semi-structured interviews with open-ended questions and repository mining with GitHub API. While our research is not planned as an exact replication of prior work and exceeds the scope of prior studies by comparing social and hard forks, many facets seek to replicate prior findings (e.g., regarding motivations and outcomes of hard forks) and can be considered a conceptual replication [24, 43]. +---------------------------------------- +------------------------------- +Section 256: +2 PAST RESEARCH ON FORKING +---------------------------------------- +------------------------------- +Section 257: +2.1 Types of Forking + + +What is popularly understood by ‘forking a project’ has changed in the last decades, which, in line with our prior work [53], we distinguish as +hard forks + and +social forks +: + + + + + + +Hard forks: + Traditionally, forking refers to copying a project in order to continue a separate, often competing line of development; the name and the direction of the project also typically change. Developers might fork a project, e.g., when they are unhappy with the direction or governance, deciding to create a divergent version more in line with their own vision [15]. In pre-GitHub days, ways to contribute to an open-source project varied widely, but rather than using public forks one would typically create local copies to make changes and then send those as patch files. + + + + + + +Social forks: + Popularized through GitHub, ‘forking’ now also refers to public copies of open-source repositories that are often created for short-term feature implementation, often with the intention of contributing back the the upstream repository. A fork on GitHub is thus typically not intended to start an independent development line, but as a uniform mechanism for distributed development and third-party contribution (i.e., pull requests) [10, 19]. In fact, the forking function on GitHub is frequently used even just as a bookmarking mechanism to keep a copy of a project without intention of performing any changes [25]. + + + + + + +On GitHub, nowadays, both forms of forking exist, and we conjecture that the vast majority of forks are social forks, however it is not obvious how to distinguish the two kinds without a closer analysis. + + +At a technical level, forks can be created by cloning of repositories in distributed version control systems, in which case the fork maintains the history of the upstream project, or simply by copying files over and starting a new history (the latter was more common in pre-GitHub days). If forks are created directly on GitHub, a clone is automatically created, and GitHub tracks and visually shows the relationship between fork and upstream projects. + + +There is significant research on both hard forks and social forks. The hard-forking research is typically older, conducted almost exclusively before GitHub and social forking. Research on social forking is more recent, but focuses much more on the contribution process and issues around managing contributions in a single project. +---------------------------------------- +------------------------------- +Section 258: +2.2 Motivations for Forking + + +Reasons why developers might create a hard fork of an existing open-source project vary widely. Motivations for such forks have been studied primarily on SourceForge before the advent of social coding environment [8, 12, 13, 26, 31, 39, 47]. As per Robles and +González-Barahona [39], the most common motivations for hard forks were: + + + + + + +Technical. + Variants targeting specific needs or user segments that are not accommodated by the upstream project are the most common motivation [31]. As a project grows and matures, the contributors’ goals or perspectives may diverge, and some may want to take the project in a different direction. If taken to the extreme, hard forks can be used for variant management, in which multiple related but different projects originating from the same source are maintained separately [3, 13, 14, 45]. + + + + + + +Governance disputes. + Some contributors created hard forks when they feel their feedback is not heard or maintainers are accepting patches too slowly in the original project. A hard fork, or even just the threat of creating one, can help developers negotiate in governance disputes [17]; recent examples of hard forks caused by governance disputes include Node.js [42, 50] and Docker [51]. Other common forms of disputes occur when companies are involved and try to influence the direction of the project or try to close-source or monetize future versions of the project, as with Hudson and OpenOffice. + + + + + + +Discontinuation of the original project. + A hard fork can revive a project when the original developers have ceased to work on it. For example, back in the 1990s, the Apache web server project took over for the abandoned NCSA HTTPd project. + + + + + + +Commercial forks. + Companies sometimes fork open-source projects to create their own branded version of the project, sometimes enhanced with closed-source features. An example is Apple’s fork of KDE’s KHTML rendering engine as Webkit. + + + + + + +Legal reasons. + A project might consider different licenses, a trademark dispute may arise, or changes in laws (e.g., regarding encryption) require technical changes. Hard forks can be used to split development for different jurisdictions. + + + + + + +Personal reasons. + Interpersonal disputes and irreconcilable differences of a non-technical nature lead to a rift between various parties, so the project forks. OpenBSD is a classic example. + + + + + + +In contrast to the older work on hard forks, more recent work has also investigated the motivation and practices behind social forks. For example, Fung et al. [16] report that only 14 percent of all active forks of nine popular JavaScript projects integrated back any changes. Subsequently, researchers studied social forks at larger scale and reported that around 50 percent of forks on GitHub never integrate code changes back [23, 53]. In addition, Jiang et al. [23] reported that 10 percent of their study participants used forks for backup purposes. + + +In our study, we revisit the question about the motivation for hard forks and explore whether they have changed with the rise of social coding. +---------------------------------------- +------------------------------- +Section 259: +2.3 Outcomes of Hard Forks + + +Wheeler [49] and Robles and González-Barahona [39] distinguish five possible outcomes of hard forks: + + + + + + +Successful branching, typically with differentiation. + Both the original project and the fork succeed and remain active for a prolonged period of time, fragmenting the community into smaller subcommunities. The BSD variants are notable examples. + + + + + + +Fork merges back into the upstream project. + The fork does not sustain independence but merges changes back into the upstream project, e.g., after resolving a dispute that triggered the hard fork in the first place, as in the io.js fork of Node.js [50]. + + + + + + +Discontinuation of the fork. + The fork is initially active, but does not sustain its activity. For example, when libc split off from glibc, the glibc maintainers invested in improvements to win back users and the fork failed. + + + + + + +Discontinuation of the upstream project. + The fork outperforms the upstream project such that the upstream project is discontinued (or the fork revives an already dead upstream project). For example, XFree86 moved away from a GPL-compatible license, so the project forked and created X.org, which was quickly adopted by most developers and users; soon after, the XFree86 core team disbanded and development ceased on the project. + + + + + + +Both fail. + Both projects fail (or the fork fails to revive a dead project). + + + + + + +Wheeler [49] conjectured that it is rare for both the fork and the upstream project to sustain activities. Robles and González-Barahona [39] quantified the frequency of each outcome in a sample of 220 forked open-source projects referenced from Wikipedia in 2011 (i.e., selection biased toward well-known projects that have achieved a certain level of success) and found that successful branching was most common (43.6%), followed by discontinuation of the fork (29.8%) and discontinuation of the upstream project (13.8%); failure of both and merges were relatively rare (8.7% and 3.2%). +---------------------------------------- +------------------------------- +Section 260: +2.4 Pros and Cons of Hard Forks + + +Hard forks have long been discussed controversially. In the 90s and 2000s, forking was seen as an important right but also as something to avoid if at all possible, unless it is a last resort. There was a strong norm against forking, as it fragments communities and can cause hard feelings for the people involved. The free software movement has traditionally seen forking as something to avoid: forks split the community, introduce duplicate effort, reduce communication, and may produce incompatibilities [39]. Specifically, it can tear a community apart, meaning people in the community have to pick sides [6, 15, 26, 30, 36, 49]. Such fragmentation can also threaten the sustainability of open-source projects, as scarce resources are additionally scattered and changes need to be performed redundantly across multiple projects; e.g., the 3D printer firmware Marlin fixed an issue (PR #10119) two years after the same problem was fixed in its hard fork Ultimaker (PR #118). At the same time, the right to forking is also seen as an important political tool of the community: The threat of a fork alone can cause project leaders to pay attention to issues they may ignore otherwise, should those issues actually be important and potentially improve their current practices [49]. + + +In contrast, social forks are seen as something almost exclusively positive and are actively encouraged [4]. They are a mechanism to contribute to a project, and most open-source projects actively embrace external contributors [19, 46]. Although some maintainers complain about the burden of dealing with so many third-party contributions [21, 46] and some researchers warn about inefficiencies regarding lost contributions or duplicate work [38, 52, 53], we are not aware of any calls to constrain social forking. +Importantly though, as our study will show, the distinction between social and hard forks is fluent. Social coding platforms contain both kinds of forks and they are not always easy to distinguish. Diffusion of efforts and fragmentation of communities, as always feared in discussions of hard forks, can be observed also on GitHub. Many secondary forks (i.e., forks of forks) contribute to other forks, but not to the original repository, and forks slowly drift apart [16, 45]. A key question is, thus, whether the popularity of social forking encourages also hard forks and causes similar fragmentation and sustainability challenges feared in the past. + + +We believe it is necessary to revisit hard forking after the rise of social coding and GitHub. Specifically, we aim to understand the hard-fork phenomenon in a current social-forking environment, and understand how perceptions and practices may have changed. +---------------------------------------- +------------------------------- +Section 261: +3 RESEARCH QUESTIONS AND METHODS + + +As described in Sec. 2, the conventional use of the term forking as well as corresponding tooling have changed with the rise of distributed version control and social coding platforms, and we conjecture that this also influenced hard forks. Hence, our overall research question is How have perceptions and practices around hard forks changed? + + +We explore different facets of hard forks, including motivations, outcomes, and perceived stigma (cf. Sec. 2). We also attempt to identify how frequent hard forks are across GitHub, and discuss how developers navigate the tension and often blurry line between social and hard forks. We adopt a concurrent mixed-method exploratory research strategy [9], in which we combine repository mining – to identify hard forks and their outcomes – with interviews of maintainers of both forks and upstream projects – to explore motivations and perceptions. Mixing multiple methods allows us to explore the research question simultaneously from multiple facets and to triangulate some results. In addition, we use some results of repository mining to guide the selection of interviewees. + + +We explicitly decided against an exact replication [24, 43] of prior work, because contexts have changed significantly. Instead, we guide our research by previously explored facets of hard forks, revisit those as part of our repository mining and interviews, and contrast our findings with those reported in pre-GitHub studies. In addition, we do not limit our research to previously explored facets, but explicitly explore new facets, such as the tension between social and hard forks, that have emerged from technology changes or that we discovered in our interviews. + + +3.1 Instrument for Visualizing Fork Activities + + +We created commit history graphs, a custom visualization of commit activities in forks, as illustrated in Figure 2, to help develop and debug our classifiers (Sec. 3.2 and 3.3), but also to prepare for interviews. Given a pair of a fork and corresponding upstream repositories, we clone both and analyze the joint commit graph between the two, assigning every commit two one of five states: (1) created before the forking point, (2) only upstream (not synchronized), (3) only in fork (unmerged), (4) created upstream but synchronized to the fork, and (5) created in the fork but merged into upstream. Technically, in a nutshell, we build on our prior commit graph analysis [53], where merge edges are assigned weight 1 and all other edges weight 0, and the shortest path from the commit to any branch in either fork or upstream repository identifies where the commit originates and whether it has been merged (and in which direction).1 + + +We subsequently plot activities in the two repositories over time, aggregated in three-month intervals; larger dots indicate more commits. In these plots, we include additional arrows for synchronization (from upstream into the fork) and merge (from fork to upstream) activities. With these plots, we can quickly visually inspect development activities before and after the forking point as well whether the fork and the upstream repository interact. + + +3.2 Identifying Hard Forks + + +Identifying hard forks reliably is challenging. Pre-GitHub work often used keyword searches in project descriptions, e.g., ‘software fork’, or relied on external curated sources (e.g., Wikipedia) [39]. Today, on sites like GitHub, hard forks use the same mechanisms as social forks without any explicit distinction. + + +Classifier development. For this work, we want to gather a large set of hard forks and even approximate the frequency of hard forks among all 47 million forks on GitHub. To that end, we need a scalable, automated classifier. We are not aware of any existing classifier except our own prior work [53], in which we classified forks as hard forks if they have at least two own pull requests or at least 100 own, unmerged commits and the project’s name has been changed. Unfortunately, we found that this classifier missed many actual hard forks (false negatives), thus we went back to the drawing board to develop a new one. + + +We proceeded iteratively, repeatedly trying, validating, and combining various heuristics. That is, we would try a heuristic to detect hard forks and manually sample a significant number of classified forks to identify false positives and false negatives, revising the heuristic or combining it with other steps. Commit history graphs (cf. Sec. 3.1) and our qualitative analysis of forks (Sec 3.3 below) were useful debugging devices in the process. We iterated until we reached confidence in the results and a low rate of false positives. + + +1There are a few nuances in the process due to technicalities of Git and GitHub. For example, if the upstream repository deletes a branch after forking, the joint commit graph would identify the code as exclusive to the fork; to that end, we discard commits that are older than the forking timestamp on GitHub. Such details are available in our open-source implementation (https://github.com/shuiblue/VisualHardFork). +Our final classifier proceeds in two steps: first, we use multiple simple heuristics to identify candidate hard forks; second, we use a more detailed and more expensive analysis to decide which of those candidates are actual hard forks. + + +In the first step, we identify as candidate hard forks, among all repositories labeled as forks on GitHub, those that: + + + + + + +Contain the phrase “fork of” in their description + (H1). We use GitHub’s search API to find all repositories that contain the phrase “fork of” in their project description and are a fork of another project. The idea, inspired by prior work [31], is to look for projects that explicitly label themselves as forks (defined as “self-proclaimed forks”), i.e., developers explicitly change their description after cloning the upstream repository. To work around GitHub’s API search limit of 1000 results per query, we partitioned the query based on different time ranges in which the repository was created. Next, we compare the description of the fork and its upstream project to make sure the description is not copied from the upstream, i.e., that the upstream project is not already a self-proclaimed fork. + + + + + + +Received external pull requests + (H2). Using the June 2019 GHTorrent dataset [18], we identified all GitHub repositories that are labeled as forks and have received at least three pull requests (excluding pull requests issued by the fork’s owner to avoid counting developers who use a process with feature branches). We consider external contributions to a fork as a signal that the fork may have attracted its own community. + + + + + + +Have substantial unmerged changes + (H3). Using the same GHTorrent dataset, we identify all forks that have at least 100 own commits, indicating significant development activities beyond what is typical for social forks. + + + + + + +Have at least 1-year of development activity + (H4). Similar to the previous heuristic, we look for prolonged development activities beyond what is common for social forks. Specifically, we identify those forks as candidates in which the time between the first and the last commit spans more than one year. + + + + + + +Have changed their name + (H5). We check if the fork’s name in GitHub has been changed from the upstream repository’s name (with Levenshtein distance $\geq 3$). This heuristic comes from the observation that most social forks do not change names, but that forks intending to go in a different direction and create a separate community tend to change names more commonly (e.g., Jenkins forked Hudson). + + + + + + +Each repository that meets at least one of these criteria is considered as a candidate. We show how many candidates each heuristic identified in the second column of Fig. 3b. Note, for all heuristics that use GHTorrent, we additionally validated the results by checking whether the fork and upstream pair still exist on GitHub and whether the measures align with those reported by the GitHub API.(^5) + + +In line with prior work [25, 53], we remove repositories using GitHub for document storage or course project submission – some of which are among the most forked projects on GitHub. Specifically, after manual review, we discard repositories containing the keywords ‘homework’, ‘assignments’, ‘course’, ‘codecamp’, or ‘documents’ in their description; we discard repositories whose name starts with ‘awesome-’ (usually document collections); and we discard repositories with no programming-language-specific files (as per GitHub’s language classification queried through the API). + + + + + + +We discard candidates with fewer than three stars on GitHub. Stars are a lightweight mechanism for developers to indicate their interest in a project and a common measure of popularity. A threshold of three stars is very low, but still requires a minimum amount of public interest. According to GHTorrent data, of the 125 million GitHub repositories, 2 million repositories (1.6%) have three or more stars. + + + + + + +We discard candidates without any own commits after the fork, typically projects that only performed a name change as the single post-fork action. + + + + + + +We discard candidates in which 30% or more of all commits in the fork have been merged upstream, which indicates social forks with active contributions to the upstream project. + + + + + + +For candidates identified with 100 commits or 1 year of activity, we discard those where the thresholds are not met when considering only unmerged commits exclusive to the fork. + + + + + + +We discard candidates owned by developers who contributed more than 30% of the commits or pull requests of the upstream repository, which typically indicates core team members of the upstream project using social forks for feature development. + + + + + + +We discard candidates if the fork was created right after the upstream stopped updating if the fork is owned by an organization account and the upstream is owned by a user account. This is a common pattern we observed, indicating the ownership transfer. Our classifier identifies a total of 15,306 hard forks across GitHub. In Fig. 3b, we show which heuristics identified the hard forks and the overlap between the different heuristics in Fig. 3a. + + + + + + +Classifier validation. + To validate the precision of our classifier, we manually inspected a random sample of 300 detected hard forks. By manually analyzing the fork’s and the upstream repository’s history and commit messages, we classified 14 detected hard forks + + +| Rule | Candidates | Actual | +|------|------------|--------| +| H1 | 10,609 | 551 | +| H2 | 23,109 | 7,043 | +| H3 | 14,956 | 810 | +| H4 | 33,073 | 11,268 | +| H5 | 20,358 | 5,568 | +| Total| 63,314 | 15,306 | +as likely false positives, suggesting an acceptable accuracy of 95%. Note that manual labeling is a best effort approach as well, as the distinction between social and hard fork is not always clear (see also our discussion of interview results in Sec. 4.4). + + +Analyzing false negatives (recall) is challenging, because hard forks are rare, projects listed in previous papers are too old to detect in our GitHub dataset, and we are not aware of any other labeled dataset. We have manually curated a list of known hard forks from mentions in web resources and from mentions during our interviews. Of the 3 hard forks of which both the fork and the upstream repository are on GitHub, we detect all with our classifier, but the size of our labeled dataset is too small to make meaningful inferences about recall. +---------------------------------------- +------------------------------- +Section 262: +3.3 Classifying Evolution Patterns + + +We identified different evolution patterns among the analyzed forks using an iterative approach inspired by card sorting [44]. Evolution patterns describe how a hard fork and the corresponding upstream project coevolve and can help to characterize forking outcomes. In addition, we used evolution patterns to diversify interviewees. + + +Specifically, we printed cards with commit history graphs of 100 randomly selected hard forks (see Sec. 3.2), then all three authors jointly grouped the cards and identified a few common patterns. Our card-sorting was open, meaning we had no predefined groups; the groups emerged and evolved during the analysis process. Afterward, we manually built a classifier that detects the forks for each identified pattern. We then applied this classifier to the entire dataset, inspected that the automatically classified forks actually fit the patterns as intended (refining the classifier and its thresholds if needed). We then picked another 100 hard forks that fit none of the previously defined patterns and sorted those again, looking for additional patterns. We similarly proceeded within each pattern, looking at 100 hard forks to see whether we can further split the pattern. We repeated this process until we could not identify any further patterns. + + +After several iterations, we arrived at a stable list of 15 patterns with which we could classify 97.7% of all hard forks. We list all patterns with a corresponding example commit history graph in Tab. 2. The patterns use characteristics that relate to previously found outcomes, such as fork or upstream being discontinued, but also consider additional characteristics corresponding to features that were not available or easily observable before distributed version control, e.g., whether the fork and upstream merge or synchronize. We present the patterns in a hierarchical form, because our process revealed a classification with a fairly obvious tree structure, not because we were specifically looking for a hierarchical structure. +---------------------------------------- +------------------------------- +Section 263: +3.4 Interviews + + +To solicit views and perceptions, we conducted 18 semi-structured interviews with developers, typically 20 to 40 minutes. Despite reaching fewer developers, we opted for interviews rather than surveys due to the exploratory nature of our research: Interviews allow more in-depth exploration of emerging themes. + + +Interview protocol. + We designed a protocol [2] that covers the relevant dimensions from earlier research and touches on expected changes, including reasons for forking, perceived stigma of forking, and the distinction and possible tensions between social and hard forks. We asked fork owners about their decision process that lead to the hard fork, their practices afterward (e.g., why they renamed the projects), their current relationship to the upstream project (e.g., whether they still monitor or even synchronize), and their future plans. In contrast, we asked owners of upstream projects to what extent they are aware of, interact with, or monitor hard forks; and to what degree they are concerned about such forks or even take steps to avoid them. In addition, we asked all participants with a long history of open-source activity if they observed any changes in their practices or perceptions and that of others over time. + + +All interviews were semi-structured, allowing for exploration of topics that were brought up by the participants. Our interview protocol evolved with each interview, as we reacted to confusion about questions and insights found in earlier interviews. That is, we refined and added questions to explore new insights in more detail in subsequent interviews – for example, after the first few interviews we added questions about the tradeoff between being inclusive to changes versus risking hard forks and questions regarding practices and tooling to coordinate across repositories. To ground each interview in concrete experience rather than vague generalizations, we focused each interview on a single repository in which the interviewee was involved, bringing questions back to that specific repository if the discussion became too generic. + + +Participant recruitment. + We selected potential interviewees among the maintainers of the 15,306 identified hard forks and corresponding upstream repositories. We did consider maintainers with public email address on their GitHub profile that were active in the analyzed repositories within the last 2 years (to reduce the risk of misremembering). We sampled candidates from all evolution patterns (Sec. 3.3) and sent out 242 invitation emails. + + +Overall, 18 maintainers volunteered to participate in our study (7% response rate). Ten opted to be interviewed over email, one +---------------------------------------- +------------------------------- +Section 264: +Table 1: Background information of participants. + + +| Par. | Domain | #Stars(U) | #Stars(F) | LOC | Role | Exp.(yr) | +|------|-----------------|-----------|-----------|-----|------|----------| +| P1 | Blockchain | <20 | <10 | 10K | F | 19 | +| P2 | Reinforcement learning | 10K | 1K | 30K | F | 3 | +| P3 | Mobile processing | - | 70 | 20K | F | 6 | +| P4 | Video recording | - | 100 | 300K| F | 18 | +| P5 | Helpdesk system | 2K | <10 | 800K| F | 5 | +| P6 | CRM system | 30 | 200 | 800K| F | 10 | +| P7 | Physics engine | - | 300 | 100K| F | 15 | +| P8 | Social platform | 500 | 230 | 500K| F | 20 | +| P9 | Reinforcement learning | <20 | <20 | 30K | 2nd-F| 3 | +| P10 | Game Engine | 500 | <10 | 200K| 2nd-F| 21 | +| P11 | Networking | 300 | 100 | 500K| F | 10 | +| P12 | Email library | - | 10K | 20K | F/U | 32 | +| P13 | Game engine | 3K | 70 | 20K | F | 11 | +| P14 | Machine learning| 30K | 50 | 60K | F | 8 | +| P15 | Image editing | 70 | <10 | 20K | F | 20 | +| P16 | Image editing | 70 | <10 | 20K | U | 10 | +| P17 | Microcontrollers| 9K | 1K | 300K| U | 6 | +| P18 | Maps | 400 | <10 | 100K| U | 9 | + + +F: Hard Fork Owner; U: Upstream Maintainer; 2nd-F: Fork of the Hard Fork + + +*Some of the upstream projects are not in GitHub, so the number of stars is unknown. Numbers rounded to one significant digit. +through a chat app, and all others over phone or teleconferencing. In Table 2, we map our interviewees to the evolution pattern for the primary fork discussed (though interviewees may have multiple roles in different projects). Naturally, our interviewees are biased toward hard forks that are still active. Our response rate was also lower among maintainers of upstream repositories, who were maybe less invested in talking about forking. In Table 1, we list information about our interviewees and the primary hard fork we discussed. All interviewees are experienced open-source developers, specifically, many with more than 10 years experience of participating in open-source community, meaning they have interacted with earlier open-source platform such as Sourceforge. Our interviews reached saturation, in that the last interviews provided only marginal additional insights. + + +Analysis. + We analyzed the interviews using standard qualitatively research methods [41]. After transcribing all interviews, two authors coded the interviews independently, then all authors subsequently discussed emerging topics and trends. Questions and disagreements were discussed and resolved together, if needed asking follow up questions to some interviewees. +---------------------------------------- +------------------------------- +Section 265: +3.5 Threats to Validity and Credibility + + +Our study exhibits the threats to validity and credibility that are typical and expected of this kind of exploratory interview studies and the used analysis of archival GitHub data. + + +Distinguishing between social and hard forks is difficult, even for human raters, as the distinction is primarily one of intention. In our experience, we can make a judgment call with high inter-rater reliability for most forks, but there are always some repositories that cannot be accurately classified without additional information. We build and evaluate our classifiers based on a best effort strategy, as discussed. + + +While we check later steps with data from the GitHub API, early steps to identify candidate hard forks may be affected by missing or incorrect data in the GHTorrent dataset. In addition, the history of Git repositories is not reliable, as timestamps may be incorrect and users can rewrite histories after the fact. In addition, merges are difficult to track if code changes are merged as a new commit or through ‘squashing’ rather than through a traditional merge commit. As a consequence, despite best efforts, there will be inaccuracies in our classification of hard forks and individual commits, which we expect will lead to some underreporting of hard forks and to some underreporting of merged code. + + +We analyze data with right-censored time series data, in which we can detect that projects have seized activity in the past, but cannot predict the future, thus seeing a larger chance for older forks to be discontinued. + + +Our study is limited to hard forks of which both fork and upstream repository are hosted on GitHub and of which the forking relationship is tracked by GitHub. While GitHub is by far the most dominant hosting service for open source, our study does not cover forks created of (typically older) projects hosted elsewhere and forks created by manually cloning or copying source code to a new repository. In addition, our interviews, as typical for all interview studies in our field, is biased toward answer from developers who chose to make their email public and chose to answer to our interview request, which underrepresented maintainers of upstream repositories in our sample. +---------------------------------------- +------------------------------- +Section 266: +4 RESULTS + + +We explore practices and perceptions around hard forks along four facets that emerged from our interviews and data. +---------------------------------------- +------------------------------- +Section 267: +4.1 Frequency of Hard Forks + + +Our classifier identified 15,306 hard forks, confirming that hard forks are generally a rare phenomenon. As of June 2019, GitHub tracks 47 million repositories that are marked as forks over 5 million distinct upstream repositories among GitHub’s over 125 million repositories. + + +Among those, the vast majority of forks has no activity after the forking point and no stars. Most active forks have only very limited activity indicative of social forks. Only 0.2% of GitHub’s 47 million forks have 3 or more stars. + + +As our analysis of evolution patterns (Tab. 2) reveals, cases where both the upstream repository and the hard fork remain active for extended periods of time are not common (patterns 1, 2, and 4–7; 1157 hard forks, 8.8%). Most hard forks actually survive the upstream project, if the upstream project was active when the fork was created (patterns 8–11; 7280 hard forks, 47.6%), but many also run out of steam eventually (patterns 3 and 12–15; 6671 hard forks, 43.6%). + + +While most hard forks are created as forks of active projects (patterns 4–15; 14254 hard forks, 93%), there are a substantial number of cases where hard fork are created to revive a dead project (pattern 1–3; 1052 hard forks, 6.8%), in some cases even triggering or coinciding with a revival of the upstream project (pattern 2; 56 hard forks, 0.36%), but also here not all hard fork sustain activity (pattern 3; 420 hard forks, 2.7%). + + +Discussion and implications. + Even though the percentage of hard forks is low, the total number of attempted and sustained hard forks is not. Considering the significant cost a hard fork can put on a community through fragmentation, but also the potential power a community has through hard forks, we argue that hard forks are an important phenomenon to study even when they are comparably rare. + + +Whereas previous work typically looked at only a small number of hard forks, and research on tooling around hard-fork issues typically focus on few well known projects, such as the variants of BSD [35] or Marlin [28] or artificial or academic variants [14, 22], we have detected a significant number of hard forks, many of them recent, using many different languages, that are a rich pool for future research. We release the dataset of all hard forks with corresponding visualizations as dataset with this paper [2]. +---------------------------------------- +------------------------------- +Section 268: +4.2 Why Hard Forks Are Created (And How to Avoid Them) + + +At a first glance, the interviewees give reasons for creating hard forks that align well with prior findings, including especially continuing discontinued projects or projects with unresponsive maintainers (P1, P2, P8), disagreements around project governance (P2, P12), and diverging technical goals or target populations (P3, P5, +Table 2: Evolution patterns of hard forks + + +| Id | Category | Total | Sub-category | Example | Count | Interviewees | +|----|---------------------------|-------|-----------------------|---------|-------|--------------| +| 1 | Success (F. active > 2 Qt.) | 632 | Upstream remains inactive | + | 576 | P12 | +| 2 | Revive Dead Project | | Upstream active again | + | 56 | | +| 3 | Not success (F active <= 2 Qt) | 420 | | + | 420 | | +| 4 | only merge | | | + | 26 | P10 | +| 5 | Both Alive | 723 | only sync | + | 107 | P2, P13, P15 | +| 6 | merge & sync | | | + | 28 | P9 | +| 7 | no interaction | | | + | 562 | P1, P3, P4, P5, P7, P14 | +| 8 | only merge | | | + | 174 | | +| 9 | Fork Lived Longer | 7280 | only sync | + | 686 | | +| 10 | Forking Active Project | | merge & sync | + | 107 | | +| 11 | no interaction | | | + | 6313 | P6, P8, P11 | +| 12 | only merge | | | + | 388 | | +| 13 | Fork does not out live upstream | 6251 | only sync | + | 762 | | +| 14 | merge & sync | | | + | 199 | | +| 15 | no interaction | | | + | 4902 | | + + +P6, P11, P13, P14, P17). As discussed, we identified 1052 hard forks (Tab. 2, patterns 1–3, 6.8 %) that forked an inactive project. + + +An interesting common theme that emerged in our interviews though was that many hard forks were not deliberately created as hard forks initially. More than half of our interviewees described that they initially created a fork with the intention of contributing to the upstream repository (social fork), but when they faced obstacles they decided to continue on their own. Common obstacles were unresponsive maintainers (P1, P2, P8) and rejected pull requests (P11, P13, P14), typically because the change was considered beyond the scope of the project. For example, P2 described that “before forking, we started by opening issues and pull requests, but there was a lack of response from their part. [We] got some news only 2 months after, when our fork was getting some interest from others.” Similarly, some maintainers reported that a fork initially created for minor personal changes evolved into a hard fork as changes became more elaborate and others found them useful (P2, P14, P17); for example, P14 described that the upstream project had been constantly evolving and the code base became quickly incompatible with some libraries, so he decided to fix this issue while also adding functionality, after which more and more people found his fork and started to migrate. + + +Several maintainers also had explicit thoughts about how to avoid hard forks (both maintainers of projects that have been forked and fork owners who themselves may be forked), and they largely mirror common reasons for forking, i.e., transparent governance, being responsive, and being inclusive to feature requests. For example, P2 suggests that their project is reactive to the community, thus he considers it unlikely to be forked; similarly P16 decided to generally “respond to issues in a timely manner and make a good...” +faith effort to incorporate PRs and possibly fix issues and add features as the needs arrives” to reduce the need for hard forks. Beyond these, P2 also mentioned that they created a contributing guide and issue templates to coordinate with contributors more efficiently; P14 suggested to “credit the contributors” explicitly in release notes in order to keep contributors stay in the community. + + +Discussion and Implications. Whereas forking was typically seen as a deliberate decision in pre-GitHub days that required explicit steps to set up a repository for the fork and find a new name, nowadays many hard forks seem to happen without much initial deliberation. Social coding environments actively encourage forking as a contribution mechanism, which significantly lowers the bar to create a fork in the first place without having to think about a new name or potential consequences like fragmenting communities. Once the fork exists (initially created as social fork), there seems to be often a gradual development until developers explicitly consider their fork a separate development line. In fact, many hard forks seem to be triggered by rather small initial changes. These interview results align with the observation that only about 36% of the detected hard forks on GitHub have changed the project’s name (cf. Fig. 3a). + + +More importantly, a theme emerged throughout our interviews that hard forks are not likely to be avoidable in general, because of a project’s tension between being specific and begin general. On the one hand, projects that are more inclusive to all community contributions risk becoming so large and broad that they become expensive to maintain (e.g., as P17 suggests, the project maintainers need to take over maintenance of third-party contributions for niche use cases) and difficult to use (e.g., lots of configuration options and too much complexity). On the other hand, projects staying close to their original vision and keeping a narrow scope may remain more focused with a smaller and easier to maintain code base, but they risk alienating users who do not fit that original vision, who then may create hard forks. One could argue that hard forks are a good test bed for contributions that diverge from the original project despite their costs on the community: If fork dies it might suggest a lack of support and that it may have been a good decision not to integrate those contributions in the main project. + + +In this context, a family of related projects that serve slightly different needs or target populations but still coordinate may be a way to overcome this specificity-generality dilemma in supporting multiple projects that each are specific to a mission, but together target a significant number of use cases. However, current technology does not support coordination across multiple hard forks well, as we discuss next. + + +4.3 Interactions between Fork and Upstream Repository + + +Many interviewees indicate that they are interested in coordinating across repositories, either for merging some or all changes back upstream eventually or to monitor activity in the upstream repository to incorporate select or all changes. Some hard fork owners did not see themselves competing with the upstream project, but rather being part of a larger project. For instance, although fork owner P13 has over 1500 commits ahead of the upstream project, he still said that “I would not consider it independent because I am relying on what they (upstream) are doing. I could make it independent and stop getting their improvements, but it’s to their credit they make it very easy for their many hundreds of developers to contribute patches and accept patches from each other. They regulate what goes into their project very well, and that makes [merging changes] into my fork much easier.” Some (P4 and P11) indicate that they would like to merge, once the reason for the hard fork disappears (typically governance practices or personal disputes). Also upstream maintainers tend to be usually interested in what happens in their forks; for example, P17, a maintainer of a project with thousands of (mostly social) forks, said “I try to be aware of the important forks and try to get to know the person who did the fork. I will follow their activities to some extent.” + + +However, even though many interviewees expressed intentions, we see little evidence of actual synchronization or merging across forks in the repositories: For example, P1, P4, P8, and P11 mention that they are interested in eventually merging back with the upstream repository, but they have not done so yet and do not have any concrete plans at this point. Similarly, P2, P6, and P10 indicate that they are interested in changes in upstream projects, but do not actually monitor them and have not synchronized in a long time. Our evolution patterns similarly show that synchronization (from upstream to fork) and merging (from fork to upstream) are rare. Only 16.18% of all hard forks with active upstream repositories ever synchronize or merge (Tab. 2, patterns 4–6, 8–10, and 12–14). + + +What might explain this difference between intentions and observed actions is that synchronization and merging becomes difficult once two repositories diverge substantially and that monitoring repositories can becoming overwhelming with current tools. For example, P2 reports to only occasionally synchronize minor improvements, because the fork has diverged to much to synchronize larger changes; P10 has experienced problems of synchronizing too frequently and thus being faced with incomplete implementations and now only selectively synchronizes features of interest. In line with prior observations on monitoring change feeds [5, 10, 33, 52], interviewees report that systematically monitoring changes from other repositories is onerous and that current tools like GitHub’s network graph are difficult to use and does not scale (P11, P16). + + +Discussion and Implications. Tooling has changed significantly since the pre-GitHub days of prior studies on hard forks which may allow new forms of collaboration across forks: Git specifically supports merges across distributed version histories, as well as selectively integrating changes through a ‘cherry picking’ feature. GitHub and similar social coding pages track forks, allowing developers to subscribe to changes in select repositories, and generally make changes in forks transparent [10, 11, 52]. Essentially all interviewees were familiar with GitHub’s network view [1] that visually shows contributions over time across forks and branches. + + +Even though advances in tooling provide new opportunities for coordination across multiple forks and project maintainers are interested in coordinating and considering multiple forked projects as part of a larger community, current tools do not support this use case well. Current tools work well for short-term social forks but... +tend to work less well for coordinating changes across repositories that have diverged more significantly. + + +This provides opportunities for researchers to explore tooling concepts that can monitor, manage, and integrate changes across a family of hard forks. Recent academic tools for improved monitoring [33, 52] or cross-fork change migration [35, 37] are potentially promising but are not yet accessible easily to practitioners. Also more experimental ideas about virtual product-line platforms that unify development of multiple variants of a project [3, 14, 29, 40, 45] may provide inspiration for maintaining and coordinating hard forks, though they typically do not currently support the distributed nature of development with competing hard forks. A technical solution could solve the specificity-generality dilemma (cf. Sec. 4.2), allowing subcommunities to handle more specific features without overloading the upstream project and without fragmenting the overall community. We believe that our dataset of 15,306 hard forks can be useful to develop and evaluate such tools in a realistic setting. + + +4.4 Perceptions of Hard Forking + + +Our discussion with maintainers confirmed that the line between hard forks and social forks is somewhat subjective, but, when prompted, they could draw distinctions that largely mirror our definition (long-term focus, extensive changes, fork with own community). For example, P2 agree that his fork is independent from the upstream project because they have different goals, and suggests the fork has better code quality, and better community management practices; the only remaining connection are upstream bug fixes that he incorporates from time to time. Also, P6 considers his fork as independent, given a quicker release cycle and significantly refactoring to the code base. + + +For most interviewees, the dominant meaning of a fork is that of a social fork. When asked about perceptions of forks, most interviewees initially thought of social forks and have strong positive associations, e.g., others contributing to a project, onboarding newcomers and finding collaborators, and generally fostering innovation. For instance, P6 described the advantages of social forking as “it encourages developers to go in a direction that the original project may not have gone,” and similarly P9 thought that “it could boost the creative ideas of the communities.” One interviewee also mentioned that for young projects primarily focus on growth, having been forked is a positive signal meaning the project is useful to other people. Social forks were so dominant in the interviewees’ mind as a default, that we had to frequently refocus the interview on hard forks. When asked specifically about hard forks, several interviewees raised concerns about potential community fragmentation (P4, P6, P17), worried about incompatibilities and especially confusing end users (P3, P9, P14, P17), and would have preferred to see hard-fork owners to contribute to the upstream project instead (P3, P8, P12). However, concerns were mostly phrased as hypotheticals and contrasted with positive aspects. + + +Many interviewed owners of hard forks do not see themselves competing with the upstream repository, as they consider that they address a different problem or target a different user population. For example, P10 described his fork as a “light version” of the upstream project targeting a different group of users. + + +While it is understandable that hard-fork owners see their forks as justified, also some interviewed owners of upstream projects had positive opinions about such forks. For example, P17 expressed that forks are good if there is a reason (such as a focus on a different target population, in this case beginners), and that those forks may benefit the larger community by bringing in more users to the project; P18 suggested even that he would support and contribute forks of his own project by occasionally contributing to them as long as it will benefit the larger community. + + +Discussion and Implications. Overall, we see that the perception of forking has significantly changed compared to perceptions reported in earlier work. Forking used to have a rather negative connotation in pre-GitHub days and was largely regarded as a last resort to be avoided to not fragment the community and confuse users. With GitHub’s rebranding of the word forking, the stigma around hard forking seems to have mostly disappeared; the word has mostly positive connotations for developers, associated positively with external contributors and community. While there is still some concern about community fragmentation, it is rarely a concrete concern if there are actual reasons behind a hard fork. Transparent tooling seems to help with acceptance and with considering multiple hard forks as part of a larger community that can mutually benefit from each other. + + +We expect that a more favorable view, combined with lower technical barriers (Sec. 4.2) and higher expectations of coordination (Sec. 4.3) makes hard forks a phenomenon we should expect to see more of. However, positive expectations can turn into frustration (and disengagement of valuable contributors to sustain open source) if fragmentation leads to competition, confusion, and coordination breakdowns due to insufficient tooling. + + +With the right tooling for coordination and merging, we think hard forks can be a powerful tool for exploring new and larger ideas or testing whether there is sufficient support for features and ports for niche requirements or new target audiences (e.g., solving the specificity-generality dilemma discussed in Sec. 4.2 with a deliberate process). To that end though, it is necessary to explicitly understand (some) hard forks as part of a larger community around a project and possibly even explicitly encourage hard forks for specific explorations beyond the usual scope of social forks. We believe that there are many ways to support development with hard forks and to coordinate distributed developers beyond what social coding site offer at small scale today. Examples include (1) an early warning system that alerts upstream maintainers of emerging hard forks (e.g., external bots), which maintainers could use to encourage collaboration over competition and fragmentation if desired, (2) a way to declare the intention behind a fork (e.g., explicit GitHub support) and dashboard to show how multiple projects and important hard forks interrelate (e.g., pointing to hard forks that provide ports for specific operating systems), and (3) means to identify the essence of the novel contributions in forks (e.g., history slicing [27] or code summarization [52]). +---------------------------------------- +------------------------------- +Section 269: +5 CONCLUSION + + +With the rise of social coding and explicit support in distributed version control systems, forking of repositories has been explicitly promoted by sites like GitHub and has become very popular. +However, most of these modern forks are not hard forks in the traditional sense. In this paper, we automatically detected hard forks and their evolution patterns and interviewed open-source developers of forks and upstream repositories to study perceptions and practices. We found that perceptions and practices have indeed changed significantly: Among others, hard forks often evolve out of social forks rather than being planned deliberately and developers are less concerned about community fragmentation but frequently perceive hard forks a positive noncompetitive alternatives to the original projects. We also outlined challenges and suggested directions for future work. + + +Acknowledgements. Zhou and Kästner have been supported in part by the NSF (awards 1552944, 1717022, and 1813598) and AFRL and DARPA (FA8750-16-2-0042). Vasilescu has been supported in part by the NSF (awards 1717415 and 1901311) and the Alfred P. Sloan Foundation. + + +REFERENCES + + +[1] 2008. GitHub Network View. https://help.github.com/en/articles/viewing-a-repositorys-network. + + +[2] 2020. Appendix. https://github.com/shuiblue/ICSE20-hardfork-appendix. + + +[3] Michal Antkiewicz, Wenbin Ji, Thorsten Berger, Krzysztof Czarnecki, Thomas Schmorleiz, Ralf Lämmel, Ştefan Stănciulescu, Andrzej Wąsowski, and Ina Schaefer. 2014. Flexible Product Line Engineering with a Virtual Platform. In Proc. Int'l Conf. Software Engineering (ICSE). ACM, 532–535. + + +[4] Matt Asay. 2014. Why you should fork your next open-source project. Blog Post. https://www.techrepublic.com/article/why-you-should-fork-your-next-open-source-project/ + + +[5] Christopher Bogatin, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to Break an API: Cost Negotiation and Community Values in Three Software Ecosystems. In Proc. Int’l Symposium Foundations of Software Engineering (FSE). ACM, 109–120. + + +[6] Pete Bratach. 2017. Why Do Open Source Projects Fork? Blog Post. https://thenewstack.io/open-source-projects-fork/ + + +[7] Caius Brindescu, Mihai Codoban, Sergiu Shmaruktiachi, and Danny Dig. 2014. How Do Centralized and Distributed Version Control Systems Impact Software Changes? In Proc. Int’l Conf. Software Engineering (ICSE). ACM, 323–333. + + +[8] Bee Bee Chua. 2017. A Survey Paper on Open Source Forking Motivation Reasons and Challenges. In 21st Pacific Asia Conference on Information Systems (PACIS). 75. + + +[9] John W Creswell and J David Creswell. 2017. Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications. + + +[10] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proc. Conf. Computer Supported Cooperative Work (CSCW). ACM, 1277–1286. + + +[11] Laura Dabbish, Colleen Stuart, Jason Tsay, and James Herbsleb. 2013. Leveraging transparency. IEEE Software 30, 1 (2013), 37–43. + + +[12] James Dixon. 2009. Forking Protocol: Why, When, and How to Fork an Open Source Project. Blog Post. https://jamesdixon.wordpress.com/2009/05/13/different-kinds-of-open-source-forks-salad-dinner-and-fish/ + + +[13] Neil A Ernst, Steve Easterbrook, and John Mylopoulos. 2010. Code forking in open-source software: a requirements perspective. arXiv preprint arXiv:1004.2889 (2010). + + +[14] Stefan Fischer, Lukas Linsbauer, Roberto Erick Lopez-Herrejon, and Alexander Egyed. 2014. Enhancing clone-and-own with systematic reuse for developing software variants. In Proc. Int’l Conf. Software Maintenance (ICSM). IEEE, 391–400. + + +[15] Karl Fogel. 2005. Producing open source software: How to run a successful free software project. O’Reilly Media, Inc. + + +[16] Kam Hay Fung, Aybüke Aurum, and David Tang. 2012. Social Forking in Open Source Software: An Empirical Study. In Proc. Int’l Conf. Advanced Information Systems Engineering (CAiSE) Forum. Citeseer, 50–57. + + +[17] Jonas Gamalielsson and Björn Lundell. 2014. Sustainability of Open Source Software Communities beyond a Fork: How and Why has the LibreOffice Project Evolved? Journal of Systems and Software 89 (2014), 128–145. + + +[18] Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proc. Working Conf. Mining Software Repositories (MSR). IEEE Press, 233–236. + + +[19] Georgios Gousios, Martin Pinger, and Arie van Deursen. 2014. An exploratory study of the pull-based software development model. In Proc. Int’l Conf. Software Engineering (ICSE). ACM, 345–355. + + +[20] Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, and Andy Zaidman. 2014. Lean GHTorrent: GitHub data on demand. In Proc. Working Conf. Mining Software Repositories (MSR). ACM, 384–387. + + +[21] Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie Van Deursen. 2015. Work Practices and Challenges in Pull-Based Development: The Integrator’s Perspective. In Proc. Int’l Conf. Software Engineering (ICSE). Vol. 1. 358–368. + + +[22] Wenbin Ji, Thorsten Berger, Michal Antkiewicz, and Krzysztof Czarnecki. 2015. Maintaining Feature Traceability with Embedded Annotations. In Proc. Int’l Software Product Line Conf. (SPLC). ACM, 61–70. + + +[23] Jing Jiang, David Lo, Jiuhuan He, Xin Xia, Paveent Singh Kochhar, and Li Zhang. 2017. Why and how developers fork what from whom in GitHub. Empirical Software Engineering 22, 1 (2017), 547–578. + + +[24] Natalia Juristo and Omar S Gómez. 2010. Replication of software engineering experiments. In Empirical software engineering and verification. Springer, 60–88. + + +[25] Eirini Kallianvakou, Georgios Gousios, Kelly Blinco, Leif Singer, Daniel M German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035–2071. + + +[26] Andrew M St Laurent. 2004. Understanding Open Source and Free Software Licensing: Guide to Navigating Licensing Issues in Existing & New Software. O’Reilly Media, Inc. + + +[27] Yi Li, Chenguang Zhu, Julia Rubin, and Marsha Chechik. 2017. Semantic slicing of software version histories. IEEE Trans. Softw. Eng. (TSE) 44, 2 (2017), 182–201. + + +[28] Max Lillack, Ştefan Stănciulescu, Wilhelm Hedman, Thorsten Berger, and Andrzej Wąsowski. 2019. Intention-based Integration of Software Variants. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press. Piscataway, NJ, USA, 831–842. + + +[29] Leticia Montalvillo and Oscar Díaz. 2015. Tuning GitHub for SPL development: branching models & repository operations for product engineers. In Proceedings of the 19th International Conference on Software Product Line. ACM, 111–120. + + +[30] Linus Nyman. 2014. Hackers on forking. In Proc. Int’l Symposium on Open Collaboration (OpenSym). ACM, 6. + + +[31] Linus Nyman and Tommi Mikkonen. 2011. To Fork or not to Fork: Fork Motivations in SourceForge Projects. In Proc. IEP’11 Int’l Conf. on Open Source Systems. Springer, 259–268. + + +[32] Linus Nyman, Tommi Mikkonen, Juho Lindman, and Martin Fougère. 2012. Perspectives on Code Forking and Sustainability in Open Source Software. Open Source Systems: Long-Term Sustainability (2012), 274–279. + + +[33] Rohan Padhye, Senthil Mani, and Vibha Singhal Sinha. 2014. NeedFeed: Taming Change Notifications by Modeling Code Relevance. In Proc. Int’l Conf. Automated Software Engineering (ASE). ACM, 665–676. + + +[34] Ayushi Rastogi and Nachiappan Nagappan. 2016. Forking and the Sustainability of the Developer Community Participation—An Empirical Investigation on Outcomes and Reasons. In Proc. Int’l Conf. Software Analysis, Evolution, and Reengineering (SANER). Vol. 1. IEEE, 102–111. + + +[35] Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In Proc. Int’l Conf. Automated Software Engineering (ASE). IEEE, 367–377. + + +[36] Eric S Raymond. 2001. The Cathedral & the Bazaar: Musings on linux and open source by an accidental revolutionary. O’Reilly Media, Inc. + + +[37] Loyao Ren. 2019. Automated Patch Forging Across Forked Projects. In Proc. Int’l Symposium Foundations of Software Engineering (FSE). ACM, New York, NY, USA, 1199–1201. + + +[38] Loyao Ren, Shurui Zhou, Christian Kästner, and Andrzej Wąsowski. 2019. Identifying Redundancies in Fork-based Development. In Proc. Int’l Conf. Software Analysis, Evolution, and Reengineering (SANER). IEEE, 230–241. + + +[39] Gregorio Robles and Jesús M. González-Barahona. 2012. A Comprehensive Study of Software Forks: Dates, Reasons and Outcomes. In Proc. IEP’12 Int’l Conf. on Open Source Systems. 1–14. + + +[40] Julia Rubin and Marsha Chechik. 2013. A framework for managing cloned product variants. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 1233–1236. + + +[41] Johnny Saldana. 2015. The coding manual for qualitative researchers. Sage. + + +[42] Anand Mani Sankar. 2015. Node.js vs io.js: Why the fork?!? Blog Post. http://anandmanisankar.com/posts/nodejs-iojs-why-the-fork/ + + +[43] Stefan Schmidt. 2009. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology 13, 2 (2009), 90–100. + + +[44] Donna Spencer. 2009. Card sorting: Designing usable categories. Rosenfeld Media. + + +[45] Stefan Stănciulescu, Thorsten Berger, Eric Walkingshaw, and Andrzej Wąsowski. 2016. Concepts, operations, and feasibility of a projection-based variation control system. In Proc. Int’l Conf. Software Maintenance and Evolution (ICSME). IEEE, 323–333. + + +[46] Igor Steinmacher, Gustavo Pinto, Igor Scaliente Wiese, and Marco Aurélio Gerosa. 2018. Almost there: A study on quasi-contributors in open-source software projects. In Proc. Int’l Conf. Software Engineering (ICSE). IEEE, 256–266. + + +[47] Robert Viseur. 2012. Forks impacts and motivations in free and open source projects. International Journal of Advanced Computer Science and Applications 3, 2 (2012), 117–122. + + +[48] Steve Weber. 2004. The success of open source. Harvard University Press. +[49] David A. Wheeler. 2015. Why Open Source Software/Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers! Blog Post. https://dwheeler.com/ossfswhy.html + + +[50] Owen Williams. 2015. Node.js and io.js are settling their differences, merging back together. Blog Post. https://thenextweb.com/dd/2015/06/16/node-js-and-io-js-are-settling-their-differences-merging-back-together/ + + +[51] Alex Williams and Joab Jackson. 2016. A Docker Fork: Talk of a Split Is Now on the Table. Blog Post. https://thenewstack.io/docker-fork-talk-split-now-table/ + + +[52] Shurui Zhou, Ştefan Stãnciulescu, Olaf Leßenich, Yingfei Xiong, Andrzej Wąsowski, and Christian Kästner. 2018. Identifying Features in Forks. In Proc. Int’l Conf. Software Engineering (ICSE). ACM Press, 105–116. + + +[53] Shurui Zhou, Bogdan Vasilescu, and Christian Kästner. 2019. What the Fork: A Study of Inefficient and Efficient Forking Practices in Social Coding. In Proc. Europ. Software Engineering Conf./Foundations of Software Engineering (ESEC/FSE). ACM Press, New York, NY, 350–361. +---------------------------------------- +------------------------------- +Section 270: +An empirical study of integration activities in distributions of open source software + + +Bram Adams · Ryan Kavanagh · Ahmed E. Hassan · Daniel M. German + + +Published online: 31 March 2015 +© Springer Science+Business Media New York 2015 + + +Abstract Reuse of software components, either closed or open source, is considered to be one of the most important best practices in software engineering, since it reduces development cost and improves software quality. However, since reused components are (by definition) generic, they need to be customized and integrated into a specific system before they can be useful. Since this integration is system-specific, the integration effort is non-negligible and increases maintenance costs, especially if more than one component needs to be integrated. This paper performs an empirical study of multi-component integration in the context of three successful open source distributions (Debian, Ubuntu and FreeBSD). Such distributions integrate thousands of open source components with an operating system kernel to deliver a coherent software product to millions of users worldwide. We empirically identified seven major integration activities performed by the maintainers of these distributions, documented how these activities are being performed by the maintainers, then evaluated and refined the identified activities with input from six maintainers of the three studied distributions. The documented activities provide a common vocabulary for component integration in open source distributions and outline a roadmap for future research on software integration. + + +Communicated by: Filippo Lanubile + + +B. Adams (✉) +MCIS, Polytechnique Montréal, Montréal, Canada +e-mail: bram.adams@polymtl.ca + + +R. Kavanagh · A. E. Hassan +SAIL, Queen’s University, Kingston, Canada +R. Kavanagh +e-mail: ryan@cs.queensu.ca +A. E. Hassan +e-mail: ahmed@cs.queensu.ca + + +D. M. German +University of Victoria, Victoria, Canada +e-mail: dmg@uvic.ca +Keywords Software integration · Software reuse · Open source distributions · Debian · Ubuntu and FreeBSD +---------------------------------------- +------------------------------- +Section 271: +1 Introduction + + +Software reuse is “the use of existing software or software knowledge to construct new software” (Frakes and Kang 2005). Reuse roughly consists of two major steps (Basili et al. 1996): 1. identifying a suitable component to reuse, and 2. integrating it into the target system. For example, vendors of mobile phones typically reuse an “upstream” (i.e., externally developed) operating system component in their device, customized with proprietary device drivers, control panels and utilities (Jaaksi 2007). Reuse is very commonplace, as shown in studies on software projects of different sizes in China, Finland, Germany, Italy and Norway (Chen et al. 2008; Hauge et al. 2008, 2010; Jaaksi 2007; Li et al. 2008, 2009). For example, almost half of the Norwegian software companies reuse “Open Source” (OSS) in their products (Hauge et al. 2008), while 30 % of the functionality of OSS projects in general reuse existing components (Sojer and Henkel 2010). + + +Although reuse speeds up development, leverages the expertise of the upstream project and, in general, improves the quality and cost of a product (Basili et al. 1996; Gaffney and Durek 1989; Szyperski 1998), it is not entirely risk- and cost-free. In particular, the integration step of reuse consumes a large amount of effort and resources (Boehm and Abts 1999; Brownsword et al. 2000; Di Cosmo et al. 2011; Morisio et al. 2002), for various reasons. “Glue code” (Yakimovich et al. 1999) needs to be developed and maintained to make a component fit into the target system, and developers need to continuously assess the impact on this glue code of new versions of the component (such a new version can bring an unpredictable set of bug fixes and features). Furthermore, the component might depend on other components, whose bugs could propagate to the target system in undocumented ways (Dogguy et al. 2010; McCamant and Ernst 2003; Orsila et al. 2008; Trezentos et al. 2010). + + +The ability to make local changes to the source code of a reused component introduces even more challenges, since an integrator typically is not familiar with the reused component’s code base and hence can easily introduce bugs in such local changes (Hauge et al. 2010; Li et al. 2005; Merilinna and Matinlassi 2006; Stol et al. 2011; Tiangco et al. 2005; Ven and Mannaert 2008). Worse, if the local changes are not contributed back to the owner of the reused component, the organization that made the changes will need to maintain them and possibly re-apply them themselves to future versions of the component (Spinellis et al. 2004; Ven and Mannaert 2008). + + +Thus far, most of the empirical studies on integration of components (Brownsword et al. 2000; Hauge et al. 2010; Li et al. 2005; Merilinna and Matinlassi 2006; Morisio et al. 2002; Stol et al. 2011; Ven and Mannaert 2008) concentrated on the base case of integrating one component in a target system. In practice, however, organizations tend to integrate not one, but two or more components, which brings along a set of unique challenges (Morisio et al. 2002; Van Der Linden 2009; Ven and Mannaert 2008), especially given the popularity of open source development: in the timespan of one release, an organization needs to coordinate the integration of updates by multiple vendors, typically with totally independent release dates (Boehm and Abts 1999; Brownsword et al. 2000). For example (Jaaksi 2007), Nokia’s N800 tablet platform reused 428 OSS components, 25 % of which were reused as is (e.g., bzip2 and GNU Chess), 50 % were changed locally (e.g., the graphics subsystem), and 25 % were developed in-house using open source practices (“inner source”, ISS). It is +unclear for organizations like Nokia how to keep their system stable and secure amidst the integration of so many different components (Hauge et al. 2010). Furthermore, there is a clear need (Boehm and Abts 1999; Crnkovic and Larssom 2002; Merilinna and Matinlassi 2006) for dedicated training and education of developers and organizations on integration, since in a world of open source they now need to collaborate with the providers of 3rd party components and other external contributors to benefit from external contributions and to avoid having to maintain bug fixes and other customizations oneself. + + +This paper aims to improve the understanding of multi-component integration by empirically studying and documenting the major integration activities performed by OSS distributions (Gonzalez-Barahona et al. 2009). An OSS distribution basically is a “packaging organization” (Ruffin and Ebert 2004; Merilinna and Matinlassi 2006), i.e., an organization that integrates upstream components into a common platform (similar to product lines Meyer and Lehnerd (1997) and Pohl et al. (2005)), ironing out bugs and intellectual property issues, and providing extensive documentation and training on the integrated components. Reusing an OSS component through an established distribution provides more confidence in the quality of the component (Tiangco et al. 2005), and hence many companies use OSS distributions as the basis for products like routers, mobile phones or storage devices (Koshy 2013). Examples of established OSS distributions are Eclipse, GNOME and operating system distributions like Debian or Ubuntu. + + +Here, we focus on operating system distributions (henceforth called “OSS distribution”), which bundle and customize OSS operating system kernels (e.g., Linux or BSD), system utilities (e.g., compilers and file management tools) and end-user software (e.g., text processors, games and browsers) with a dependency-aware package system. There are almost 400 active OSS distributions, and each year 26 new ones are born (Lundqvist 2013). Given the growing competition, distributions need to release new features and versions in an ever shorter time frame (Hertzog 2011; Remnant 2011; Shuttleworth 2008) to millions of desktop users and server installations. To achieve this, they rely on hundreds of volunteers to integrate the latest versions and bug fixes of the tens of thousands of integrated upstream components. + + +We empirically studied the major integration activities of three of the most popular and successful OSS distributions, i.e., Debian, Ubuntu and FreeBSD, using qualitative analysis on an accumulated 29 years of historical change and bug data. We document these activities and the steps used to perform them in a structured format, distilling the state-of-the-practice tools and processes followed by the actors involved in the activity, providing concrete examples, and comparing our findings to prior research and integration outside the context of OSS. Six members of the maintenance community of the analyzed distributions discussed and refined the documented activities, and provided feedback on the usefulness and completeness of the activities. Similar to the concept of design patterns (Gamma et al. 1995) or reference architectures (Bowman et al. 1999), the documented activities can be used by (1) organizations as a common terminology for discussing and improving integration activities for components, and (2) researchers to set up a road map for research on integration, since integration remains a largely unexplored research area (Goode 2005; Hauge et al. 2010; Stol et al. 2011). + + +The main contributions of this paper are: + + +– Identification and documentation of seven major integration activities and the processes that they follow in three major OSS distributions. +– Identification of major challenges for tool support and research for integration activities. +– Evaluation of and feedback on the identified activities and challenges by six integration maintainers and release managers of the analyzed distributions. + + +This paper is structured as follows. First, Section 2 discusses background and related work on software integration and OSS distributions, after which Section 3 presents the design of our qualitative analysis. Section 4 documents the seven integration activities that we identified during our analysis, followed by a discussion of the open challenges that we identified (Section 5) and the evaluation of our findings by six practitioners (Section 6). We conclude with threats to validity (Section 7) and the conclusion (Section 8) of our study. +---------------------------------------- +------------------------------- +Section 272: +2 Background and Related Work + + +This section discusses background and related work on integration and open source distributions. Table 1 summarizes key technical terms that will be used throughout the paper. + + +2.1 Software Integration + + +Reuse can be black box or white box (Frakes and Terry 1996). Black box reuse refers to “Commercial Off The Shelf” (COTS) components (Boehm and Abts 1999), for which + + +| term | meaning | +|-----------------------|--------------------------------------------------------------------------| +| reuse | identification and integration of a component (e.g., class or library) into a system | +| OSS reuse | reuse of Open Source Software | +| COTS reuse | black box reuse based on Commercial Of The Shelf components | +| ISS reuse | reuse of Inner Source Software, i.e., OSS developed in-house | +| integrator | organization that integrates a third party component into its product | +| maintainer | individual or team doing physical integration on behalf of integrator | +| downstream project | synonym for “integrator” | +| upstream project | organization (open source project or company) whose components are being integrated by another project | +| upstream component | component developed by upstream project that is being reused | +| multi-component integration | integration of more than one upstream component | +| packaging organization| integrator whose business goal is to package upstream components into a coherent platform that is offered for sale or reuse | +| package | upstream component that has been integrated into an OSS distribution using the distribution’s packaging format (e.g., “rpm”) | +| binary distribution | distribution providing compiled code in its packages | +| source-based distribution | distribution providing source code in its packages, for compilation on the end-user’s machine | +| derived distribution | “child” distribution that customizes packages of an existing “parent” distribution and adds additional packages to it | +source code typically is not available. Hence, such components can only be configured and plugged into a target system. White box reuse provides access to the component’s source code to customize it to the needs of the target system, either because the component is OSS (Spinellis et al. 2004) or because it is developed in-house following open source principles (“inner source”, ISS), a practice that is increasingly more common in large companies like Alcatel-Lucent, HP, Nokia, Philips and SAP (Stol et al. 2011). OSS and ISS reuse are also very common in the base platform of software product lines (van der Linden et al. 2007; Pohl et al. 2005; Van Der Linden 2009), since up to 95% of such a platform consists of “commoditized” features readily available from upstream projects. + + +In general, software reuse creates a win-win situation for the reusing organization and the upstream project whose software is reused. The former benefits from the features provided by the component in terms of productivity and product quality (Frakes and Kang 2005; Szyperski 1998), while the upstream project benefits financially (through licensing) and/or qualitatively from the various forms of feedback in the form of defect reports, code contributions and user experiences. However, despite the differences between COTS and OSS/ISS, all forms of reuse introduce a dependency on an upstream project (COTS/OSS) (Di Giacomo 2005; Hauge et al. 2010; Lewis et al. 2000; Mistrík et al. 2010; Morisio et al. 2002) or another division inside the organization (ISS) (Van Der Linden 2009), which can lead to hidden maintenance costs. + + +Software reuse has been studied extensively from the perspective of how to make a software system reusable (Coplien et al. 1998; DeLine 1999; Frakes and Kang 2005; Mattsson et al. 1999; Parnas 1976; Pohl et al. 2005), how to select components for reuse (Bhuta et al. 2007; Chen et al. 2008; Li et al. 2009), how to resolve legal issues regarding software reuse (German et al. 2010), and what factors can impact collaboration between the component provider and integrators (Brooks 1995; Curtis et al. 1988; Herbsleb and Grinter 1999; Herbsleb et al. 2001; Seaman 1996). In particular, Curtis et al. (1988) found, based on interviews, how the need to communicate outside the team, department or even company boundaries opens a can of worms (e.g., finger-pointing, silos of domain knowledge, limited communication channels, lack of contact persons and misunderstanding due to different context) that can negatively impact the integration process. Herbsleb and Grinter (1999) and Herbsleb et al. (2001) empirically proved that the need to involve more people indeed relates to the time necessary to resolve bugs and integration issues. + + +In contrast, the concrete activities involved with the integration of reused components, as well as their costs, have been studied in substantially less detail. Especially for multi-component integration, where not one but a potentially large number of (typically open source) components are being reused by an organization at the same time, empirical evidence is currently lacking (Morisio et al. 2002; Van Der Linden 2009; Ven and Mannaert 2008). Lewis et al. (2000) note that “The greater the number of components, the greater the number of version releases, each potentially coming out at different times.” Hence, what kind of activities does such integration imply, and how do those activities relate to known activities for single-component integration? Before explaining how this study addresses these questions, we first discuss prior work on COTS, OSS and ISS reuse. + + +2.1.1 COTS Reuse and Integration + + +Brownsword et al. (2000) studied over 30 medium-to-large commercial projects to analyze the hidden integration activities of COTS reuse. They found that for an organization it is important to be informed about (new versions of) promising COTS components and to continuously monitor the impact of the components on the organization’s code base. They +also point out the maintenance issues of glue code and configuration of a COTS component, and the fact that projects do not control the upstream project. However, their findings are rather high-level, and do not explain how the projects coped with multi-component integration. + + +Lewis et al. (2000) relate on their experience with COTS reuse in 16 government organizations. They especially stress the loss of control as soon as a contract for COTS reuse is signed: any clause or adaptation that was not negotiated will result in additional costs down the line. Changing one’s own system or looking for another COTS component is preferable to requesting (and having to pay) the component vendor to adapt her component. The main question in the studied organizations’ mind was “How do we upgrade an operational system without a great deal of disruption?”. There was no consensus whether one should always update to the latest version of a reused component, wait until a new major version or incorporate only the most pressing changes (e.g., security fixes). These questions only aggravated for those organizations that were reusing dozens of components, which causes additional coordination issues. + + +A similar study was performed by Morisio et al. (2002) at NASA. Again, integration was the most costly aspect of COTS reuse, yet the integration activities varied widely across projects. Glue code was the main means of integration, and the authors note that most successful projects had to stay in contact with the COTS component provider throughout the lifecycle of the system to avoid surprises in the next version of the COTS. + + +2.1.2 OSS Reuse and Integration + + +Merilinna and Matinlassi (2006) performed a literature survey and structured interviews with nine small-to-medium Finnish companies that reuse OSS components. They found that integration problems are primarily due to the heterogeneous environments that components need to support as well as the lack of documentation, forcing companies to rely primarily on their own experience. Merilinna et al. identified three ways to deal with integration problems: using OSS components as a COTS component (no changes to the code), contributing changes back upstream, or using a packaging organization like an OSS distribution as mediator. Not upgrading to a new version of a reused component can also help. In any case, a thorough analysis of the OSS component to be reused can avoid many problems. + + +Ven and Mannaert (2008) performed interviews with members of a commercial project reusing OSS components, and examined in detail the trade-off between changing the code and contributing the changes back. Even though a project wants to avoid maintaining local changes (since this is costly), the alternative of contributing changes to the upstream project also requires an investment of time and resources, for example to get to know the contribution procedures and to keep track of the future evolution of the upstream project. Even if a patch is accepted by the upstream project, the organization developing the patch might still be required to maintain it since only it has all the insight. Ven et al. recommend to contribute patches if the local changes are sufficiently generic, to maintain patches oneself if they are too specific, or (in the worst case) to fork the upstream project, even though such a fork has only a small chance of success. + + +While Merilinna and Matinlassi (2006) and Ven and Mannaert (2008) identified two integration activities that we also identified in our study (i.e., Upstream Sync and Local Patch), we approached those activities from the perspective of a packaging organization (and multi-component integration) and documented them in a structured way. +2.1.3 ISS Reuse and Integration + + +Stol et al. (2011) studied the emerging practice of developing and reusing code in-house using open source practices (ISS). ISS is a popular phenomenon in large companies, since it provides the benefits of OSS reuse without giving up control. Some companies only offer their employees the infrastructure for ISS reuse, while others make it part of their development strategy. A systematic literature study and detailed study of ISS inside an organization shows that the most costly ISS issues are due to integration. In addition to the integration issues related to OSS reuse in general, other challenges like backwards compatibility and the peculiar interplay between the ISS team and other teams in a company were identified. For example, the ISS team can send a “delivery advocate” to other teams to help them integrate the ISS components. However, various activities are company- and ISS reuse-specific. For example, the ISS team receives components initially from a specific team in the organization, but after integration becomes responsible for it itself and starts acting as upstream for the other teams in the organization (even though the original developers still collaborate on the development of the component). In this paper, OSS distributions and upstream projects are separate, independent entities. + + +Finally, Van Der Linden (2009) reports on adoption of OSS and ISS reuse in software product lines (Meyer and Lehnerd 1997; Pohl et al. 2005). The platform on which such product lines are built largely consists of common functionality for which many components are available. Reuse of OSS and ISS components for such functionality improves the quality and speed of development, however it also introduces a dependency on the upstream projects, not only from the platform, but from all products based on the platform. In addition to the best practices mentioned before, close collaboration with the upstream projects in a symbiotic fashion is key to keeping track of new features and changes, and can be established by reporting or fixing bugs. Although OSS distributions can be seen as a product line, our study focuses especially on the identification and structured documentation of major integration activities in the context of multi-component integration. + + +2.2 Open Source Distributions + + +This paper focuses on the maintenance activities involved in software integration in the context of OSS distributions, since this context enables us to study integration in a multi-component, open source setting. OSS distributions are one of the most well-known open source packaging organizations (Gonzalez-Barahona et al. 2009; Ruffin and Ebert 2004). Such distributions integrate a collection of upstream software components consisting of an operating system kernel (e.g., Linux or BSD), core libraries, compilation tools and software for users like desktop applications and web browsers. Thanks to their inclusion in an OSS distribution, the integrated upstream projects can reach millions of users without having to market themselves. Although distributions are especially known in the Linux and BSD world, even commercial products like Microsoft Windows and Mac OS X can be considered as distributions (they just ship with more ISS projects than OSS). + + +There are hundreds of OSS distributions, most of which integrate thousands of upstream components. Figure 1 shows that the total number of currently active Linux distributions has grown to 380 (in addition to 135 discontinued distributions, which are not shown), increasing more or less by 26 distributions each year (Lundqvist 2013). For the BSD family of open source kernels, there are twelve currently active distributions (Comparison of BSD operating systems 2011), in addition to 22 distributions that are either discontinued or +have an unclear status. The most popular Linux distributions like Debian and Ubuntu both integrate more than 24,000 OSS components, whereas FreeBSD (most popular BSD distribution) integrates almost 23,000 components. The Debian distribution doubles in size every 2 years, having passed the mark of 300 MLOC in 2007 (Gonzalez-Barahona et al. 2009). + + +Despite this large scale, integrating an OSS project’s components into a distribution goes far beyond black-box reuse. First, the upstream components need to be turned into a distributable “package”. Distributions such as Debian, Ubuntu and Fedora, compile the components for a particular architecture, then split up the compiled libraries and executables across one or more “binary” packages. Such packages (together with the packages they depend on) can be automatically installed using a distribution-specific package management system, such as “apt”, “dpkg” or “yum”. Source-based distributions, like FreeBSD, distribute the (possibly customized) source code of an upstream component to the end-user as a so-called “source” package (FreeBSD uses the term “port” for this), for compilation on the user’s machine. Unless otherwise specified, the term “package” in this paper will refer to both “binary” and “source” (port) packages. + + +After building and packaging the upstream component, the new package needs to be tested and delivered to the end-user. Once a package becomes available to end-users (including the integrators), the real integration maintenance work starts, since packages (and their dependent packages) need to be continuously updated to new versions of the packaged component. Similarly, bugs in the package should be detected and fixed promptly, and (if appropriate) patches should be sent back to the upstream project that developed the packaged component. Local changes to the package that have not been sent back, however, need to be maintained and kept up-to-date by the distribution. User complaints should be triaged and processed by the distribution as well, before escalating them to upstream, if appropriate. + + +Organizations that reuse a component typically (Koshy 2013; Merilinna and Matinlassi 2006) appoint a person or group of people, i.e., the “maintainer(s)”, to perform and co-ordinate integration activities on the organization’s behalf. Organizations like OSS distributions dealing with multiple upstream projects and components typically have multiple maintainers, each one responsible for a group of related upstream components. Figure 2 shows the interactions of a distribution’s maintainer (in bold) with the other major actors of the distribution. The maintainer packages and customizes the upstream software component by herself, interacting with the upstream project whenever necessary, for example to understand changes in a new release or to communicate reported bugs. Customizations result in local patches applied to the vanilla upstream component, after which the patched component is packaged using the distribution’s package management tool. The package is being tested by the project’s package community, which consists of volunteering contributors and +testers. Once stabilized, packages can also be used by end-users, who can contribute bug reports or suggestions by contacting the maintainer. The maintainer’s work ultimately ends up in an official release of the distribution, hence all maintainers are being co-ordinated by the release manager in charge. Some of the common activities of the release manager are discussing release-critical bugs or project-wide packaging policies with the maintainer, and enforcing deadlines. + + +Given the size of a distribution, most of the maintainers are responsible for multiple components (each of which is packaged into one or more packages). Debian has around 2,400 (Project participants 2013) maintainers for 24,000 integrated components (a ratio of 10 components per maintainer), while FreeBSD has around 400 (The freeBSD developers 2013) maintainers for 23,000 components (ratio of 57.5). Ubuntu only has around 150 (Ubuntu universe contributors team 2013; MOTU team 2013; Ubuntu core development team 2013) maintainers for 24,000 components (ratio of 160), since most of its packages are inherited as-is from Debian, thus requiring less work. Given the high maintainer-to-component ratios, maintainers often team up to share package responsibilities, but even then, they still need to divide their attention and limited time across many components. In addition, the maintainers are not the developers of the packages that they are maintaining, which means that even more time is spent to fully understand changes or to contact the upstream developers about a change (Brownsword et al. 2000; Stol et al. 2011). Finally, various proposals have been launched to shorten the time frame in between releases of distributions (Hertzog 2011; Remnant 2011) or even to synchronize releases with those of other distributions (Shuttleworth 2008). This further complicates the task of the package maintainers. + + +This paper identifies and documents the integration activities that must be done on a daily basis by the maintainers of three of the most successful OSS distributions. Previous research has focused exclusively on the other stakeholders in Fig. 2: the governance processes of distributions (Sadowski et al. 2008), release management (Michlmayr et al. 2007; van der Hoek and Wolf 2003), the package/developer community (Scacchi et al. 2006), the (evolution of the) size and complexity of packages (Gonzalez-Barahona et al. 2009), and the dependencies of packages (German et al. 2007). Given the central role of package maintainers in the success of a distribution, their responsibilities and challenges need to be understood in order to streamline the interaction between the OSS distribution and the upstream project, and to bring new maintainers quickly up-to-speed. Furthermore, +previous work focused especially on integration of individual components, while packaging organizations like OSS distributions need to deal with the integration of thousands of components at the same time, with their users expecting the latest versions of each component to be integrated. Finally, open source development forces organizations to collaborate with external parties to reap the full benefits of quality and innovation that can be achieved with open source components. If not, organizations waste substantial effort, for example to maintain their own local patches. Hence, studying the integration activities of distributions will help us understand integration in a multi-component, open source context. + + +The following section presents the approach that we followed to identify and analyze the major integration activities in three large OSS distributions. +---------------------------------------- +------------------------------- +Section 273: +3 Case Study Setup + + +The goal of this paper is to empirically identify and document the major integration activities in use by packaging organizations for multi-component OSS integration, as existing empirical work focused exclusively on single-component integration. Since a wide range of packaging organizations exists, as a first step we focus on some of the most experienced integration experts in the area of OSS reuse, i.e., OSS distributions. In particular, we perform qualitative analysis on three of the largest and most successful OSS operating system distributions, i.e., Debian, Ubuntu and FreeBSD. + + +Although our results consist of integration activities performed in OSS distributions, these activities are not unique to OSS integration, nor are they just a subset of the integration activities performed by commercial organizations. Whereas in a commercial setting organizations used to buy or develop all dependencies themselves, an OSS setting requires one to collaborate with a variety of external stakeholders to avoid being stuck with one’s own patches and customizations. Avoiding this requires a different set of integration activities than before. In fact, those activities now need to trickle back into the commercial organizations that started to adopt OSS practices internally (ISS reuse). + + +To help such organizations, as well as open source projects, this paper addresses the following question: What is the core set of activities in OSS for dealing with integration of multiple 3rd party components? This question allows us to empirically study what is being done in OSS integration, how it is being done and what challenges expert integrators still face. In particular, it also helps us understand what are the state-of-the-art techniques in use by OSS projects to facilitate their integration activities. + + +This section discusses the methodology for our study, which is also illustrated in Fig. 3. We first performed a qualitative analysis to identify and document major integration activities, then evaluated these findings with stakeholders from the three distributions. + + +Fig. 3 Overview of our case study methodology +3.1 Subject Selection + + +To obtain a representative sample, we selected a mixture of binary and source-based, and derived and independent OSS distributions. A derived (or “child”) distribution automatically inherits the packages of its “parent” distribution. It then customizes some of those packages, and also adds its own packages, in order to enforce a uniform look-and-feel, focus on specific types of packages or to specialize to a certain set of users (e.g., office workers vs. music producers). Although a derived distribution saves substantial integration time, it also leads to a unique set of integration activities, since each level of derivation adds an additional layer in the integration process. + + +When looking at the history of open source distributions (Lundqvist 2013), Debian and Ubuntu clearly stand out as two of the most influential distributions, with 41.0% of all distributions deriving from Debian (211 out of 380 active and 135 discontinued distributions), 90 from Ubuntu and 17 from FreeBSD. In particular, the Debian distribution has 81 child distributions, 105 distributions deriving from those child distributions (“grand-children”), 24 great-grand-children and 1 great-great-grand-child (Lundqvist 2013). The latter potentially needs to integrate packages from its four ancestors as well as from some upstream OSS projects directly. Ubuntu itself has 79 children and 11 grand-children (Lundqvist 2013), while FreeBSD has 15 children, 1 grand-child and 1 great-grand-child (Comparison of BSD operating systems 2011). + + +We found that the impact of the above distributions on other distributions also translated well to their popularity in terms of number of users. In contrast to mobile app stores, there is no official popularity poll or ranking of OSS distributions. However, since May 2001 one of the leading sources on OSS distributions is the distrowatch.com web site, which contains announcements of new versions of distributions as well as detailed historical overviews of each distribution (either Linux- or BSD-based). One of its major features is that, on a weekly basis, the site keeps track of how many people search or click for each distribution. Although this ranking does not map 1-to-1 to the number of downloads, it does give an important indication about the popularity of OSS distributions. + + +Despite its age (the first Debian release was made on the 16th of August, 1993), Debian was still the fourth most popular binary distribution at the time of our case study, while Ubuntu was the second most popular binary and derived distribution. We decided not to study the top binary distribution at the time of our case study (i.e., Linux Mint), since it was a rather recent distribution derived from Ubuntu, without sufficient historical data available. The third most popular distribution was Fedora, but since this is independent of the Debian/Ubuntu ecosystem, we also did not study this distribution. As source code-based distribution, we picked the most popular source code-based BSD distribution, i.e., the FreeBSD distribution. Note that FreeBSD is also the most popular BSD distribution in general according to the 2005 BSD Usage Survey (The BSD Certification Group 2005). + + +3.2 Data Sampling + + +We study integration activities by systematically analyzing, categorizing and revising historical package data for Debian, Ubuntu and FreeBSD to create a classification of integration activities. Given the large number of packages and package-versions in the three distributions (Table 2), we could not examine all of them manually. Instead, for each distribution we sampled enough package-versions to obtain a confidence interval of length 5% within +Table 2 Characteristics of the data for the three subject distributions + + +| | Debian | Ubuntu | FreeBSD | +|----------------|--------------|--------------|--------------| +| start of project | 16/08/1993 | 20/10/2004 | 11/1993 | +| start of data | 12/03/2005 | 20/12/2005 | 21/08/1994 | +| end of data | 16/08/2011 | 14/09/2011 | 01/09/2011 | +| #components | 24,263 | 25,345 | 22,733 | +| #packages | 92,277 | 66,595 | 22,733 | +| #pkg. versions | 896,757 | 446,324 | 162,135 | +| #releases | 4 | 14 | 8 major/55 minor | +| #maintainers | 2,400 | 150 | 400 | + + +a 95 % confidence level, taking into account the large population size (Cochran 1963): + + +[ +\text{sample size} = \frac{ss}{1 + \frac{ss}{#\text{pkg. versions}}} \ +] + + +with + + +[ +ss = \frac{Z^2 \cdot p \cdot (1 - p)}{0.05^2} \ +Z = 1.96 \text{ for 95\% conf. level} \ +p = 0.5 \text{ for pop. with unknown variability} +] + + +This means that if we find an integration activity to hold for ( n ) % of the sampled package-versions, we can say with a 95 % certainty that ( n \pm 5 \% ) of all package-versions exhibit that activity. For example, ( 7 \pm 5 \% ) would mean that the activity would hold with a 95 % certainty for 2 % to 12 % of the package-versions. Although the three distributions have a different number of package-versions, the asymptotic nature of the sample size formula obtained the same number of package-versions (384) for each distribution. + + +3.3 Data Extraction + + +We randomly sampled 384 package-versions from each distribution, then automatically extracted for each selected package-version the corresponding change log message. Such a change log basically consists of a detailed (Koshy 2013) bullet list containing a high-level, textual summary of all major changes in a particular package-version, as well as the explicit IDs of all fixed bugs. Figure 4 shows an example change log message of a Debian package-version (Ubuntu and FreeBSD use a similar format). Except for two changes, all changes in Fig. 4 fix open bug reports, with the reports’ identifier pasted inside the change log. As distributions stipulate that each new package-version has to be documented in a change log (Debian project 2011), we used change log data as starting point for the analysis of each package-version. + + +To interpret the change log’s reported changes, we then manually analyzed the referenced bug reports via the distributions’ bug repository. As explained below, each distribution uses a different technology for its change logs and bug repository, but we were able to write scripts to automate the fetching of both the logs and reports. The bug reports often contained references to emails on a distribution’s mailing lists, and sometimes contained +patches that had been proposed as a possible bug fix. If present, we also studied these messages and patches. Finally, to clarify technical terms or understand particularly unclear bugs or changes, we used the distribution’s developer documentation (accessible from a distribution’s web site) and, in the worst case, any relevant web search, especially for finding relevant communication on online fora. This was only necessary in a small number of cases. + + +We now discuss how we obtained the above data for each of the three distributions. This data can be found online in the paper’s replication package (Adams et al. 2015). For Debian, we obtained the names of all integrated components across Debian’s entire history from the so-called snapshot archive. This is a server containing all versions of all packages over time, and allowing scriptable access via a public JSON-based API. Then, for every integrated component, we retrieved all version numbers, their timestamps and the list of binary package names associated with the component (since a component can be split across multiple packages). After sampling 384 package-versions, we downloaded the corresponding change log using a simple script from Debian’s change log repository. Bug reports mentioned in the change logs can be found in the bug repository using the bug identifier. Related email messages and other data mentioned in the bug reports was found by using a web search. + + +For Ubuntu, we used the Python API of the Launchpad collaboration platform to retrieve the names and version numbers of all Ubuntu packages that have ever existed. Because Ubuntu is derived from Debian, we filtered the Ubuntu packages to include only the ones customized by Ubuntu, since the other packages are identical to Debian packages. Ubuntu-customized packages have a version number ending in “-MubuntuN”, where “M” and “N” are numbers following a special convention. We found 133,311 of such package versions, belonging to 26,858 packages. Except for a different location of the change logs and bug reports, we used the same approach for data extraction as for Debian. + + + + +1http://snapshot.debian.org/ +2http://packages.debian.org/changelogs/pool/main +3http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=XYZ with XYZ the bug identifier +4http://api.launchpad.net/1.0/ +5http://changelogs.ubuntu.com/changelogs/pool/main +6Manual search using the bug identifier on https://bugs.launchpad.net/ubuntu +For FreeBSD, data extraction was a bit more involved, since it is a source-based repository. For this reason, we retrieved a copy of the FreeBSD version control system (CVS), which contains all local file changes ever made to all reused components. Since such CVS changes are too fine-grained to be considered a “version”, but releases are too coarse-grained (multiple port versions can exist in between two official releases), we had to reconstruct the port versions by grouping related CVS changes together. For this, we used the FreeBSD convention that each port’s Makefile is expected to have a PORTREVISION variable that is changed “each time a change is made to the port which significantly affects the content or structure of the derived package” (FreeBSD porter’s handbook 2011). If a maintainer does not change the PORTREVISION (nor the related PORTVERSION variable), the corresponding changes are not deemed important enough to be automatically picked up by users during an update of their installation. We interpret this as “changes that do not change the PORTREVISION variable do not define a new port version”, similar to the definition of “version” of binary packages. + + +In practice, we determined for each port the timestamps of all changes that change PORTREVISION and/or PORTVERSION, then grouped all changes to a port’s files between two consecutive PORTREVISION changes (excluding the first PORTREVISION change) into one port version. We treated all changes up to and including the first Makefile revision as the first PORTREVISION, to account for the initial import of a port. We wrote scripts that queried the CVS repository for all commit log messages between the start and end date of a port version. The change logs of the resulting port versions then correspond to the concatenation of these commit log messages. Finally, bug reports were obtained from FreeBSD’s bug repository based on the bug identifiers mentioned in the change logs. + + +3.4 Data Analysis + + +Since we did not have any classification of integration activities to start from, initially the first author studied the Debian distribution as a pilot project. He manually interpreted the changes documented in the change log of each sampled package-version, then looked up the bug reports referenced by the change log in order to understand which bugs had been resolved or which features had been added, and how this was done. For the latter, the bug reports’ comments were an important source of information. To fully understand the scope and context of more complex changes, he sometimes had to consult email messages referenced by the bug reports and patches attached to them. In case of doubt or usage of unfamiliar technical terms or inside stories, the distribution’s developer documentation was considered or, in the worst case, a web search was performed. + + +Once it was clear what exactly the integrators had done to produce the analyzed package-version, the package-version was tagged with any observed activity to summarize the rationale behind the version. Two examples of activities could be “new release” or “package dependency change”. More than one tag could be assigned to a version, since a new version of a package typically consists of multiple changes (as seen earlier in Fig 4). By repeating this procedure for all sampled Debian versions, and constantly revising already analyzed versions when new tags were found, an initial tagging schema was built up, representing different activities that go into a package-version. + + + + +7 ftp3.ie.FreeBSD.org::FreeBSD/development/FreeBSD-CVS/ports/ +8 pserver:anoncvs@anoncvs.tw.FreeBSD.org:/home/ncvs +9 http://www.FreeBSD.org/cgi/query-pr.cgi?pr=XYZ with XYZ the bug identifier +After finishing the pilot project on Debian, the first two authors revised the obtained tagging schema, leveraging the second author’s experience as a Debian/Kubuntu maintainer and developer. Some tags were merged, others were renamed, and with the resulting tagging schema in hand, we revised the Debian analysis to standardize the tags used. Afterwards, both authors analyzed the Ubuntu and FreeBSD data using the same tagging schema as a starting point (and using the same approach as for Debian). Conflicts in tagging between both authors were manually resolved through discussion. We did not find additional tags for Ubuntu and FreeBSD, giving us confidence about the completeness of our initial tagging schema. Eventually, we obtained seven very popular tags, two less popular ones and a catch-all tag for multiple unique or less frequent activities unrelated to any of the other tags. We excluded the latter three tags from our analysis, but we come back to them in Section 6. The replication package (Adams et al. 2015) contains the tags and noteworthy observations of the sampled package versions. + + +3.5 Identification and Documentation of Activities + + +The seven most popular tags obtained after the manual analysis all correspond to unique integration activities, however each distribution could have its own terminology and workflow for such an activity. Hence, in order to abstract up the commonalities and variabilities across distributions for a particular activity (tag), all authors together distilled the intent, motivation, common tasks and current practices across the distributions based on (1) the information that we encountered in the change logs, bug reports and mailing lists for the sampled package-versions, as well as (2) the second author’s experience as a Debian/Kubuntu developer. This was an iterative process, trying to separate the essential steps used during an integration activity from implementation details or exceptions in a particular distribution. Typically, each author would refine one or two patterns, then send to the next author for further refinement until no more changes were made to an activity. + + +Similar to design patterns (Gamma et al. 1995), we then “captured [the activities] in a form that people can use effectively”. For each integration activity, we documented in a rigid format its intent, motivation, the major tasks involved in the activity, its participants, possible interactions with other activities and notable instances of the activity in the three studied distributions (Debian, Ubuntu and FreeBSD). Interactions are based on co-occurrence of activities in our data. We also tried to compare each activity to prior work in the integration literature, to put each activity in context. + + +During the tagging of integration activities, and the abstraction into pattern form, the authors encountered recurring issues and problems of the package maintainers. Such issues and problems were noted down by each author individually, then compared and clustered to obtain a set of challenges, across 4 research areas. After filtering out challenges that were already addressed by related work, we obtained 13 concrete challenges or limitations that, based on our data, seemed to hold back maintainers in their activities. To cross-check those challenges, together with the activities that we documented, we performed a validation with practitioners in the next step. + + +3.6 Validation of the Activities by Practitioners + + +In order to get feedback on the correctness and usefulness of the documented integration activities and challenges, we contacted members of the package maintenance community of Debian, Ubuntu and FreeBSD. We asked them to (1) verify the correctness of the activities that we derived and abstracted from the change log, bug report and other historical data, as +well as of the challenges that we uncovered, and to (2) provide feedback on the usefulness of the activities as well as the activities and challenges that we might have missed while analyzing the sampled package-versions. + + +Based on their extensive experience with the 3 distribution communities, the second and fourth author first compiled a short-list of package maintainers and release engineers experienced with maintaining large packages. We then contacted the people on the short-list by email, since email is the preferred channel of communication for maintainers (and maintainers are volunteers spread across the world, without a fixed office). We played with the idea of creating a bug report for our study, since maintainers track the bug repository of their package from close-by, however since bug reports are a public broadcast medium, and people would have been able to chime in and perhaps influence the maintainer, we discarded the bug repository for our purposes. + + +We eventually received feedback from 3 maintainers (M1, M2 and M3) active in both Debian and Ubuntu, one (M6) in Debian, 1 (M5) in Ubuntu, and 1 (M4) in FreeBSD. All of them have at least five to ten years of experience, since the role of package maintainer or release engineer can only be deserved through years of active involvement in a distribution. Note that to respect their anonymity we will refer to all of them as “maintainers” and use a symbolic name. + + +When contacting the maintainers, we provided them a draft of this paper, then asked them for feedback about the documented activities and challenges. In particular, we asked the following questions to evaluate the usefulness and completeness of the activities and challenges: + + +Q1 What activities did we miss? +Q2 What can the documented activities be used for? +Q3 Which existing tools and techniques for these activities did we miss? +Q4 What challenges did we miss? +Q5 What promising tools/techniques do you see coming up to address some of the challenges? + + +The maintainers replied to the five questions by email. All six also provided higher-level comments about the paper, with one maintainer providing an annotated pdf with more detailed comments. Despite their busy schedule and the asynchronous nature of email communication (one cannot force someone to reply), only two maintainers left two or more questions blank. We come back to this in Section 6. The email replies were then analyzed by two of the authors and summarized into a table (Table 5) in order to compare the findings across all 6 maintainers. + + +At a high level, the obtained feedback showed us whether the activities as a whole made sense, whereas at a lower level it exposed inaccuracies, missed workarounds and any factual errors. We then used this feedback to flesh out the description of the seven documented activities and the 13 challenges, to obtain the final version of the activities documented in the present paper. The contacted members suggested five additional activities, however since we did not have sufficient empirical support for these activities in our data sample, we did not add them to the documented activities. Instead, we discuss those additional activities in Section 6. +Table 3 Overview of integration activities and their prevalence in the three distributions. Activities below the horizontal line were not common enough to be documented + + +| Activity | Explanation | % Deb. | % Ub. | % Fre. | +|------------------------|--------------------------------------------------|--------|-------|--------| +| A. New Package | Integrating a new software project. | 1.04 | 0.78 | 13.54 | +| B. Upstream Sync | Updating to a new upstream version. | 40.89 | 43.75 | 57.81 | +| C. Dependency Management| Managing changes to dependencies. | 38.80 | 30.73 | 28.39 | +| D. Packaging Change | Changing a package’s packaging logic. | 43.49 | 44.01 | 38.80 | +| E. Product-wide Concern| Enforcing policies across all packages. | 4.95 | 3.13 | 25.00 | +| F. Local Patch | Patching upstream source code locally. | 22.40 | 28.39 | 12.24 | +| G. Maintainer Transfer | Managing unresponsive maintainers. | 5.73 | 0.00 | 2.86 | +| H. Security | Patching a security vulnerability. | 4.43 | 1.30 | 0.78 | +| I. Internationalization| Internationalization of packages. | 4.17 | 1.56 | 0.26 | +| J. Other | Catch-all for rare activities. | 2.34 | 4.95 | 1.04 | +---------------------------------------- +------------------------------- +Section 274: +4 Integration Activities in Distributions + + +Table 3 gives an overview and short explanation of the seven major integration activities that we documented, as well as three less common ones. The table also provides the percentage of sampled Debian, Ubuntu and FreeBSD package-versions that involve each of the activities (within a confidence interval of 5 %). Those numbers are also plotted on Fig. 5. Since a new version of a component can involve multiple integration activities, the percentages in the plots add up to more than 100 %. Upstream Sync, Dependency Management and Packaging Change are the most frequently occurring activities in Debian and FreeBSD. Local Patch is also common in all three projects, whereas New Package and Product-wide Concern are common for FreeBSD. + + +The next subsections discuss each of the seven major integration activities in detail. For each activity, we provide: + + + + +Intent +: Short outline of the goal of the activity. + + +Motivation +: Short description of the role and rationale of an activity. + + +Major tasks +: The major steps involved with the activity. + + +Participants +: A list of stakeholders from Fig. 2 involved with the major tasks of the activity. + + + + + Popularity of the integration activities of Table 3 in the 384 sampled (a) Debian, (b) Ubuntu and (c) FreeBSD package-versions (confidence interval with length 5 % for a 95 % confidence level) +Interactions Activities that co-occurred substantially with a given activity in package-versions, and hence are related. + + +Literature Discussion of prior work and approaches for the activity, as well as prevalence of the activity outside the context of OSS distributions. + + +Notable instances Concrete examples of the activity from the sampled Debian, Ubuntu and FreeBSD package-versions. + + +A. New Package + + +Intent: Integrating a previously unpackaged upstream component into a distribution. + + +Motivation: The users of the distribution or the maintainer of a package require new functionality provided by a component that has been identified but is not yet part of the distribution. + + +Major Tasks: + + + + + + +Recruiting a Maintainer responsible for integrating the new component and for liaising with the upstream project is one of the most important decisions to take (Koshy 2013; Merilinna and Matinlassi 2006). Most commonly, an upstream developer or motivated end-user requests an upstream component to be integrated in the distribution. One of the distribution’s maintainers might pick up this request and become the maintainer. Alternatively, the upstream developer can package the component herself and ask a distribution maintainer to “sponsor” this package, i.e., to review and to upload it to the distribution’s package repository. In that case, although the majority of the integration is done upstream, the maintainer still has the end responsibility. Another possibility is that the distribution appoints a maintainer to the integration of a new component because of a clear need in the distribution. + + + + + + +Packaging an Upstream Project requires access to the project’s source code (except for binary-only packages like Adobe Flash) and verification of its license. The maintainer then proceeds to determine the build-time and run-time dependencies of the package. If a dependent component is not yet in the distribution, it has to be packaged first. This is a process of trial-and-error, trying to build the package and fixing any dependency problems. The maintainer might have to customize the software or its makefiles so it would build correctly in the environment of the distribution. When porting the package to other platforms than Linux- or GNU-based ones, it is often needed to remove dependencies on Linux- or GNU-specific libraries or functionality. This can take significant effort. Finally, the maintainer needs to make sure that the package follows the distribution’s policies, such as specific locations for configuration files and manual pages. + + + + + + +Creating the Package’s Metadata. The maintainer is responsible for creating the package metadata like the package name, version number and the list of dependent packages. Such metadata is necessary to add the package to the distribution’s package management system (“apt” in Debian/Ubuntu, or the port system in FreeBSD) to enable the automatic and systematic building, packaging, and deployment of the software project. + + + + + + +Integration Testing. The package must build and run consistently on all supported architectures. Typically, two rounds of tests are used to verify a package. The first round involves only maintainers ironing out any obvious functionality or platform issues. The second round involves uploading the package to a staging area (e.g., “unstable” in Debian), from where expert end-users can install it for use in their daily work. Bugs identified by these users are reported (together with possible patches) to the maintainer, who incorporates this feedback in a new version of the package that is re-uploaded. Some distributions, like Ubuntu, have tools to automatically run integration testing and identify integration issues. + + + + +Publishing the Package. If a staged package contains severe bugs, it might be (temporarily) removed from the staging archive until the bugs are resolved. If the package has been stable for a certain period of time, it becomes eligible for inclusion in an upcoming release. The package is either moved to that release’s archive (Debian/Ubuntu), or to the source code repository (FreeBSD). + + + + +Participants: + maintainer, upstream developer, package community, expert end-user. + + +Interactions: + New Package is a prerequisite of the other six activities, and usually occurs by itself (i.e., a package-version only involves New Package, and no other activity). In 2.3 ± 5% of the FreeBSD package-versions, it also involves Local Patch to fix a bug or to make the package compile. + + +Literature: + +In the context of COTS reuse, additional tasks are involved, especially contract negotiations (Information Technology Resources Board 1999; Navarrete et al. 2005). Lewis et al. (2000) note that “Vendors are driven by profits [...] They can be cooperative and responsive when it is in their perceived interest to be so.” Various guidelines and risk assessment tools exist to help companies or federal departments select the right COTS components (Information Technology Resources Board 1999; Lewis et al. 2000). They, for example, recommend to find COTS components that fit with the existing architecture, or possibly adjust the architecture first, rather than requiring the COTS vendor to customize their component to the system at hand (since that could be very costly). This is different from OSS distributions, where monetary incentives typically do not exist and OSS distributions sometimes carry enough weight to convince upstream components to adapt to them rather than the other way around. + + +Although not applicable in the case of packaging organizations like OSS distributions, the identification of COTS/OSS components for reuse is a known challenge as well (Morisio et al. 2002; Stol et al. 2011), typically requiring extensive web or literature research, or insightful recommendations by experts. While maintainer recruitment and integration testing are known research problems, the other tasks are less known in research. + + +Notable Instance: + +A New Package with customization: irssi-plugin-otr (Ubuntu) is an IRC client plugin integrated in July 2008. A first customization changed the location for documentation to the Ubuntu default location. The second customization fixed the package’s build process to not download required header files during the build, since the Ubuntu build servers do not have network access. + + +B. Upstream Sync + + +Intent: + Bringing a package up-to-date with a newer version of the upstream component. + + +Motivation: + As shown in Fig. 5, synchronizing the existing packages of a distribution with a newer upstream version forms the core activity of integration. End-users expect package maintainers to update their packages to the latest features and bug fixes as soon as possible, while maintainers are more concerned about the long-term stability of a package. + + +Major Tasks: + +1. Becoming Aware of a New Upstream Release largely depends on distribution-specific dashboards that automatically track the development progress of upstream projects. For example, Debian’s watch file mechanism specifies (1) the URL of the upstream project’s download page with all releases of a component, as well as (2) a regular expression to +identify the source code and a version number for each release. If the highest version number surpasses the current version, this means that a new release is available. + + +Derived distributions (e.g., Ubuntu) not only need to synchronize with the upstream projects, but also with their own parent distribution, typically at the start of a new release cycle. For example, out of 167 analyzed Ubuntu package-versions involving Upstream Sync, 99 versions were synchronized with the upstream project, 65 were synchronized with the parent distribution (Debian) and 3 were synchronized with both. Since the derived distribution can leverage the Upstream Sync and other activities performed by the maintainers of the parent distribution, risk assessment (task 2) becomes slightly easier. However, keeping track of which patch was synchronized from which upstream project requires rigorous book-keeping. Projects use custom dashboards for this, sometimes interfacing with the bug reporting infrastructure. + + + + +Assessing the Risk of an Upstream Release requires the maintainer to review the changes to the previous upstream version (Rodin and Aoki 2011) in order to estimate whether the new version is production-ready. These changes run the risk of breaking important functionality, while end-users do not always need the new features and bug fixes. Despite the importance of this analysis, in practice it currently is a largely manual task supported by basic tools like “diff” (Rodin and Aoki 2011), change and commit log messages, email communication with upstream developers, and experience. + + + + +The outcome of risk assessment is often to not update to a full new release, but to “cherry-pick” a select number of acceptable changes out of all changes made upstream or by another distribution, then merge those changes into the current package-version (discarding the other changes). For example, an upcoming release of a distribution might be too nearby, making the full import of a new version of a component too risky. Instead, maintainers would cherry-pick the show-stopper bug fixes that they are most interested in. Some distributions, like FreeBSD, prefer not to cherry-pick, i.e., they either take a new version of a component as a whole, or do not update to it. + + + + + + +Updating Customization involves revisiting the customizations (patches) performed on earlier versions of the packaged component (e.g., the initial New Package or later Local Patch activities). Maintainers typically submit these patches upstream, to be merged. As a consequence, some patches no longer need to be maintained locally and can be discarded by the maintainer. Other patches, however, need to be updated by the maintainer to be cleanly applied to the new version of the upstream package. Just like task 2, this requires manual analysis of the patch and the new package-version. + + + + + + +Updating the Package’s Metadata, cf. task 3 of New Package. + + + + +Integration Testing, cf. task 4 of New Package. + + +Publishing the Package, cf. task 5 of New Package. + + + + +Participants: maintainer and upstream developer. + + +Interactions: Upstream Sync is a pivotal activity that can be accompanied by any other activity, except for New Package (by definition). Upstream Sync occurs mostly together with Packaging Change, Dependency Management, Local Patch and (in source-based distributions) Product-wide Concern. + + +Literature: +Together with Local Patch, Upstream Sync is the most discussed integration activity in literature, independent of the type of reuse (COTS/OSS/ISS) or organization (OSS/commercial) (Lewis et al. 2000; Navarrete et al. 2005), and it is the source of most of the issues related to Dependency Management (sometimes even preventing Upstream Sync of other packages). For example, Begel et al. (Begel et al. 2009) report +that at Microsoft up to 9% of 775 surveyed engineers rely on other teams to inform them of changes to a component they rely on. Researchers (Merilinna and Matinlassi 2006; de Souza and Redmiles 2008) and practitioners (Koshy 2013) recommend to continuously monitor (or inquire) for new versions and their impact on the software system, even appointing a specific gatekeeper responsible for doing this. This also helps mitigate one of the largest risks of reuse: the component vendor going out of business (Lewis et al. 2000). + + +Since reuse induces a dependency on the provider of a COTS/OSS/ISS component (who fully controls the component’s evolution (Lewis et al. 2000)), researchers have reported two extreme approaches to deal with this dependency: swiftly updating to each new component version (Brownsword et al. 2000; Stol et al. 2011; Van Der Linden 2009) versus sticking to a particular version and patching it for the organization’s particular needs (Merilinna and Matinlassi 2006; Ruffin and Ebert 2004; Van Der Linden 2009). There is no systematic methodology to decide between the two approaches and hybrid approaches in between like cherry-picking (Lewis et al. 2000), typically personal experience is the deciding factor (Merilinna and Matinlassi 2006), while other factors like the safety-critical nature of a software system can play a role as well (Lewis et al. 2000). Interestingly, many integration issues could in fact be avoided if the new component version would be backwards compatible with the previous version (Crnkovic and Larssom 2002; Stol et al. 2011), but this is outside the control of the organization that reuses a component. + + +Notable Instances: + + +A low-risk Upstream Sync: Gnash (Ubuntu) is a Flash player that was updated to upstream version 0.8.7 in March 2010 (#52225410), right at the start of the Ubuntu feature freeze window (i.e., close to the next release). Since new features are technically not allowed in a freeze window, a member of the Ubuntu release team needed to explicitly approve the Upstream Sync. As Gnash is a package inherited from Debian, and the update mostly contained bug fixes, version 0.8.7 quickly got synced. + + +An Upstream Sync taking a long time: Krita 2.1.1-1 (Debian), the painting program of the KOffice suite, was broken early May 2010 because one of the libraries it depends on (libkdcraw7) had been replaced by a newer version (libkdcraw8) in an Upstream Sync of KDE 4.4.3 (#580782). Unfortunately, the solution (an Upstream Sync to KOffice 2.2.0), took 2 months because this new version of KOffice introduced too many new functionalities, requiring the package to be tested more thoroughly. + + +A patch cherry-picked from another distribution: libpt 1.10.10 (Ubuntu), a cross-platform library, relied on the new gspca webcam driver provided by the 2.6.27 Linux kernel. For this driver to work, all programs and libraries consuming the webcam stream now had to load the libv4l wrapper libraries at run-time, forcing 62 Ubuntu packages to be modified. Since three weeks earlier a patch had been uploaded to Fedora (another distribution) to make these changes for libpt, this patch was cherry-picked into Debian (and Ubuntu). + + +C. Dependency Management + + +Intent: Keeping track of the dependencies of a package to make sure it can be properly built and run. + + +Motivation: Packages depend on other packages to be built (e.g., compilers and static libraries) and to be run (e.g., dynamic libraries and services). For example, in our data set, Debian packages containing dynamic libraries have on average 6.4 packages depending + + + + +10This notation refers to a bug report in the distribution’s bug repository. +on them directly (median: 2.0), and 47.6 transitively (median: 3.0). If a package on which many other packages (“reverse-dependencies”) depend changes, for example because of an Upstream Sync, that change might break its reverse-dependencies. + + +A special case of such a change are “library transitions”, i.e., changes to the public interface of a shared library that might force dozens of packages to be rebuilt or, in the worst case, to be adapted to the new interface via source code changes. For example, if the C runtime library would change, all packages using C might need to be changed and/or re-built. + + +Major Tasks + + + + + + +Becoming Aware of Dependency Changes + either happens automatically (see Upstream Sync), or based on an announcement by the maintainer of a dependent package that is about to change significantly. The latter announcement typically is sent to the release manager and any affected maintainers, leaving time to discuss the repercussions of the update. In case such an announcement has not been done, at the very minimum, the maintainer should notice a change in the API through the updated interface version (“SONAME”) of a dynamic library. For example, a dynamic library “libfoo” with interface version 1 would have a SONAME of “libfoo.so.1”. If this SONAME suddenly changed to “libfoo.so.2” upstream, maintainers would know that the API of the component has changed substantially. + + + + + + +Assessing the Risk of a Dependency Change + is similar to task 2 of an Upstream Sync. Determining which and whose packages broke because of a change is largely a manual task, requiring insight into how an API is used by other packages, whose implementation and algorithms are typically unknown to the maintainer. Unfortunately, no tool support is available in practice to assist in this task. Typically, the build logs are checked for errors and the package is driven through a small smoke test scenario. + + + + + + +Fixing the Damage + either happens atomically, i.e., the changed package and all its reverse-dependencies are updated at once (FreeBSD), or interleaved, i.e., each of the packages is updated independently (Debian/Ubuntu). Atomic updates can delay a new package-version as long as not all broken packages have been updated successfully, but at least the end user will not be impacted by inconsistent packages. Distributions like Fedora and Ubuntu use sandbox build environments to atomically update a transitioning library with all its reverse-dependencies in isolation, without affecting other packages (and hence users) (The Fedora Project 2011). + + + + + + +Whether or not the update model is atomic, the maintainer of the library causing the changes is responsible for performing all rebuilds. The maintainer analyses the build and test logs to determine which packages failed to build, and attempts to write patches for those, using her knowledge of the API changes. If this fails, she needs to assist the failing packages’ maintainers to resolve the transition issues, similar to delivery advocates for ISS reuse (Stol et al. 2011). To keep track of which packages have already been re-built, the release manager and maintainers use a tracking system: Ubuntu and Debian both use a custom library transition tracker, while Ubuntu sometimes uses a bug tracker. + + + + + + +Updating the Packages’ Metadata +, cf. task 3 of New Package. + + + + + + +Integration Testing +, cf. task 4 of New Package, once the whole transition is complete (atomic model) or for each updated package separately (interleaved model). + + + + + + +Publishing the Package +, cf. task 5 of New Package. + + + + + + + + +11If the maintainer finds out that the interface did change without a SONAME update, she would contact upstream to ask for an update of the SONAME, then perform an Upstream Sync of the updated library before resuming the Dependency Management of the library’s reverse-dependencies. +Participants: maintainers of the changed package and those of its reverse-dependencies, release manager. + + +Interactions: Dependency Management can be accompanied by any other activity, except for New Package. It occurs mostly together with Upstream Sync, Packaging Change, Local Patch and (in source-based systems) Product-wide Concern. + + +Literature: +Similar to Upstream Sync, Dependency Management is independent of the kind of reuse and organization. Begel et al. (2009) observed a wide range of mitigation techniques for dependency problems at Microsoft, ranging from minimizing the number of dependencies to explicitly planning backup strategies to deal with dependency issues. Other companies, such as the one studied by de Souza et al. (2004) and de Souza and Redmiles (2008), stressed the importance of vendor-integrator communication to reduce the effort required for “impact management” of reused APIs. Managers first should build an impact network consisting of people affecting or affected by their component, then use frequent email communication or people assigned explicitly to a particular API (or ISS component (Stol et al. 2011)) to manage forward (i.e., on other teams) and backward (i.e., on their team) dependency impact. Similar to other major companies like Google (Whittaker et al. 2012), as well as the studied OSS distributions, a team is required to inform its clients of major API breakage. de Souza and Redmiles (2008) note, however, that one should not forget the ripple effect of “indirect” (i.e., transitive) dependencies. + + +Similar to Upstream Sync, backwards compatibility of dependent packages can avoid many integration issues (Crnkovic and Larssom 2002; Stol et al. 2011). Furthermore, many Dependency Management issues are due to unnecessarily high coupling between components by relying on implementation details (Spinellis et al. 2004) and private APIs (Stol et al. 2011). Hence, using components via explicit (Stol et al. 2011) and stable (Merilinna and Matinlassi 2006) interfaces can avoid many problems. Finally, packaging organizations like distributions can eliminate many dependency issues of their users by providing assemblies (sets) of integrated components instead of individual components. This is why many distributions offer so-called “virtual” packages, for example to integrate all core packages of Perl, KDE or GNOME. + + +Notable Instances: + + +A surprise library transition: A library interface change to the libfm 0.1.14-1 (Debian) file manager library was not announced by the upstream developer. As a consequence, applications built against the old version of libfm (“libfm.so.0”), such as the pcmanfm file manager, broke (#600387). The dynamic linker had no way of knowing that “libfm.so.0” was no longer the original library version all packages were built against, but rather the new version with a different interface that should have been named “libfm.so.1”. + + +Problems with non-atomic fixes of dependency changes: The transition of Perl 5.10 (Debian), the Perl programming language ecosystem, to Perl 5.12 at the end of April 2011 (#619117) took slightly over two weeks, during which over 400 packages (directly or indirectly depending on Perl), including high-profile ones such as vim, subversion, rxvt-unicode and GNOME, were not installable from the staging area until all their dependencies were rebuilt consistently against Perl 5.12. + + +A dependency change requiring only a rebuild: The chances of acceptance for Boost 1.34.1 (Ubuntu), a general-purpose C++ library, in Ubuntu 7.10 looked slim, since Ubuntu had just entered its “Feature Freeze” (only bug fixes were still accepted for the upcoming release) and all Boost’s reverse-dependencies had to be updated. However, the contributor +championing the new Boost release was able to convey the urgency of the release (fixes to show-stopper bugs) and the package maintainer verified that all reverse-dependencies could just be rebuilt without source code changes. + + +D. Packaging Change + + +Intent: + Changing the packaging logic or metadata to fix packaging bugs, to follow new packaging guidelines or to change the default configuration, either for binary or source packages. + + +Motivation: + The packaging process combines the build process (McIntosh et al. 2011) of an upstream component with the dependency management and packaging machinery of a distribution. Hence, understanding the packaging process is not a trivial process, and bugs slip in frequently. Furthermore, as the packaged component evolves, its packaging requirements evolve as well. For example, new features might have been added that need to be configured in the package. The Packaging Change activity covers any such changes to the packaging, building and installation logic and metadata of a package. + + +Major Tasks: + + + + + + +Replicating Reported Problems + is a prerequisite in order to fix a packaging problem. Ideally, the maintainer would like to clone the packaging environment of a bug reporter, or at least have a complete description of the build platform, all installed libraries and their versions. Tools exist to generate such a description when submitting bug reports, yet inexperienced bug reporters often do not know or forget to use those. + + + + + + +Understanding the Build and Packaging Process + is a necessity in order to be able to fix packaging bugs or enhance the packaging logic. Such understanding currently is based on interpreting the build and execution logs of packages. Furthermore, trial-and-error is commonly used when changing the packaging logic. Since there is no dedicated way to test build and packaging changes, the maintainer verifies the correctness of those changes by manually installing the package and running the unit or user tests of the package. + + + + + + +Integration Testing +, cf. task 4 of New Package. + + + + + + +Publishing the Package +, cf. task 5 of New Package. + + + + + + +Participants: + maintainer, package community (for testing), expert end-user. + + +Interactions: + This activity is performed during most of the other activities, such as New Package and Upstream Sync. Frequently, this activity requires a Local Patch. + + +Literature: + + +The Packaging Change activity has not been discussed thoroughly in prior research, except for the well-known difficulty of configuring COTS/OSS/ISS components (Stol et al. 2011). Such configuration issues are due to the fact that, by default, components need to be generic and contain many features, whereas a specific integrator only needs some of those. The need to adapt packaging logic is specific to the domain of packaging organizations (of which OSS distributions are a subset), since they are a mediator between upstream components and final users, and hence require upstream components to fit into their own package management system. + + +Notable Instances: + + +A package with missing files: + The librt shared library implementing the POSIX Advanced Realtime specification had been dropped without warning from the GNU standard C library on Debian (libc6 2.3.6-18), breaking the XFS file system package (#381881). To resolve this case of Dependency Management for XFS, a Packaging Change was made to libc6’s package metadata to indicate that librt was no longer provided. +Broken packaging because of changed guidelines: Versions 2.6 to 3.2 of Python (Ubuntu), the Python programming language ecosystem, suddenly failed to build on Ubuntu (#738213) because essential libraries like libdb and zlib on which python depended could not be found anymore on the build platform. The change in directory layout was a result of the work on enabling 32 and 64 bit versions of libraries to be installed on a single machine. + + +Broken packaging because of upstream changes: The GNU Octave (FreeBSD) developers changed the layout of their web site as well as the build logic of some of their projects (#144512). The maintainer had to fix the code fetching script and refactor the existing build script shared by all GNU Octave ports into separate scripts for the individual ports. + + +E. Product-wide Concern + + +Intent: Applying product-wide policies and strategic decisions to the integrated packages. + + +Motivation: Since a distribution integrates thousands of packages, there are important rules and strategic decisions that should be followed in order to make the distribution coherent and consistent. For example, a new standard for package help files should be adopted by all packages, either all at once or at their own pace. Similarly, strategic decisions to transition to a new version of a core library or to move to a new default window manager should be followed up as uniformly as possible by all involved packages. + + +Major Tasks: +1. Determining Ownership and Timing of Changes happens through discussions between the co-ordinator (release manager or a volunteer) of the product-wide concern and the affected maintainers. The co-ordinator notifies all affected package maintainers about the decision, explaining the motivation of the Product-wide Concern, the end goal and the different steps involved in getting there. Those steps depend on the enforcement strategy in use. + + + + +Enforcing the Concern happens either through centralized or distributed enforcement. With centralized enforcement, the Product-wide Concern co-ordinator applies the concern’s changes herself on all affected packages at once. Maintainers only need to test if their package still works and report a bug if it does not. With distributed enforcement, the package maintainers, briefed by the co-ordinator, are in charge of the change for their own package. This gives them the freedom to implement a Product-wide Concern as they see fit, but might delay updates to their packages’ reverse-dependencies. While the concern is being enforced, the co-ordinator continuously monitors the status of the concern via dashboards, mailing lists and/or bug reporting systems. + + + + +Debian uses distributed enforcement, FreeBSD uses centralized enforcement and Ubuntu uses both. Derived distributions like Ubuntu automatically leverage Product-wide Concern changes performed by the contributors of the parent distribution. FreeBSD co-ordinators use regular expressions to change the packaging logic of hundreds of ports at once, thanks to the strict naming conventions in the packaging logic. Given the high risk of such product-wide changes in FreeBSD, the co-ordinator needs approval by the release manager, after which the whole distribution is rebuilt on the distribution’s build cluster to check the effects of the product-wide change. + + + + + + +Integration Testing, cf. task 4 of New Package. + + + + + + +Publishing the Package, cf. task 5 of New Package. + + + + + + +Participants: maintainer, co-ordinator, release manager. + + +Interactions: Product-wide Concern is typically accompanied by Dependency Management, Upstream Sync or Packaging Change. +Literature: +Similar to Packaging Change, Product-wide Concern is a relatively unknown activity. For example, Curtis et al. (1988) identify the issue that “Projects must be aligned with company goals and [that they] are affected by corporate politics, culture, and procedures”, and they stress that the “inter-team group dynamics” (between an integrator and upstream) significantly complicates the already complex “intra-team group dynamics”. However, no concrete advice or discussion of the tasks involved are provided, especially not in the context of multi-component integration at the scale of OSS distributions (thousands of integrated components). + + +Notable Instances: +The massive migration to GCC 4 (Debian) in July 2005 is an example of a Product-wide Concern with distributed enforcement. Since the compiler suite broke C++ programs compiled with earlier GCC versions, all C++ packages using GCC had to be rebuilt. An approach typically followed in cases like this,\textsuperscript{12,13} is to (permanently) rename the packages after rebuilding by attaching a suffix like “+b2”. This ensures the visibility of rebuilt packages, enabling other packages to explicitly depend on the rebuilt versions. + + +The migration to Dash as the default command shell in Ubuntu 6.10 (October 2006) and Debian Lenny (February 2009) illustrates the differences between centralized and distributed enforcement. The Ubuntu co-ordinator instantaneously made Dash the default shell, breaking many packages’ scripts and build files (centralized). Although several users were enraged, the co-ordinator consistently referred to the maintainers and upstream developers of the failing packages to fix incompatible Bash-specific code (“bashisms”). A web site with official migration strategies and workarounds was provided. + + +When Debian discussed their move to Dash (independently from the Ubuntu move),\textsuperscript{14} the Ubuntu co-ordinator convinced them about the importance of clear release goals and communication with all stakeholders. The Debian developers then built tools to screen all packages for known bashisms. Maintainers of packages containing bashisms were notified by email and requested to fix the bashisms by a certain date (distributed). + + +F. Local Patch + + +Intent: Maintaining local fixes and/or customizations to a package. + + +Motivation: Integrators and their users will find bugs in packages. Some of these bugs are package-specific, while others are due to the integration of the package in the distribution. Typically, maintainers are encouraged to send the fixes for both kinds of bugs upstream, such that the upstream project will take ownership of the code (and its maintenance) and include it by default in their project. In practice, however, many integration bug fixes are not accepted by upstream (or take time to be adopted) and tend to end up as local patches that need to be maintained by the integrator and re-applied by the integrator upon each Upstream Sync. The same holds for customization changes specific to a distribution, for example because of Product-wide Concern. + + +Major Tasks: +1. Getting a Local Patch Accepted Upstream requires a patch that fixes the bug in a clean way and follows the programming guidelines of the upstream developers. After thorough + + +\textsuperscript{12}http://bit.ly/FOCJHf +\textsuperscript{13}http://lwn.net/Articles/160330/ +\textsuperscript{14}http://bit.ly/z3ORxT +testing, the maintainer submits the patch to the preferred bug reporting system of the upstream project. The report should be as detailed as possible, making clear what bug is fixed, in which version of the project, and what the impact is on the users of the distribution. Either the patch is accepted in a reasonable period of time, or it is not. If accepted, the maintainer can discard his Local Patch. Otherwise, the maintainer is responsible for maintaining and re-applying the Local Patch across all future versions of the package. + + + + + + +Maintaining the Patch upon an Upstream Sync is the maintainer’s responsibility until the Local Patch is accepted by upstream (if ever), cf. task 3 of Upstream Sync. As such, Local Patch is a very common activity, involving 22.1 ± 5% (Debian), 28.4 ± 5% (Ubuntu) and 12.2 ± 5% (FreeBSD) of all package-versions. Of these versions, only 7 ± 5% (Debian), 0.3 ± 5% (Ubuntu) and 0 ± 5% (FreeBSD) had to update an existing Local Patch, whereas 24.7 ± 5% (Debian), 11.9 ± 5% (Ubuntu) and 6.3 ± 5% (FreeBSD) could stop maintaining the Local Patch because it was included into a new upstream version. To keep track of local patches, Debian-based distributions use patch management systems such as “quilt”, “dpatch” and “git”, while FreeBSD maintainers manage patches manually. + + + + + + +Updating the Package’s Metadata, cf. task 3 of New Package. + + + + +Integration Testing, cf. task 4 of New Package. + + +Publishing the Package, cf. task 5 of New Package. + + + + +Participants: maintainer, upstream developer, bug reporter. + + +Interactions: Local Patch is typically accompanied by Upstream Sync, Packaging Change, or Dependency Management. + + +Literature: +The paradox of on the one hand having to submit a patch upstream to avoid maintenance but on the other hand having a hard time getting the patch accepted, is the most studied integration challenge in the literature, across different kinds of reuse and organizations (Bac et al. 2005; Brownsworth et al. 2000; Merilinna and Matinlassi 2006; Spinellis et al. 2004; Stol et al. 2011). No silver bullet exists, although, similar to Upstream Sync and Dependency Management, close collaboration of an organization with the upstream project is generally recommended (Stol et al. 2011), even in the case of COTS (Morisio et al. 2002). However, such a collaboration takes a lot of time, effort and goodwill, and also does not guarantee that the upstream project will accept and maintain the patch (Ven and Mannaert 2008). In fact, it often happens that even an accepted patch still needs to be maintained by the downstream organization (since the organization has the required expertise) (Jaaksi 2007). + + +An opposite approach has been successful in the case of ISS, where the ISS team reaches out to the teams that reuse its components to help them with integration (Stol et al. 2011). Alternatively, one could use COTS-style glue or wrapper code to avoid changing the actual code altogether (Di Giacomo 2005; Van Der Linden 2009). However, such approaches are less powerful (one loses the benefits of OSS/ISS) and still require maintenance. As a kind of middle ground, many organizations use packaging organizations like OSS distributions as a maintenance buffer between upstream and themselves (Merilinna and Matinlassi 2006), shifting the problem to the distributions. In the presence of sufficient industrial partners, one could even consider making an independent fork of an upstream component, but this is quite costly and in the end not that successful in practice (Ven and Mannaert 2008). Note that patches for local usage or configuration will never be picked up upstream, hence require eternal maintenance. This applies especially to end-users, who might have local patches on top of a distribution’s package. +Notable Instance: + + +A patch that is quickly adopted upstream: The Debian and Ubuntu packages of the GNOME sensors-applet (Debian/Ubuntu) desktop widget for temperature and other sensors featured “ugly, outdated icons” (#69800) because the newer icons did not comply with the license policy of Debian and Ubuntu. To fix this, the Ubuntu maintainer built a local patch on top of the Debian package to use the newer icons in Ubuntu, while the upstream developer contacted the icon designer to make the new icons compatible with Debian by adding an additional license to the icons (an example of the “Disjunctive” legal pattern (German and Hassan 2009)). The designer complied, and the Ubuntu maintainer reported the license change to the Debian maintainer, such that he could drop his Local Patch. + + +A Local Patch can cause havoc: A notorious security hole in the OpenSSL Debian package (an implementation of the SSL/TLS protocols) was introduced into Debian by a local patch and lasted from May 2006 until May 2008. A call to the function adding randomness to a cryptographic key had accidentally been commented out by a Local Patch (#363516). The Debian maintainer had contacted upstream, but did not fully disclose himself, nor his plans, and was largely ignored. The patch was never sent upstream for inclusion afterwards. To complicate the issue further, the address of the mailing list contacted by Debian was not the real OpenSSL development list, since that one was hidden from non-developers. This security hole propagated to over 44 derived distributions, without any of the maintainers or contributors involved identifying the bug. + + +G. Maintainer Transfer + + +Intent: +Maintaining a package if the maintainer is absent, unwilling or incapable to further maintain a package. + + +Motivation: +Being a package maintainer is a major responsibility, since it requires mediating between upstream projects and the end-user, typically for multiple packages at a time. However, maintainers may have periods during which they cannot spend the required time on integration, they may lose interest in certain packages, or they could just become unresponsive to bug reports or user requests. In the worst case, a package could even be orphaned when the maintainer quits. To prevent packages (and any product based on it (Van Der Linden 2009)) from stalling, OSS distributions need to provide a means to keep packages evolving, while bypassing or overriding a maintainer. + + +Major Tasks: +1. Overriding the Maintainer depends on how a distribution organizes package ownership. If package maintenance is shared across all distribution developers collectively, the concept of overriding a maintainer is not relevant. In Ubuntu, for example, packages in the commercially supported Main and Restricted archives are managed by a team known as Core Developers, whereas the packages in the commercially unsupported Universe and Multiverse archives are supported by the community under the guidance of a team known as “Masters Of The Universe” (MOTU). Any developer can modify any package, as long as it is managed by the developer’s collective and the change does not introduce unnecessary + + +15http://lwn.net/Articles/282038/ +16http://bit.ly/w7rn04 +17http://www.links.org/?p=327 +divergences compared to upstream. In case of disagreement amongst developers, there are conflict resolution procedures in place, but those rarely need to be used. + + +Distributions with individual package ownership, on the other hand, need a Maintainer Transfer policy to take over the role of a maintainer if she becomes unresponsive or disappears altogether. A contributor proposing an Upstream Sync, Dependency Management, Infrastructure Change or a Local Patch that fulfils certain criteria can explicitly mark her change as a Maintainer Transfer. In Debian, for example, this is called a “Non-Maintainer Upload” (NMU), and is only valid for changes that fix an important, known bug. Debian provides the “nmudiff” tool to help contributors submit NMUs. + + +The unique property of a Maintainer Transfer change is that a timer is attached to it, with a delay depending on the severity of the proposed change (e.g., FreeBSD typically uses a delay of 2 weeks). Unless the maintainer replies to the change on time, the change is set to go in automatically once the timer expires. If the maintainer replies on time, she can request suspending the timer in order to review the change. If not approved, the contributor needs to revise the change corresponding to the maintainer’s comments. + + +We found that 5.7 ± 5 % (Debian) and 2.9 ± 5 % (FreeBSD) of all package-versions contain an instance of Maintainer Transfer (Ubuntu has collective package ownership, hence does not have such transfers). The min/median/max number of days until such changes were accepted is 0/1.5/556 days for Debian and 1/16/465 days for FreeBSD. In Debian, the median value is very low, indicating that maintainers often commit a Maintainer Transfer before the timer goes off. In FreeBSD, time-outs are much more common. The cases with maximum time-out in Debian (#325110) and FreeBSD (#140303) correspond to packages that temporarily were orphaned, i.e., the maintainer officially stepped down. + + + + + + +Supporting Orphaned Packages is typically done by an ad hoc team of volunteers, based on casual contributions or reported critical bugs. In Debian, the QA team typically jumps in to make changes to orphaned packages. + + + + + + +Adopting Orphaned Packages either happens by volunteers interested in an orphaned package, or by convention, when a contributor provides patches for an orphaned package and automatically becomes the new maintainer. For example, if no feedback is received for a patch in FreeBSD within three months, the maintainer is deemed to have abandoned the package and any contributor may assume maintainership (The FreeBSD Documentation Project 2011, Section 5.5). + + + + + + +Participants: maintainer, contributor. + + +Interactions: Maintainer Transfer can co-occur with all other activities, except for New Package. + + +Literature: +We could not find any reference to the Maintainer Transfer activity in literature. However, Curtis et al. (1988) and Lewis et al. (2000) do stress the importance of having “system-level thinkers” as maintainers, who are able to sufficiently understand both the specific domain of the integrated component as well as the overall architecture of their own system. According to our analysis, the Maintainer Transfer activity would kick in as soon as the maintainer of a component would not possess those skills. + + +Notable Instances: +An NMU helping out a busy maintainer: httrack 3.40.4-3.1 (Debian), an offline browser, fixed an issue with the file system locations for test files. The bug was reported on the 11th of October 2006, followed one week later by a proposed NMU by a contributor. A couple +of hours later the NMU was approved by the maintainer, who noted (#392419): “Thanks a lot, I didn’t yet had [sic] the change [sic] to review the issue”. + + +An NMU with strings attached: + The maintainer of +libcdio 0.78.2+dfsg1-2.1 (Debian) +, a library for accessing CD media, had been warned on the 20th of January 2008 about C++ header file issues with the upcoming release of GCC 4.3 ( +Product-wide Concern +). Two months later, a contributor sent in an NMU patch fixing the compiler errors. One day later, the maintainer chimes in (#461683): “I don’t object to a NMU (I know I haven’t been handling my libcdio package in the best possible way), but if you wish to NMU, please consider applying the patches that were sent to other bug reports”. The NMU was approved the same day. + + +A hostile NMU: + On the 18th of May 2007, a contributor requested an +Upstream Sync + to the new upstream release (1.3.2) of +libjcalendar-java 1.2.2-6.1 (Debian) +, a calendar picker component, and also proposed a +Packaging Change + to support the Kaffe Java VM. However, since nothing happened for one week, the contributor added a comment to both bug reports stating “I am planning a NMU if nothing happens (again)” (#424981, #424982). The next day, the maintainer replied (#424981) “I admit that I’m not very reactive, but before you do your NMU, have you checked that Jcalendar 1.3.2 is backwards compatible with version 1.2?”. Nothing happened for 1.5 months, until the NMU timer had expired and the NMU went in. +---------------------------------------- +------------------------------- +Section 275: +5 Identified Integration Challenges + + +The seven discussed integration activities document the complexity of integration. Even in the simplest case, i.e., black box integration, maintainers still need to package the integrated project ( +New Package +), verify if the integrated product is compatible with each +Upstream Sync +, and follow up on +Dependency Management + changes like library transitions. In the case of white box integration, the integrated projects need to be customized or fixed with +Local Patches +, and streamlined to product-wide policies ( +Product-wide Changes +). All the time, the packaging logic and configuration files need to be kept up-to-date ( +Packaging Change +), and maintainer activity needs to be monitored ( +Maintainer Transfer +). + + +To paraphrase Curtis et al. (1988), we “are not claiming to have discovered new insights” for OSS integration, instead we identified and documented the core integration activities that the maintainers of three large OSS distributions perform on a daily basis “to help identify which factors must be attacked to improve” integration. Although distributions have guidelines on how to address some of these activities (Debian project 2011; The FreeBSD Documentation Project 2011), the differences in terminology (e.g., “NMU” vs. “time-out”) and technical procedures (e.g., centralized vs. distributed +Product-wide Concern +) make it confusing to understand and compare the activities, or to study possible tools and techniques to improve these activities. Hence, the unifying vocabulary that we provide is key to understand the integrating process of upstream components, complementing existing work on code integration (Coplien et al. 1998; DeLine 1999; Frakes and Kang 2005; Parnas 1976; Pohl et al. 2005) and on selection of reusable components (Bhuta et al. 2007; Chen et al. 2008; Li et al. 2009). Finally, we also compared the activities to those in prior work, in particular in commercial settings. + + +Throughout our analyses and the documentation of the 7 integration activities, we distilled 13 concrete challenges summarized in Table 4, across four different research areas. Most of the challenges have been discussed earlier in this paper. Ubuntu and Debian are +Table 4 Open challenges for integration activities + + +| Area | Challenge | +|--------|---------------------------------------------------------------------------| +| packaging | · insight into upstream build process | +| | · automatic build-/run-time dependency extraction | +| | · accurate replication of packaging environment | +| testing | · cross-platform testing of package & its dependencies | +| | · integration testing during packaging | +| | · accurate replication of functionality issues | +| evolution | · determining best moment for Upstream Sync | +| | · insight into upstream changes | +| | · recommendations about important API changes | +| | · management of ownership of package changes | +| merging | · prediction of integration defects | +| | · identifying opportunities for cherry-picking | +| | · insight into merge status of Local Patches | + + +currently in the process of designing an automatic unit and integration testing system for the packaging process. Similar to defect prediction work at the code level, prediction of integration defects and the effort involved with fixing these defects would be extremely useful. There is some initial work on this (Mohamed et al. 2008; Yakimovich et al. 1999), but more work is needed to bring such techniques to practitioners. Similarly, a kind of bugzilla repository for managing ownership of changes, i.e., who should update reverse-dependencies, who should perform a Product-wide Concern or who should act on an NMU, is needed to improve communication across all involved parties. Insight into the upstream build process (Adams et al. 2007; Qiang and Godfrey 2001) currently relies on manual tracing and analysis of build and run-time logs, with only some packages having rudimentary scripts for checking runtime dependencies. In general, however, the ability to accurately replicate bugs in code and build is missing. Packaging environments can vary widely between users, with certain combinations of package and distribution versions causing subtle packaging or run-time problems. Current bug reporting tools automatically include detailed platform information, yet such information is often insufficient to identify Dependency Management changes. + + +As the above challenges impact even three of the largest and most popular OSS distributions, more powerful tool and process support is essential for most of the OSS integration activities, complementing the mailing lists, bug repositories, and custom dashboards (for example to track library transitions) currently in use by organizations. Until now, researchers have only been studying some of the challenges, such as API changes (Dagenais and Robillard 2008) and merge defects (Brun et al. 2011; Shihab et al. 2012). Clearly, more research is needed to support maintainers in the field. +---------------------------------------- +------------------------------- +Section 276: +6 Evaluation + + +The six contacted maintainers pointed out some small factual errors in an earlier version of the documented integration activities, and recent advances (e.g., regarding the automatic test systems being built for Debian and Ubuntu). However, no fundamental errors were +identified, nor was any activity discarded. The identified inaccuracies have been fixed in the activity descriptions above. + + +Regarding the completeness and usefulness of the documented activities, Table 5 summarizes the replies of the six contacted maintainers. As explained in Section 3.6, two maintainers (M2 and M5) provided empty replies for at least two questions, while M1 left one question open. Hence, we obtained some empty replies for Q2, Q3 and Q5. We now discuss each question’s answers. + + +Q1. What activities did we miss? + Five of the maintainers pointed out missing activities, although many of them were captured in some form. + + +A. “Upstream Lobbying” was in fact mentioned as part of Local Patch, but M4 found that it deserved its own activity. Interestingly, M6 mentioned the inverse kind of lobbying, i.e., lobbying in derived distributions for newly reported or fixed bugs. Instead of splitting up Local Patch, we decided to keep this activity as is, but add more detail about the lobbying part. + + +B. “Post-release Maintenance” was suggested by M4 and M2 as a dedicated integration activity encompassing all the activities occurring after a new package-version has made it into a new release of the distribution. M4 notes that “while the maintainer isn’t required to support the use of a product they [sic] are often the first person contacted if someone can’t get to build on FreeBSD”. Our activities do not capture this activity by itself, only its outcome, for example in the form of a Packaging Change or Local Patch. This is because many emails could be exchanged regarding a maintenance problem without a + + +| | M1 | M2 | M3 | +|---|-----------------------------------------|-----------------------------------------|-----------------------------------------| +| Q1| license/copyright analysis | vulnerability resolution | no | +| | | post-release maintenance | | +| Q2| people unfamiliar with topic | < no reply > | major activities ... | +| | | | ... in easy-to-read way | +| Q3| < no reply > | < no reply > | more detail/examples | +| Q4| license tracking | none | none | +| Q5| DEP5/CDBS license checking | < no reply > | automated testing | +| | autom. dep. checking | | autom. dep. checking | + + +| | M4 | M5 | M6 | +|---|-----------------------------------------|-----------------------------------------|-----------------------------------------| +| Q1| upstream lobbying | package end-of-life | monitoring downstream ... | +| | post-release maintenance | | ... distributions for bugs/patches | +| Q2| useful overview | < no reply > | nice intro to what ... | +| | do we document our activities? | | ... being distro dev is about | +| Q3| what to do? | nothing | none | +| Q4| timely integration | none | monitoring the status ... | +| | desktop vs. enterprise | | ... of all packages ... | +| | hundreds of variants | | ... in the distribution | +| Q5| good question :−) | < no reply > | improvements to package process | +| | | | atomic package updates | +corresponding change log item or bug report (i.e., our data set does not capture such discussions). Although this hints at less important integration issues (since they did not need to be fixed or acted upon in some form), future work should analyze the mailing list data of the distributions to uncover this part of the integration work. + + +C. “License/Copyright Analysis” was mentioned by M1 as an important activity: “copyright/licensing analysis isn’t mentioned anywhere, yet it’s often a tiresome process when creating a new package (and often forgot [sic] to update on upstream sync)”. License analysis did not occur very often in our data set, for example in our Ubuntu samples we only found one occurrence (version “0.4 − 0Ubuntu1” of package “branding-ubuntu”), in which case the license of some files had not been specified as being GPL. For this reason, the activity is captured in our Other category. + + +D. “Vulnerability Resolution” was pointed out by M2 as a missing activity, i.e., the steps performed to address a vulnerability in a timely manner after release. Although it is not one of the top 7 activities (and hence not documented in detail by us), vulnerability resolution occurred relatively often (Table 3), occurring in 4.4 ± 5% (Debian), 1.3 ± 5% (Ubuntu) and 0.8 ± 5% (FreeBSD) of all package-versions. Our data shows how most of these vulnerabilities were reported and fixed upstream. Similar to Upstream Sync, distributions first have to become aware of vulnerabilities, then update their packages as soon as a fix is available. + + +For this reason, vulnerability changes tend to use NMUs (see Maintainer Transfer), since the security team wants to update a vulnerable package as soon as possible, overruling the maintainer if necessary. Often, vulnerability fixes are cherry-picked, leaving other upstream changes until the next official Upstream Sync. For example, cups-base revision 1.44 (FreeBSD) (24th of January 2005) fixed a vulnerability in the Cups printer server identified and reported upstream by a university student, while php4 4:4.4.0-3ubuntu1 (Ubuntu) cherry-picked 8 upstream vulnerability fixes for the php programming language (19th of December, 2005). Since the full details of vulnerabilities and how they were processed internally are not available in publicly available databases, and since it is less common than the seven documented activities, detailed analysis of this integration activity is future work. + + +E. “Package End-of-life” was a missing and often overlooked activity according to M5. Some packages lose user and maintainer interest over time, hence when the distribution evolves and integration activities need to be performed on the package, either nobody steps up or substantial effort is required by other maintainers to keep the package up-to-date. Similarly, if an older version of a library is rendered obsolete by a newer one, or the older version starts to create conflicts with the newer one, the older version needs to be removed from the distribution. However, we did not find evidence of this activity in our data samples. Our Maintainer Transfer activity comes closest, since this one occurs when an unmaintained package is “saved” from end-of-life by a new maintainer. + + +Surprisingly, the Internationalization activity, which is the ninth most frequent activity that we found (Table 3), was not mentioned by any maintainer. This activity comprises all the work related to translation and adaptation of a package to other cultures (e.g., different currencies) (Xia et al. 2013). Since distributions reach significantly more users than an individual upstream project could reach on its own, a packaged project has a higher chance of being used in non-English locales. Hence, distributions typically have dedicated teams addressing the internationalization needs of their packages. + + +For example, the debian-l10n-english team works on the translation templates of packages to facilitate the job of translators (who are often not software engineering experts). +Distributions typically solicit Internationalization patches once development has been frozen, i.e., the basic new functionality has been stabilized and only bug fixes are still allowed. Although Internationalization changes are typically harmless, they can in rare cases keep packages from executing. In January 2006, for example, an incomplete Japanese character prevented the xchat IRC client of FreeBSD from executing. A 1-character fix in a translation template fixed this issue. + + +Q2. What can the documented activities be used for? + M1, M3 and M4 agree that the documented patterns provide a clear overview of the major integration activities, which is useful for novices (M1) as well as any stakeholder involved in integration (M3/M4). M4 noted that the activities do not necessarily need to be used as direct documentation. They could also be used to check how well the distribution collects data or monitors the progress of each integration activity. M3 informed us that the structured, accessible explanations of the major integration activities piqued the interest of two of his package testers, which he believes to be a success. M6 recommended us to “reach out to developers communities with this documentation. E.g., you could write a blog post providing an introduction to your paper, targeted at distribution devs”. We are planning to follow up on this suggestion. + + +Q3. What is missing from the documented activities? + M3 was interested in getting more details and examples for each activity, while M4 wanted to know what the recommended practices and tools for each activity are. Our documented activities on purpose describe only the major tasks and how they are implemented in the three considered distributions, without a dedicated section for “best practices”. Given the many challenges identified in Section 5 as well as in Section 2, many activities rely on manual work, and hence do not yet have best practices. + + +Q4. What challenges did we miss? + M1 again mentioned license tracking. M4 noted that the largest challenge is not how to perform each activity, but how to perform them on time. Given the ever shorter time frame in between releases (Hertzog 2011; Remnant 2011; Shuttleworth 2008), this is indeed an important constraint on the identified challenges. Furthermore, the right activity to do on a particular moment also depends on the end-user: “desktop users want updates ASAP while enterprise users don’t want to change their software for multiple years”. This echoes known phenomena such as Microsoft’s monthly “patch Tuesday” (Lemos 2003) and Mozilla’s extended support releases for companies (Khomh et al. 2012). M4 concluded by warning for the challenges represented by the hundreds of variations in build systems, versioning schemes, projects, etc. Slightly related to this, M6 noted that “something orthogonal is the management of a large amount of software packages: getting a global overview from their status is not easy”. This ties into the management-related challenges of Table 4 identified from our data. + + +Q5. What promising tools/techniques do you see coming up to address some of the challenges? + Both M1 and M3 expect automated dependency checking tools to become mainstream, i.e., “It may take some time to make that automatic but we are getting closer every day”. Such tools would improve at least the Upstream Sync and Dependency Management activities. M1 mentioned two promising license analysis tools, while M3 remarked that “We already have automated testing tools in Ubuntu (see QA team) so we are heading in the right direction here”. M6 saw the advent of atomic Dependency Management and other packaging process improvements as a promising development. + + +Overall, the six maintainers liked the work and found that the documented activities described their daily activities “quite well” (M6). They would not necessarily use our documented representation of the activities themselves (it is more targeted towards novices), except to systematically check which activities their distribution is not tracking (M4). Some +missing important activities were identified, in particular license analysis and tracking of licensing changes, vulnerability resolution and post-release maintenance, as well as some missing challenges (especially time pressure). Finally, some tool support for dependency checking is expected to arrive in the medium term, however many challenges remain open. +---------------------------------------- +------------------------------- +Section 277: +7 Threats to Validity + + +With respect to construct validity, there are several threats to consider. First, we used the change log messages as a representative record of the maintainers’ activities, based on which important bug reports were identified for in-depth manual analysis and (if necessary) mailing list messages and other kinds of documentation. We did not formally verify the accuracy of these data sources, nor their completeness. Although M6 warned that the log message of the first version of a Debian package does not always mention whether Local Patch has been performed, none of the 4 instances of New Package found suffered from this issue. + + +There is no further evidence that suggests that the logs are incorrect: the three analyzed distributions require their maintainers to provide log messages (Debian project 2011; Koshy 2013), since those are the primary input for end users and other maintainers affected by changes to a package. In fact, bug reports and mailing lists form the official means of communication in OSS distributions, together with IRC chat messages. In cases where a bug report identifier was missing (cf. Fig. 4), either the change log item was sufficiently clear or we were able to find a related email message via a web search. + + +Second, we only analyzed a subset of the package-versions, and, hence, change logs. To mitigate this threat, we randomly sampled a large enough subset of package-versions to obtain a confidence interval of ±5% with a 95% confidence level. Furthermore, the activities that we identified for Ubuntu and FreeBSD did not add any new activity on top of those identified for Debian. + + +Third, our algorithm for reconstructing “versions” from the FreeBSD CVS commits depends on conventions that are documented by the FreeBSD project, but not explicitly enforced. It is possible that the recovered versions are either too fine-grained (under-approximating the actual number of activities performed for a version) or too coarse-grained (over-approximating). Feedback from the package maintainers confirmed that the algorithm is correct and that deviations from the guidelines should be minimal. + + +Fourth, since we study individual package-versions, our sample could contain multiple versions of some packages, just one version of other packages, and no version at all of the remaining packages. Such an approach is necessary, since large projects like KDE or GNOME involve more integration effort than smaller projects, and hence need to have more weight in our study. In addition, such projects typically also have a larger number of associated packages, which increases their weight further. The risk that this sampling decision biases the observed activities is small, since ecosystems like KDE and GNOME consist of hundreds of different applications and tools, developed by hundreds of developers and packaged by dozens of maintainers. In other words, even inside one such ecosystem, we should still expect a large diversity in integration activities. + + +Regarding internal validity, as mentioned above we rely on the accuracy and completeness of the logs of each package-version. Even in the event that some activities were not documented in the logs, there is no specific reason to believe that some activities would be less documented than others, hence this effect would cancel itself out across the different activities. For example, Post-release Maintenance was missed in our results, since “unimportant” discussions (i.e., those without explicit bug report or patch attached to +them) did not have any trace in the change log and its referenced bug reports, across all three distributions. + + +Furthermore, the nature of manual classification implies that there might be some misclassifications (both for the activities as well as challenges). To overcome this, the logs were interpreted by two of the authors, both of whom have experience in integration tasks (one of them is a Debian/Kubuntu developer), and they discussed their decisions with each other in order to resolve differences and obtain consensus. These discussions also resolved possible bias introduced by having the first set of tags be derived only by one of the authors. Furthermore, to validate the discovered patterns of integration and open challenges, we reached out to six maintainers/release engineers of Debian, Ubuntu and FreeBSD to evaluate and provide feedback on these patterns. Nonetheless, the quantitative results of this paper (prevalence of each activity) is exploratory only and we do not extrapolate these results. + + +The evaluation by the six maintainers was performed entirely via email, since this is the preferred means of communication for maintainers (and bug repositories, as discussed, are not suited). Furthermore, the asynchronous nature of emails provided breathing space to the maintainers and made it easier for them to organize their feedback amongst their voluntary open source activities and day-time job. Even then, we still observed that some of the questions were not addressed. In future work, we might complement asynchronous messages via email with synchronous follow-up via, for example, instant messaging (using IRC). + + +The open replies by some of the maintainers, as well as the selection of maintainers for the evaluation, also could introduce bias. M2 provided three open replies, M5 two open replies and M1 one open reply, yielding a total of 6 open replies out of 30 (20%). Due to the distribution of the open replies across the questions, each question obtained at least four concrete replies (two obtained six replies). Furthermore, the open replies are spread across the Debian and Ubuntu maintainers, reducing the overall impact of the missing data even further. Regarding selection bias, all six maintainers were experienced maintainers in their respective OSS distribution, covering a range of different packages according to size and domain. + + +An alternative evaluation methodology would have been to first perform a survey or interview, after which the research findings would be empirically analyzed and validated on change log and other data. However, doing this would bias our results to the activities that stakeholders think would be important, not necessarily all important activities that they are actually doing. Some essential activities never would have surfaced. + + +With respect to external validity, we have analyzed three of the largest OSS distributions as exemplars of packaging organizations. Since integration is the central activity of OSS distributions, we expect the identified activities to be representative for many of the activities that other packaging organizations would face in the case of OSS reuse. For example, packaging organizations like GNOME and KDE, or even “regular” Java or C++ systems that reuse multiple open source libraries as well have to deal with Upstream Sync (e.g., reusing a new version of log4j), Dependency Management (e.g., adding the dependencies of the new version of log4j) and Local Patch (e.g., customizing the new version of log4j to fix a bug). Nevertheless, manual analysis of other kinds of OSS distributions (e.g., Fedora-based), packaging organizations in general or any organization that performs multi-component integration, is necessary to confirm these conjectures and validate the generalizability of the seven integration activities. Such an analysis might discover new activities, for example in the case of package organizations that do not build products for end-users but rather middleware or frameworks for other companies to build on. +8 Conclusion + + +Software reuse is a major tenet of software engineering, yet the integration activities that accompany it, be it in a COTS, OSS or ISS context, introduce unforeseen maintenance costs. Since more empirical research is necessary in this area to help organizations reuse components successfully and since most studies thus far focused on integration of individual components and/or non-OSS integration, we performed a large-scale study on three successful OSS distributions, i.e., Debian, Ubuntu and FreeBSD. + + +Analysis of a large sample of change log messages, bug reports and other historical integration data resulted in the identification of seven major integration activities, whose processes were documented in a pattern-like fashion to help organizations and researchers understand the responsibilities involved in integration. The activities were shown to be non-trivial and requiring a large amount of effort, and they were validated by six maintainers of the three distributions. Based on the seven documented activities, the major challenges turned out to be related to cherry-picking of safe changes from a new upstream release, the management of dependencies between packages, testing of packages and coordination among maintainers. Models and tools are needed to support these integration activities. + + +By providing a unified terminology across distributions and by documenting the integration activities in a structured way, our catalogue of activities enables maintainers of open source distributions, organizations interested in reusing OSS or ISS components, and researchers to better understand the challenges and activities that they face, and to plan policies, tools and methods to address these challenges. Together with other studies on integration, a dedicated training program on integration could be built, aimed at developers and their managers, with the aim of reducing or at least stabilizing maintenance costs caused by integration. + + +Finally, and very encouragingly, all distribution maintainers that we contacted hope that the documented activities and challenges will inspire researchers to start up a research program in the domain of reuse and integration. + + +Acknowledgments The authors would like to thank all maintainers and release engineers of Debian, Ubuntu and FreeBSD who participated in our study, either directly (providing feedback on the documented activities), or indirectly (providing insights into the fascinating world of OSS distributions). + + +References + + +Adams B, De Schutter K, Tromp H, De Meuter W (2007) Design recovery and maintenance of build systems. In: Proceedings of the Intl. Conf. on Soft. Maint, pp 114–123 + + +Adams B, Kavanagh R, Hassan AE, German DM (2015) Replication package. http://mcis.polymtl.ca/publications/2014/integration_oss_distribution_adams_et_al.zip + + +Bac C, Berger O, Deborde V, Hamet B (2005) Why and how-to contribute to libre software when you integrate them into an in-house application? Proceedings of the 1st Intl Conf on Open Source Systems (OSS):113–118 + + +Basili VR, Briand LC, Melo WL (1996) How reuse influences productivity in object-oriented systems. Commun ACM 39(10):104–116 + + +Begel A, Nagappan N, Poile C, Layman L (2009) Coordination in large-scale software teams. In: Proceedings of the 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering, CHASE ’09, pp. 1–7, Washington, DC, USA, IEEE Computer Society + + +Bhuta J, Mattmann C, Medvidovic N, Boehm BW (2007) Framework for the Assessment and Selection of Software Components and Connectors in COTS-Based Architectures. In: WICSA, page 6 +Information Technology Resources Board (1999) Assessing the risks of commercial-off-the-shelf applications. Technical report, ITRB + + +Boehm B, Abts C (1999) COTS integration: Plug and pray? Computer 32(1):135–138 + + +Bowman IT, Holt RC, Brewster NV (1999) Linux as a case study: its extracted software architecture. In: Proceedings of the 21st Intl. Conf. on Software Engineering (ICSE), pp 555–563 + + +Brooks FP, Jr (1995) The Mythical Man-month (Anniversary Ed.) Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA + + +Brownsword L, Oberndorf T, Sledge CA (2000) Developing new processes for COTS-based systems. IEEE Softw 17(4):48–55 + + +Brun Y, Holmes R, Ernst MD, Notkin D (2011) Proactive detection of collaboration conflicts. In: Proceedings of the 19th ACM SIGSOFT Symp. and the 13th European Conf. on Foundations of Software Engineering (ESEC/FSE), pp 168–178 + + +Chen W, Li J, Ma J, Conradi R, Ji J, Liu Chunnian (2008) An empirical study on software development with open source components in the Chinese software industry. Softw Process 13:89–100 + + +Cochran WG (1963) Sampling Techniques, 2nd edn. John Wiley and Sons, Inc., New York + + +Coplien J, Hoffman D, Weiss D (1998) Commonality and variability in software engineering. IEEE Softw 15:37–45 + + +Crnkovic I, Larssom M (2002) Challenges of component-based development. J Syst Softw 61(3):201–212 + + +Curtis B, Krasner H, Iscoe N (1988) A field study of the software design process for large systems. Commun ACM 31(11):1268–1287 + + +Dagenais B., Robillard MP (2008) Recommending adaptive changes for framework evolution. In: Proceedings of the 30th Intl. Conf. on Software Engineering (ICSE), pp 481–490 + + +de Souza CRB, Redmiles D, Cheng L-T, Millen D, Patterson J (2004) Sometimes you need to see through walls: A field study of application programming interfaces. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW ’04, pp. 63–71, New York, NY, USA. ACM + + +de Souza CRB, Redmiles DF (2008) An empirical study of software developers’ management of dependencies and changes. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 241–250, New York, NY, USA, ACM + + +Project participants (2013). http://www.debian.org/devel/people + + +Debian project (2011) Debian Developer’s Reference, 2011 edition + + +DeLine R (1999) Avoiding packaging mismatch with flexible packaging. In: Proceedings of the 21st Intl. Conf. on Software Engineering (ICSE), pp. 97–106 + + +Developer’s Reference Team, Barth A, Di Carlo A, Hertzog R, Nussbaum L, Schwarz C, Jackson I (2011) Debian. The Debian Project + + +Di Cosmo R, Di Ruscio D, Pelliccione P, Pierantonio A, Zacchiroli S (2011) Supporting software evolution in component-based foss systems. Sci Comput Program 76:1144–1160 + + +Di Giacomo P (2005) COTS and open source software components: are they really different on the battlefield? In: Proceedings of the 4th intl. conf. on COTS-Based Software Systems (ICCBSS), pp 301–310 + + +Dogguy M, Glondu S, Le Gall S, Zacchiroli S (2010) Enforcing type-safe linking using inter-package relationships. In: Proc. of the 21st Journées Francophones des Langages Applicatifs (JFLA), p. 25p + + +Frakes W, Terry C (1996) Software reuse: metrics and models. ACM Comput Surv 28(2):415–435 + + +Frakes WB, Kang K (2005) Software reuse research: status and future. IEEE Trans Softw Eng 31:529–536 + + +FreeBSD porter’s handbook (2011). http://bit.ly/FQDPhP + + +The freeBSD developers (2013). http://www.freebsd.org/doc/en/articles/contributors/staff-committers.html + + +Gaffney JE, Durek TA (1989) Software reuse – key to enhanced productivity: some quantitative models. Inf Softw Technol 31(5):258–267 + + +Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., MA + + +German DM, Gonzalez-Barahona JM, Robles G (2007) A model to understand the building and running inter-dependencies of software. In: Proceedings of the 14th Working Conf. on Reverse Engineering (WCRE), pages 140–149 + + +German DM, Hassan AE (2009) License integration patterns: addressing license mismatches in component-based development. In: Proceedings of ICSE, pp 188–198 + + +German DM, Webber JH, Di Penta M (2010) Lawful software engineering. In: Proceedings of the FSE/SDP wrksh. on Future of Soft. Eng. research (FoSER), pp. 129–132 +Gonzalez-Barahona JM, Robles G, Michlmayr M, Amor JJ, German DM (2009) Macro-level software evolution: a case study of a large software compilation. Empirical Softw Engg 14:262–285 + + +Goode S (2005) Something for nothing: management rejection of open source software in australia’s top firms. Inf Manage 42(5):669–681 + + +The BSD Certification Group (2005) BSD usage survey. Technical report, The BSD Certification Group + + +Hauge Ø, Ayala C, Conradi R (2010) Adoption of open source software in software-intensive organizations - a systematic literature review. Inf Softw Technol 52(11):1133–1154 + + +Hauge Ø, Sørensen C-F, Conradi R (2008) Adoption of open source in the software industry. In: Proceedings of the 4th IFIP WG 2.13 Intl. Conf. on Open Source Systems (OSS), vol 275, pp 211–221 + + +Herbsleb JD, Grinter RE (1999) Splitting the organization and integrating the code: Conway’s law revisited. In: Proceedings of the 21st International Conference on Software Engineering, ICSE ’99, pp. 85–95, New York, NY, USA, ACM + + +Herbsleb JD, Mockus A, Finholt TA, Grinter RE (2001) An empirical study of global software development: distance and speed. In: Proceedings of the 23rd International Conference on Software Engineering, ICSE ’01, pp. 81–90, Washington, DC, USA, IEEE Computer Society + + +Hertzog R (2011) Towards Debian rolling: my own Debian CUT manifesto. http://raphaelhertzog.com/2011/04/27/towards-debian-rolling-my-own-debian-cut-manifesto/ + + +Jaaksi A (2007) Experiences on product development with open source software. In: Proc. of the IFIP Working Group 2.13 on Open Source Soft, volume 234, pp 85–96. Springer + + +Koshy J (2013) Building products with FreeBSD. http://www.freebsd.org/doc/en/articles/building-products/, 2013 + + +Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality? – an empirical case study of Mozilla Firefox. In: Proceedings of the 9th IEEE Working Conf. on Mining Software Repositories (MSR), pp 179–188, Zurich, Switzerland + + +Lemos R (2003) Microsoft details new security plan. http://news.cnet.com/Microsoft-details-new-security-plan/2100-1002.3-5088846.html + + +Lewis P, Hyle P, Parrington M, Clark E, Boehm B, Abts C, Manners R (2000) Lessons learned in developing commercial off-the-shelf (COTS) intensive software systems. Technical report, SERC + + +Li J, Conradi R, Bunse C, Torchiano M, Slyngstad OPN, Morisio M (2009) Development with off-the-shelf components: 10 facts. IEEE Softw 26:80–87 + + +Li J, Conradi R, Slyngstad OP, Torchiano M, Morisio M, Bunse C (2008) A state-of-the-practice survey of risk management in development with off-the-shelf software components. IEEE Trans Softw Eng 34:271–286 + + +Li J, Conradi R, Slyngstad OPN, Bunse C, Khan U, Torchiano M, Morisio M (2005) An empirical study on off-the-shelf component usage in industrial projects. In: Proceedings of the 6th intl. conf. on Product Focused Software Process Improvement (PROFES), pp. 54–68 + + +van der Linden FJ, Schmid K, Rommes E (2007) Software product lines in action: the best industrial practice in product line engineering. Springer, Berlin Heidelberg + + +Van Der Linden F (2009) Applying open source software principles in product lines. Eur J Informa Prof (UPGRADE) 3:32–40 + + +Lundqvist A (2013) GNU/Linux distribution timeline. http://futurist.se/gldt/ + + +Mattsson M, Bosch J, Fayad ME (1999) Framework integration problems, causes, solutions. Commun ACM 42(10):80–87 + + +McCamant S, Ernst MD (2003) Predicting problems caused by component upgrades. In: Proceedings of the Symposium on the Foundations of Software Engineering, pp. 287–296 + + +McIntosh S, Adams B, Kamei Y, Nguyen T, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of ICSE, pages 141–150 + + +Merilinna J, Matinlassi M (2006) State of the art and practice of opensource component integration. In: Proceedings of the 32nd Conf. on Software Engineering and Advanced Applications (EUROMICRO), pp 170–177 + + +Meyer MH, Lehnerd AP (1997) The power of product platforms. Free Press, New York + + +Michlmayr M, Hunt F, Probert D (2007) Release management in free software projects: practices and problems. In: Open Source Development, Adoption and Innovation, v. 234, pp. 295–300 + + +Mistrík I, Grundy J, Hoek A, Whitehead J (2010) Collaborative software engineering: challenges and prospects, chapter 19, 1st edn. Springer, Berlin Heidelberg, pp 389–402 + + +Mohamed A, Ruhe G, Eberlein A (2008) Optimized mismatch resolution for COTS selection. Softw Process 13(2):157–169 + + +Morisio M, Seaman CB, Basili VR, Parra AT, Kraft SE, Condon SE (2002) COTS-based software development: processes and open issues. J Syst Softw 61(3):189–189 +Navarrete F, Botella P, Franch X (2005) How agile COTS selection methods are (and can be)? In: Proceedings of the 31st EUROMICRO Conference on Software Engineering and Advanced Applications, EUROMICRO ’05, pp 160–167, Washington, DC, USA. IEEE Computer Society + + +Orsila H, Geldenhuys J, Ruokonen A, Hammouda I (2008) Update propagation practices in highly reusable open source components. In: Proceedings of the 4th IFIP WG 2.13 Int. Conf. on Open Source Systems (OSS), vol 275, pp 159–170 + + +Parnas DL (1976) On the design and development of program families. IEEE Trans Softw Eng 2:1–9 + + +Pohl Klaus, Böckle G, van der Linden FJ (2005) Software product line engineering: foundations, principles and techniques. Springer, New York + + +Remnant SJ (2011) A new release process for Ubuntu? http://netsplit.com/2011/09/08/new-ubuntu-release-process/ + + +Rodin J, Aoki O (2011) Debian New Maintainers’ Guide. The Debian Project + + +Ruffin M, Ebert C (2004) Using open source software in product development: a primer. IEEE Softw 21(1):82–86 + + +Sadowski BM, Sadowski-Rasters Gaby, Duysters G (2008) Transition of governance in a mature open software source community: evidence from the Debian case. Inf Econ Policy 20(4):323–332 + + +Scacchi W, Feller J, Fitzgerald B, Hissam S, Lakhani K (2006) Understanding free/open source software development processes. Softw Process: Improv Pract 11(2) + + +Seaman CB (1996) Communication costs in code and design reviews: an empirical study. In: Proceedings of the 1996 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON ’96, pp 34–. IBM Press + + +Shihab E, Bird C, Zimmermann T (2012) The effect of branching strategies on software quality. In: Proceedings of the ACM/IEEE intl. symp. on Empirical Software Engineering and Measurement (ESEM), pp 301–310 + + +Shuttleworth M (2008) The art of release. http://www.markshuttleworth.com/archives/146 + + +Sojer M, Henkel J (2010) Code reuse in open source software development: quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(iss.12) + + +Spinellis D, Szyperski C, Guest editors’ introduction: how is open source affecting software development? (2004) IEEE Softw 21(1):28–33 + + +Stol K-J, Babar MA, Avgeriou P, Fitzgerald B (2011) A comparative study of challenges in integrating open source software and inner source software. Inf Softw Technol 53(12):1319–1336 + + +Szyperski C (1998) Component software: beyond object-oriented programming. Addison-Wesley Publishing Co., MA + + +The Fedora Project (2011) Package update HOWTO. http://fedoraproject.org/wiki/Package_update + + +The FreeBSD Documentation Project (2011) FreeBSD Porter’s Handbook. The FreeBSD Foundation + + +Tiangco F, Stockwell A, Sapsford J, Rainer A, Swanton E. (2005) Open-source software in an occupational health application: the case of heales medical ltd. Procs 1:130–134 + + +Trezentos P, Lynce I, Oliveira AL (2010) Apt-pbo: solving the software dependency problem using pseudo-boolean optimization. In: Proceedings of the IEEE/ACM intl. conf. on Automated Software Engineering (ASE), pp. 427–436 + + +Qiang T, Godfrey M (2001) The build-time software architecture view. In: Proceedings of ICSM, pp. 398– + + +MOTU team (2013). https://launchpad.net/%7Emotu/+members + + +Ubuntu core development team (2013). https://launchpad.net/%7Eubuntu-core-dev/+members + + +Ubuntu universe contributors team (2013). https://launchpad.net/universe-contributors/+members + + +van der Hoek A, Wolf AL (2003) Software release management for component-based software. Softw Pract Exper 33:77–98 + + +Ven K, Mannaert H (2008) Challenges and strategies in the use of open source software by independent software vendors. Inf Softw Technol 50(9-10):991–1002 + + +Whittaker J, Arbon J, Carollo J (2012) How google tests software. Addison-Wesley Professional, MA + + +Comparison of BSD operating systems (2011). http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems + + +Xia X, Lo D, Zhu F, Wang X, Zhou B (2013) Software internationalization and localization: an industrial experience. In: Proceedings of the 18th Intl. Conf. on Engineering of Complex Computer Systems (ICECCS), pp. 222–231 + + +Yakimovich D, Bieman JM, Basili VR (1999) Software architecture classification for estimating the cost of COTS integration. In: Proceedings of the 21st Intl. Conf. on Software Engineering (ICSE), pp. 296–302 +Bram Adams is an assistant professor at Polytechnique Montréal (Canada). He obtained his PhD at the GHSEL lab at Ghent University (Belgium), and was an adjunct assistant professor in the Software Analysis and Intelligence Lab at Queen’s University (Canada). His research interests include software release engineering in general, as well as software integration and software build systems in particular. His work has been published at premier software engineering venues such as TSE, ICSE, FSE, ASE, EMSE, MSR and ICSME. In addition to co-organizing RELENG 2013 to 2015 (and the 1st IEEE SW Special Issue on Release Engineering), he co-organized the PLATE, ACP4IS, MUD and MISS workshops, and the MSR Vision 2020 Summer School. He is PC co-chair of SCAM 2013, SANER 2015 and ICSME 2016. + + +Ryan Kavanagh is a Bachelor of Computing (Honours) student in Computing and Mathematics at Queen’s University. He has been a research assistant at the SAIL lab of Dr. Hassan, at McGill University, and at Microsoft Research Cambridge. Ryan started contributing to Ubuntu and its derived distributions in February 2006 (while being in high school), and in December 2011 he became an official Debian developer. In his spare time, Ryan is an avid piper, with various Canadian titles under his belt. +Ahmed E. Hassan is the Canada Research Chair (CRC) in Software Analytics, and the NSERC/BlackBerry Software Engineering Chair at the School of Computing at Queen’s University, Canada. His research interests include mining software repositories, empirical software engineering, load testing, and log mining. Hassan received a PhD in Computer Science from the University of Waterloo. He spearheaded the creation of the Mining Software Repositories (MSR) conference and its research community. Hassan also serves on the editorial boards of IEEE Transactions on Software Engineering, Springer Journal of Empirical Software Engineering, and Springer Journal of Computing. Contact him at ahmed@cs.queensu.ca. + + +Daniel German is professor of Computer Science at the University of Victoria. He completed his PhD at the University of Waterloo in 2000. His work spans the areas of mining software repositories, open source, and intellectual property in software engineering. +---------------------------------------- +------------------------------- +Section 278: +Variant Forks – Motivations and Impediments + + +John Businge,∗ Ahmed Zerouali,‡ Alexandre Decan,† Tom Mens,† Serge Demeyer,∗ and Coen De Roover,‡ + + +∗University of Antwerp, Antwerp, Belgium +†University of Mons, Mons, Belgium +‡Vrije Universiteit Brussels, Brussels, Belgium + + +{ john.businge | serge.demeyer }@uantwerpen.be +{ alexandre.decan | tom.mens }@umons.ac.be +{ ahmed.zerouali | coen.de.roover }@vub.be + + +Abstract—Social coding platforms centred around git provide explicit facilities to share code between projects: forks, pull requests, cherry-picking to name but a few. Variant forks are an interesting phenomenon in that respect, as they permit for different projects to peacefully co-exist, yet explicitly acknowledge the common ancestry. Several researchers analysed forking practices on open source platforms and observed that variant forks get created frequently. However, little is known on the motivations for launching such a variant fork. Is it mainly technical (e.g., diverging features), governance (e.g., diverging interests), legal (e.g., diverging licences), or do other factors come into play? We report the results of an exploratory qualitative analysis on the motivations behind creating and maintaining variant forks. We surveyed 105 maintainers of different active open source variant projects hosted on GitHub. Our study extends previous findings, identifying a number of fine-grained common motivations for launching a variant fork and listing concrete impediments for maintaining the co-existing projects. + + +Index Terms—Mainlines, Variants, GitHub, Software ecosystems, Maintenance, Variability + + +I. INTRODUCTION + + +The collaborative nature of open source software (OSS) development has led to the advent of social coding platforms centred around the git version control system, such as GitHub, BitBucket, and GitLab. These platforms bring the collaborative nature and code reuse of OSS development to another level, via facilities like forking, pull requests and cherry-picking. Developers may fork a mainline repository into a new forked repository and take governance over the latter while preserving the full revision history of the former. Before the advent of social coding platforms, forking was rare and was typically intended to compete with the original project [1]–[6]. + + +With the rise of pull-based development [7], forking has become more common and the community typically characterises forks by their purpose [8]. Social forks are created for isolated development with the goal of contributing back to the mainline. In contract, variant forks are created by splitting off a new development branch to steer development into a new direction, while leveraging the code of the mainline project [9]. + + +Several studies have investigated the motivations behind variant forks in the context of OSS projects [1]–[6]. However, most have been conducted before the rise of social coding platforms and it is known that GitHub has significantly changed the perception and practices of forking [8]. In this social coding era, variant projects often evolve out of social forks rather than being planned deliberately [8]. To this end, social coding platforms often enable mainlines and variants to peacefully co-exist rather than compete. Little is known on the motivations for creating variants in the social coding era, making it worthwhile to revisit the motivation for creating variant forks (why?). + + +Social coding platforms offer many facilities for code sharing (e.g., pull requests and cherry-picking). So if projects co-exist, one would expect variant forks to take advantage of this common ancestry, and frequently exchange interesting updates (e.g., patches) on the common artefacts. Despite advanced code-sharing facilities, Businge et al. observed very limited code integration, using the git and GitHub facilities, between the mainline and its variant projects [10]. This suggests that code sharing facilities in themselves are not enough for graceful co-evolution, making it worthwhile to investigate impediments for co-evolution (how?). + + +We therefore explore two research questions: + + +RQ1: Why do developers create and maintain variants on GitHub? The literature pre-dating git and social coding platforms identified four categories of motivations for creating variant forks: technical (e.g., diverging features), governance (e.g., diverging interests), legal (e.g., diverging licences), and personal (e.g., diverging principles). RQ1 aims to investigate whether those motivations for variant forks are still the same, or whether new factors have come into play. + + +RQ2: How do variant projects evolve with respect to the mainline? If, despite advanced code sharing facilities, there is limited code integration between the mainline and the variant projects, a possible cause could be related to how the teams working on the variants and the mainline are structured. Therefore, RQ2 investigates the overlap between the teams maintaining the mainline and variant forks, and how these teams interact. As such we hope to identify impediments for co-evolution. + + +The investigations are based on an online survey conducted with 105 maintainers involved in different active variant forks hosted on GitHub. + + +Our contributions are manifold: we identify new reasons for creating and maintaining variant forks; we identify and categorize different code reuse and change propagation practices between a variant and its mainline; we confirm that little code integration occurs between a variant and its mainline, and uncover concrete reasons for this phenomenon. We discuss +the implications of these findings and how tools can help to achieve an efficient code integration and collaboration between mainlines and diverging variant forks. Our replication package can be found 1. + + +II. RELATED WORK + + +Previous research has focused on (A) motivations for creating or maintaining variant forks; and (B) interaction between variant forks and their mainline. + + +A. Motivations for creating or maintaining variant forks + + +Several studies have investigated motivations for creating and maintaining variant forks. However, most of these studies were carried out on SourceForge, pre-dating the advent of social coding platforms like GitHub [1]–[5], [11]. Several of those early studies report perceived controversy around variant forks [5], [12]–[17]. Jiang et al. [18] state that, although forking may have been controversial in the OSS community, it is now encouraged as a built-in feature on GitHub. They further report that developers create social forks of repositories to submit pull requests, fix bugs, and add new features. Zhou et al. [8] conclude that most variant forks started as social forks and that perceptions of forks have changed with the advent of GitHub. Robles and González-Barahona [2] carried out a comprehensive pre-GitHub study on a carefully filtered list of 220 potential forks referenced on Wikipedia. They report motivations and outcomes for forking on these 220 projects. + + +The literature has uncovered a number of motivations for creating variants. Below, we present those where both the mainline and variant co-evolve together. The motivation of reviving an abandoned project is not considered in this study since it does involve co-evolution if the variants. + + +○ Technical (addition of functionality). Sometimes developers want to include new functionality into the project, but the main developer(s) do not accept the contribution. An example is Poppler, a fork of xpdf relying on the poppler library [2]. + + +○ Governance disputes. Some contributors from the community create a variant project because they feel that their feedback is not heard, or because the maintainers of the mainline are unresponsive or too slow at accepting their patches. A well-known example is a fork of GNU Emacs (originally Lucid) which was created as a result of the significant delays in bringing out a new version to support the Energize C++ IDE [19]. + + +○ Legal issues. This includes disagreements on the license and trademarks, and changes to conform to rules and regulations. An example is X.Org, which originated from XFree86 [2], [19]. XFree86 was originally MIT/X open source license that is GPL-compatible and then was changed to one that was not GPL-compatible. This caused many practical problems and a serious uproar in the community, resulting in the project fork X.Org. + + +○ Personal reasons. In some situations, the developer team disagrees on fundamental issues (beyond mere technical matters) related to the software development process and the project. An example is the OpenBSD fork from NetBSD. One of the developers of NetBSD had a disagreement with the rest of the core developers and decided fork and focus his efforts on OpenBSD [20]. + + +Focusing on variant forks in the Android ecosystem, Businge et al. [21] found that re-branding, simple customizations, feature extension, and implementation of different but related features are the main motivations to create forks of Android apps. Zhou et al. [8] interviewed 18 developers of hard forks on GitHub to understand reasons for forking in social coding environments that explicitly support forking. The motivations they observed align with the findings of the aforementioned studies. + + +Sung et al. [9] investigated variant forks in an industrial case study to uncover the implications of frequent merges from the mainline and the resulting merge conflicts in the variant forks. They implemented a tool that can automatically resolve up to 40% of 8 types of mainline-induced build breaks. + + +While the pre-GitHub studies reported perceived controversy around variant forks, Zhou et al. [8] report that this controversy has reduced with the advent of GitHub. Jiang et al. [18] report that, while forking is considered controversial in traditional OSS communities, it is actually embraced as a built-in feature in GitHub. Our study builds on these previous studies to identify whether the motivations for variant forks are still the same or whether new factors have come into play. + + +B. Interaction between variant forks and their mainline + + +We have only encountered two studies that investigated the interaction between variant forks and mainlines [8], [10]. Zhou et al. [8] conducted 18 semi-structured developer interviews. Many respondents indicated being interested in coordination across repositories, either for eventually merging changes back into the mainline, or to monitor activity in the mainline repository and select and integrate interesting updates into their variant project. Businge et al. [10] also investigated the interaction between mainline and variants. The authors quantitatively investigated code propagation among variants and their mainline in three software ecosystems. They found that only about 11% of the 10,979 mainline–variant pairs had integrated code between them. Since the mainlines and variants share a common code base, and with the collaborative maintenance facilities of git and the pull-based development model, one would expect more interactions between the mainline and its variants. We hypothesise that there are some impediments to enable such interactions. Since the two aforementioned studies do not report any such impediments, we decided to carry an exploratory qualitative survey with variant maintainers to identify possible impediments. + + +III. STUDY DESIGN + + +To understand the motivations behind the creation and maintenance of variant forks we conducted an online survey with maintainers of variant forks. In this section, we explain +how we (i) designed the survey protocol; (ii) collected mainline-variant pairs and extracted the maintainers of the variant forks; and (iii) recruited the survey participants. + + +A. Survey Protocol Design + + +We designed a 12-question survey that would last at most 15 minutes. Since we aimed to learn from a large number of projects, we used an online survey as this data collection approach is known to scale well [22]. The survey can be found here(^2). The questions were designed to cover our two main research questions. 8 of the 12 questions were close-ended and respondents could answer them either via multiple choice or Likert scales. An optional free-text form was provided for 3 of the 8 close-ended questions to allow respondents to share additional thoughts and feedback. The 4 remaining questions are open-ended. All questions were carefully formulated so as not to bias respondents towards a specific answer. We validated them by subjecting them to the critical eye of 7 colleagues and by conducting trial runs of the survey with the same 7 participants. + + +B. Identifying variant projects and participants + + +Given the scope of the survey, we target respondents involved in the creation and maintenance of variant projects. Therefore, we first needed to identify such variants. To this end, we relied on two data sources: Libraries.io and GitHub. + + +Libraries.io contains metadata about projects distributed through various package registries. We collected the metadata for all projects of some of the largest package registries (npm, Go, Maven, PyPI and Packagist). We relied on this metadata to identify those projects that are variants of another one, following the variant identification method proposed by Businge et al. [10], [23]. We only considered variants that are actively maintained in parallel with their mainline counterparts. We extracted variants for which the mainline–variant pair was created before 2019-04-01 and updated at least once after 2020-04-01 (i.e., active projects). This process yielded 227 mainline–variant project pairs. + + +We collected additional mainline-variant pairs from GitHub directly. To do so, we searched for mainline projects using the GitHub search endpoint. We looked for popular (> 50 stars and forks), long-lived (created before 2018) and active (still updated in 2020) repositories. We focused on software development repositories whose main language is among the top 17 of most popular languages used in GitHub (e.g., JavaScript, Java, Go, Python, Ruby, C, etc). For all the mainline projects we found, we tried to identify and collect variant forks. This process is subject to a known threat to validity since previous studies revealed that the majority of forks on GitHub are inactive [24], [25] or are social forks [21]. To reduce this threat, we filtered forks based on the following heuristics: ( \geq 10 ) stars, ( \geq 10 ) commits ahead of the mainline, ( \geq 5 ) closed pull requests, diverging README files. We manually verified these remaining forks to ensure they corresponded to variants of the corresponding mainline. This process yielded 264 additional mainline-variant project pairs, leading to a total of 491 collected mainline–variant pairs. + + +C. Participant Recruitment + + +Based on this collection of mainline-variant pairs, we identified contributors that had integrated at least one pull request into the variant. We retrieved their public-facing emails (if available) using the GitHub API, while ensuring to respect the GitHub Privacy Statement.(^3) We individually contacted a total of 762 variant maintainers from the 491 variant projects, and received a total of 105 responses (response rate 14%), representing a total of 105 variant forks (21%). All participants were required to read and accept an informed consent form before taking part in the survey. + + + + +D. Analysis + + +We used open card sorting [26], on the 3 open-ended questions to identify common responses reported by the participants. In the analysis, we grouped similar responses from the open-ended questions into themes. We did not start with any pre-defined themes in mind, but instead derived the themes from the open-ended answers, iterating as many times as needed until reaching a saturation point. The first iteration of coding themes was performed by the first author of the paper, and any responses the first author was unsure of were decided by discussion with the second author. Once the first two authors agreed on the themes, a virtual meeting was set with all six authors to discuss the resulting themes and come to a negotiated agreement [27]. This allowed us to remove duplicates and, in some cases, to generalize or specialize themes. + + +(^2)10.5281/zenodo.5855808 + + +(^3)https://docs.github.com/en/github/site-policy/github-privacy-statement +IV. RQ1: Why do developers create and maintain variants on GitHub? + + +RQ1 aims to investigate whether new motivations for creating variant forks have changed since the advent of social coding platforms. To do so, we asked the survey participants the following questions: + + + + +SQ1a +: Was the motivation for creating the variant an individual or a community decision? + + +SQ1b +: What was the motivation for creating the variant of the mainline project? + + +SQ1c +: What are the motivation details relating to the motivation in SQ1b? + + + + +For +SQ1c +, we presented a multiple choice question. +SQ1a + presented Likert-scale answer options, while +SQ1b + was an optional open-ended question. For the latter, we coded the responses into themes and categorised common themes. When quoting the survey respondents, we refer to them using [R N] notation, where N is the respondent’s ID. The respondents’ answers that include the selection on the multiple choice answers as well as the themes resulting from coding open-ended answers are underlined. The open-ended responses are presented in +italics +. Where applicable, we integrate and compare our findings with related research findings. + + +A. Results + + +Fig. 2 summarises the responses for +SQ1a + and +SQ1b +. Fig. 2(a) shows that the majority of the participants responded that the decision was individual. Fig. 2(b) shows that the majority ranked highly the technical motivation for creating variants. We also see quite a number of highly ranked motivations of governance and others. + + +While previous studies have investigated the motivations for creating variants, no study has investigated the details of those motivations ( +SQ1c +). To identify these details, two optional open-ended questions allowed respondents to provide details on their Likert-scale answer to +SQ1b +. The two questions were (1) +Kindly provide details for your selected answer(s) on the motivation +; and (2) +If there are any links that are documented relating to your choice of answers on motivation detail, kindly point us there +. +---------------------------------------- +------------------------------- +Section 279: +100 of the 105 survey respondents answered the optional open-ended question +SQ1c +. Luckily, during the coding process (cf. Section III-D), we were able to identify possible answers of the 5 respondents that did not answer +SQ1c + by comparing the information on the +readme.md + files of the variant and mainlines. 30 of the 105 respondents provided links to documents (pull requests, issues, and blogs) relating to their choice of answers on motivation detail. + + +Fig 3 presents a Sankey diagram summarising the details of the respondents’ choice of motivation based on the coded themes. The figure presents the distribution of the responses to all questions relating to +RQ1 + and how these responses relate to each other. The thickness of the edge represents the frequency of respondents between two entities. + + +Focusing on the axes of decision and motivation, we can confirm the observations from Fig. 2(b) that the majority of respondents had an individual and technical motivation. The majority of respondents that answered the question original developers? selected none implying that the majority of the variants were started by different developers. Since the answers to +SQ1b + were presented on a Likert scale, participants were asked to rank the appropriate motivation(s) to why they created the variant. While coding the motivations details, we identified respondents who ranked highly more than one motivation category and also provided a response in the open-ended question to support each highly ranked motivation category. In this scenario, each highly ranked motivation category would have a motivation detail for the same respondent. At the end we found that 105 of the survey participants chose 145 motivation categories, of which 84 technical, 34 governance, 3 legal and 24 others. Below we present the common motivation themes and some specific responses we found very interesting. + + +Technical. + Maintenance is the most frequently mentioned reason for the technical motivation. 19 of the 84 survey participants who selected technical, mentioned phrases related to performing bug/security fixes. + + + + +[R59] ranked highly both technical and governance and mentioned “The PR to merge the fork’s new capabilities into the mainline code was too large, [...] and my attempts to incorporate feedback into the PR [...] ended upsetting the primary maintainer who has been studiously ignoring the pull request for three years”. The respondent also provided a GitHub link to his pull request to the mainline. Indeed, we found that the PR was made in February 2018 and was accompanied by a discussion of 218 comments between the mainline maintainer and the respondent. On October 2021, the PR was still open. +• “I forked the original project in order to fix a bug. However, the way the original was architected made this very challenging, so I ended up rewriting it instead of submitting a patch to the original.” [R79] + + + + +The next prominent technical motivation detail was different goals. 17 respondents who selected technical, mentioned phrases related to variants present different goals / content / communities / directions: + + +• “[We] list websites that accept Bitcoin Cash cryptocurrency, as opposed to the mainline that lists websites with 2 factor authentication.” [R1] + + +• “The original goal of the mainline is completely different from the fork variant.” [R4] + + +• “We wanted to take the project in a different direction” [R100]. + + +An equally prominent technical motivation detail was new features. 17 respondents who selected technical, mentioned phrases related to introduction of new features not in the mainline: + + +• “[...] to add support for a feature I knew would not get merged into the main project.” [R53] + + +• “Mainline developer only does bugfixes and eventual underlying runtime/SDK upgrades to stay current. He did not add new features due to lack of interest [...]” [R67] + + +• “Our variant introduces new experimental functionality that is not yet ready for use in the mainline.” [R80] + + +Another technical motivation was customization. 8 respondents who selected technical, mentioned phrases related to variant customizes the mainline features: + + +• “The “bones” were good, but I wanted to add some aesthetics [...] so, I forked it to make it pretty and my own.” [R10] + + +• “The new version is a vectorized, accelerated version of the original.” [R37] + + +• “[We] added some syntactic sugar and some improvements by itself [...]” [R42] + + +The next technical motivation was unmaintained feature. 8 respondents who selected technical, mentioned phrases related to one of the mainline feature used by the variant is no longer maintained. + + +• “The ‘shiny’ component of mainline was declared to be no longer maintained around the time I created our fork. [...] I did not like many of the architectural decisions of the original project, I opted to create a fork instead of volunteer to maintain the original.” [R65]. The respondent provided an extra link. An issue about ‘shiny’ component was opened up in July 2015 and closed in July 2017. The issue contained 93 comments from 35 participants. When closing the issue the maintainer stated that “[...] If somebody or bodies from the community wants to fork the source code and run with it, they have my blessing [...]”. The variant was created on August 2017. + + +• “The mainline project had made a radical shift from providing one set of features to a different, disjoint set of features. The maintainer had thought about it very well, but some users (including myself) had built their workflows around one of the old features. For this reason, I lifted that particular feature into a separate project that was also published under a different name to the package index.” [R23]. The respondent also provided us a GitHub issue link, discussing the details. The issue was opened by the variant maintainer on July 2015 and was eventually closed on April 2018. The issue had 33 comments involving 17 participants. + + +• “Mainline dropped support for a small subset of the code and asked for community support to create a fork to support that subset” [R66]. + + +A final technical motivation was technology. 7 respondents who selected technical, mentioned phrases related to variant created to depend on a different technology. + + +• “Added support for Open Street Maps as an available map provider [...] mainline was not willing to accept this kind of contribution.” [R8]. This was also ranked as a governance. +• “The mainline wasn’t updated to use .NET Core which I was using in my project, so I updated it” [R29] +• “[...] to keep the source code compatible with the language/compiler version that we use (Swift / Xcode). [...] if the maintainer of the mainline is supporting a different one, then we could not compile our dependency anymore.” [R54] + + +Governance. + After technical, governance is the secondmost popular motivation, with responsiveness being the most prominent governance category. 18 of the 34 respondents who selected governance mentioned phrases related to mainline was unresponsive to pull requests or issues for a long time. Most of the respondents that ranked governance highly as their motivation, also ranked other options of motivations highly. Only 4 of the 34 ranked only governance. + + +• “[They] had a series of commits that fixed functionality for newer PHP versions, but never made into a release. After waiting for more than a year for a release, a fork was done just to push a newer release into Composer/Packagist.” [R21] +• “We submitted some bug fixes [...], but didn’t hear back from the maintainer for a while and needed to progress to meet our own goals so we forked. I followed up over email with the maintainer and he merged the patches about a month later, at which point we closed down and archived our fork and returned to using the mainline.” [R15]. Merging back to the original corresponds to one the outcomes of variant forking reported in [2]. +• “[...] due to lack of response from mainline maintainer (more than months) and need of release. This lead to release of a new variant. [...] there is no intention to submit changes to mainline anymore (even when the first PR was merged into mainline after more than year).” [R56] + + +The next governance motivation was feature acceptance. 15 respondents who selected governance, mentioned phrases related to mainline hesitant to or not willing to accept feature. + + +• “TECHNICAL: Added support for Open Street Maps as an available map provider. GOVERNANCE: not exactly governance, but mainline was not willing to accept this kind of contribution” [R8]. This was coded as technology in technical. The respondent also provided a GitHub PR link containing extra information. The PR included 45 conversations and 15 participants between June 2018 until March 2021 when it was closed. +• “Mainline was not ready to accept those changes in part because the maintainers were not responsive. Since that time all of the issues have been dealt with and my variant is no longer needed, though the infrastructure for creating a new release of the variant remains in place in the event that it might be needed in the future.” [R44] +• “[...] even main repo maintainer was saying he is busy and please use your fork for thing X and Y. We don’t know the exact reason why he stopped maintaining it and also did not allow us to maintain his repo” [R89]. In one of the multiple choice answers, the respondent indicated that the variant was created through a community decision. The respondent also provided an extra link, revealing that three contributors from the community were interested in a couple of new features that were missing in mainline, but the mainline maintainer seemed busy. At the end, two members of the community took over the fork maintenance and introduced the missing features and advertised the additions in the readme.md file of the fork as well as in the issue. + + +Others. + The most prominent motivation for others is supporting personal projects. 8 of the 24 respondents who selected others mentioned phrases related to variant was created to support personal projects. + + +• “[The] maintainer was not interested in a PR that added functionality needed by a project I’m developing. [It] was considerably easier to add the logic into the [new] library than bolt it on.” [R18]. This was ranked as technical, governance, and others. As we can see in the participant response we have phrases like “adding logic” (new features, technical), “was not interested in a PR” (feature acceptance, governance), and “functionality needed by a project I’m developing” (supporting personal projects, others). +• “In Oct 2017 [...] has changed its API and these changes broke the mainline project. I used this project daily and needed to fix it ASAP. After quick fix I started to add my own features. [...] the mainline project has been fixed and refactored, but my other projects were already depending on my own fork.” [R56] +• “[...] to make sure that no matter what happen to the mainline repository, we can maintain source access to this library, which is an essential dependency of our project. [...]” [R54]. This response is in line with Nyman et al. [1] who reported that forking provides a mechanism for safeguarding against despotic decisions by the project lead, who is thus guided in their actions to consider the best interest of the community. + + +The next motivation for others was supporting mainline, which was mentioned by 7 respondents who selected others: + + +• “We have a fork that is the “main fork”, which is [...], and the “development fork” is [FORKNAME]. In this case, our modeling tool [...] is only maintained as the fork [...] we synchronize everything between both forks while the [FORKNAME] one is mainly used to develop new features, which are then pushed as PRs to the main fork.” [R61] +• “Preparation of mainline pull requests. mainline repo should not be spammed by WIP PRs by students. Supervisors do coaching and try to improve the quality by the initial mainline pull request. [...] Keeping the PR open on the fork, reduces the number of PRs.” [R73] +• “We needed a repository for tracking our ideas to keep the number of issues of the main repository low.” [R83]. The extra link that was provided revealed that the mainline and variant are owned by the same developer: “this repository is used by [X] to make his ideas transparent. He collects the issues here to avoid flooding the “official” issue tracker. - Refined issues will be migrated to the official issue tracker”. + + +The next motivation detail for others was code quality. 3 respondents who selected others, mentioned phrases related to mainline low code quality. +“The mainline [...] was clearly written by someone who isn’t a professional software engineer.” [R63] + + +“The way the original was architected made this very challenging, so I ended up rewriting it instead of submitting a patch to the original.” [R79] + + +Legal. + The motivation of legal was least popular, corresponding to only 3 of the 105 respondents that indicated phrases related to closed source. Below we present their corresponding responses. + + + + + + +“[The] main reason is creating [an] open source and commercial product which has much more features” [R7]. + This motivation detail was also categorised as: (new features, technical) and (supporting personal projects, others). + + + + + + +“5 years ago the permissions model for GitHub and Travis is not what it is today. I wanted to use Travis but if I granted Travis access to my primary github account, it would have read access to all the github repos [...], which would expose private customer code. I forked the repo [but] the permissions model has evolved [and I] deleted the fork” [R24]. + + + + + + +“The founders of the mainline had been absent from the project for several years, but came back and booted the maintainers off and [...] shifted the project to a closed source.” [R36]. The respondent provided a link with extra information showing that three of the maintainers that were booted from the original project and a fourth one from the community joined forces and are now maintaining the variant. The variant currently has over 739 stars, is used by 35 developers, has 101 pull requests and 195 issues. + + + + + + +B. Discussion and Implications + + +RQ1 + mainly focused on determining the motivations for creating and maintaining variants, especially those that are actively being maintained in parallel with their mainline counterparts. We identified that the decision to create the variants is mostly initiated by individuals and less by the community. Our observations thereby confirm the findings in the literature. Our study also extends the state-of-the-art by providing fine-grained reasons for creating and maintaining variants relating to the reported motivations. Furthermore, our study revealed new reasons that have not been reported in literature (categorised as others in our survey) which include: 1) supporting the mainline, 2) variant supporting other personal projects, 3) localization purposes and 4) variant developers not trusting the code quality of the mainline. The reported findings are very useful to guide follow-up studies in investigating the co-evolution of mainline and variant projects. + + +Fig. 3 presented an overview of how the detailed motivations relate to who is involved in creating and maintaining the variants. The motivations majorly related to developers outside the core contributors of the mainlines (82%). We also observed quite a significant number of respondents (24%) reporting that the decision to create the variant was initiated by the community. We observed from the open-ended responses that, before the transition from social to variant fork, some variant maintainers engage with the mainline maintainers through discussions in issues and pull requests. This is inline with the Zhou et al. who reported that many variant forks start as social forks [8]. + + +Besides the motivations for creating and maintaining variants, the respondents reported some interesting software reuse practices by the variants, like those categorized in the themes of: different goals, new features, customization, technology, supporting personal projects, supporting upstream, localization. A specific example of [R70] categorized in the different goals theme, stated that in the cryptocurrency world, all applications inherit code from the mother project bitcoin/bitcoin. Downstream applications also monitor their immediate upstream and other in the hierarchy for important updates like bug and security fixes as well as other specific updates. These cryptocurrency applications can be considered as a software family [21] or software ecosystem [28]. Variants are also likely to occur in other dedicated software ecosystems like Eclipse, Atom, Emacs, software library distributions for Java, C, C++, Python, Go, Ruby, and OS distributions for macOS, Linux, Windows, and iOS. To this end, our study opens up different research directions that can aim at deeply investigating different reuse practices in software families and software variants. A deeper understanding of these reuse practices can aid in developing tools that can support more effective software reuse. + + +Summary – RQ1: + Many variant forks start as social forks. The decision to create/maintain the forks is either community-driven (contributing up to 24%) or individual (76%). The majority of the developers (82%) creating the forks are not maintainers of the mainlines. We identified 18 variant creation/maintenance motivation details categorized in the motivations of technical (accounting 58% of the responses), governance (24%), others (16%) and legal (2%). The detailed motivations in the others category are newly introduced since the social coding era. + + +V. RQ2: How do variant projects evolve with respect to the mainline? + + +RQ2 + aims to identify the impediments for co-evolution between the mainline and variant projects. This question lead to two specific focuses reflecting the who and the how, respectively. The who focus aimed at identifying who are the developers involved in maintaining variants. The how aimed to understand how variant forks evolve w.r.t. the mainline. As for RQ1 we refer to the responses using underlined, italics and [R.N]. + + +A. Results for the “who?” focus + + +To understand who is creating and maintaining variant forks, we asked two multiple-choice questions: + + + + +SQ +(_b^2): How many of the original developers of the mainline maintained the variant in its first 6 months? + + +SQ +(_b^1): Do the variant and mainline have common active maintainers? + + + + +Fig. 4(a) and Fig. 4(b) summarise the answers to +SQ +(_b^2) and +SQ +(_b^1), respectively. The majority of the respondents chose the options of none for +SQ +(_b^2) (none of the creators of the variant were part of the mainline) and no for +SQ +(_b^1) (they do not have common active maintainers). This implies that +most developers involved in the creation and maintenance of variants are not core maintainers of the mainline from where the variant was forked. Fig. 3 reveals the difference in the numbers of participants who selected none for $SQ^2_a$ and no for $SQ^2_b$. Focusing at how responses of $SQ^2_a$—original developers? and $SQ^2_b$—common active maintainers? are associated, one can observe that most respondents that selected option none in $SQ^2_a$ went ahead to select option no in $SQ^2_b$. Other associations between responses of $SQ^2_a$ and $SQ^2_b$ can be observed as well. + + +Anecdotally, [R36] responded to $SQ^2_a$ that 6–10 developers from the mainline were involved in the creation of the variant, and responded to $SQ^2_b$ with the option yes & no—“They used to have common maintainers in the early stages of the variant, but now the projects have technically diverged away from each other, there are no more common maintainers”. Respondents [R51] and [R57] selected for $SQ^2_a$ the options 6–10 and 2–5 respectively, while selecting the option no for $SQ^2_b$. This implies that at least two maintainers involved in fork creation are not (or no longer) contributing to the mainline. + + +Summarising our observations for $SQ^2_a$ and $SQ^2_b$, we conclude that +variant forks are created and maintained by developers different from those in the mainline counterparts. + This observation concurs with the earlier findings of Businge et al. [10]. +---------------------------------------- +------------------------------- +Section 280: +B. Results for the “how?” focus + + +To understand how variant forks evolve w.r.t. the mainline, we asked two additional questions: + + +$SQ^2_c$: Do the variant forks and the upstream still discuss the main directions of the project? + + +$SQ^2_d$: Do the variant developers integrate changes to and from the upstream repository? + + +For $SQ^2_c$ we presented four multiple choice answer options, corresponding to the first four answers reported in Fig. 5, gathering the highest number of responses. We allowed respondents to provide an open-ended answer if they felt that their choice was not among the four proposed options. The open-ended answers were coded into themes (listed in Fig. 5 from variant follows mainline→to variant is a mirror of the mainline). Fig. 5 shows that more than half of the respondents chose the option of never (corresponding to: no, there has never been any discussion since the creation of the variant). Even if there was some discussion, 10.7% of the respondents signal that they technically diverged (corresponding to: “They used to discuss but not anymore since the projects have technically diverged from each other”). The open-ended answers also revealed variant responses that do not discuss the directions of the project, like mainline hostile to variant, not very active, in contact but rarely discuss and only once. + + +An explanation for the high number of variant developers that do not discuss with the mainline developers about the project direction can be derived from the findings of $SQ^2_a$ and $SQ^2_b$. The majority of the variants are created and maintained by developers that are not core developers of the mainline. Also, most of the motivation details in RQ1 could explain the high numbers of never. For example we observed that the majority of the variants in the motivation details category of different goals, unmaintained features in the mainline, those having issues with the mainline responsiveness, those whose features will not be accepted by the mainline (feature acceptance), selected never in $SQ^2_c$. We conclude that the reasons for the majority of variant forks not to discuss the project directions with the mainline could be attributed to a diverging range of motivations for creating the variant as well as to the variant creators not being part of the mainline’s core development team. + + +Anecdotally, 5 respondents indicated phrases related to variant follows mainline. Respondent [R77] indicated that “in the crypto world, the mainline inherits changes from BITCOIN, for example, security commits, and the variant merges those changes in. So the variant is very interested in every change in the Mainline. However, the variant must maintain the specific new features that we added separately, and the Mainline is not interested in helping the Variant do this.” We also observed two interesting cases where the variants merged back to the +mainline. This is in line with Robles and González-Barahona [2] who reported that one of the outcomes of forking is the fork merging back. + + +For $SQ_2$ we asked respondents two closed-ended questions: (1) How often do the maintainers of the variant integrate the following types of changes from the mainline?; and (2) How often do the maintainers of the variant integrate the following types of changes into the mainline?. We provided Likert-scale options for the two questions. We presented optional follow-up questions with open-ended answers, for each of the two questions, allowing respondents to provide extra information. + + +Fig. 6(a) presents the answers from the respondents on what they value most when integrating changes back from the mainline. The highly scored changes are bug fixes and security fixes. One can observe that most respondents were leaning towards the negative side of the Likert scale, implying that most variants are not interested in integrating changes from the mainline. Fig. 6(b) focuses on integrations from variants towards the mainline. We observe a similar trend to Fig. 6(a), with an even more pronounced negative inclination. + + +Fig. 6(c) and Fig. 6(d) present the coded themes of the extra information gathered from the open-ended answers corresponding to the results in Fig. 6(a) and Fig. 6(b), respectively. Fig. 6(c) summarises the results of 28 respondents who provided the extra information, while Fig. 6(d) summarises the results of only 17 respondents, most likely because most variants do not submit changes to the mainline. The most prominent response in Fig. 6(c) was related to being kept in sync, signaling the desire of variants to keep in sync with the changes made in the mainline. The next prominent response was related to occasionally pull from mainline implying that variants from time to time pull changes made in the mainline. Some respondents mentioned phrases related to specific changes are pulled; for example, [R63] indicated that “It’s mostly changes that make the library for specific iRobot Roomba models (new ones for example)”. Other respondents mentioned phrases related to everything except specific changes; for example, [R48] mentioned that “All non-compiler specific changes are pulled”. In Fig. 6(d) there were two prominent answers: PRs are suggested, for example, “Made PRs with changes but those have just been ignored. They’re still “open” with 0 comments from the mainline dev” [R67]. The other prominent answer is changes are out of scope, for example, “We use this as a dependency in another project [. . . ] which is often diverging from the language version of the mainline, so there is little reason for us to push this to mainline” [R54]. + + +C. Discussion and Implications + + +The results of $RQ_2$ revealed that variants are created and maintained by developers that are not core developers of the mainline. We also observed limited interaction between the mainline and its variant(s). Although we found there is little code integration, the integration from mainline to variant is more frequent than from variant to mainline. Our study confirms and extends the findings of Businge et al. [10]: we provide concrete reasons relating to little integration between the mainline and variants that include: + + +1) technical divergency: variants and mainlines are offering different goals, implementing different technologies, variant is maintaining a part of the mainline that is frozen; +2) governance disputes: mainlines are unresponsive to pull requests and issues from the variants and mainlines not willing or hesitant to accept some features from the variants. One respondent also reported that mainline is actively very hostile to variants as a result of mainline’s license changing to proprietary; +3) distinct developers: Another reason for the lack of code integration is because most of the variants are maintained by developers that are not part of the core team of the mainline. Furthermore, we observed that a few mainline–variant pairs that do interchange code are mostly interested in patch sets (security fixes and bug fixes). + + +Although maintenance and collaboration have improved through dedicated tooling, especially through distributed ver- +sion control systems like Git [29] and transparency mechanisms on social coding platforms like GitHub [30], these tools are only ideal for social forks which aim to sync all the changes between repositories. For example, code integration using pull requests and git tools like merge/rebase may not be the best when integrating changes in between mainline and variant forks since they involve syncing upstream/downstream with all changes missing in the current branch. + + +This study reveals that some variant maintainers are only interested in integrating commits with specific changes. A suitable integration mechanism would be commit cherry picking since the developers can choose the exact commits they want to integrate. However, GitHub’s current setup does not make it easy to identify commits to cherry-pick without digging through the branch’s history to identify relevant changes since the last code integration. Additionally, even though the variants have diverged from their mainlines, we do believe that since they share common code, some of the common code may go through maintenance to perform some bug and security fixing. Since these mainline–variant repository pairs are being maintained by uncommon developers, chances are that these fixes could be missed or they could be fixed at different times by different developers, resulting in duplicated effort. + + +Our findings are very relevant to code integration tool builders between mainline and variants to prioritise certain categories of mainline–variant pairs by targeting specific changes. Ideally, a tooling would help identify possibly important fixes in commits and recommend these commits to mainline or variant developers to support a more efficient reuse. Some promising studies in this direction have focused on providing the mainline with facilities to explore non-integrated changes in forks to find opportunities for reuse [31] and cross-fork change migration [32]. More experimental ideas have focused on virtual product-line platforms for unified development of multiple variants of a project [33]–[37]. + + +Summary–RQ2 +: Variant forks do not usually interact with the mainline during their co-evolution. The lack of interaction could be attributed to a variety of reasons including: (i) technical divergence, where variants and mainlines are offering different features or implementing different technologies having nothing to share; (ii) governance disputes, where mainlines are unresponsive to the requests from community and also uninterested in some features suggested by the community; (iii) distinct development teams that no longer interact; (iv) diverging licenses, where the mainline variant has changed the license and integration is no longer possible. As a result of these divergences, it is likely that important security or patch updates could be missed or are duplicated. + + +VI. Threats to Validity + + +Construct validity +. The response categories for the closed questions in the survey originated from a thorough literature review. The questions were carefully phrased to avoid biasing the respondent towards a specific answer. We validated the questions by consulting seven colleagues from three different universities and through trial runs of the survey with seven participants. Social desirability bias may also have influenced the answers [38]. To mitigate this issue, we informed participants that the responses would be anonymous and evaluated in a statistical form. + + +Internal validity +. We used an open coding process to classify the participants responses received from open-ended questions. The coding process is known to lead to increased processing and categorization capacity at the loss of accuracy of the original response. To alleviate this issue lack of accuracy, we allowed more than one code to be assigned to the same answer. + + +Generalizability +. Our study is limited to variants of mainline repositories that are hosted on GitHub. We do not claim that our findings generalize to other social coding platforms. In addition, the set of participants we interviewed corresponds to those who decided to make their e-mail public and who accepted to take part in our study. As such, they are not de facto representative of all maintainers of variant forks. + + +VII. Conclusions + + +Thanks to social coding platforms like GitHub, software reuse through forking to create variant projects is on the rise. We carried out an exploratory study with 105 maintainers of variants, focusing on answering two key research questions: + + +1) +Why do developers create and maintain variants on GitHub? + We observed that the motivations reported by studies carried out in the pre-GitHub era, still hold. We identified 18 motivation details for variant creation and maintenance, categorized in the motivations of technical (58% of the responses), governance (24%), others (16%) and legal (2%). Some of these motivations are newly introduced in the social coding era. + + +2) +How do variants projects evolve with respect to the mainlines? + We have found that there is little interaction between the variants and their mainlines during the co-evolution and reported possible impediments to the lack of interaction. These include: (i) technical (i.e., diverging features), where variants and mainlines are offering different goals or implementing different technologies having nothing to share; (ii) governance (i.e., diverging interests), where mainlines are unresponsive to the requests from community and also uninterested in some features suggested by the community; (iii) legal (e.g., diverging licenses), where the mainline variant has changed the license and integration is no longer possible. + + +Our findings are very useful to guide follow-up studies in investigating the co-evolution and reuse practices between mainline and variants. A deeper understanding of these practices can aid code integration tool builders in developing tools to support more effective software reuse between mainline projects and their variant forks. + + +Acknowledgment + + +This work is supported by the joint FWO-Vlaanderen and F.R.S.-FNRS Excellence of Science project SECO-ASSIST under Grant number O.0157.18F- RG43. +REFERENCES + + +[1] L. Nyman, T. Mikkonen, J. Lindman, and M. Fougère, “Perspectives on code Forking and Sustainability in open source software,” in Open Source Systems: Long-Term Sustainability, 2012, pp. 274–279. + + +[2] G. Robles and J. M. González-Barahona, “A comprehensive study of software forks: Dates, reasons and outcomes,” in Open Source Systems: Long-Term Sustainability, 2012, pp. 1–14. + + +[3] R. Viseur, “Forks impacts and motivations in free and open source projects,” International Journal of Advanced Computer Science and Applications, vol. 3, no. 2, February 2012. + + +[4] L. Nyman and J. Lindman, “Code forking, governance, and sustainability in open source software,” Technology Innovation Management Review, vol. 3, pp. 7–12, January 2013. + + +[5] L. Nyman and T. Mikkonen, “To fork or not to fork: Fork motivations in SourceForge projects,” in Open Source Systems: Grounding Research, 2011, pp. 259–268. + + +[6] J. Gamalielsson and B. Lundell, “Sustainability of open source software communities beyond a fork: How and why has the libreoffice project evolved?” Journal of Systems and Software, vol. 89, pp. 128 – 145, 2014. + + +[7] G. Gousios, M. Pinzger, and A. van Deursen, “An exploratory study of the pull-based software development model,” in International Conference on Software Engineering, 2014, pp. 345–355. + + +[8] S. Zhou, B. Vasilescu, and C. Kästner, “How has forking changed in the last 20 years? a study of hard forks on GitHub,” in International Conference on Software Engineering. ACM, 2020, pp. 268–269. + + +[9] C. Sung, S. K. Lahiri, M. Kaufman, P. Choudhury, and C. Wang, “Towards understanding and fixing upstream merge induced conflicts in divergent forks: An industrial case study,” in International Conference on Software Engineering. ACM, 2020, pp. 172–181. + + +[10] J. Businge, M. Openja, S. Nadi, and T. Berger, “Reuse and maintenance practices among divergent forks in three software ecosystems,” Journal of Empirical Software Engineering, 2021. + + +[11] A. S. Laurent, Understanding Open Source and Free Software Licensing, O’Reilly Media, 2008. + + +[12] B. B. Chua, “A survey paper on open source forking motivation reasons and challenges,” in Pacific Asia Conference on Information Systems, 2017. + + +[13] J. Dixion, “Different kinds of open source forks: Salad, dinner, and fish,” https://jamesdixon.wordpress.com/2009/05/13/different-kinds-of-open-source-forks-salad-dinner-and-fish/, 2009. + + +[14] N. A. Ernst, S. M. Easterbrook, and J. Mylopoulos, “Code forking in open-source software: a requirements perspective,” ArXiv, vol. abs/1004.2889, 2010. + + +[15] L. Nyman, “Hackers on Forking,” in The International Symposium on Open Collaboration, 2014, pp. 1–10. + + +[16] E. S. Raymond, The Cathedral & the Bazaar: Musings on linux and open source by an accidental revolutionary. O’Reilly Media, Inc., 2001. + + +[17] P. Bratach, “Why Do Open Source Projects Fork?” https://thenewstack.io/open-source-projects-fork/, 2017. + + +[18] J. Jiang, D. Lo, J. He, X. Xia, P. S. Kochhar, and L. Zhang, “Why and how developers fork what from whom in GitHub,” Empirical Softw. Engg., vol. 22, no. 1, pp. 547–578, Feb. 2017. + + +[19] D. A. Wheeler, “forking,” https://dwheeler.com/oss_fs_why.html#forking, 2009, revised as of July 18, 2015. + + +[20] T. de Raadt, “Theo de Raadt’s dispute w/ NetBSD,” https://zeus.theos.com/deraadt/coremail.html, 2006, retrieved October 2021. + + +[21] J. Businge, M. Openja, S. Nadi, E. Bainomugisha, and T. Berger, “Clone-based variability management in the Android ecosystem,” in International Conference on Software Maintenance and Evolution. IEEE, 2018, pp. 625–634. + + +[22] F. Uwe, An Introduction to Qualitative Research. London: Sage Publications, 2014. + + +[23] J. Businge, A. Decan, A. Zerouali, T. Mens, and S. Demeyer, “An empirical investigation of forks as variants in the npm package distribution,” in The Belgium-Netherlands Software Evolution Workshop, ser. CEUR Workshop Proceedings, vol. 2912. CEUR-WS.org, 2020. + + +[24] J. Businge, M. Openja, D. Kasvales, E. Bainomugisha, F. Khomh, and V. Filkov, “Studying Android app popularity by cross-linking GitHub and Google Play store,” in International Conference on Software Analysis, Evolution and Reengineering, 2019, pp. 287–297. + + +[25] J. Businge, S. Kawuma, E. Bainomugisha, F. Khomh, and E. Nabaasa, “Code authorship and fault-proneness of open-source Android applications: An empirical study,” in PROMISE, 2017. + + +[26] T. Zimmermann, “Card-sorting: From text to themes,” in Perspectives on Data Science for Software Engineering. Elsevier, 2016, pp. 137–141. + + +[27] D. Garrison, M. Cleveland-Innes, M. Koole, and J. Kappelman, “Revisiting methodological issues in transcript analysis: Negotiated coding and reliability,” The Internet and Higher Education, vol. 9, pp. 1–8, 03 2006. + + +[28] A. Decan, T. Mens, and P. Grosjean, “An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems,” Empirical Softw. Engg., vol. 24, no. 1, pp. 381–416, Feb. 2019. + + +[29] C. Rodríguez-Bustos and J. Aponte, “How distributed version control systems impact open source software projects,” in Working Conference on Mining Software Repositories. IEEE, 2012, pp. 36–39. + + +[30] L. Dabbish, C. Sturt, J. Tsay, and J. Herbsleb, “Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository,” in Conference on Computer Supported Cooperative Work, 2012, pp. 1277–1286. + + +[31] L. Ren, S. Zhou, and C. Kästner, “Poster: Forks insight: Providing an overview of GitHub forks,” in The International Conference on Software Engineering: Companion (ICSE-Companion), 2018, pp. 179–180. + + +[32] L. Ren, “Automated patch porting across forked projects,” in Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 1199–1201. + + +[33] M. Antkiewicz, W. Ji, T. Berger, K. Czarnecki, T. Schmorleiz, R. Lämmel, u. St ˘anciulescu, A. W ˛ asowski, and I. Schaefer, “Flexible Product Line Engineering with a Virtual Platform,” in Companion of the International Conference on Software Engineering, 2014, pp. 532–535. + + +[34] S. Fischer, L. Linsbauer, R. E. Lopez-Herrejon, and A. Egyed, “Enhancing clone-and-own with systematic reuse for developing software variants,” in International Conference on Software Maintenance and Evolution, 2014, pp. 391–400. + + +[35] L. Montalvillo and O. Díaz, “Tuning GitHub for SPL development: Branching models & repository operations for product engineers,” in International Conference on Software Product Lines, 2015, pp. 111–120. + + +[36] J. Rubin and M. Chechik, “A framework for managing cloned product variants,” in International Conference on Software Engineering. IEEE, 2013, pp. 1233–1236. + + +[37] S. Stanculescu, T. Berger, E. Walkingshaw, and A. Wasowski, “Concepts, operations, and feasibility of a projection-based variation control system,” in International Conference on Software Maintenance and Evolution (ICSME), 2016, pp. 323–333. + + +[38] A. Furnham, “Response bias, social desirability and dissimulation,” Personality and Individual Differences, vol. 7, no. 3, pp. 385–400, 1986. +---------------------------------------- +------------------------------- +Section 281: +“Nip it in the Bud”: Moderation Strategies in Open Source Software Projects and the Role of Bots + + +JANE HSIEH, Carnegie Mellon University, USA +JOSELYN KIM, Carnegie Mellon University, USA +LAURA DABBISH, Carnegie Mellon University, USA +HAIYI ZHU, Carnegie Mellon University, USA + + +Much of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. This study examines the various structures and norms that support community moderation, describes the strategies moderators use to mitigate conflicts, and assesses how bots can play a role in assisting these processes. We interviewed 14 practitioners to uncover existing moderation practices and ways that automation can provide assistance. Our main contributions include a characterization of moderated content in OSS projects, moderation techniques, as well as perceptions of and recommendations for improving the automation of moderation tasks. We hope that these findings will inform the implementation of more effective moderation practices in open source communities. + + +CCS Concepts: • Human-centered computing → Open source software; Empirical studies in HCI; Empirical studies in collaborative and social computing. + + +Additional Key Words and Phrases: moderation, automation, coordination, open source + + +ACM Reference Format: +Jane Hsieh, Joselyn Kim, Laura Dabbish, and Haiyi Zhu. 2023. "Nip it in the Bud": Moderation Strategies in Open Source Software Projects and the Role of Bots. Proc. ACM Hum.-Comput. Interact. 7, CSCW2, Article 301 (October 2023), 29 pages. https://doi.org/10.1145/3610092 +---------------------------------------- +------------------------------- +Section 282: +1 INTRODUCTION + + +Online social coding platforms such as GitHub facilitate the production of open source software (OSS), which modern digital infrastructure relies heavily upon. However, excess volumes of issues and requests filed by users can overload volunteer project maintainers [86]. Aggravating the situation, open source developers can become toxic and hostile in the course of technical or ideological disagreements [26]. Incivility suppresses productivity, creativity and quality in the workplace [84], and for semi-professional software production platforms like GitHub, such misbehaviors have caused growing concerns over the mental well-being of contributors and maintainers [73]. + + +Moderation, as a non-technical form of labor, is a necessary but often overlooked and understudied task that maintainers undertake to sustain the community around an OSS project. To date, it is not well understood how maintainers grapple with toxic or undesirable behavior on their projects, particularly at scale. Research has described the different types of conversations around code... +contributions [104] and categorized toxic content as insults, trolling, as well as displays of arrogance and entitlement [26, 75]. At the same time we know that responding to issues and pull requests is an important part of maintenance work in open source [34, 43]. As Geiger points out, maintainers must delicately navigate instances where there is a mismatch between the work required to merge a contribution and non-maintainers’ desires to integrate a certain piece of functionality [43]. We also know that responses and interactions around public conversations on OSS projects in GitHub are an important signal to potential contributors and users of a project, underscoring the importance of dealing with toxicity [30, 85]. + + +A growing body of research in CSCW examines how users moderate their own content in online communities and increasingly leverage automation to more efficiently control bad behavior [66, 93, 106]. These studies describe the challenges of moderation in different platforms (e.g. [56, 58]), explores novel moderation techniques and tools (e.g. [21, 22]) and examine the effectiveness of different moderation behaviors and strategies (e.g. [92, 93]). For example, Jhaver et al. find that moderation transparency matters - offering removal explanations on Reddit reduces the likelihood of future post removals [55]. In [66], Lampe and Resnick observe that timeliness trades off with accuracy in distributed moderation systems like Slashdot. And while automated moderation systems scale well in removing obviously undesirable content (e.g. spam and malware links), Chancellor et al. note how they can magnify errors [20], making human decisions more preferable for nuanced [19, 54] and high stakes contexts [47]. + + +But the online social media communities and platforms studied in much of the moderation research vary in three important ways from open source software development. First, social media forums and text-based discussion groups are typically informal public spaces where people gather to share compelling and interesting information, converse, as well as build communities [81], whereas open source communities aim to collaboratively produce software, which can entail complex organizational structures and highly technical discussions tied with code artifacts and software that is utilized for professional purposes [14]. Secondly, each individual contributor’s activities on GitHub have implications for employment prospects and reputation, both within OSS and the professional community more broadly [1]. Unlike their peers on discussion groups like Reddit, where participants are pseudonymous or anonymous, a large portion of GitHub users are real name identified, and often their accounts are listed on personal CVs or resumes [29, 95]. Finally, the types of inappropriate behaviors and harmful content that present in OSS communities diverge from what is traditionally found in social media. Past work has uncovered that passive-aggressive behaviors such as name-calling and entitlement are more prevalent among conversations between OSS developers [36, 75], and our findings support these results. Thus, the distinctive user goals, behaviors and inappropriate content found in OSS communities might necessitate the adoption of unconventional moderation strategies. + + +In this study, we qualitatively examine community moderation in open source repositories, that is existing strategies, structures and techniques used for mitigating and preventing inappropriate activity and conversation. Moderation here includes activities to manage behavior in conversations around issues and code contributions as well as the code itself (e.g. the use of potentially offensive variable names). Specifically, we sought to answer the following research questions to investigate moderation in open source communities: + + +Research questions: + + + + +What does moderation look like in OSS? + a. Who performs moderation actions in projects and in what capacity + b. What strategies do moderators use to respond to, diffuse and prevent conflicts? + + + + +1 choices that may lead to disinhibition online [67, 99, 111] +(2) What are the current limitations of automation for moderation and potential future improvements? + + +In order to address these questions, we conducted interviews with 14 maintainers across 10 projects to identify how moderation actions are performed on projects of different scales as well as attitudes towards algorithmic support for toxicity moderation and prevention. We find that 1.) moderation in open source is conducted by different roles depending on the size and structure of projects and 2.) moderators leverage several strategies to mitigate and prevent emergent conflicts and 3.) that future efforts will need to address concerns around customizability and detection accuracy before deploying automation tools to help offload the labor of moderation. By documenting the structures and forms of labor performed around moderation within open source projects, we hope to enlighten future practitioners on the available strategies for moderating digitally-mediated software development contexts. By characterizing the potentials and limitations of automation tools for moderation, we support practitioners in understanding and anticipating the challenges and impacts of adopting such automation. We also encourage tool designers and developers to build on these findings, so that future tools for moderation can provide improved and wider services to open source community members. +---------------------------------------- +------------------------------- +Section 283: +2 BACKGROUND + + +Software development is a product-oriented and collaborative endeavor, making open source development environments semi-formal working spaces that expect professional conduct from their participants. During the development process, project collaborators may encounter a myriad of technical and interpersonal conflicts that impede their work. In the following we present our study platform, notable types of open source conflicts that prior work and our participants have reported, as well as relevant prior work on moderation and automation for toxicity detection in various online communities. + + +2.1 Study Context: GitHub + + +We focused our data collection on open source software projects hosted on the GitHub platform. GitHub facilitates collaboration and communication among developers, users and owners of software projects [30, 71], and is arguably the most popular hosting site for projects. As of June 2022, GitHub reports having over 83 million developers [5] and more than 200 million repositories (including at least 28 million public repositories). + + +Projects on GitHub are organized into code repositories (or repos for short) which can be owned by a personal account (usually the creator or another maintainer) or by an organization, which comprises multiple users. Collaborators to the repository have direct write access to make commits, and they work together with the owner to maintain the project. Users primarily consist of software consumers, and they can star repos to express interest in a project or to save it for later reference. Within each repository, contributors and users can plan work, track bugs, request new features or express maintenance concerns by creating issues [71]. When an external (non-collaborator) developer has changes to propose, they can submit a pull request – a special issue for posting code contributions so that others can review and integrate them into the existing codebase. However, a pull request requires approval by one or more authorized collaborators before it can be merged. To communicate about developments, collaborators can comment under issues, pull requests as well as lines of code. +2.2 Conflicts and Incivility in Open Source + + +Open source project maintainers are responsible for tremendous amounts of unseen civic labor that underlies our digital infrastructure, and many have documented how overwhelming volumes of such invisible labor engenders harm to the mental well-being [33, 68]. Maintainers are seldom recognized sufficiently for their stewardship, causing individual stress and burnout [86], imperiling projects for undermaintenance, and threatening the overall sustainability of the open source ecosystem [26, 86]. Due to factors like a lack of corporate management structure and geographic dispersion, open source maintainers are required to undertake a plethora complex interpersonal and organizational work [43]. Community maintenance tasks include providing support to internal contributors as well as technical assistance to external users so they can make use of the product. Previous investigations have found such organizational and interpersonal labor to play a critical role in traditional software engineering contexts [74, 80, 101]. Due to the fully public and largely voluntary nature of discussions and actions in open source development, moderation is one necessary task that maintainers must undertake to avoid an overwhelming amount of negative content and harmful interactions. + + +Prior work extensively documented the presence of incivility, conflict and in general negative emotions across multiple actions of open source development, including code reviews [10, 11, 16, 31, 32, 36, 87], issue discussions [37, 75] as well as in comments to these actions [41, 50]. Negative interactions occur among different members of the community (e.g. core collaborators, external contributors as well as maintainers across different projects), and stem from multiple grounds, ranging from language and cultural differences to political disagreements to personal feuds to software dependencies to mismatches in expectations [38, 43, 71]. Conflicts among internal contributors can be difficult to moderate - since organization members cannot ban each other and interventions between familiar and respected contributors can get tricky - but politically charged misconduct from external and banned members can be harmful as well. Incivil behaviors present in the semi-professional volunteer software development environment endanger the sustainability of open source by decreasing the intrinsic motivation of contributors, reducing their productivity, and heightening dropout rates of newcomers [63, 76, 84]. Rather than categorizing the types of conflict in open source (the focus of [26, 38, 75]), one aim of our study (via RQ1) is to characterize strategies and structures that maintainers use to moderate such incivil situations. + + +While incivility originating from internal contributors of a project has been well-studied [10, 11, 16, 31, 32, 36, 87, 102], frustration that follows from unrealistic expectations of user support can cause toxic insults [75] and entitlement directed at maintainers, demanding their time and attention [43]. User support involves providing assistance to consumers of the software who have difficulties making use of it either because of existing defects or the consumer’s misunderstanding of some aspect of the software [65]. Swarts identified usability and transparency issues as causes of user needs in open source [100]. As projects scale, user support becomes a tedious task, overwhelming maintainers with issues and requests, demanding their time and emotional labor [43]. Unlike commercial vendors that generally rely on institutional infrastructures such as paid and dedicated IT or tech support teams, open source software provides informational user support free of charge, via a small group of volunteer users and maintainers [65]. + + +2.3 Governance and Moderation + + +The non-technical labor of moderation is often overlooked but essential for understanding the infrastructure of open source [33, 69, 88]. According to Grimmelmann, moderation consisted of “governance mechanisms that structure participation in a community to facilitate cooperation and prevent abuse” [48]. For the context of open source, we define community moderation to be the set of activities that maintainers and designated moderators leverage to manage behavior in conversations. +around issues, code contributions, and code itself in an effort to minimize harmful and abusive activities and to foster a collaborative and welcoming environment for contributors. + + +Much like their social media (e.g. Discord [58], Reddit [23, 62], Twitter[57]) and peer production contemporaries (e.g. Wikipedia [12, 40, 44]), GitHub communities engage in volunteer-based community moderation, as opposed to platform-wide commercial moderation. Such voluntary nature of moderation and maintenance in open source forces members of the community (e.g., maintainers or volunteer contributors) to bear the responsibility of providing support and assistance to users. But unlike support providers of commercial software products, the services of volunteer contributors are uncompensated [65]. To exacerbate their workload, prior studies reported that maintainers found user support to be an “overwhelming and never-ending chore, particularly for projects that use GitHub-style collaboration platforms” [43]. The staggering volume of demands for user support and feature requests on GitHub’s issue-posting mechanisms demonstrates an instance of overuse – a form of deviant behavior among Grimmelmann’s categorization of abuses that leads to congestion and cacophony, making it harder for information to get through and thereby hindering users’ information search and retrieval processes [48]. + + +Existing systems of platformic content moderation have been found to vary in terms of actions, styles, philosophies and values. In a systematic review that engaged 86 papers related papers, Jiang et al. described such tradeoffs and compared the various moderation techniques with Grimmelann’s four broad categories [59]. These included exclusion – the act of depriving people access to the online community, often through bans or timeouts, organizing – consisting of measures like removing and annotating content, norm-setting – a practice of issuing warnings or “indirect policing” to denounce bad behavior as well as monetary pricing – a way of using market forces to raise the prices of participation on users – though social media users were not found to engage with this last category [48, 59]. In a study of volunteer moderators on Reddit, Facebook and Twitch, Seering et. al. showed how moderators used excluding and norm-setting actions (e.g., bans and warnings) at increasingly restrictive rates and relied heavily on general community members to report and flag misbehaviors [94]. While the actions of excluding, organizing and norm-setting may be transferable to open source moderation, we expect that the distinct forms of inappropriate content might motivate the adoption of other unique strategies and governance structures. We sought to characterize the moderation structures, norms and roles involved in open source via RQ2. + + +While some past work examined conflict management strategies for peer review [52] and the emergence of early governance structures on GitHub [82], we lack knowledge around the specific strategies that maintainers use to moderate inappropriate and problematic behaviors in open source. Among the many forms of intervention techniques available for such purposes, Renee et al. investigated how the code of conduct, a document that “defines standards for how to engage in a community... signals an inclusive environment that respects all contributions ...[and] outlines procedures for addressing problems between members” [6] is used for moderation. Other moderation tools include documents such as contributing guidelines (“which provides potential project contributors with a short guide to how they can help with your project” [9]), moderation policies, or in-house features such as bans, the locking of conversations [8]. However, Geiger et al. uncovered that contributors are not as intrinsically motivated to engage in non-technical maintenance work (e.g. community support and documentation) as they are to complete more technical tasks (e.g. feature implementation or debugging) [45], indicating a need for more comprehensive and higher-level strategies for conducting moderation for complex situations and interpersonal conflicts. Maintainers can be especially discouraged to perform moderation work since it has been found to cause psychological and emotional distress [98], and automated assistance to moderation can be an + + +2though GitHub did develop a set of platform-wide Acceptable Use Policies [7] +appealing solution, with the potential to minimize maintainers’ time and labor on tedious tasks and increasing developer productivity [35, 107]. However, there exists a gap in our understanding of how OSS moderation is executed in practice, both in terms of strategies as well as the roles and structures that are established to support and facilitate moderation. In this study, we qualitatively investigate such infrastructures and approaches, as well as uncover maintainers’ perspectives on how automation can support moderation. + + +2.4 Automated Moderation Bots in Open Source + + +Sentiment Bot and Safe Space are examples of tools that leverage existing sentiment analysis models to help maintainers detect and regulate the existence of toxic comments on GitHub. The Sentiment Bot is a GitHub App built with GitHub’s Probot framework that “replies to toxic comments with a maintainer designated reply and a link to the repo’s code of conduct” [4] while Safe Space is a GitHub action that leverages TensorFlow’s toxicity classification model to “detect potential toxic comments added to PRs and issues so authors can have a chance to edit them and keep repos a safe space” [3]. Both of these bots make use of machine learning classifiers to detect toxic content within pull request or issue threads and respond back with a comment that urges the original author to modify or delete their comment whenever problematic content is detected. Underlying such tools are sentiment analysis detectors, and numerous models have emerged in the field of software engineering to improve accuracy and domain specificity of such models. These include classifiers of negative interactions trained on conversations surrounding issues [41, 61, 78, 86, 89], code reviews [10, 16, 37], commits [50, 51], codes of conduct [97] as well as data from other contexts such as IT support [15] and Stack Overflow [18]. + + +However, bot use in open source contexts has its own associated challenges. Wessel et al. found that bot-generated noise (in the form of verbosity or excessive/undesirable tasks) causes annoyance for contributors, disrupts their workflow, and creates additional labor for maintainers [108]. Meanwhile, Huang et al. discovered how contributors react negatively to automated encouragements [52]. Outside of open source, Jhaver et al. described how subpar removal explanations provided by bots on Reddit brewed community resentment [55]. In voice-based communities like Discord, bots faced challenges in identifying rule violations based on nuances such as tone and accent, despite the widespread adoption of bots to automate features [58]. Jiang et. al. highlighted the tradeoff that while automation help communities achieve moderation at massive scales and with faster turnarounds, human involvement is required to understand contextual nuances, provide clear removal explanations, and conduct negotiations around norms that contribute toward community building [59]. Moderators of the three platforms that Seering et. al. studied also expressed the desire to personally deal with harder, more nuanced situations, despite being content to have automated tools deal with the most egregious and unwanted content — the authors argue that these desiderata are motivated by moderators’ inclination to make context-specific judgments and impact community development [94]. Smith et al. identified community values related to the design and usage of machine learning-based predictive tools in content moderation on Wikipedia [96]. + + +In open source, maintainers’ and moderators’ stances toward automation are likely to differ as open source contributors are more habituated to using tooling for increasing productivity and efficiency, whereas the efficiency of moderation has been found to trade off with quality [59, 66]. The second part of RQ2 aims to provide insights on how well current moderation bots support human maintainers in open source contexts and what improvements are needed to reduce friction and concerns in adoption. +3 METHOD + + +To learn how maintainers and moderators maintain their communities, we interviewed 14 individuals who moderate or maintain projects of varied sizes ranging from 500 to 87,000 stars on their repos and 30 to 4,000 contributors. Before beginning the interview and recruitment process, we obtained institutional IRB approval and debriefed participants on the type of questions to expect prior to starting the interviews to ensure ethicality. + + +3.1 Recruitment + + +Participants were recruited through publicly available information on GitHub. The requirement criteria for the participants were that they had to be 1.) at least the age of 18 years old and 2.) either a current or past maintainer or moderator for a collaborative open source project. We started recruiting participants by emailing owners of repos that used moderation bots. But we soon realized that most of these owners had limited moderation experience, since their bot setup resulted from a forked template project. We expanded to recruiting from repos with designated moderation teams or contributing guidelines using search terms such as “moderating” or “moderation team” on GitHub, and also conducted snowball sampling by asking participants to refer us to other potential interviewees. If the maintainer’s contact information was public, we requested an interview via email. Of the 40 potential participants we emailed, 14 agreed to take the interview, one of whom was female – the proportion of women involved in this study is on par with the overall representation of women in open source (which is below 5%) [103]. We concluded the recruiting process when the addition of participants stopped generating new emergent themes – signaling theoretical saturation [28]. Table 1 displays a summary of participants’ projects, their respective roles, as well as descriptive project information. + + +3.2 Interview Protocol + + +We started the semi-structured interviews by following a protocol of scripted questions, which included questions about negative and positive interactions, detection and moderation strategies, codes of conduct, and bot use. From each category of questions, our main goal was to learn what strategies maintainers used to respond to negative interactions such as violations of codes of conduct and issues with bot usage (after we introduced the Sentiment Bot). Specifically, we inquired about the responsibilities of moderating members, expected norms and behaviors of a community and whenever applicable, their resolution strategies for disruptive behaviors in the past, and how they set precedents for future incidents. Each interview lasted 30-60 minutes and participants were compensated $15 for their time via PayPal or a donation to a charity or organization of their choice. + + +3.3 Analysis + + +Using interview recordings and transcripts, a team of two researchers engaged in a bottom-up, thematic analysis of the interviews. The experience of this team in open source contributions ranges from novice to knowledgeable. We adopted a thematic analysis approach to analyze the transcribed video recordings, and followed a shared open coding session to calibrate coding granularity. The first two authors developed the initial lower level codes for each participant’s data and synched weekly to resolve disagreements. After resolving any disagreements amongst the coders, we conducted a bottom-up affinity diagramming process to iteratively refine and group the resulting 375 unique codes into 32 first-level themes, which were then clustered into four main themes that we present below. +| ID | Project Pseudonyms | Project Area | Role on Project | # Contributors | Stars | +|----|-------------------|--------------|----------------|----------------|-------| +| P1 | Honeysuckle | Visual diagramming platform | Maintainer/Contributor | >20 | >5k | +| P2 | Receptive | Differential Privacy Library | Maintainer/Contributor | ∼50 | >300 | +| P3 | Apex | Runtime Environment | Moderation Team Members | >3k | >85k | +| P4 | JaguarAPI | Web framework for building APIs | Owner/Founder | ∼300 | >40k | +| P5 | Grunge | Programming Language | Designated Moderator | >3.5k | >65k | +| P6 | Hyundai | Alternative firmware | Designated Moderator | >200 | >17k | +| P7 | Vessel | Container management | Moderation Team Member | >3k | >87k | +| P9 | | | Owner/Founder | >200 | >500 | +| P10 | Silverback | Object Storage | Owner/Founder | >90 | >9.5k | +| P11 | Community Manager | | Owner/Founder | >80 | >400 | + + +Table 1. Participant Summaries. Project details extracted to preserve anonymity. All references to projects are by pseudonyms. +---------------------------------------- +------------------------------- +Section 284: +4 RESULTS + + +We start by characterizing types of inappropriate behaviors that moderators observed and monitored, separating the common types of rule violations found in other domains from the more implicit forms of conflicts that emerge from the technical development environment of open source. Next, we describe the types of moderation roles and structures that individuals or groups assume or set up in order to more effectively address and govern misconduct. We then discuss the specific strategies that moderators use to react to, address and prevent misbehavior and incivility. Finally, we summarize maintainers’ stance around the adoption of tools to automate moderation, highlighting various concerns such as over-censorship, technical incapabilities, as well as limited customizability. + + +4.1 Inappropriate behaviors in OSS + + +In most well-studied online communities, intolerable behaviors largely comprised of deliberately abusive and disruptive misconduct such as harassment and hate speech [56, 58, 94]. However, in the context of open source, explicitly inappropriate behaviors are accompanied by more subtle acts and borderline behaviors such as miscommunication and resistance against new practices. When we inquired about moderation, many maintainers brought up strategies they used to respond to and mediate miscommunication, as well as ways of organizing and curbing the excessive volume of demands. While prior works categorizing toxic behaviors on GitHub have also uncovered less severe misbehaviors such as technical disagreements and arrogance [75], we make the distinction between the clearly disruptive content that are detectable by toxicity classifiers and the more covert forms of incivility that require human judgment to identify. In the following subsection we outline some of the more disruptive acts of misconduct (e.g., hate speech, snarky humor) as well as more subtle forms of misbehaviors that OSS moderators observed and guarded against, and follow with strategies they leveraged to address these in 4.3 Moderation Strategies. + + +4.1.1 Explicitly Aggressive and Disruptive Behaviors. The first class of misbehaviors consisted of explicitly harmful or ill-intended content. We start off by presenting to misconduct that are obvious (e.g., spam) and egregious (e.g. hate speech, harassment) and follow with examples of more +concealed (but still harmful) forms of hostility, which include passive aggressiveness and snarky humor. + + +Spam, hate speech and harassment. + + + + +Much of P8’s job as a moderator consisted of “moderating spam users”, which include instances of a “bot that’s leaving nonsensical comments, opening garbage pull requests that are wasting people’s time”. Even in smaller projects such as Hyundai, “spammers come with things (political) that doesn’t have anything to do with Hyundai, they occur twice a year” (P6). + + +Hate speech like “someone coming in and saying ‘why are you people so stupid’, or worse than that” can happen but fortunately “those are very spotty” (P8). In one case, a banned member threatened to “send collaborators bombs” and afterwards “he got arrested, like by the FBI, because he made bombs in his house” (P8). + + +While commonsense rules like “no sexual harassment or no discrimination” seems obvious, P4 pondered how “in some cases it has to be very explicitly stated, because the people that violate those things are probably the people that wouldn’t guess that”. + + + + +Passive-aggressiveness & snarky humor. + + + + +Both destructive and contagious [77], “passive-aggressive comments” sadly did present in OSS contexts. They include arrogant “things like ‘I have been working for 10 years 20 years and I had never seen a solution like what you’re proposing’ – something that is not very exclusively saying what you’re proposing is dumb, but . . . kind of implicitly saying you’re inexperienced . . . in a very, very hidden way” (P4) or demeaning insults such as “can’t you ask an intelligent question” which P4 reports as content that “we often get within questions threads”. + + +In a similar vein, snarky humor is also advised against because “it’s so easy to offend someone with that” because “it’s really hard to convey what you mean while being snarky on the internet, where nobody can see your face” (P5). + + + + +Entitled demands & heated complaints: + + + + +Users and contributors who felt entitled to receive responses can take up a significant amount of maintainers’ time – “the thing that makes most of the time is the questions, issues” and “80% of the time, or like 90% it will be just like a feature request or a question or [demand for which] like ‘I’m never really in the user scope’ ” (P4). + + +While some requests are easy to address (e.g., simple questions and feature requests), others can get quite heated: “[One user complained that] Hyundai was not good because it wasn’t working on their device (person didn’t read documentation) . . . It started off aggressive, and ended up with the user complaining the documentation wasn’t good enough” (P6). Ironically, “in many cases [it] is just like errors in the code of the developer (who’s asking) and they didn’t realize” (P4). Yet someone must attend to the issues because “If you ignore people they get more mad . . . and they act out more and more.” (P11). And the problem with entitled comments isn’t the comment itself, “it’s the knock on effects of that comment . . . other people will see that and think it’s okay to behave that way . . . [and] feel more entitled because they’ve seen entitlement be normalized” (P3). +---------------------------------------- +------------------------------- +Section 285: +4.1.2 Misunderstandings, technical disagreements, and resistance against new practices. + + +In contrast to explicitly aggressive misbehaviors, our moderator participants also reported monitoring for more subversive disagreements and misunderstandings that arise from the technical and collaborative nature of OSS projects. +Aside from intentional misconduct, “many times bad behavior is just misunderstandings” and “it boils down to like miscommunication and not understanding the issue like people talking past each other and people getting a little bit heated” (P9). According to P13, miscommunications occurred frequently: “If you dig into old threads, you see a lot of them are full of miscommunication and people shouting over each other about who should have had dealt with what”. + + +Technical disagreements are easy to surface in development environments, because “sometimes people simply get riled up, they have an idea of what is right or wrong and someone else has a different idea, which in tech can happen” (P5). In one instance when people did get heated after “a disagreement with the licensing of the code” which was “from another project library”, some contributors unfortunately “felt [the need to use] ‘accusatory’ language”. + + +Technical projects often need to adopt new pipelines and packages to keep up with recent updates and practices, but sometimes new standards are met with “resistance initially, usually because of large changes such as build pipelines” (P2). So first they must “get through the transition period” (P3), but “over time there’s acceptance” (P2), “and the new norms will just be the way it is, and everyone will be horrified that it used to be worse” (P3). + + +4.2 Moderation Roles and Structures + + +While open source communities were once perceived as decentralized and bazaar-like, emergent governance structures form over time [13, 64, 82]. Maintainers in our sample employed a plethora of strategies to overcome interpersonal and technical challenges of social coding. Depending on the size of the project or organization, maintainers varied their governing structure and strategies. Specific moderation actions were performed by members of the community, a moderation team, or maintainers themselves. The most basic form of moderation involved contributors performing self-censorship. After that, volunteer moderators described how they reported potentially harmful content and actions to maintainers or formal moderation teams. In the following sections we describe how participants in our sample described collaboration between different roles and governing powers to conduct moderation together. + + +4.2.1 Self-moderation and Volunteer Moderators. When a particular individual violated the community rule or norm, self-moderation constituted the first line of defense. Unlike the broader term of community self-moderation that Seering proposes [91], we consider self-moderation to be the individual self-corrective action of the author to edit and fix their own content, regardless of who first noticed the questionable content. In the case of large projects like Apex, maintainers may instate “an explicit policy to ask organization members to self-moderate, such rules that ‘allow [maintainers] a way to say: ‘if you just made a mistake, and you apologize and don’t do the behavior again, you’ll be fine’. . . in a way that displays those norms for the community” (P3). + + +Member status affected who received self-moderation requests – when the original author was “not a[n internal] collaborator, then the moderation team can just summarily do what we decide is the best” (P3). However, when internal organization members exhibit problematic behaviors, “then the first thing [we are required to do] is to always ask them to self-moderate” (P3). In requesting self-moderation from contributors, maintainers asked for specific actions like editing or deleting the offensive comment, so as to avoid public shaming directed at the author or other escalations. + + +Since social coding platforms like GitHub are working environments for producing software, team members are expected to treat each other with civility. So even if “You don’t have to like each + + + + +3Acts of self-moderation erases many public records of accidentally posted harmful content, thus we suspect that the practice is prevalent but often undetected. While Apex was the only project reporting self-moderation, it is also one of the most established OSS projects. Hence we expect self-modertating to appear in other projects as well and encourage future work to explore the detection and frequency of self-initiated moderation. +other . . . you have to be professional” (P13). Therefore, when P13 asked contributors who harbor negative feelings toward each other to “to self-moderate, . . . they did” and in general “people . . . are usually regretful that the comment was hurtful . . . [and] will be eager and happy to self-moderate.” + + +But in some cases, uncooperative contributors refused to conduct self-moderation, and one cause of this behavior was a difference in cultures. In one case, P13 asked for self-moderation by posting a request along the lines of “Hey this comment is perceived as . . . problematic, can you please consider self-moderating it”. If the recipient is from the US, then they would understand that – “you’re really telling them to do that”. But “in Israeli culture, it’s perfectly acceptable for them to say ‘No, I considered it and I think I have a better understanding than you’”. When contributors refused to cooperate, moderators escalated to more direct measures to intervene, which we cover in 4.3. + + +To delegate some of their responsibilities, maintainers of more popular libraries such as Apex distributed moderation work by relying on community members: “most of the time somebody reports it . . . they can surface it [by] say[ing] ‘hey check out this’ in a moderation repo that’s private to org members”. While maintainers would prefer to hide contentious content from contributors and users - “in an ideal world, we don’t require somebody to report it before we fix it”, some of it inevitably goes undetected in larger projects: “there’s a scalability thing there”, and community reporting can serve as a “a useful filter to prevent all of our time being taken up by hunting down problems” (P3). + + +4.2.2 Formal Moderation Teams. Larger and more mature projects designated particular volunteer members from the community to form an official moderation team for the organization. The Apex moderation team, for example, consisted of “8 to 10 people, 5-6 who are regularly active” (P13). Moderation team members were self-nominated and the role is not even exclusive to contributors- “any member who is on the project. . . [can] say ‘hey I want to be a moderator’, and if nobody objects for seven days, they join the team”. Team members are recertified annually by a Technical Steering Committee (TSC), which guided and advised the organization with higher-level directives. + + +Among the ten projects we interviewed, six had designated moderators, all of whom were appointed due to existing demand. For instance, when P12’s moderately-sized “project first started getting popular”, he had “no clue how to moderate”. But growing attention eventually convinced him to assign moderators: “people were demanding moderators - so very quickly I had to choose a moderator . . . [and these] moderators [would] tell people to calm down and most people are respectful”. + + +Maintainers often encouraged contributor interactions with moderators to help offload their maintenance responsibilities. In one case, P5 of the popular programming language Grunge would tell users “If you have a question about anything that disturbs you or that you may think has disturbed others, contact the moderators”. P7 of the mature project Vessel also reports how they often redirect users and contributors to “talk to a moderator on Slack” whenever the community had questions and doubts around governance actions such as instituted bans, so that moderators can provide them the appropriate explanations. Even for more nascent repos such as Silverback, P11 explicitly “set up community values that proactively explains to people what the community will look like”, that way “if someone is blocked, and they don’t know why they were blocked, or they think they should be unblocked, they know where to get in touch with us.” (P8). + + +Outside of moderating, members may be responsible for onboarding tasks such as taking a training course: “It was an online course, we went on to Zoom (the whole team) for like a few weeks and we did a training. We should probably do another round [of refreshers] because some people joined ” (P13). And since the labor of performing moderation actions (e.g. providing explanations) is can be draining [59], it is a moderator’s own responsibility to self-assess and take breaks to avoid burnout: “Moderation is something I do for a while, I stop doing it for a while, I do for a while, I stop doing it for a while, ’cause I burn out.” (P8) +4.2.3 Power Sharing Structures. While a moderating team members hold the power to execute governing actions (e.g., interaction limits or bans), they also experienced power restrictions. Restrictions typically originate from higher-up governing bodies such as the Technical Steering Committee, but efforts to decentralize and democratize moderation also encourage community members to call review and call out and misjudgments of moderators. + + +Technical Steering Committees (TSCs) tend to appear only in larger projects (only 3 of the 10 projects we interviewed formed one – Apex, Vessel and Grunge), where the sizable number of internal project members calls for top-down governance. In Apex for instance, P13 was blocked from directly removing an internal member because “once you’re a collaborator you can’t really be removed”. In order to remove an internal collaborator, “the Technical Steering Committee needs to vote . . . we [the moderation team] wouldn’t [typically] remove a collaborator” (P13). + + +The TSC shoulders many technical and governing responsibilities, serving as “the unifying factor” of the project (P13). But the TSC also exhibits “a very strong bias towards inaction, by design . . . because . . . making the wrong technical decision is a lot riskier than not making technical decision.” Finally, the TSC also consists “a lot of people who are very technical, [so] they don’t like dealing with interpersonal issues” (P13). The combination of limited bandwidth, composition of technical members, and tendency toward inaction means that the TSC is slow in approving requests for actions like removing collaborators. As a result, maintainers of larger projects eventually “determined that we needed a separate body from the Community Committee and the Technical Steering Committee to handle these [governance actions] because membership on the TSC does not mean you have any idea how to handle a code of conduct report” (P8), leading to the formation of official moderation teams in project Vessel. + + +The TSC holds powers above the moderation team (e.g., the ability to remove internal collaborators), and moderators must additionally “do a weekly report to the TSC about what moderation actions have happened . . . [to adhere to] . . . our governance documentation” (P8). In addition, moderation teams set up structures to also encourage project members to check their judgments as well, so as to ensure a more democratic distribution of moderating powers: + + +“We always invite people to call the other mods to check that we are actually right because we get it wrong. Because otherwise if we wouldn’t have rules to follow then it would be, well ‘this mod didn’t like my nose so he banned me’ ” (P5) + + +4.2.4 Reporting Mechanisms. To support the reporting of misconduct from volunteers, moderators of larger projects set up “a private moderation repo” so that “collaborators (∼500-600ish of them)” can “open issues there to notify the moderation team that ‘here it’s something that . . . you need to look at’ ” (P8). These moderation repos for community reporting works for larger projects because “for very contentious topics [in] issues and pull requests (which happen occasionally in most projects), someone will notice and surface it even though there’s nothing bad yet”. In addition to providing a centralized place for members to submit reports, this strategy enables moderation team members to “start subscribing to it and jump really quickly when something happens”. + + +In addition moderation efforts from the community, reports to GitHub constitute another avenue for escalation if moderators don’t have the power to edit particular posts or close specific user accounts. For instance, P8 relates how “There are definitely some blind spots and missing parts . . . certain types of comments you can’t edit or delete . . . that’s a bit of a problem. We have to contact GitHub if it gets really bad – but if it gets really bad you just report the user and eventually all their stuff gets deleted because GitHub just deletes the user”. Spammers who occasionally attacked the mid-sized project Hyundai were dealt with in a similar way: “spammers that come with things (political) that doesn’t have anything to do with Hyundai, they occur twice a year and we have to delete/close issues. They are also reported to GitHub, who may close their accounts”. + + +Proc. ACM Hum.-Comput. Interact., Vol. 7, No. CSCW2, Article 301. Publication date: October 2023. +| Moderation Strategy | Description | Example Actions | +|---------------------|-------------|-----------------| +| Punitive | Reactive measures taken to eliminate harmful content and prohibit interactions that cause rapid and excessive negative engagements. Used in situations when someone acts in a clearly outlawed manner or activities that cause high levels of community response | Hiding/deleting comments, bans, interaction limits, locking conversations, calling out bad behavior. | +| Mediations | Diplomatic interventions taken to resolve small-scale misunderstandings and agreements. Used for disagreements between a small number of (usually internal) contributors. | Correcting misunderstandings, forming negotiations. | +| Preventative | Inhibitory: Precautionary measures used to prevent the development and further escalations of conflicts. Used in situations that maintainers perceive to have the potential to escalate, such as expressions of indirect hostility, inside jokes, belittling comments | Issuing warnings, calling out behavior that are perceived to have potential to escalate. | +| | Proactive: Setting up rules and workflows to avoid the repetition of similar mistakes and future user/contributor frustrations. Used after repeated offenses. | Setting up private moderation repos, codes of conduct, linters, templates, topic-specific channels | +| Reformative | Educational approaches to rehabilitate misbehaviors and set up acceptable standards. Used after unintended neglect of rules or repeated violations by multiple members. | Offering explanations, polite admonishment. | + + +Table 2. Summary of Moderation Strategies + + +4.3 Moderation Strategies + + +In an ideal world, maintainers should not have to monitor and respond to negative interactions. But despite their best intentions, contributors did end up engaging in heated conversations that escalate quickly out of control. When such unexpected situations occurred, maintainers reacted by utilizing a set of existing tools on GitHub to help limit, de-escalate or remove the interaction. But sometimes it takes more in-depth interventions to resolve a conflict, in which case maintainers and moderators performed the role of the conciliator to mediate the dispute. Fortunately, many misbehaviors can be anticipated and prevented once moderators have witnessed and intervened in similar incidents. In such instances, moderators took preventative actions to deter further escalations and avoid future mistakes or reformative strategies so that newcomers to the project can distinguish the acceptable behaviors from the inappropriate. Once established, norms guided contributors toward more productive, healthy and efficient interactions. Table 2 shows definitions of the moderation strategies we uncovered, and example actions associated with each strategy. + + +4.3.1 Punitive Strategies. Punitive strategies consist of reactive moderating actions such as content removal, bans, locking of conversations, or strict enforcement of codes of conduct guidelines to eliminate harmful content and disruptive behaviors. These were usually taken immediately after severe situations such as unexpected debates and outbursts, so as to limit the impact of inappropriate actions and prevent further escalations of conflict. +When content removal was sufficient to conclude and archive an exchange, moderators simply hid or deleted comments. P3 of Apex related his latest preference for hiding comments over deletion since GitHub introduced public deletion receipts: “I don’t delete comments anymore because GitHub leaves a record that you’ve done it . . . because of that it’s more effective to hide them all as abusive or off-topic”. + + +Unlike deletion (which now leaves a public trail of delete receipts), folding the content via hiding offered transparency, which was found to 1.) improve legitimacy and accountability 2.) increase perceived consistency and 3.) prevent confusion and frustration [59]: “People can still read it (which sucks), but then there’s not an illusion of censorship, which is worse than people reading the content, but not as good as the content being erased.” (P3). Prior to public deletion receipts, “if there was a hugely toxic exchange that was irrelevant to the issue, I could sum it up and then delete all the comments and nobody had to see that toxic exchange had happened” (P3). + + +Moderators found it was crucial to enforce existing rules to maintain a healthy and supportive environment. In the case of political spam in Hyundai, P6 recounted having to delete and close issues, as well as the report the accounts to GitHub. To clearly outline desired behaviors, it can be helpful to have a “code of conduct, and being open about enforcing it helps a lot, because people know what they’re getting if they go that way. . . acceptable behavior is pretty much laid out [there]” (P5). P5 additionally emphasized the importance of invoking existing rules: + + +“We have the moderation team to enforce this; we want to be constructive at all times. We do not accept people harassing other people or calling them names or generally being negative, it’s rather frowned upon. Basically we criticize the code not the person: be constructive, be on the point.” + + +Moderators also called out clearly toxic behaviors that were not yet explicitly delineated in existing rules. For instance, within the project of popular language Grunge: “we call out bad behavior when we see it.” (P5). These concerns were raised directly on GitHub or through other media: P9 and his team members on Vessel “call out bad behavior by sending screenshots over the team’s Slack” while P13 and his team on Apex post in a moderation repo to encourage accountability “You open an issue in moderation repo, so they see that you’re aware of it . . . and often that’s enough to get them to de-escalate and no other people are watching.” + + +Reactive approaches require quick responses, since escalations tend to unravel quickly “Either we don’t notice it, or we say ‘hey it’s banned’ or ‘work it out’ . . . But . . . sometimes it’s a bad thing to catch it like one day too late, and at that point it’s too late.” (P13). Should disagreements develop into more heated debates, moderators would institute temporary bans: “There will sometimes be very heated discussions, we may institute a one-, or even in some cases the seven-day ban, so they can cool off and then come back, refreshed, hopefully.” (P5). + + +4.3.2 Mediations. “In an OS community, the implicit foundation of it is that all contributions are valid, or that everybody has an equal stake in doing something” (P11). But disagreements occurred when maintainers and contributors have mismatched expectations for the future state of the software [43]. During such interpersonal conflicts, it fell to moderators to hear out all perspectives and mediate underlying conflicts to resolve disagreement and limit the development of toxic behaviors. + + +Mediation involves communicating with multiple parties involved in a conflict (individually or as a group) to resolve any misunderstandings or negotiate any conflicting objectives when collaborating on a decision in the project. P13 described how one party engaged in a conflict sought out moderators to mediate situations: “You find the moderator that is respected by both parties that are involved in the conflict. . . . Then you talk to them, if they’re nice they usually agree to facilitate things. You get them to hear both sides, they take it from there.” And having conducted mediations himself, P13 elaborated on the sequential process of mediations. To start off, the moderator speaks +with individuals from both sides of the conflict: “you just talk to the sides, . . . you try to figure out the conflict, you try to get them to see the other person’s perspective”. In some cases someone actually did commit a wrongdoing or misconduct: “Sometimes there is a clear person who is right in the conflict . . . usually the other party will either admit that or dig in.” But more likely it’s just a miscommunication: “Often there is not [someone in the wrong], it’s just like a misunderstanding, and just getting people to see the misunderstanding and the other person’s perspective is usually enough”. + + +P13 of project Apex also recounted approaching mediation by giving all parties the benefit of doubt: “Most of these people are good people and good engineers, and there is very little malice in the project. Just assuming good faith and trying to approach it from a point of like ‘these are reasonable, decent human beings’ is often sufficient, in terms of figuring out the right side.” Meanwhile, P14 of a smaller project found mediation to be a negotiatory task: “It’s all about negotiation. You talk with engineer A, you tell them what you don’t like. You try to talk with engineer B, you try to see if what engineer A is proposing will work with engineer B, and you try to come up with a tradeoff.” + + +Some maintainers were happy to act as an intermediary from the beginning. For instance P1 related how “I would rather be [a] middleman than to call out anyone for toxicity” while P14 helped his contributors ask for clarifications: “Sometimes people can come to me and say: ‘I read this, not sure how to take it, if it’s personal or something’. Usually I know all interested parties and I try to ask the reviewer to rephrase the message, to clarify it.” But founders of more mature projects like JaguarAPI were not as comfortable with mediating - “I’m trying to mediate, which is strange because that’s not something I would normally do. I wouldn’t normally engage in an aggressive conversation” (P4). But due to hypervisibility of the project and obligations to protect community members, P4 wound up learning how to learn anyway: “I feel like I have to protect the community and the people that are around there, around my family. So I end up having to stop whoever is kind of harassing us”. + + +4.3.3 Preventative Strategies. Mediations and punitive strategies describe ways that moderators react to conflicts of different scales. While these techniques can be taught and directly performed by any new moderator, it takes more experience to anticipate and prevent budding or future disputes. Kiesler et al. presented ways to limit the impacts of misbehaviors as well as the performance of bad behaviors in their meta-analysis [2]. Below we categorized these two types of strategies as inhibitory and proactive preventions, where moderators used the former to prevent escalations the latter to proactively set up workflows to prevent frustrations and ensure conformity to standards. + + +Inhibitory Preventions. Not all conflicts end up escalating into full-blown arguments between contributors, and most of the time it was up to human moderators to predict the onset of harmful behaviors. Inhibitory preventions involve warning-based, reproachful techniques that moderators leverage to target indirectly hostile behaviors (e.g. inappropriate jokes, passive aggressive behaviors), so as to limit harm and avoid further escalations. The indirectly hostile behavior in open source projects is analogous to the concept of “toxicity elicitation” in online text-based communities [110], which are comments or behaviors that elicit high toxic responses, but doesn’t necessarily contain toxic language itself. The preventative actions targeting these behaviors included monitoring conversations, calling out and correcting misbehaviors, or issuing warnings. + + +Passive aggressive behaviors were a classic example of indirect hostility that participants brought up, and P5 of Grunge recounts how “we always stop this. You have to nip it in the bud, because people new to the language come there to ask questions, that’s always a delicate situation”. To reduce the chances of newcomers dropping off, “we’re extra careful there to protect those people from know-it-alls and people who just ooze negativity” (P5). P11 from Silverback similarly practiced the firm enforcement of rules to prevent escalations: “you just firmly enforce it, and that itself creates a good culture because you nip these things in the bud. You don’t let them escalate out” (P11). In the absence of existing rules, moderators issued preventative warnings to de-escalate situations. “Other than +bans, it can even just be a proverbial slap on the wrist. We call out bad behavior and if they fix it directly then that’s totally okay, we appreciate that not everyone is at their best” (P5). + + +Even though these comments were not as outright and blatantly harmful, they did contribute to the normalization of hidden hostility: + + +“The problem isn’t the offensive or toxic comments, that’s not actually the issue. It’s not actually a problem that someone is entitled, in a comment directly. It’s the knock on effects of that comment, it’s that other people will see that and think it’s okay to behave that way, it’s that other people will feel more entitled because they’ve seen entitlement be normalized.” (P3) + + +Proactive Preventions. After observing repeated instances of misconduct, moderators proactively established rules and workflow standards to avoid the repetition of similar mistakes, minimize the amount of harm that bad actors can perform, and guide new contributors toward desired standards and practices, which has been found as an issue for newcomers [43]. Specific structures include codes of conduct, private moderation repos, formatting linters, templates that help contributors better frame their questions and suggestions, or channels for organizing existing answers. While these structures did not directly take place after an offense, their presence created structures for support and information dissemination, thereby minimizing questions and issues raised from users and contributors. + + +In the case of P8 from Apex, an entire moderation team was set up in reaction to a conflict: “The moderation team is set up in reaction to Apex botching . . . [a situation]. It was a public relations fiasco.” A team member P3 also described how besides moderation teams, contributors also set up codes of conduct after instances of conflict: “anyone who has run into the need for moderation or codes of conduct, is going to be very quick to implement it in a community they enter or create”. + + +To minimize the edits that maintainers need to make to contributors’ submissions, templates helped to list out necessary components to include in new pull requests or issues: “In the repository, in the topmost folder, there is a Contributing document that says: your pull request should have this title, commit messages should be in this format” (P3) or assist users in drafting issues “I added a load of information to the template and a lot of requisites to ask people to build a very simple example of what is it that you want.” (P4). + + +4.3.4 Reformative Strategies. Not all acts of misconduct are borne from malicious intentions [59]. Sometimes when moderators observed repeated instances of misbehaviors, they employed a more nurturing and reformative approach that doesn’t castigate contributors for unintentional offenses. Unlike the punitive or preventative strategies that remove a member’s content or right to interactions, reformative techniques are more educational and gentle, consisting of actions like polite admonishments or providing explanations. Over the long term, artifacts from reformative approaches (e.g. explanations) benefit the community by establishing acceptable behavioral standards, even if they take some time for communities to adopt. By offering benefits such as transparency and a way to establish new norms for subsequent community members, reformative approaches garnered increased advocacy from researchers in recent years [53, 79]. + + +Similarly, reformative strategies were well received among open source practitioners as well. P3 of Apex related positive feedback from his community: “The polite admonishment (when I word it eloquently enough) tends to gather lots of heart and thumb-up emoji reactions, and the person will either apologize or just dip out and be quiet. So it’s the most effective form of response.” In a newer project, P11 redirected a raised issue to demonstrate a more efficient response for typos: “Thanks for point this out. However, instead of raising this as an issue, if you ever see small typos, please feel free to just put in a pull request to fix them”. However, providing explanations is a nontrivial +amount of work, so sometimes maintainers fall back on preventative strategies: “I don’t always have the energy for that so sometimes I’m hostile back . . . sometimes biting comments in response are effective, at the cost of other people seeing me as a jerk, but it still establishes that behavior is not acceptable” (P3). + + +One side effect of politely admonishing community members is the potential loss of a contributor, but often that risk is outweighed by the knock-on effects of unaddressed misbehaviors: + + +Establishing what behavior is acceptable and what is not . . . [it] is performative - it’s showing everyone else in the arena that that behavior is not okay, even if that means that person is not going to improve. And while it’s always preferred to rehabilitate someone, or convince them to re-evaluate. . . I’d rather lose a person forever from the community than have the rest of the community see toxic behavior go unchallenged. + + +In addition to polite admonishment, P2 of a new differential privacy library showed how reformative actions also offer explanations of newly established norms and practices: “New pipelines are introduced through meetings, and introducers explain why they’re better, they are then more accepted by contributors after explanation”. Some of these standards took a period of transition for some communities to adopt, demonstrating a case of normative conflict identified from [38]: + + +“When a community starts moderating it’s overwhelming for a while. . . The goal is to get everyone to be as open and tolerant and respectful as possible and that goal is not . . . most efficiently achieved by immediately jumping to a list of all the things that are potentially a problem . . . each medium has to get there on its own time, in its own way so the norms can be established and everyone can accept them.” (P3) + + +However, once communities adopted a good practice, they grew to appreciate it over the long run: “And then norms are established; they know that it’s safe to admonish newcomers to behave that way. And the incidence of reports just plummets. People just don’t screw up when they know what the norms are supposed to be. We’ll get through the transition period and the new norms will just be the way it is, and everyone will be horrified that it used to be worse.” (P3). Among collaborators, such adoption frictions were usually mitigated by group meetings and discussion: “New pipelines are introduced through meetings, and introducers explain why they’re better, they are then more accepted by contributors after explanation.” (P2). + + +4.4 Automation of Moderation + + +Most of the interactions on social coding platforms are text-based, making them well-suited for automation when compared to their social media counterparts [58]. As a result, bots and GitHub Actions easily leverage the available repo artifacts to facilitate various workflows and protection mechanisms for their projects [108]. Half of our participants mentioned using or considering automated tools to facilitate community moderation. While most of them did not currently have moderation tools set up on their repos, Hyundai had a positive experience using the Sentiment Bot, and Silverback had installed the alex bot, which detects instances of “gender favoring, polarizing, race related, or other unequal phrasing in text” [109]. + + +However, our interviewees perceived current bots to be inadequate for conducting moderation beyond simple reactive warnings. Moderators reported that community members can view automated moderation tools as over-censoring and policing forces that threaten their freedom of speech, especially due to their tendencies for false triggers. Furthermore, the more subtle forms of misbehaviors found in the professional space of software development (such as those covered in Section 4.1.2 Misunderstandings, technical disagreements, and resistance against new practices) are difficult to anticipate by the language models’ underlying moderation tools. Meanwhile, the +tools used for moderation are seldom adapted to the development context and lack access to cross-platform information, increasing the chance for false alarms. Finally, the absence of customization options for privacy and notifications breached social boundaries between users, contributors and maintainers by exposing deletions and callouts to the public and unnecessarily demanding maintainers’ attention with excessive and overly public notifications. So despite the potential for bots to perform automated moderation on the behalf of maintainers, many of our participants expressed concerns and adoption frictions. Below we highlight some of these existing tensions as well as maintainers’ stance on the utility and impact of moderation bots. + + +4.4.1 Automated Moderation Breeds Over-Censorship. In public online spaces, the right to free individual expression inevitably trades off with concerns of wellbeing and public safety [49], and our interview participants perceived the potential for automation to lead to over-censorship. Free speech has long-standing associations with source code, and the metaphor was leveraged by early supporters of F/OSS to protect the right to use, modify and distribute software [27]. As a result, open source communities have strongly embraced and valued the right to free speech. However, in volunteer-based development contexts, it is also important to foster a safe space that welcomes contributions from everyone, especially given the limited diversity and inclusion in modern open source communities [71]; P8 of Apex points out the shortcoming “As with a lot of organizations . . . we struggle with representation.” But as Gibson has found, moderators use punitive strategies more within safe spaces [46], which leads to a sense of over-censorship. Teammate P3 described the two opposing value systems: + + +“There’s a spectrum of how much we want to modify our language to avoid offending people. And the folks who generally resist political correctness are the ones who are on the side of saying, ‘I’m going to say whatever I want, if you’re offended that’s your problem’ and I think that sucks and I don’t want to be there. But the other extreme is problematic too because it damages the message by turning into policing.” + + +Maintainers delicately balanced this friction between individual contributors’ desire for free speech and the community’s need to create a welcoming space for female, LGBTQ, and other underrepresented groups [71, 105]. Depending on contextual needs, maintainers balanced the freedom of expression for contributors with broader project goals of promoting an respectful and inclusive community, or as P14 eloquently expressed: “the tradeoff between this [moderation] and freedom. You have to balance, don’t want to restrict people, but you also want everybody to play nice.” + + +While some maintainers struggled with the tradeoff between free expression and enforced civility, P10 (founder of Silverback) expressed active resistance against demands for free speech: “This is a slippery slope, [as if] we’re going to ‘lose all of our words’ and ‘it’s going to be 1984’ and ‘we can’t express ourselves’.” To assert that automations are not powerful enough to suppress our creativity and right to speech, P10 challenged users/contributors who complained to test out AlexJS, telling them that: “if you lose more than like 15 words in a year red we can rediscuss this. But . . . you’re smart enough and creative enough, and the language is large enough, that you’re going to be just fine”. + + +Most of the automation assistance our participants considered focused on language use, and less so on other forms of misbehavior. However, excessive attention to and moderation of language misuse also derailed conversations topics away from the development of software - “even the people who were philosophically aligned with the idea of avoiding gender words were still irritated that the normal topic of the channel was distracted or disrupted by that conversation so frequently” (P3). Such concerns are another set of nuances that bots would have a hard time taking into account: “if you get a tool like that – it can legitimately be seen as being too pedantic or too tightly wound about certain words. And [if] the culture of that community isn’t ready for that yet, then that’s worse than not saying anything” (P3). +4.4.2 Perceived Technical Limitations of Automating Moderation. Participants perceived that moderation bots deployed on GitHub doubly suffered from context specificity due to 1.) situation-specific nuances that are difficult for current tools to pick up on and 2.) technical terms used in software development environments. On the one hand, a lack of nuanced understanding of situational contexts made it difficult for models to detect new and more subtle variations of misbehaviors. On the other hand, the underlying language models lacked contextual sensitivity to technical terms, triggering false positives that require additional human labor to review. The combination of such shortcomings caused hesitation in delegating moderation responsibilities to automation tools. + + +Inability to Anticipate False Negatives. Human collaborators can easily retrieve information distributed across platforms by toggling between them, but most automation tools can only access a single deployment context or platform: “a lot of the mediums [where] they have discussions are not in public – the bot wouldn’t have access” (P13). Without multiplatform contextual clues, bots failed to pick up on interpersonal relationships or intended meanings from a working environment, and “even inside the discussion there is a lot of background [and] the bot would have a very hard time to figure it out” (P13). For instance, P6 of Hyundai pointed out that “when people are passive aggressive, the bot cannot understand that, and it’s better to interact with a human”. Hence, much like the moderators of social media contexts, open source moderators prefer human-reviewed decisions “for the interpersonal conflict inside of projects [since] it would be impossible” for automated assistance of moderation to work [94]. P3 also shares the stance that human should be involved in moderating decisions: “If it was just maintainers that see it: fine, I can make a judgment call”. + + +Context Specificity Raises False Positives. General-purpose sentiment analysis models have struggle to pick up on connotations of context-dependent terms, causing learning-based models to falsely trigger on common software engineering terms that carry negative denotations when used in everyday contexts. Consequently, maintainers had to manually review instances of such false positives triggers to ensure accuracy, exacerbating their already limited bandwidth. For instance, P3 introduced how AlexJS could offend individuals by aggressively flagging words with slightly negative connotations: “AlexJS . . . ended up . . . [being similar to] the archeology conference where they couldn’t say the word bone, because some software I flagged it as offensive.” In project Silerback, P10 also observed how “Alex triggers on ‘master’ because of [the upstream dependency] Vessel” because project Vessel had not yet renamed their ‘master’ branch as ‘main’. + + +One particular example of contextual information missing from detection models was a contributor’s primary language. When someone’s native tongue is not English, their comments can accidentally trigger bot reactions by unintentionally using phrases that carry negative connotations and innuendos: “English not being a first language may affect them.” (P8). Li et al. has highlighted how moderators must use intuition (i.e. guessing) to discern between behaviors needing intervention from unintentional offenses caused by language differences [71]. + + +Participants also worried about how bots will treat self-directed anger. In one instance, the founder of project Hyundai “was answering a question and said ‘Oh yes, don’t worry about that feature, it was a rubbish feature and we already fixed it’ and the bot was triggered.” (P6). Likewise, P10 “worries about the use of negative language in code due to personal experience writing code (mostly self-directed).” While P6 was able to find humor in the situation, “Sometimes the Sentiment Bot flags not so aggressive phrases and it’s a funny occurrence” other instances may be more frustrating, especially if the triggering words are frequently used. + + +Most maintainers hold the opinion that false positives cause harm, especially if they add noisy information: “false positives become a big problem, especially because they’re a distraction” (P5). But the potential for false negatives to cause disruptions depend on context: “Sometimes false positives are acceptable, better than missing something . . . but there’s also sometimes where false anything is not +acceptable and it’s better to say nothing, than to have a false result.” (P3). Too many false triggers can numb maintainers’ responses to warnings (a behavior consistent with the findings of Wessel et al. [108]), perceiving them as noise instead and thereafter ignoring them altogether: “We have something called a stale bot . . . it periodically will just put comments on tickets and send emails, which is not bad. But for whatever reason we’ve learned to ignore it sometimes.” (P10). + + +4.4.3 Customizations and Boundaries. Participants reported strong needs to tweak and customize the tooling based on specific project needs. In existing automation tools, the lack of personalization options harmed adoption rates. For instance, the absence of options for notification settings caused information overload and fatigue for maintainers, especially given the possibility of the abovementioned false positive triggers. + + +In Sections 4.2.1 and 4.2.4 we discussed the distribution of moderation work to volunteer contributors. P3 brought up an in-house GitHub feature that supports this volunteer reporting framework – “GitHub has this reporting facility that I don’t think too many people know to turn on. It allows arbitrary users to say this is a problem, pay attention”. Unfortunately the feature lacks notification settings: “In [Apex] we actually turned that off (it was on for a while) because you can only turn it on for the whole org or none of it.” + + +Privacy is another setting that maintainers wanted to configure. While transparency has been found to improve collaboration in open source [30], not all maintainers are ready “to be that transparent, and that direct”, and P3 is of the mind set that completely transparent configurations are “going to have consequences that those folks didn’t anticipate and that a private system allows for more bias”. However, they also conceded later that privacy “also allows . . . for a more refined response, hence it is of importance to have the agency to configure for private notifications: “Having it surface just so I notice it . . . [it should also be that] I can also tweak how sensitive it is, instead of having a default setting”. + + +There exist yet other bots that are intended to lessen the burden of maintainers, but have instead crossed a social boundary that maintainers were not entirely comfortable with. For example, P3 worried about how the automatic closing of issues deprioritized the time of community contributors: “I don’t use Probot at all primarily [because] most of the usage of it I’ve seen has been programmatically closing issues (like stale issues) and I think that’s insanely user hostile . . . prioritizing maintainer time over the feelings of users and I think that’s not a good trade off.” In a similar vein, P1 also claimed he “would not consider anything that’d directly communicate with the contributor, because we value every single one of them” and allowing direct communication between automated moderation tools and contributors could risk offending and losing valuable community members. + + +4.4.4 Anticipated Role of Bots in Moderation. Perhaps due to the shortcomings outlined above, maintainers for repos of various sizes indicated that projects should not (solely) depend on automation for moderation. As a moderator of the popular project Grunge, P5 thought its presence would be extraneous - “the best thing it could do was alert as to situations that may arise. But then again, people already do that”. As the creator of the more nascent Silverback project, P11 also thought that his community - “should never need it, other than catching slip-ups” because “if we’re relying on the bot to solve moderation problems, we’ve gone so far off course”. + + +Fortunately, the future for bot adoption is not entirely dismal. P8 expressed appreciation for the depersonalized nature of automated interventions, suggesting that it may have a place in initiating interpersonal interactions such as mediations “it’s nice that the tool depersonalizes the intervention”. But like bots on any platform, user abuse is a possibility: “as soon as a repo has that system on it . . . a bunch of people are going to go brigade it and just drop every offensive word they can think . . . just to see how much they can respond” (P3). +| Self moderation | Volunteer Moderators | Moderation Teams | +|-----------------|----------------------|------------------| +| +Punitive + | Current bot interven-tions are suitable for reactive self-moderation, but customizations, contextual sensitivity and higher accuracy can increase usage. | Current volunteer moderators often experience false positives triggers from moderation bots, customized notifications, increased accuracy and contextual sensitivity can encourage adoption. | Bot interventions can help improve the efficiency of content moderation in large projects with moderation teams. There are opportunities for bots to help team members to make collaborative decisions or onboard new members. | +| +Mediations + | N/A (Conflicts involving mediation have usually escalated beyond self-moderation.) | Bots can ask for clarifications on behalf of a contributor (acting as a mediator) in place of moderators. | Bots can help de-personalize mediations but there exists room for improvement in detecting situations that are in need of mediations in large projects. | +| +Preventative +| Inhibitory: There are opportunities for detecting instances of potential toxicity such as indirect hostility that could develop into more serious conflicts (e.g. passive aggressiveness, inside jokes, minor transgressions). Proactive: Bots can provide suggestions of improved workflows after observing repeated mistakes and unconformity to existing standards. | | +| +Reformative + | Bots can help enforce template use and surface rules, community guidelines and codes of conduct to writers when they are composing a potentially harmful comment. | | + + +Table 3. Design Recommendations: how automation may support moderation structures and strategies + + +Additionally, our participants considered scenarios where moderation bots can be leveraged to execute some of the Moderation Strategies – Table 3 overviews some ways that bots can support moderation in the future. For situations needing immediate response such as warnings that are administered through Punitive Strategies, P12 recalled an instance of a demanding user where a bot could have intervened. The user had commented: “THIS BUG HAS BEEN OPEN FOR A YEAR, WHERE IS THE FIX AT” in all caps – I can definitely see using it [a moderation bot] for something like that.” P4 also contemplated a situation where the Sentiment Bot could have taken the frontline, reactive work of moderation: “if it was able to take a lot of those first conversations I think that will be very useful”. P8 imagined that such a tool could help self-moderation by alerting well-intentioned commenters when they accidentally make a mistake: “This is gonna be good for people who are good faith commenters, it’s not gonna be effective for the trolls”. In communities where malpractices were pervasive, P14 imagined that moderation bots could help facilitate reform: “If I see this type of behavior becoming a bad practice in the team/overall community, I would definitely consider doing [adopting] something like that”. +---------------------------------------- +------------------------------- +Section 286: +5 DISCUSSION + + +Through our examination of moderation norms and practices among communities of various sizes, we found a diverse set of structures and practices that maintainers leverage to manage and +prevent conflicts. While self-moderation and volunteer-based moderation have pervaded and been well-studied in neighboring communities such as Wikipedia [40] and Stack Overflow [25], we found that moderation required a different set of strategies and in the case of larger projects, more formal structures such as moderation teams. We also discovered that there are still many gaps in the forms of moderation assistance that bots can offer, both in terms of whom they serve and the type of moderation strategies they automate. Inspired by some speculations of our participants, we present below a comparison between the moderation structures, strategies and opportunities for automation in open source versus other platforms, as well as some design recommendations to help guide the future of automation tools for moderation. + + +5.1 Moderation in open source versus other platforms + + +5.1.1 Moderated Content. In terms of content, prior works on content moderation in social media largely documented the presence of more explicit forms of misbehaviors such as the infamous triad of “flaming, spamming and virtual rape,” among other forms of inappropriate content such as hate speech, insults or harassment [24, 56, 58, 94]. In our discussions with practitioners on GitHub, we gathered that moderators also watched out for more borderline actions such as technical disagreements or resistance to new norms, which may not be as immediately apparent. The evidence of such subtle forms of disputes mean that moderators are more likely to leverage Mediations as an approach to conflicts between contributors. Another implication is that automated tools powered by language models are unlikely to detect these less obvious misbehaviors, because not only are they subtle, but they also tend to be situational and technical – and therefore highly context-dependent. + + +5.1.2 Structures and Roles. While prior works discuss the potentials of self-moderation in community-based platforms such as Facebook groups, Wikipedia and Reddit [17, 90], most considered moderation to be a community-level effort where platform peers helped one another moderate, similar to the volunteer moderation that we attribute to community members in this study. Among our participants, self-moderation was considered an individually-initiated action where contributors self-monitor and edit their own content. Such behaviors are likely to benefit from automated assistance, as presented in Table 3. Our terminology was consistent with one other study on YouTube [72], while another investigation of subreddits called the phenomenon self-censorship [46]. + + +Many of the communities that practice the community-level self-moderation rely on volunteers to conduct moderation, as opposed to more centralized models of corporate moderation [58, 94]. However, past work suggest that the reliance online volunteers to conduct moderation labor may be exploitative, meriting re-examinations from an ethical perspective [70]. Our results revealed that moderators in OSS shared governing powers with higher-up authorities such as the TSC, as well as with the community members more broadly. Prior work suggest that these mechanisms for distributing power across multiple hierarchical levels is beneficial and expected for larger projects, by arguing that 1.) power limitations on moderators can increase the perceived legitimacy of their decisions [2], and 2.) the growth of communities increase the decentralization of moderation on platforms such as Wikipedia [40]. The establishment of formal structures (such as the moderation teams we introduced in section 4.2.2) have been found to improve the communication of norms to newcomers [40], perhaps by increasing the usage of actions such as Reformative Strategies. + + +5.1.3 Moderation Strategies. Punitive actions such as hiding or deletion of content as well as the banning and calling out of rule-breaking behaviors resemble much of the organizing actions found on Reddit, Discord and Twitter [24, 56, 58], and we found evidence that such strategies are transferable to the OSS context. Similarly, inhibitory warnings used for preventing conflict resemble norm-setting practices adopted by moderators in Wikipedia, Facebook, Twitch, as well as Reddit [39, 94]. The transition from inhibitory warnings to punitive actions reflects Ostrom’s +fifth design principle of graduated sanctions [83]; and though our participants did not explicitly discuss such escalations, we encourage future work to more closely examine its prevalence in OSS contexts. Ferreira et. al. [36] advocated for both proactive and reactive (or punitive) approaches for addressing known issues and conducting damage control, and our findings provide evidence of moderators employing such strategies in practice. Finally, mediation was a strategy that was almost never observed among extant literature, except in Wikipedia, perhaps of its similarities with open source as a collaborative peer-production platform [12]. + + +5.1.4 Usage and Perception of Automation. Prior works on Wikipedia have found semi- and full-automated tools valuable in providing moderators with an information infrastructure that connected editors from a decentralized network that facilitated valuation, negotiation, and administration, thereby enabling new moderating actions independent of existing norms [44]. For open source, Ferreira et. al. anticipated the deployment of similarly automated assistance for moderation [36], yet many others critiqued that existing toxicity detectors are not yet tailored enough for the software engineering context [10, 36, 60], which our findings corroborated. Beyond the challenges induced by limited domain adaptation, we additionally uncovered the presence of subtle misbehaviors that may contribute to the inability of such models to anticipate more nuanced situations. Lastly, we also highlighted how the absence of customization options caused maintainers to resist adoption, the implementation of which is made difficult by the lack of transparency in the underlying black box models [42, 62]. + + +5.2 Design Implications for Automating Moderation + + +Punitive strategy. The first set of strategies we uncovered were employed at the early stages of conflict, these included punitive measures that halted the escalation and removed toxic content. Presently, we found that moderation bots are of most assistance to human contributors in this reactive capacity, i.e., by pointing out cases of rule violations and harmful content so that authors can become aware when they unintentionally compose inappropriate content. However, when bots were tasked with calling out bad behaviors, our participants observed that they were prone to hypersensitivity, causing false positive triggers. Such false alarms do not scale well and negatively impact moderators by contributing to their already overloaded maintenance burdens [43, 108], making existing tools helpful only for cases of self-moderation. To extend the scope of reactive support toward volunteer moderators and formal moderation teams, further improvements of contextual sensitivity in the underlying sentiment models supporting current moderation bots are needed to enhance understanding of nuances of language used in software engineering to increase accuracy. More customization options can also be incorporated into these tools to increase transparency, explainability and trust among the community. + + +Mediation Strategy. Once conflicts developed, moderators engaged in different approaches to mitigate and resolve issues, depending on specifics of the situation. When encountering disagreements among small parties of contributors, moderators took mediating actions to reconcile differences. During mediations, moderation bots help facilitate depersonalized interventions between contributors, but further advancements can help moderators detect any disputes that require mediation and ask clarifications from a fellow contributor when one side is uncertain about the presence of conflict or the potentially negative connotations of a comment. + + +Preventative Strategy. When contributors engage in more indirect forms of toxicity, such as passive aggressiveness or inappropriate jokes, maintainers leveraged inhibitory preventions to limit the extent of bad behaviors. Bots can support moderators by expanding their detection scope to include such forms of indirect hostility. After repeated instances of behavioral mistakes occur among +different contributors, moderators proactively established new rules and standards to prevent future violations. To contribute toward proactive preventions, bots can help monitor and detect repeated offenses, identify associated workflows that cause such unconformity, and suggest improvements based on practices observed among other communities. + + +Reformative Strategy. + For mistakes repeated by multiple contributors, moderators took reformative approaches to set up standards and proactively prevented future cases of similar violations by introducing new rules and workflows. To help moderators initiate reformation among the community, automation can be utilized to surface existing guidelines in real time when authors are writing content so as to prevent the public posting of potentially harmful content. +---------------------------------------- +------------------------------- +Section 287: +5.3 Relation between Workflow Automations and User Frustration + + +While our study did not set out to find out the types of technical and interpersonal conflicts that lead to toxic and uncivil behaviors in open source, there was an emergent theme that pointed to entitlement and user frustrations, especially among more prominent projects with larger user bases, highlighting the shortage of technical support for users and contributors. These problems usually surfaced when participants discussed the types of strategies or workflows they used or set up (mostly described in Sections 4.3.3 and 4.4.1), many of which were established to reduce the masses of questions or requests. While prior work has touched upon how the time-sensitive and never-ending chore of user support is one of the most emotionally draining tasks for maintainers [43, 100], little is known about the types of technical complexities and misunderstandings that cause such extensive amounts of frustration. Future work can seek to address this missing link between specific forms of technical (or interpersonal) issues that cause these emotionally-charged conflicts as well as some of the mitigation strategies that maintainers mentioned to us above. +---------------------------------------- +------------------------------- +Section 288: +5.4 Limitations + + +Our results indicated three main themes around moderation and the potentials of automation in open source, which we presented in the previous sections. However, despite our efforts to recruit from a diverse group of participants and projects for our interviews (with a particular focus on a variety of project sizes), we do not claim that it is representative of all open source developers. The number of interviews we conducted and the snowball sampling technique both limit the representativeness of our sample. We also focused solely on projects hosted on GitHub, which means that the scope of our theory and results may not generalize to other social coding platforms, such as Bitbucket or GitLab. Furthermore, while it would have been ideal to highlight the experiences and perspectives of more marginalized and underrepresented groups in open source, the scarce availability of such participants did not present us the opportunity – we encourage future work to explore this gap in our understanding of moderators in OSS. +---------------------------------------- +------------------------------- +Section 289: +6 CONCLUSION + + +In this paper, we examined moderation practices in open source communities by conducting 14 semi-structured interviews with moderators and maintainers. Specifically, we characterized the norms, roles and practices of who performs moderation and how different techniques are employed for various contexts (RQ 1). We further investigated automation tools for moderation and identified concerns against adoption, as well as potential ways that future bots can support different groups of moderators in various capacities. Based on the implications of these results, we presented a set of design recommendations for practitioners and researchers, which can guide the future development of automation tools for moderation. +ACKNOWLEDGMENTS + + +This work was supported by the National Science Foundation (NSF) under Award No. 1939606, 2001851, 2000782, 1952085 and 1952085. We are grateful to Allen Yao, Pranav Khadpe, Jim Herbsleb, Christian Kastner, David Widder as well as anonymous reviewers for their crucial input and feedback towards the initial and subsequent drafts of this work. Finally, we would like to thank our participants for offering us their time to share their expertise and insights. + + +REFERENCES + + +[1] 2011. https://code.dblock.org/2011/07/14/github-is-your-new-resume.html. +[2] 2012. Regulating behavior in online communities. In Building Successful Online Communities. The MIT Press. +[3] 2021. Safe space - Github action. https://github.com/charliegerard/safe-space. +[4] 2021. sentiment-bot. https://github.com/behaviorbot/sentiment-bot. +[5] 2022. https://github.com/search?q=type:user&type=Users. +[6] 2022. Adding a code of conduct to your project. https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-code-of-conduct-to-your-project. +[7] 2022. GitHub Acceptable Use Policies. https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies. +[8] 2022. Moderating comments and conversations. https://docs.github.com/en/communities/moderating-comments-and-conversations. +[9] n.d. Wrangling Web Contributions: How to Build a CONTRIBUTING.md. https://mozillascience.github.io/working-open-workshop/contributing/. +[10] Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: A customized sentiment analysis tool for code review interactions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 106–111. +[11] K D Singh Arneja. 2015. Code reviews do not have to be stressful. https://medium.com/idyllic-geeks/code-reviews-do-not-have-to-be-stressful-919e0a8377a1. Accessed: 2022-7-13. +[12] Matt Billings and Leon A Watts. 2010. Understanding dispute resolution online: using text to reflect personal and substantive issues in conflict. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 1447–1456. +[13] Christian Bird. 2011. Sociotechnical coordination and collaboration in open source software. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). 568–573. +[14] Christian Bird, David Pattison, Raissa D’Souza, Vladimir Filkov, and Premkumar Devanbu. 2008. Latent social structure in open source projects. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. 24–35. +[15] Cássio Castaldi Araujo Blaz and Karin Becker. 2016. Sentiment analysis in tickets for IT support. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR ’16). Association for Computing Machinery, New York, NY, USA, 235–246. +[16] Amiangshu Bosu and Jeffrey C Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 133–142. +[17] LIA BOZARTH, JANE IM, CHRISTOPHER QUARLES, and CEREN BUDAK. 2023. Wisdom of Two Crowds: Misinformation Moderation on Reddit and How to Improve this Process—A Case Study of COVID-19. (2023). +[18] Fabio Calefato, Filippo Lanubile, Federico Maiorano, and Nicole Novielli. 2018. Sentiment Polarity Detection for Software Development. Empirical Software Engineering 23, 3 (June 2018), 1352–1382. +[19] Stevie Chancellor, Andrea Hu, and Munmun De Choudhury. 2018. Norms Matter: Contrasting Social Support Around Behavior Change in Online Weight Loss Communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18, Paper 666). Association for Computing Machinery, New York, NY, USA, 1–14. +[20] Stevie Chancellor, Yannis Kalantidis, Jessica A Pater, Munmun De Choudhury, and David A Shamma. 2017. Multimodal Classification of Moderated Online Pro-Eating Disorder Content. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3213–3226. +[21] Eshwar Chandrasekharan, Chaitrali Gandhi, Matthew Wortley Mustelier, and Eric Gilbert. 2019. Crossmod: A Cross-Community Learning-based System to Assist Reddit Moderators. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019), 1–30. +[22] Eshwar Chandrasekharan, Shagun Jhaver, Amy Bruckman, and Eric Gilbert. 2022. Quarantined! Examining the Effects of a Community-Wide Moderation Intervention on Reddit. , 26 pages. + + +[23] Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech. Proc. ACM Hum.-Comput. Interact. 1, CSCW (Dec. 2017), 1–22. https://doi.org/10.1145/3134666 + + +[24] Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1–22. https://doi.org/10.1145/3134666 + + +[25] Jithin Cheriyan, Bastin Tony Roy Savarimuthu, and Stephen Cranefield. 2020. Norm violation in online communities – A study of Stack Overflow comments. (April 2020). arXiv:2004.05589 [cs.SI] + + +[26] Sophie Cohen. 2021. Contextualizing toxicity in open source: a qualitative study. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 1669–1671. + + +[27] Gabriella Coleman. 2009. CODE IS SPEECH: Legal tinkering, expertise, and protest among free and open source software developers. Cult. Anthropol. 24, 3 (Aug. 2009), 420–454. + + +[28] John W Creswell and Cheryl N Poth. 2016. Qualitative inquiry and research design: Choosing among five approaches. Sage publications. + + +[29] Laura Dabbish, Colleen Stuart, Jason Tsay, and James Herbsleb. 2012. Leveraging transparency. IEEE software 30, 1 (2012), 37–43. + + +[30] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (Seattle, Washington, USA) (CSCW ’12). Association for Computing Machinery, New York, NY, USA, 1277–1286. + + +[31] Erik Dietrich. 2020. How to Deal with an Insufferable Code Reviewer. Retrieved September (2020). + + +[32] Carolyn D Egelman, Emerson Murphy-Hill, Elizabeth Kammer, Margaret Morrow Hodges, Collin Green, Ciera Jaspan, and James Lin. 2020. Predicting Developers’ Negative Feelings about Code Review. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 174–185. + + +[33] Nadia Eghbal. 2016. Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure. Ford Foundation. + + +[34] Nadia Eghbal. 2020. Working in public: the making and maintenance of open source software. Stripe Press. + + +[35] Linda Erlenhov, Francisco Gomes de Oliveira Neto, and Philipp Leitner. 2020. An empirical study of bots in software development: characteristics and challenges from a practitioner’s perspective. + + +[36] Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2021. The “Shut the f**k up” Phenomenon: Characterizing Incivility in Open Source Code Review Discussions. , 35 pages. + + +[37] Isabella Ferreira, Ahlaam Rafiq, and Jinghui Cheng. 2022. Incivility Detection in Open Source Code Review and Issue Discussions. (June 2022). arXiv:2206.13429 [cs.SE] + + +[38] Anna Filippova and Hichang Cho. 2015. Mudslinging and Manners: Unpacking Conflict in Free and Open Source Software. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 1393–1403. + + +[39] Andrea Forte and Amy Bruckman. 2008. Scaling consensus: Increasing decentralization in Wikipedia governance. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008). IEEE, 157–157. + + +[40] Andrea Forte, Vanesa Larco, and Amy Bruckman. 2009. Decentralization in Wikipedia Governance. Journal of Management Information Systems 26, 1 (July 2009), 49–72. + + +[41] Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik. 2017. Anger and Its Direction in Collaborative Software Development. In 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER). 11–14. + + +[42] R Stuart Geiger and Aaron Halfaker. 2016. Open algorithmic systems: lessons on opening the black box from Wikipedia. AoIR Selected Papers of Internet Research (2016). + + +[43] R Stuart Geiger, Dorothy Howard, and Lilly Irani. 2021. The Labor of Maintaining and Scaling Free and Open-Source Software Projects. Proc. ACM Hum.-Comput. Interact. 5, CSCW1 (April 2021), 1–28. + + +[44] R Stuart Geiger and David Ribes. 2010. The work of sustaining order in wikipedia: the banning of a vandal. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (Savannah, Georgia, USA) (CSCW ’10). Association for Computing Machinery, New York, NY, USA, 117–126. + + +[45] R Stuart Geiger, Nelle Varoquaux, Charlotte Mazel-Cabasse, and Chris Holdgraf. 2018. The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries. Comput. Support. Coop. Work 27, 3 (Dec. 2018), 767–802. + + +[46] Anna Gibson. 2019. Free Speech and Safe Spaces: How Moderation Policies Shape Online Discussion Spaces. Social Media + Society 5, 1 (Jan. 2019), 2056305119832588. +[47] Joanne E Gray and Nicolas P Suzor. 2020. Playing with machines: Using machine learning to understand automated copyright enforcement at scale. +Big Data & Society + 7, 1 (Jan. 2020), 2053951720919963. + + +[48] James Grimmelmann. 2015. The virtues of moderation. +Yale JL & Tech. + 17 (2015), 42. + + +[49] Ted Grover and Gloria Mark. 2019. Detecting Potential Warning Behaviors of Ideological Radicalization in an Alt-Right Subreddit. +ICWSM + 13 (July 2019), 193–204. + + +[50] Emitza Guzman, David Azócar, and Yang Li. 2014. Sentiment analysis of commit comments in GitHub: an empirical study. In +Proceedings of the 11th Working Conference on Mining Software Repositories + (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 352–355. + + +[51] Emitza Guzman and Bernd Bruegge. 2013. Towards emotional awareness in software development teams. In +Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering + (Saint Petersburg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 671–674. + + +[52] Wenjian Huang, Tun Lu, Haiyi Zhu, Guo Li, and Ning Gu. 2016. Effectiveness of Conflict Management Strategies in Peer Review Process of Online Collaboration Projects. In +Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing + (San Francisco, California, USA) (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 717–728. + + +[53] Shagun Jhaver, Darren Scott Appling, Eric Gilbert, and Amy Bruckman. 2019. Did you suspect the post would be removed? +Proc. ACM Hum.-Comput. Interact. + 3, CSCW (Nov. 2019), 1–33. + + +[54] Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. +ACM Trans. Comput.-Hum. Interact. + 26, 5 (July 2019), 1–35. + + +[55] Shagun Jhaver, Amy Bruckman, and Eric Gilbert. 2019. Does Transparency in Moderation Really Matter? User Behavior After Content Removal Explanations on Reddit. +Proc. ACM Hum.-Comput. Interact. + 3, CSCW (Nov. 2019), 1–27. + + +[56] Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online Harassment and Content Moderation: The Case of Blocklists. +ACM Trans. Comput.-Hum. Interact. + 25, 2 (March 2018), 1–33. + + +[57] Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online harassment and content moderation: The case of blocklists. +ACM Transactions on Computer-Human Interaction (TOCHI) + 25, 2 (2018), 1–33. + + +[58] Jialun Aaron Jiang, Charles Kiene, Skyler Middler, Jed R Brubaker, and Casey Fiesler. 2019. Moderation Challenges in Voice-based Online Communities on Discord. +Proc. ACM Hum.-Comput. Interact. + 3, CSCW (Nov. 2019), 1–23. + + +[59] Jialun Aaron Jiang, Peipei Nie, Jed R Brubaker, and Casey Fiesler. 2022. A Trade-off-centered Framework of Content Moderation. (June 2022). arXiv:2206.03450 [cs.HC] + + +[60] Robbert Jongeling, Subhajit Datta, and Alexander Serebrenik. 2015. Choosing Your Weapons: On Sentiment Analysis Tools for Software Engineering Research. +2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) + (2015), 531–535. https://doi.org/10.1109/icsm.2015.7332508 + + +[61] Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik. 2017. On negative results when using sentiment analysis tools for software engineering research. +Empirical Software Engineering + 22, 5 (Oct. 2017), 2543–2584. + + +[62] Prerna Juneja, Deepika Rama Subramanian, and Tanushree Mitra. 2020. Through the Looking Glass: Study of Transparency in Reddit’s Moderation Practices. +Proc. ACM Hum.-Comput. Interact. + 4, GROUP (Jan. 2020), 1–35. + + +[63] Rajdeep Kaur and Kuljit Kaur. 2022. Insights into Developers’ Abandonment in FLOSS Projects. , 731–740 pages. + + +[64] Terhi Kilamo, Valentina Lenarduzzi, Tuukka Ahoniemi, Ari Jaaksi, Jurka Rahikkala, and Tommi Mikkonen. 2020. How the Cathedral Embraced the Bazaar, and the Bazaar Became a Cathedral. In +Open Source Systems +. Springer International Publishing, 141–147. + + +[65] Karim R Lakhani and Eric von Hippel. 2004. How open source software works: “free” user-to-user assistance. In +Produktentwicklung mit virtuellen Communities +. Springer, 303–339. + + +[66] Cliff Lampe and Paul Resnick. 2004. Slash(dot) and burn: distributed moderation in a large online conversation space. In +Proceedings of the SIGCHI Conference on Human Factors in Computing Systems + (Vienna, Austria) (CHI ’04). Association for Computing Machinery, New York, NY, USA, 543–550. + + +[67] Noam Lapidot-Lefler and Azy Barak. 2012. Effects of anonymity, invisibility, and lack of eye-contact on toxic online disinhibition. +Comput. Human Behav. + 28, 2 (March 2012), 434–443. + + +[68] Nolan Lawson. 2017. What it feels like to be an open-source maintainer. +Read the Tea Leaves +. https://nolanlawson.com/2017/03/05/what-it-feels-like-to-be-an-open-source-maintainer (2017). + + +[69] Charlotte P Lee, Paul Dourish, and Gloria Mark. 2006. The human infrastructure of cyberinfrastructure. In +Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work + (Banff, Alberta, Canada) (CSCW ’06). Association for Computing Machinery, New York, NY, USA, 483–492. + + +[70] Hanlin Li, Leah Ajmani, Moyan Zhou, Nicholas Vincent, Sohyeon Hwang, Tiziano Piccardi, Sneha Narayan, Sherae Daniel, and Veniamin Veselovsky. 2022. Ethical Tensions, Norms, and Directions in the Extraction of Online Volunteer Work. In +Companion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social Computing +. Proc. ACM Hum.-Comput. Interact., Vol. 7, No. CSCW2, Article 301. Publication date: October 2023. +[71] Renee Li, Pavitthra Pandurangan, Hana Frluckaj, and Laura Dabbish. 2021. Code of Conduct Conversations in Open Source Software Projects on Github. +Proc. ACM Hum.-Comput. Interact. + 5, CSCW1 (April 2021), 1–31. + + +[72] Renkai Ma and Yubo Kou. 2021. "How advertiser-friendly is my video?": YouTuber’s Socioeconomic Interactions with Algorithmic Content Moderation. +Proceedings of the ACM on Human-Computer Interaction + 5, CSCW2 (2021), 1–25. + + +[73] Pia Mancini et al. 2017. Sustain: A one day conversation for open source software sustainers–the report. In +Technical report, Sustain Conference Organization +. + + +[74] Gerardo Matturro. 2013. Soft skills in software engineering: A study of its demand by software companies in Uruguay. In +2013 6th international workshop on cooperative and human aspects of software engineering (CHASE) +. IEEE, 133–136. + + +[75] Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian Kästner. 2022. “Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions. In +44th International Conference on Software Engineering (ICSE’22) +. + + +[76] Courtney Miller, David Gray Widder, Christian Kästner, and Bogdan Vasilescu. 2019. Why Do People Give Up FLOSSing? A Study of Contributor Disengagement in Open Source. In +Open Source Systems +. Springer International Publishing, 116–129. + + +[77] BUCUREAN Mirela. 2019. A QUALITATIVE STUDY ON PASSIVE-AGGRESSIVE BEHAVIOUR AT WORKPLACE. +Annals of the University of Oradea, Economic Science Series + 28, 2 (2019). + + +[78] Alessandro Murgia, Parastou Tourani, Bram Adams, and Marco Ortu. 2014. Do developers feel emotions? an exploratory analysis of emotions in software artifacts. In +Proceedings of the 11th Working Conference on Mining Software Repositories + (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 262–271. + + +[79] Sarah Myers West. 2018. Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms. +New Media & Society + 20, 11 (Nov. 2018), 4366–4383. + + +[80] Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality: an empirical case study. In +Proceedings of the 30th international conference on Software engineering +. 521–530. + + +[81] Ray Oldenburg. 1999. +The great good place: Cafes, coffee shops, bookstores, bars, hair salons, and other hangouts at the heart of a community +. Da Capo Press. + + +[82] Siobhán O’Mahony and Fabrizio Ferraro. 2007. The Emergence of Governance in an Open Source Community. +AMJ + 50, 5 (Oct. 2007), 1079–1106. + + +[83] Elinor Ostrom. 2000. Collective action and the evolution of social norms. +Journal of economic perspectives + 14, 3 (2000), 137–158. + + +[84] Christine Porath and Christine Pearson. 2013. The price of incivility. +Harv. Bus. Rev. + 91, 1-2 (Jan. 2013), 114–21, 146. + + +[85] Huilian Sophie Qiu, Yucen Lily Li, Susmita Padala, Anita Sarma, and Bogdan Vasilescu. 2019. The Signals that Potential Contributors Look for When Choosing Open-source Projects. +Proc. ACM Hum.-Comput. Interact. + 3, CSCW (Nov. 2019), 1–29. + + +[86] Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Christian Kästner, and Bogdan Vasilescu. 2020. Stress and burnout in open source: toward finding, understanding, and mitigating unhealthy interactions. In +Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results + (Seoul, South Korea) (ICSE-NIER ’20). Association for Computing Machinery, New York, NY, USA, 57–60. + + +[87] Philipp Ranzhin. 2020. I ruin developers’ lives with my code reviews and I’m sorry. Retrieved September (2020). + + +[88] David Ribes, Steven Jackson, Stuart Geiger, Matthew Burton, and Thomas Finholt. 2013. Artifacts that organize: Delegation in the distributed organization. +Information and Organization + 23, 1 (Jan. 2013), 1–14. + + +[89] Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu. 2020. A Benchmark Study of the Contemporary Toxicity Detectors on Software Engineering Interactions. + + +[90] Joseph Seering. 2020. Reconsidering Self-Moderation. +Proceedings of the ACM on Human-Computer Interaction + 4, CSCW2 (2020), 1–28. https://doi.org/10.1145/3415178 + + +[91] Joseph Seering. 2020. Reconsidering Self-Moderation: the Role of Research in Supporting Community-Based Models for Online Content Moderation. +Proc. ACM Hum.-Comput. Interact. + 4, CSCW2 (Oct. 2020), 1–28. + + +[92] Joseph Seering, Robert Kraut, and Laura Dabbish. 2017. Shaping Pro and Anti-Social Behavior on Twitch Through Moderation and Example-Setting. In +Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing + (Portland, Oregon, USA) (CSCW ’17). Association for Computing Machinery, New York, NY, USA, 111–125. + + +[93] Joseph Seering, Tony Wang, Jina Yoon, and Geoff Kaufman. 2019. Moderator engagement and community development in the age of algorithms. +New Media & Society + 21, 7 (July 2019), 1417–1443. https://doi.org/10.1177/1461444818821316 + + +[94] Joseph Seering, Tony Wang, Jina Yoon, and Geoff Kaufman. 2019. Moderator engagement and community development in the age of algorithms. +New Media & Society + 21, 7 (2019), 1417–1443. +[95] Giuseppe Silvestri, Jie Yang, Alessandro Bozzon, and Andrea Tagarelli. 2015. Linking Accounts across Social Networks: the Case of StackOverflow, Github and Twitter. In +KDWeb +. 41–52. + + +[96] C Estelle Smith, Bowen Yu, Anjali Srivastava, Aaron Halfaker, Loren Terveen, and Haiyi Zhu. 2020. Keeping Community in the Loop: Understanding Wikipedia Stakeholder Values for Machine Learning-Based Systems. In +Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems +. Association for Computing Machinery, New York, NY, USA, 1–14. + + +[97] Megan Squire and Rebecca Gazda. 2015. FLOSS as a Source for Profanity and Insults: Collecting the Data. In +2015 48th Hawaii International Conference on System Sciences +. 5290–5298. + + +[98] Miriah Steiger, Timir J Bharucha, Sukrit Venkatagiri, Martin J Riedl, and Matthew Lease. 2021. The Psychological Well-Being of Content Moderators: The Emotional Labor of Commercial Moderation and Avenues for Improving Support. In +Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems + (Yokohama, Japan) ( +CHI ’21, Article 341 +). Association for Computing Machinery, New York, NY, USA, 1–14. + + +[99] John Suler. 2004. The online disinhibition effect. +Cyberpsychol. Behav. + 7, 3 (June 2004), 321–326. + + +[100] Jason Swarts. 2019. Open-Source Software in the Sciences: The Challenge of User Support. +Journal of Business and Technical Communication + 33, 1 (Jan. 2019), 60–90. + + +[101] Damian A Tamburri, Patricia Lago, and Hans van Vliet. 2013. Organizational social structures for software engineering. +ACM Computing Surveys (CSUR) + 46, 1 (2013), 1–35. + + +[102] Xin Tan and Minghui Zhou. 2019. How to Communicate when Submitting Patches: An Empirical Study of the Linux Kernel. +Proc. ACM Hum.-Comput. Interact. + 3, CSCW (Nov. 2019), 1–26. + + +[103] Bianca Trinkenreich, Igor Wiese, Anita Sarma, Marco Gerosa, and Igor Steinmacher. 2022. Women’s participation in open source software: a survey of the literature. +ACM Transactions on Software Engineering and Methodology (TOSEM) + 31, 4 (2022), 1–37. + + +[104] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Let’s talk about it: evaluating contributions through discussion in GitHub. In +Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering + (Hong Kong, China) ( +FSE 2014 +). Association for Computing Machinery, New York, NY, USA, 144–154. + + +[105] Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark G J van den Brand, Alexander Serebrenik, Premkumar Devanbu, and Vladimir Filkov. 2015. Gender and Tenure Diversity in GitHub Teams. In +Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems + (Seoul, Republic of Korea) ( +CHI ’15 +). Association for Computing Machinery, New York, NY, USA, 3789–3798. + + +[106] Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y Zhao. 2014. Whispers in the dark: analysis of an anonymous social network. In +Proceedings of the 2014 Conference on Internet Measurement Conference + (Vancouver, BC, Canada) ( +IMC ’14 +). Association for Computing Machinery, New York, NY, USA, 137–150. + + +[107] Mairieli Wessel, Alexander Serebrenik, Igor Wiese, Igor Steinmacher, and Marco A Gerosa. 2020. What to Expect from Code Review Bots on GitHub? + + +[108] Mairieli Wessel, Igor Wiese, Igor Steinmacher, and Marco Aurelio Gerosa. 2021. Don’t Disturb Me: Challenges of Interacting with Software Bots on Open Source Software Projects. +Proc. ACM Hum.-Comput. Interact. + 5, CSCW2 (Oct. 2021), 1–21. + + +[109] Titus Wormer. 2015. alex: Catch insensitive, inconsiderate writing. https://alexjs.com/. Accessed: 2022-7-14. + + +[110] Yan Xia, Haiyi Zhu, Tun Lu, Peng Zhang, and Ning Gu. 2020. Exploring Antecedents and Consequences of Toxicity in Online Discussions: A Case Study on Reddit. +Proc. ACM Hum.-Comput. Interact. + 4, CSCW2 (Oct. 2020), 1–23. + + +[111] Gi Woong Yun, Sasha Allgayer, and Sung-Yeon Park. 2020. Mind Your Social Media Manners: Pseudonymity, Imaginary Audience, and Incivility on Facebook vs. YouTube. +Int. J. Commun. Syst. + 14, 0 (June 2020), 21. + + +Received January 2023; revised April 2023; accepted July 2023 +---------------------------------------- +------------------------------- +Section 290: +License Update and Migration Processes in Open Source Software Projects + + +Chris Jensen + +Institute for Software Research + +University of California, Irvine + +Irvine, CA USA 92697-3455 + +Phone: +1 (949) 824-0573 + +Email: cjensen@ics.uci.edu + + +Walt Scacchi + +Institute for Software Research + +University of California, Irvine + +Irvine, CA USA 92697-3455 + +Phone: +1 (949) 824-4130 + +Email: wscacchi@ics.uci.edu + + +Abstract + + +Open source software (OSS) has increasingly been the subject of research efforts. Central to this focus is the nature under which the software can be distributed, used, and modified and the causes and consequent effects on software development, usage, and distribution. At present, we have little understanding of, what happens when these licenses change, what motivates such changes, and how new licenses are created, updated, and deployed. Similarly, little attention has been paid to the agreements under which contributions are made to OSS projects and the impacts of changes to these agreements. We might also ask these same questions regarding the licenses governing how individuals and groups contribute to OSS projects. This paper focuses on addressing these questions with case studies of processes by which the Apache Software Foundation's creation and migration to Version 2.0 of the Apache Software License and the NetBeans project's migration to the Joint Licensing Agreement. + + +Keywords + + +Open source, license evolution, process, Apache, NetBeans + + +Introduction + + +Software process research has investigated many aspects of open source software (OSS) development in the last several years, including release processes, +communication and collaboration, community joining, and project governance. The central point of Lawrence Lessig's book “Code” is that the hardware and software that make up cyberspace also regulate cyberspace. He argues that code both enables and protects certain freedoms, but also serves as to control cyberspace. Software licenses codify these freedoms and regulations by setting forth the terms and conditions for software use, modification, and distribution of a system and any changes made to it. For that reason, others have suggested that licenses serve as contracts for collaboration. In the case of non-OSS licenses, that contract may indicate no collaboration, but rather strict separation between users and developers. OSS licenses, by contrast range widely in permissiveness, some granting more rights to the original authors and some granting more rights to consumers of OSS software. While research has examined OSS licenses to great detail, we are only beginning to understand license evolution. Just as OSS code is not static, neither are the licenses under which it is distributed. Research into license evolution is just beginning. However, when licenses change, so too the contracts for collaboration change. This paper seeks to provide an incremental step to understanding how changes in software licensing impact software development processes. + + +Why does understanding license update and migration matter? Companies using OSS software need to know how changes affect their use, modification, and distribution of a software system. License compatibility in OSS has long been a topic of debate. Research is only beginning to provide tools for assistance in resolving software license compatibility [1]. OSS project participants need to understand why changes are being made, whether the changes align with their values and business models (e.g., enabling new avenues of license compatibility offering strategic benefit or opening up new channels of competition). As a project sponsor or host, you may be concerned about how to best protect both the software system and your user community, but also your business model. You typically want a license that will attract a large number of developers to your project [2] while at the same time allowing you to make a profit and stay in business. + + +While licenses such as the GNU General Public License (GPL), the Berkeley Software Distribution (BSD) license, and the Apache License are well known, we rarely consider another type of license agreement critical to understanding collaboration in OSS projects: individual contributor license agreements (CLAs) and organizational contributor license agreements (OCLAs), for contributors from organized entities. In non-OSS software development, the contract for collaboration is typically an employment contract, often stating that all intellectual property rights pertaining to source code written by an employee are property of the employer. This provides the employer with complete control of the rights granted of licensed software. In OSS development, you have a situation where multiple developers are contributing to a software system. Without copyright assignment or a CLAs, changing a software license requires the consent of every contributor to that system. We observed this situation in the case of the Linux kernel, which suggested that +without a CLA, license evolution can become inhibited or prevented as the number of contributors, each with differing values and objectives, increases. To understand how changes in software licenses affect software development processes, we must also investigate changes in CLAs. + + +We address these issues with two case studies. The first examines the creation and deployment of the Apache Software License, Version 2.0. The second looks at an update to the contributor license agreement in the NetBeans project. + + +Background Work + + +Legal scholars, such as St. Laurent [3] and Larry Rosen [4], former general counsel and secretary for the Open Source Initiative (OSI), have written extensively on license selection. They note that quite often, the choice of license is somewhat outside the control of a particular developer. This is certainly the case for code that is inherited or dependent on code that is either reciprocally licensed, or at the very least, requires a certain license for the sake of compatibility. However, outside such cases, both St. Laurent and Rosen advocate for the use of existing and well-tested, well-understood licenses as opposed to the practice of creating new licenses. Such license proliferation is seen as a source of confusion among users and is often unnecessary given the extensive set of licenses that already exist for a diverse set of purposes. Lerner and Tirole [5] observe specific determinant factors in license selection. Of the 40,000 Sourceforge projects studied, projects geared towards end-users tended towards more restrictive license terms, while projects directed towards software developers tended towards less restrictive licenses. Highly restrictive licenses were also found more common in consumer software (e.g., games) but less common for software on consumer-oriented platforms (e.g., Microsoft Windows) as compared to non-consumer-oriented platforms. Meanwhile, Rosen specifically addresses the issue of relicensing, commenting that license changes made by fiat are likely to fracture the community. This case of relicensing is exactly the focus of our case studies here. + + +The drafting and release of the GNU General Public License, Version 3.0 was done in a public fashion, inviting many prominent members of the OSS community to participate in the process. In fact, we even see a sort of prescriptive process specification outlining, at a high level, how the new license was to be created. This license revision process is interesting from the perspective that the license in question is not used by one project or one foundation, but rather is an update of the most commonly used open source license in practice. As such the process of its update and impact of its revision on software development is both wide ranging and widely discussed. + + +Di Penta, et al. [6], examined changes to license headers in source code files in several major open source projects. Their three primary research questions sought to +understand how frequently licensing statements in source code files change, the extent of the changes, and how copyright years change in source code files. Their work shows that most of the changes observed to source code files are small, though even small changes could signify a migration to a different license. The authors also note that little research available speaks to license evolution, pointing to the need for greater understanding in this area. + + +Lindman, et al., [2] examine how companies perceive open source licenses and what major factors contribute to license choice in companies releasing open source software. The study reveals a tight connection between business model, patent potential, the motivation for community members to participate in development, control of project direction, company size, and network externalities (compatibility with other systems) and licensing choice. + + +Lindman, et al., provide a model of a software company, its developers, and users in the context of an OSS system developed and released from a corporate environment [2]. However, few systems are developed in complete isolation. + + +Figure 1. A model of software production and consumption with open source licensing +Rather, they leverage existing libraries, components, and other systems developed by third parties. Moreover, as Goldman and Gabriel point out, open source is more than just source code in a public place released under an OSS license [7]; communities matter. Figure 1 shows the production and consumption of open source software, highlighting the impact of software licenses and contributor license agreements. + + +Going a step further, Oreizy [8] describes a canonical high-level software customization process for systems and components, highlighting intra-organizational software development processes and resource flow between a system application developer, an add-on developer, a system integrator, and an end user. + + +Similarly, we have examined such concepts in the context of software ecosystems [9] in the context of process interaction. Software license change can precipitate integrative forms of process interaction in the case of dual and multi-licensing by enabling new opportunities for use of software systems upstream of a project to provide added functionality or support, as well as projects downstream vis a vis use as a library, plugin development, support tool development, and via customization and extension. In such cases, software source becomes a resource flowing between interacting projects. However, license change can also trigger interproject process conflict if new license terms render two systems incompatible. At that point, the resource flow between projects can be cut off, when downstream consumers of software source code no longer receive updates. A more common example with non-OSS is license expiration. License-based interproject process conflicts can also manifest as unmet dependencies in software builds or an inability to fix defects or add enhancements to software, resulting in process breakdown, and failing recover, project failure. OSS licenses, however, guarantee that even when conflict occurs, recovery is possible because the source is available and can be forked. + + +Methodology + + +The case studies in this report are part of an ongoing, multi-year research project discovering and modeling open source software processes. Our research methodology is ethnographically informed, applying a grounded theory to the analysis of artifacts found in OSS projects. The primary data sources in this study come from mailing list archives of the Apache and NetBeans projects. + + +Our primary data sources were mailing list messages. However, we also found supplementary documentation on each project's websites that served to inform our study. These supplementary documents were often, though not always referenced by the messages in the mailing list. Cases regarding the NetBeans project all took place between April and June of 2003, involving over 300 email messages, whereas the Apache cases were spread over several discrete time periods and consisted of more than 350 messages. +Case selection happened in two ways. For NetBeans, the cases arose during our study of requirements and release processes, having stood out as prominent issues facing the community during the time period studied. Although we observed additional incidents appropriate for discussion, the three cases selected fit together nicely as a cohesive story. This approach was also used in the study of the Apache project. However, due to a lower incident frequency, we expanded our study over a longer time period to find incidents that proved substantial. As a testament to the nature of project interaction, issues raised in mailing list discussions proved to be short-lived, either because they were resolved quickly or because the conversation simply ceased. It is possible to suggest this is the normal behavior pattern for both projects. A few issues proved outliers, having more focused discussions, and these were selected for further study. We also observed a tendency for discussions to play out in a series of short-lived discussions sessions. A topic would be raised, receiving little or no attention. Then, at a later time, it would be raised again. The JCA discussion in NetBeans and Subversion migration discussion in the Apache project demonstrated such conversational resurgence. We observed, in general, that discussion topics carry certain conversational momentum. Topics with a high degree of momentum tended to have lengthier discussion periods or frequent discussion sessions until fully resolved or abandoned while topics with a low degree of momentum were addressed quickly or simply died off. The causes and factors affecting changes in momentum were not investigated as they lay too far afield from the focus of this study. We do note that although consensus by attrition has been cited in other communities (e.g., [10 and 11]), we did not observe it in effect in any of the cases studied, but rather that the primary participants in discussions remained active in their respective projects for several months following the reported incidents. The creation of the Apache License, version 2.0 was directed to us by a colleague familiar with the project. Data for the Apache licensing case was gathered from email messages sent to a mailing list established for the purpose of discussing the proposed changes. + + +Considering the difficulties we experienced with building our own search engine to support process discovery, we still faced the challenge of keeping track of process data once we found it as we were building our models. Up until this point, our approach to providing process traceability was simply to include links to project artifacts in our models. However, this strategy did not help us build the models, themselves. We returned the search problem back to the projects, themselves using their own search engines to locate process data, looking for more lightweight support for discovery. + + +Our current strategy for providing computer support for process discovery returns to using each project's own search engine to locate process information. We have operationalized the reference model as an OWL ontology with the Protégé ontology editor [12], using only the OWL class and individual constructs to store process concepts and their associated search queries respectively. Secondly, we built a Firefox plugin, Ontology [13], to display the reference model ontology in the Firefox web browser. Next, we enlisted the Zotero citation database Firefox plugin +to store process evidence elicited from project data, integrating the two plugins such that each datum added to the citation database from a project artifact is automatically tagged with the selected reference model entities. + + +The use of a citation database as a research data repository may seem unintuitive. Zotero, however, has proven well suited for our needs. Like many Firefox plugins, Zotero can create records simply from highlighted sections of a web document, though the creation of arbitrary entries (not gleaned from document text selections) is also possible. It can also save a snapshot of the entire document for later review, which is useful given the high frequency of changes of some web documents—changes that evidence steps in a software processes. The tag, note, and date fields for each entry are useful for recording reference model associations and memos about the entry for use in constructing process steps and ascertaining their order. A screenshot of Zotero with Ontology appears in Figure 2. + + +The plugin integration greatly facilitates the coding of process evidence and provides traceability from raw research data to analyzed process models. As the tool set is browser-based, it is not limited to analysis of a particular data set, whether local or remote. Moreover, the tool set does not limit users to a single ontology or Zotero database, thereby allowing users to construct research models using multiple ontologies describing other (e.g. non-OSS process) phenomenon and reuse the tool set for analysis of additional data sets. Thus, it may be easily appropriated for grounded theory research in other fields of study. +The elicitation of process evidence is still search driven. Rather than use one highly customized search engine for all examined data repositories, the search task has been shifted back to the organizations of study. This decision has several implications in comparison with the previous approach, both positive and negative. Using an organization's own search engine limits our ability to extract document-type specific metadata, however among the organizations we have studied, their search tools provide greater coverage of document and artifact types than Lucene handled at that time. Furthermore, this approach does not suffer the data set limitations imposed by web crawler exclusion rules. The ability to query the data set in a scripted fashion has been lost, yet some scientists would see this as a gain. The use of computer-assisted qualitative data analysis software (CAQDAS) historically has put into question the validity of both the research method and results [15,16]. + + +This tool was still quite unfinished as we began governance process discovery and modeling. As we added functionality, we had to return to some of our data sources and recapture it. Although we have high hope to use the integrated timeline feature to assist in process activity composition and sequencing, the time and date support within Zotero's native date format was insufficiently precise. With provisions only for year, month, and day, there is no ability to capture action sequences that happen on the same day. After adding support for greater date and time, we found having to enter the date and time for every piece of data we captured rather tedious. Eventually we have had to prioritize completion of discovery and modeling ahead of computer-support for process discovery, and we had to disable the time and date entry. Unable to utilize Zotero to our intended effect in discovery and modeling, our efforts with Zotero remain in progress, pending usability improvements. +Creation and Migration to the Apache License, Version 2.0 + + +The Apache Software Foundation created a new version of their license in the end of 2003 and beginning of 2004. Roy Fielding, then director of the ASF, announced the license proposal on 8 November 2003 [17], inviting review and discussion on a mailing list set up specifically for said purpose. Per Roy's message, the motivations for the proposed license included + + + + +Reducing the number of frequently asked questions about the Apache License. + + +Allowing the license to be usable by any (including non-Apache) projects + + +Requiring a patent license on contributions that necessarily infringe the contributor's own patents + + +Moving the full text of the license and specific conditions outside the source code + + + + +Roy further indicated a desire to have a license compatible with other OSS licenses, notably the GPL. +As you can see from Figure 3, most of the discussion took place in mid November of 2003. In fact, given that the ApacheCon conference that ran from 16-19 November, we can see a high message density in the days leading up to ApacheCon, with a steady rate continuing on for a few days afterward. Beyond this, the frequency becomes sparse. An update to the proposed license was announced on 24 December 2003, after some internal review, a part of the process that is not publicly visible. This update prompted a brief discussion. A second active time period is observable in January 2004, when Fielding announces a final update (20 January 2004) and that the final version of the license has been approved by the board [18 and 19] (21 January 2004). + + +The primary discussion point of the creation and migration to the 2.0 version of the Apache License centered around a patent clause in the proposed license. According to Brian Behlendorf, who was serving on the ASF board of directors at the time, the ASF’s patent-related goals were to “prevent a company from sneaking code into the codebase covered by their own patent and then seeking royalties from either the ASF or end-users” [20]. The clause in question read: + + + + +Reciprocity. If You institute patent litigation against a Contributor with respect to a patent applicable to software (including a cross-claim or counterclaim in a lawsuit), then any patent licenses granted by that Contributor to You under this License shall terminate as of the date such litigation is filed. In addition, if You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work itself (excluding combinations of the Work with other software or hardware) infringes Your patent(s), then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. [21] + + + + +Consequences of this clause sparked discussion in a few areas, mainly surrounding the first sentence of the clause regarding license termination. Legal representatives from industry stated objections to losing usage rights for patent litigation regarding any software, even software unrelated to that covered by the license [22], proposing alternative wordings to achieve the stated license goals but restricting the trigger to litigation pertaining to patents covered by the ASF licensed code [23]. Uncertainty regarding the roles of people in the license revision process [24] and proposed changes [25] created additional confusion regarding the patent reciprocity stance. + + +Eben Moglen, General Counsel for the Free Software Foundation (FSF), adds that the first sentence of the license clause carries great risk for unintended and serious consequences, and is an inappropriate vehicle for protecting free software against patent litigation [26]. As such, the FSF has deemed the clause causes the license to be incompatible with version 2 of the GPL, failing one of the goals of the proposed Apache License. +Brian Carlson reports that the Debian community's consensus is that the proposed license does not meet the criteria for +Free Software Licenses + under the Debian Free Software Guidelines [27]. Consequently, code licensed as such would sandboxed into the non-free archive, and therefore, not automatically built for Debian distributions, nor receive quality assurance attention. Again, the license termination aspect of the reciprocity clause is cited as the critical sticking point [28], with several members of the Debian community arguing that free software licenses should only restrict modification and distribution, but not usage of free software. + + +The patent reciprocity clause was not entirely rejected. There was support for extending it to provide mutual defense against patent litigation attacks against all open source software [29]. The idea was quickly nixed on the grounds that it could lead to users being attacked and unable to defend themselves if someone were to maliciously violate a user's patent on an unrelated piece of software and create an open source version. In such a scenario, the user would have to choose between using Apache licensed software and losing all their patents [30]. + + +On 18 November, Fielding indicates that there have been “several iterations on the patent sentences, mostly to deal with derivative work” [24], mentioning he will probably include the suggested changes in the patent language recommended by one of the legal representatives from industry. Fielding notes that he has been in contact with representatives from other organizations, among them Apple, Sun, the OSI, Mozilla, and a few independent attorneys, although the details of these portions of the process remain hidden. + + +The next milestone in the process occurs on 24 December, when Fielding mentions that a second draft, version 1.23, has been prepared after internal review due to extensive changes [31], and has been posted to the proposed licenses website [32] and the mailing list. The new proposed license [33] incorporates many of the proposed changes, including the removal of the contested first sentence of the patent reciprocity clause, leaving the generally agreed upon patent termination condition: + + + + +If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. + + + + +The 1.23 version of the license received little feedback on the license discussion mailing list. Aside from definition clarifications, there was an inquiry about GPL compatibility. Behlendorf commented that Moglen's suggestions had been incorporated to address the two issues with GPL compliance, but he had been contacted earlier in the week to take a look at the current draft [34]. As a result, Behlendorf (on 7 January 2004) offers that the issues presented have been addressed to his satisfaction and is willing to propose the license to the board at the January 2004 meeting [35]. However, before the board meeting, Fielding announces a version 1.24, featuring a change to the definition of “Contributor” [36] and a 1.25 +version very shortly thereafter to address the way “Copyright” is represented due to various laws and the use of “(C)” to indicate copyright [37]. Finally, the Apache License, Version 2.0 was approved by the ASF board by a unanimous vote on 20 January 2004 [18] and announced to the mailing list by Fielding the following day [19]. Per the board meeting minutes: + + +WHEREAS, the foundation membership has expressed a strong desire for an update to the license under which Apache software is released, + + +WHEREAS, proposed text for the new license has been reworked and refined for many, many months, based on feedback from the membership and other parties outside the ASF, + + +NOW, THEREFORE, BE IT RESOLVED, that the proposed license found at http://www.apache.org/licenses/proposed/LICENSE-2.0.txt is officially named the Apache Software License 2.0. To grant a sufficient transition time, this license is to be used for all software releases from the Foundation after the date of March 1st, 2004. + + +The conversation continued on, briefly, to address two points. Firstly, a return to the GPL compatibility discussion. Don Armstrong requested verification as to whether Moglen/the FSF has identified the license as GPL compatible (Fielding's announcement claimed it was) [38]. Fielding responds, saying Moglen sent a private communication commenting on the license compatibility, and furthermore, that it was the belief of the ASF that “a derivative work consisting of both Apache Licensed code and GPL code can be distributed under the GPL,” and, as such, there wasn't anything further to consider, as far as the ASF was concerned [39]. Incidentally, the FSF standing is that due to the patent issue, the Apache license 2.0 is GPL3 compatible but not GPL2 compatible [40]. Secondly, Vincent Massol requested information about moving his Apache sub-project to the ASL2 license and what file license headers should be used [41], to which Behlendorf responds [42]. A flow graph of the License creation and migration process appears in Figure 4. +Introduction of the Joint License Agreement + + +Rosen [4] suggests that copyright assignment is sought for two purposes: + + + + + + +So the project can defend itself in court without the participation and approval of its contributors. + + + + + + +To give the project (and not the contributor) the right to make licensing decisions, such as relicensing, about the software. + + + + + + +The NetBeans case is interesting because it is not simple copyright assignment, but rather affords both the contributor and the project (Sun Microsystems, specifically) equal and independent copyright to contributed source. + + +The Joint License Agreement (JLA) was introduced to the NetBeans project on 28 April 2003 by Evan Adams, a prominent project participant working for Sun Microsystems [43]. Adams states that the JLA was being introduced in response to +observations by Sun's legal team of Mozilla and other open source projects and believed that Sun required full copyright authority to protect the NetBeans project from legal threats and provide Sun with the flexibility to adapt the NetBeans license over time. Under the proposed agreement, contributors (original authors) would retain all copyrights independently for project contributions and any previous contributions whose authors did not agree to the terms of the JCA would have to be removed from the source tree. The discussion spanned ninety messages from seventeen individuals over nearly two months, with a follow-up discussion consisting of forty six messages from fourteen individuals (eleven of whom participated in the earlier discussion) over a third month. The discussion, which began at the end of April 2003 continued through July (with a few sporadic messages extending out to September), long after the deadline for requiring JLA for project contributions. +The process for the license format change seems simple. The particulars of the proposed license received early focus in the discussion. As the discussion progressed, concern shifted away from details of the license agreement to the way in which the change was proposed. In the course of discussion, it was revealed that switching to the JLA was an idea proposed by the Sun legal counsel and the decision to adopt it was done internally, unilaterally, and irrevocably by Sun without the involvement of the project, at large. The adoption decision raised questions regarding the decision rights and transparency within the project. + + +While recognizing that Sun-employed contributors were responsible for a majority of project effort, non-Sun contributors took the lack of transparency and consideration in the decision making process as disenfranchisement. In a follow-up discussion, project members further expressed fears that giving Sun full copyright of contributed code could lead to reclassification of volunteer-contributed code in +objectionable ways. More significantly, they feared the change could impact copyright of projects built upon the NetBeans codebase, but not contributed back to NetBeans source repository. + + +In time, most of the “corner case” concerns about the license agreement were addressed. However, ultimately non-Sun employed contributors were still in the position of having to trust Sun to act in an acceptable manner with a grant of full copyright. Moreover, the discussion drew out larger concerns regarding Sun's role position of leadership and control of the project, and regarding transparency in decision making. A flow graph of the JCA introduction process appears in Figure 5. + + +Discussion and Conclusions + + +The two cases presented are not directly comparable. The Apache study looks at the process of creating a new license, to be used by all projects under the domain of the Apache Software Foundation. The NetBeans study focuses on the adoption of a new license agreement for contributors to the NetBeans IDE and platform. Software source licenses govern the rights and responsibilities of software consumers to (among other things) use, modify, and distribute software. Contributor license agreements (CLAs), on the other hand, govern the rights and responsibilities to (among other things) use, modify, and distribute contributions of the organization to which the contributions are submitted, and those retained by the contributor. The new CLA stated that copyright of project contributions would be jointly owned by the originating contributors, as well as the project's benefactor, Sun Microsystems. Code contribution agreements may not be of interest to end users of software executables. However, the OSS movement is known for its tendency towards user-contributors; that is, users who contribute to the development of the software and developers who use their own software. + + +If we consider, specifically, the license changes in the Apache and NetBeans projects, both were introduced as inevitable changes by persons of authority in each project (founder Roy Fielding of Apache and Evan Adams of Sun Microsystems for NetBeans). The initiators of the discussion both presented the rationale for making the changes. For Apache, the move was motivated by a desire to increase compatibility with other licenses, reduce the number of questions about the Apache license, moving the text outside the source code, and require patent license on contributions where necessary. For NetBeans, the motivations were to protect the project from legal threats and provide Sun the ability to change the license in the future. In the Apache case, the motivations for making the changes went unquestioned. The discussion focused on what objectives to achieve with the change and how best to achieve them. The former had to do with a (minority) subset of participants who saw the license change as an opportunity to affect software development culture, altering the direction of the software ecosystem as a means of governance on a macro level. The latter had to do with making sure the verbiage of +the license achieved the intended objectives of the license without unintended consequences (such as those whose nature was of the former). In the NetBeans case, the discussion focused on the differences between the licenses and their affect on non-sponsoring-organization participants (meso-level project governance) of the license. Given the context of the surrounded cases, the structural and procedural governance of the project was also questioned. + + +The area of the NetBeans license change that received the greatest push-back was granting the sponsoring organization the right to change the license unilaterally at any point in the future. This right was similarly granted to the ASF in the Apache contributor license agreement (CLA) [44], a point that was not lost on participants in the NetBeans license change discussions [45]. Why did this issue receive push-back in NetBeans and not Apache? West and O'Mahony [46] suggest caution that, unlike community-initiated projects, sponsored OSS projects must achieve a balance between establishing pre-emptive governance design (as we saw here) and establishing boundaries between commercial and community ownership and control. The surrounding cases served to create an atmosphere of distrust within the project. The distrust led to fears that contributions from the sponsoring organization would become closed off from the community, perhaps saved for the organization's commercial version of the product, leaving the sponsoring organization as free-riders [47 and 48] profiting off of the efforts of others without giving back [49] or otherwise limit what project participants can do with project code. + + +Perhaps the most striking difference in the way the two license changes were introduced is that the Apache case invited project participants (as well as the software ecosystem and the public, at large) to be a part of the change, whereas the NetBeans case did not. Participants in the NetBeans project were left without a sense of transparency in the decision-making process in that the change was put on them without any warning before the decision was made. Moreover, they were left without representation in the decision-making process in that they did not participate in determining the outcome of a decision that had a large impact on them. This is not to say that the Apache case was entirely transparent. There are clear indications from the messages on the list that conversations were held off-list. Likewise, there were misconceptions over what roles participants played and participant affiliation. However, the process was not questioned, nor the result. + + +In conclusion, we have taken a first step to understanding how license change processes impact software development processes by discovering and modeling the update process for the Apache License and the update to the contributor license agreement in the NetBeans project. We observed how differences in the processes in introducing change intent influenced response to the changes. To put these cases into context, NetBeans underwent two license changes since the events described above, neither of which received significant push-back from the community. The first shifted the license to the CDDL. The second was a move to dual license NetBeans under the GPLv2. This second licensing shift was considered by Sun “at the request from the community” [50]. Unlike the introduction of the JCA, the GPL +shift was presented to the community by Sun for feedback (in August 2007) as an added option (rather than a complete relicensing) before the change was made. Thus, we can clearly see further change in the processes used to govern the community in a way that directly addressed the defects in the project's governance processes circa 2003. Shah [51] echoes these concerns, observing that code ownership by firms creates the possibility that non-firm-employed contributors will be denied future access to project code. In other projects, these threats can lead to forking of the source, as happened when the MySQL corporation was purchased by Sun Microsystems, which, in turn, has recently been acquired by Oracle. + + +Acknowledgements + + +The research described in this report is supported by grants from the Center for Edge Power at the Naval Postgraduate School, and the National Science Foundation, #0534771 and #0808783. No endorsement implied. + + +References + + +[1] Scacchi, W.; Alspaugh, T.; and Asuncion, H. The Role of Software Licenses in Open Architecture Ecosystems, Intern. Workshop on Software Ecosystems, Intern. Conf. Software Reuse, Falls Church, VA, September 2009. + + +[2] Lindman, J.; Paajanen, A.; and Rossi, M. Choosing an Open Source Software License in Commercial Context: A Managerial Perspective, Software Engineering and Advanced Applications, Euromicro Conference, pp. 237-244, 2010 36th EUROMICRO Conference on Software Engineering and Advanced Applications, 2010. + + +[3] St. Laurent, A. M. 2004. Understanding Open Source and Free Software Licensing. O'Reilly Media, Inc., Sebastopol, CA. + + +[4] Rosen, L. 2005. Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall. + + +[5] Lerner, J. and Tirole, J. 2005. The Scope of Open Source Licensing The Journal of Law, Economics, & Organization, 21(1): 20-56 + + +[6] Di Penta, M.; German, D.; Guéhéneuc, Y.; and Antoniol, G. 2010. An exploratory study of the evolution of software licensing. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 145-154. + + +[7] Goldman, R. and Gabriel, R. 2004. Innovation Happens Elsewhere: How and Why a Company should Participate in Open Source. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. + + +[8] Oreizy, P. Open Architecture Software: A Flexible Approach to Decentralized Software Evolution. Ph.D. in Information and Computer Sciences, Irvine, CA, University of California, Irvine, 2000. + + +[9] Jensen, C. and Scacchi, W. 2005. Process Modeling Across the Web Information Infrastructure. Software Process: Improvement and Practice 10(3):255-272. +[10] Hedhman, N. Mailing list message dated 16 Dec 2004 07:18:55 -0000 “Re: [ANN] Avalon Closed,” available online at http://www.mail-archive.com/community@apache.org/msg03889.html, last accessed 15 September 2009 + + +[11] Dailey, D. Mailing list message dated Wed, 02 May 2007 10:38:26 -0400 “Re: Support Existing Content / consensus through attrition?” available online at http://lists.w3.org/Archives/Public/public-html/2007May/0214.html, last accessed 15 September 2009 + + +[12] The Protégé Ontology Editor Project, available online at http://protege.stanford.edu/ [last accessed 23 June, 2008] + + +[13] The Firefox Ontology Plugin project available online at http://rotterdam.ics.uci.edu/development/padme/browser/ontology [last accessed 23 June, 2008] + + +[14] The Zotero Project, available online at http://www.zotero.org/ [last accessed 23 June, 2008] + + +[15] Bringer, J. D.; Johnston, L. H. and Brackenridge, C. H. Using Computer-Assisted Qualitative Data Analysis Software to Develop a Grounded Theory Project Field Methods, 2006, 18(3): 245-266 + + +[16] Kelle, U. Theory Building in Qualitative Research and Computer Programs for the Management of Textual Data Sociological Research Online, 1997, 2(2) available online at http://www.socresonline.org.uk/socresonline/2/2/1.html [last accessed 23 June 2008] + + +[17] Fielding, R. Message dated Sat, 08 Nov 2003 02:39:09 GMT “Review of proposed Apache License, version 2.0,” available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/%3cBAAB287A-1194-11D8-842D-000393753936@apache.org%3e, last accessed 14 August 2009 + + +[18] Board meeting minutes of The Apache Software Foundation, January 2004, available online at http://apache.org/foundation/records/minutes/2004/board_minutes_2004_01_21.txt, last accessed 13 August 2009. + + +[19] Fielding, R. Mailing list message dated Sat, 24 Jan 2004 01:34:36 GMT “Apache License, Version 2.0 ,” available online at http://mailarchives.apache.org/mod_mbox/archive-license/200401.mbox/%3C781EEF08-4E0D-11D8-915D-000393753936@apache.org%3E, last accessed 13 August 2009 + + +[20] Behlendorf, B. Mailing list message dated Sat, 22 Nov 2003 07:31:40 GMT “RE: termination with unrelated trigger considered harmful,” available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/%3C20031121232552.X38821@fez.hyperreal.org%3E, last accessed 13 August, 2009 + + +[21] Carlson, B. M. Mailing list message dated Sat, 8 Nov 2003 10:03:55 +0000 “Re: [fielding@apache.org: Review of proposed Apache License, version 2.0],” available online at http://lists.debian.org/debian-legal/2003/11/msg00053.html, last accessed 12 August 2009 + + +[22] Peterson, S.K. Mailing list message dated Fri, 14 Nov 2003 14:52:54 GMT “termination with unrelated trigger considered harmful,” available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/%3C6D6463F31027B14FB3B1FB094F2C744704A11176@tayexc17.americas.cpqcorp.net%3E, last accessed 13 August 2009 + + +[23] Machovec, J. Mailing list message dated Fri, 14 Nov 2003 16:49:09 GMT “Re: termination with unrelated trigger considered harmful,” available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/ +[24] Fielding, R. Mailing list message dated Tue, 18 Nov 2003 02:10:27 GMT “Re: [fielding@apache.org: Review of proposed Apache License, version 2.0],” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c60AEF3C1-196C-11D8-A8F4-000393753936@apache.org%3e, last accessed 13 August 2009 + + +[25] Engelfriet, A. Mailing list message dated Mon, 17 Nov 2003 20:59:53 GMT “Re: [fielding@apache.org: Review of proposed Apache License, version 2.0],” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031117205953.GA95846@stack.nl%3e, last accessed 13 August 2009 + + +[26] Moglen, E. Mailing list message dated Fri, 14 Nov 2003 21:28:32 GMT “FSF Comments on ASL 2.0 draft,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c16309.18688.540989.283163@new.law.columbia.edu%3e, last accessed 13 August 2009 + + +[27] Carlson, B. M. Mailing list message dated Thu, 13 Nov 2003 05:39:49 GMT “DFSG-freeness of Apache Software Licenses,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031113053949.GD23250@stonewall%3e, last accessed 13 August 2009 + + +[28] Armstrong, D. Mailing list message dated Fri, 14 Nov 2003 04:39:50 GMT “Re: DFSG-freeness of Apache Software Licenses,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031114043950.GM2707@donarmstrong.com%3e, last accessed 13 August 2009 + + +[29] Johnson, P. Mailing list message dated Wed, 12 Nov 2003 02:09:14 GMT “Mutual defence patent clause,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c003d01c3a8c1$f9b55170$c6ba400c@protocol.com%3e, last accessed 12 August 2009 + + +[30] Behlendorf, B. Mailing list message dated Wed, 12 Nov 2003 21:09:32 GMT “Re: Mutual defence patent clause,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031112130508.H497@fez.hyperreal.org%3e, last accessed 13 August 2009 + + +[31] Fielding, R. Mailing list message dated 12/24/2003 04:16 AM “Re: Review of proposed Apache License, version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200312.mbox/%3c464B4006-3604-11D8-9A9F-000393753936@apache.org%3e, last accessed 12 August 2009 + + +[32] Apache License Proposal Website, available online at http://www.apache.org/licenses/proposed/, last accessed 13 August 2009 + + +[33] Apache License, Version 1.23, available online at http://mail-archives.apache.org/mod_mbox/archive-license/200312.mbox, last accessed 13 August 2009 + + +[34] Behlendorf, B. Mailing list message dated Fri, 09 Jan 2004 22:42:52 GMT “Re: Review of proposed Apache License, version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3c20040109143803.G31301@fez.hyperreal.org%3e, last accessed 13 August 2009 + + +[35] Behlendorf, B. Mailing list message dated Wed, 07 Jan 2004 22:16:36 GMT “Re: Review of proposed Apache License, version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3c20040107140658.A23429@fez.hyperreal.org%3e, last accessed 13 August 2009 +[36] Fielding, R. Mailing list message dated Wed, 14 Jan 2004 20:25:50 GMT “Re: Review of proposed Apache License, version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3cD81EA136-46CF-11D8-B08A-000393753936@apache.org%3e, last accessed 13 August 2009 + + +[37] Fielding, R. Mailing list message dated Wed, 14 Jan 2004 20:54:26 GMT “Re: Review of proposed Apache License, version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3cD6DB9454-46D3-11D8-B08A-000393753936@apache.org%3e, last accessed 13 August 2009 + + +[38] Armstrong, D. Mailing list message dated Sat, 24 Jan 2004 02:13:50 GMT “Re: Apache License, Version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C20040124021350.GG3060@archimedes.ucr.edu%3E, last accessed 13 August 2009 + + +[39] Fielding, R. Mailing list message dated Sat, 24 Jan 2004 02:29:29 GMT “Re: Apache License, Version 2.0,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C23385101-4E15-11D8-915D-000393753936@apache.org%3E, last accessed 13 August 2009 + + +[40] Free Software Foundation Licenses webpage, available online at http://www.fsf.org/licensing/licenses/index_html#GPLCompatibleLicenses, last accessed 14 August 2009 + + +[41] Massol, V. Mailing list message dated Sun, 25 Jan 2004 16:01:19 GMT “How to use the 2.0 license?,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C012f01c3e35c$78e229d0$2502a8c0@vma%3E, last accessed 13 August 2009 + + +[42] Behlendorf, B. Mailing list message dated Sun, 25 Jan 2004 20:17:06 GMT “Re: How to use the 2.0 license?,” available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C20040125121456.H396@fez.hyperreal.org%3E, last accessed 13 August 2009 + + +[43] Adams, E. NBDiscuss mailing list message: “Joint Copyright Assignment,” available online at http://www.netbeans.org/servlets/ReadMsg?list=nbdiscuss&msgNo=2228, last accessed 6 August, 2009 + + +[44] The Apache Software Foundation Individual Contributor License Agreement, Version 2.0 available online at http://www.apache.org/licenses/icla.txt, last accessed 20 October 2009. + + +[45] Brabant, V. mailing list message dated Tue, 15 Jul 2003 18:52:36 +0200 “[nbdiscuss] Re: licenses and trees,” available online at http://www.netbeans.org/servlets/ReadMsg?listName=nbdiscuss&msgNo=2547, last accessed 20 October 2009. + + +[46] West, J. and O'Mahony, S. 2005. Contrasting Community Building in Sponsored and Community Founded Open Source Projects. In Proceedings of the Proceedings of the 38th Annual Hawaii international Conference on System Sciences - Volume 07 (January 03 - 06, 2005). HICSS. IEEE Computer Society, Washington, DC, 196.3. + + +[47] Lerner, J. and J. Tirole. 2000. The simple economics of open source. NBER Working paper series, WP 7600, Harvard University, Cambridge, MA. + + +[48] von Hippel, E. and von Krogh, G. 2003. Open source software and the 'private-collective' innovation model: Issues for organizational science. Organization Science, 14(2):209-223. + + +[49] Hedhman, N. mailing list message dated Sun, 29 Jun 2003 13:31:48 +0800 “[nbdiscuss] Re: licenses and trees (was: Anti-Sun Animosity),” available online at http://www.netbeans.org/servlets/ReadMsg?listName=nbdiscuss&msgNo=2578, last accessed 21 October 2009. +[50] NBDiscuss mailing list message. Available online at http://www.netbeans.org/servlets/ReadMsg?list=nbdiscuss&msgNo=3784 last accessed 28 February 2009 + + +[51] Shah, S.K. 2006. Motivation, governance and the viability of hybrid forms in open source software development, Management Science, 52(7), 1000-1014. +---------------------------------------- +------------------------------- +Section 291: +Software Reuse in Open Source: A Case Study + + +Andrea Capiluppi, Brunel University, UK +Klaas-Jan Stol, Lero (The Irish Software Engineering Research Centre), University of Limerick, Ireland +Cornelia Boldyreff, University of East London, UK + + +ABSTRACT + + +A promising way to support software reuse is based on Component-Based Software Development (CBSD). Open Source Software (OSS) products are increasingly available that can be freely used in product development. However, OSS communities still face several challenges before taking full advantage of the “reuse mechanism”: many OSS projects duplicate effort, for instance when many projects implement a similar system in the same application domain and in the same topic. One successful counter-example is the FFmpeg multimedia project; several of its components are widely and consistently reused in other OSS projects. Documented is the evolutionary history of the various libraries of components within the FFmpeg project, which presently are reused in more than 140 OSS projects. Most use them as black-box components; although a number of OSS projects keep a localized copy in their repositories, eventually modifying them as needed (white-box reuse). In both cases, the authors argue that FFmpeg is a successful project that provides an excellent exemplar of a reusable library of OSS components. + + +Keywords: Case Study, Component-Based Software Development, Empirical Study, Open Source Software, Quantitative Study, Software Evolution, Software Reuse + + +INTRODUCTION + + +Reuse of software components is one of the most promising practices of software engineering (Basili & Rombach, 1991). Enhanced productivity (as less code needs to be written), increased quality (since assets proven in one project can be carried through to the next) and improved business performance (lower costs, shorter time-to-market) are often pinpointed as the main benefits of developing software from a stock of reusable components (Sametinger, 1997; Sommerville, 2004). + + +Although much research has focused on the reuse of Off-The-Shelf (OTS) components, both Commercial OTS (COTS) and Open Source Software (OSS), in corporate software production (Li et al., 2009; Torchiano & Morisio, 2004), the reusability of OSS projects in other OSS projects has only recently started to draw the attention of researchers and developers in OSS communities (Lang et al., 2005; Mockus, 2007; Capiluppi & Boldyreff, 2008). A vast amount of code is created daily, modified and stored in OSS repositories, and the inherent +philosophy around OSS is indeed promoting reuse. Yet, software reuse in OSS projects is hindered by various factors, psychological and technical. For instance, the project to be reused could be written in a programming language that the hosting project dislikes or is incompatible with; the hosting project might not agree with the design decisions made by the project to be reused; finally, individuals in the hosting project may dislike individuals involved in the project to be reused (Senyard & Michlmayr, 2004). A search for the “email client” topic in the SourceForge repository (http://www.sourcforge.net) produces 128 different projects (SourceForge, 2011): this may suggest that similar features in the same domain are implemented by different projects, and that code and features duplication play a significant role in the production of OSS code. + + +The interest of practitioners and researchers in the topic of software reuse has focused on two predominant questions: (1) from the perspective of OSS integrators (Hauge et al., 2007), how to select an OSS component to be reused in another (potentially commercial) software system, and (2) from the perspective of end-users, how to provide a level of objective “trust” in available OSS components. This interest is based on a sound reasoning; given the increasing amount of source code and documentation created and modified daily, it starts to be a (commercially) viable solution to browse for components in existing code and select existing, working resources to reuse as building blocks of new software systems, rather than building them from scratch. + + +Among the reported cases of successful reuse within OSS systems, components with clearly defined requirements, and hardly affecting the overall design (i.e., the “S” and “P” types of systems following the original S-P-E classification by Lehman (1980)) have often proven to be the typically reused resources by OSS projects. Reported examples include the “internationalization” (often referred to as I18N) component (which produces different output text depending on the language of the system), or the “install” module for Perl subsystems (involved in compiling the code, test and install it in the appropriate locations) (Mockus, 2007). To our best knowledge, there is no academic literature about the successful reuse of OSS, and an understanding of internal characteristics of what makes a component reusable in the OSS context is lacking. + + +The main focus of this paper is to report on the FFmpeg project (http://ffmpeg.org/), and its build-level components, and to show how some of these components are currently reused in other projects. This project is a cornerstone in the multimedia domain; several dozens of OSS projects reuse parts of FFmpeg, one of the most widely reused being the libavcodec component. In the domain of OSS multimedia applications, libavcodec is the most widely adopted and reused audio/video codec (coding and decoding) resource. Its reuse by other OSS projects is so widespread since it represents a crosscutting resource for a wide range of systems, from single-user video and audio players to converters and multimedia frameworks. As such, FFmpeg represents a unique case (Yin, 2003, p.40), which is why we selected the project for this study. + + +In particular, the study is an attempt to evaluate whether the reusability principle of “high cohesion and loose coupling” (Fenton, 1991; Macro & Buxton, 1987; Troy & Zweben, 1981) has an impact on the evolutionary history of the FFmpeg components. + + +This paper makes two contributions: + + + + + + +It studies how the size of FFmpeg components evolve: the empirical findings show that the libavcodec component (contained in FFmpeg) is an “evolving and reusable” component (an “E-type” of system) (Lehman, 1980), and as such it poses several interesting challenges when other projects integrate it; and + + + + + + +It studies how the architecture of FFmpeg components evolve, and how these components evolve when separated from FFmpeg: the empirical findings show two emerging scenarios in the reuse of this resource. On the one hand, the majority of projects that +reuse the FFmpeg components do so with a “black-box” strategy (Szyperski, 2002), as such incurring synchronization issues due to the independent co-evolution of the project and the component. On the other hand, a number of OSS projects apply a “white-box” reuse strategy, by maintaining a private copy of the FFmpeg components. The latter scenario is further empirically analyzed in order to obtain a better understanding of how the component is not only reused, but also integrated into a host system. + + + + + + +The remainder of this paper is structured following the guidelines for reporting case study research proposed by Runeson and Höst (2009). The next section provides relevant background information and an overview of related work on software components and OSS systems. This is followed by a presentation of the research design of our study. After this, the results of the empirical study are presented. Followed by threats to validity of this study. The last section concludes with the key findings and provides directions for future work. + + +BACKGROUND AND RELATED WORK + + +This section presents background and related work that is relevant for the remainder of the paper. The first subsection briefly discusses research on OSS reuse. This is followed by a discussion of Component-Based Software Development (CBSD) and the terminology used in this paper. This is followed by a brief overview of a useful and relevant categorization of components. Since this work considers the evolution of software components, a brief summary of Lehman’s classification of software programs is provided. This section concludes with a brief discussion of related work regarding software decay and architectural recovery. + + +Component-Based Software Development and Terminology + + +As mentioned, Component-Based Software Development (CBSD) has been proposed as a promising approach to large-scale software reuse. It is important, however, first to define clearly what is meant by the term “component.” The word “component” is often used in the context of CBSD as a reusable piece of software, either Commercial Off-The-Shelf (COTS) or Open Source. For instance, Torchiano and Morisio (2004) have derived the following definition: “A COTS product is a commercially available or open source piece of software that other software projects can reuse and integrate into their own products.” This definition considers a COTS or Open Source software product as an independent unit that can be reused. However, a number of authors have provided more specific definitions; a commonly cited definition can be found in Szyperski (2002, p. 41): “A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties.” + + +As De Jonge (2005) points out, “Component-Based Software Engineering (CBSE) is mostly concerned with execution-level components (such as COM, CCB, or EJB components).” Szyperski (2002, p. 3) also speaks of software components as being “executable units of independent production, acquisition, and deployment that can be composed into a functioning system.” + + +In this paper, following De Jonge (2005) we use the term “build-level component.” De Jonge speaks of build-level components as “directory hierarchies containing ingredients of an application’s build process, such as source files, build and configuration files, libraries, and so on.” In an earlier paper, De Jonge (2002) uses the term “source code component.” In this context, we interpret the meaning of “build- +level” component to be equivalent to the term “module,” as used by Clements et al. (2010, p. 29). They indicate that a module refers to a unit of implementation, and as such, can be source code or other implementation artifacts. Eick et al. (2001) also interpret a module to be a directory in the source code file system, which contains several files, though they note that this terminology is not standard. Tran et al. (1999, 2000) considered individual source files as modules. Clements et al. define a “component” to be a runtime entity, which is consistent with the definition by Szyperski. Although important issues are already known when incorporating and reusing whole systems into larger, overarching projects (as in the case of Linux distributions German & Hassan, 2009), in the remainder of this paper, we use the term “component” to refer to build-level component. + + +Components can be reused in different ways, as briefly mentioned: black-box reuse and white-box reuse (Szyperski, 2002). Black-box reuse refers to the reuse of a component as-is without any alterations. The component can only be viewed in terms of its input and output. This is typically the case when proprietary (COTS) components are used, as the source code is usually not available for proprietary software. On the other hand, when the component’s source code is available, the integrator can perform white-box reuse. The integrator may make changes to a component to fit his or her intended purpose. Obviously, the availability of the source code makes OSS components particularly suitable for white-box reuse. + + +The two scenarios are summarized in Figure 1. As an example, the MPlayer project keeps a copy of the library in its repository (and it eventually modifies, or “forks,” it for its own purposes, in a white-box reuse scenario), while the VLC project, at compilation time, requires the user to provide the location of an up-to-date version of the FFmpeg project (black-box reuse). + + +Research on Open Source Software Reuse + + +There is a growing body of empirical research the use of OSS components in CBSD (Ayala et al., 2007; Hauge et al., 2009; Capiluppi & Knowles, 2009; Li et al., 2009; Ven & Mannaert, 2008). There is an increasing number of OSS products available, many of which have become viable alternatives to commercial products (Fitzgerald, 2006), and adopting OSS components to build products is a common scenario (Hauge et al., 2010). + + +Research on OSS reuse can be classified along two dimensions. The first dimension considers the question who reuses the software. This can either be an Independent Software Vendor (ISV), or other OSS communities. The second dimension considers the software that is reused, in particular the granularity of components. Haefliger et al. (2008) identified different granularities of code reuse: algorithms and methods, single lines of code, and components. Components themselves may be of a coarse granularity, i.e., complete software systems. A common example of this is the so- +called “LAMP stack,” (Wikipedia, n.d.) which is an “ensemble” of Linux, Apache, MySQL, and a scripting language such as Python, Perl, PHP or Ruby. Much of the literature on OSS reuse focuses on such coarse-grained components by ISVs, though it is noteworthy that granularity cannot be measured on a discrete scale but rather a continuous one. German et al. (2007) discuss dependencies between packages (which they define as an installable unit of software), such as found in Linux distributions. They define a model to represent and analyze such dependencies. Other work led by German investigated the issue of licenses when reusing different OSS components (German & Hassan, 2009; German & González-Barahona, 2009). + + +On the other hand, reuse can be done with components of a finer granularity. There are few studies of this, all of which focus on the reuse by other OSS projects. The study presented in this paper also considers components of relatively small granularity, which is why we discuss this related work in more detail. Table 1 provides an overview of the study objectives as well as research methods and samples. + + +One of the first studies that quantifies the reuse in Open Source Software is by Mockus (2007). That study focuses on reuse by identifying directories of source code files that share a number (defined by a threshold) of file names; therefore, the study only considers white-box reuse. Mockus studied reuse on a large sample of 38,700 unique projects with 5.3 million unique file name paths. Mockus found that approximately half of the files are used in more than one project, which indicates significant reuse among OSS projects. + + +Haefliger et al. (2008) conducted a study of 15 OSS projects, six of which were studied in-depth. The goal of this study was an investigation of the influence of several factors identified in the literature on the support of code reuse in OSS development. Factors included standards and tools, quality ratings and certificates, and incentives as found in commercial software development firms. The study shows that all studied projects reuse software, and that black-box reuse was the predominant form. + + +Sojer and Henkel (2010) conducted a survey to investigate quantitatively the relationship between developer and project characteristics on the one hand and the degree of software reuse in OSS projects on the other hand. The survey among 686 OSS developers identified a number of factors, such as developers’ experience in OSS projects that affect software reuse in OSS projects. Unlike other studies, such as the one by Mockus and Haefliger et al. mentioned, this study does not investigate actual reuse within OSS projects, but rather developers’ behavior and opinions on the topic. + + +Heinemann et al. (2011) studied reuse in a sample of 20 OSS projects written in the Java programming language, using clone detection. +---------------------------------------- +------------------------------- +Section 292: +Table 1. Overview of previous studies of reuse in OSS + + +| Authors | Study objective | Method and sample | +|--------------------|---------------------------------------------------------------------------------|--------------------------------------------------------| +| Mockus et al. (2007) | To identify and quantify large-scale reuse in OSS. | Survey of 38,700 projects, 13.2 MLOC | +| Haefliger et al. (2008) | Is code reuse supported in OSS? | Multiple case study, 15 projects, in-depth analysis of 6 projects, 6MLOC | +| Sojer and Henkel (2010) | How important is code reuse in OSS projects? What are perceived benefits, issues and impediments of code reuse? How is code reuse affected by characteristics of developers and project? | Web-based survey, 686 responses | +| Heinemann et al. (2011) | Do OSS projects reuse software? How much black-box/white-box? | Empirical study, 20 OSS Java projects, 3.3 MLOC | +techniques complemented with manual inspection. Their study investigated whether OSS projects reuse software, and to what extent such reuse happens as white-box and black-box. They found that reuse is common in the OSS Java projects studied, in particular black-box reuse, as previously found by Haefliger et al. (2008). It must be noted that their measurements also counted reuse of the Java standard libraries. + + +Component Characterization + + +Components, as defined, can be characterized in different categories depending on their relationships to other components. Lungu et al. (2006) distinguish between four types of (Java) packages. These are: + + + + +Silent package +: no dependency relations between the package and other packages. + + +Consumer package +: a dependency relation from the package to other packages (that is, the package depends on, or consumes, functionality from other packages); + + +Provider package +: there is a dependency from other packages to the package (that is, the package provides functionality to other packages); + + +Hybrid package +: the package is both a consumer and provider at the same time (that is, it both consumes and provides functionality to and from other packages, respectively). + + + + +Though Lungu et al. refer to Java packages, which, they argue, are the main mechanism for the decomposition and modularization of a software system written in Java, we argue that the same four types listed can be used to characterize components as directories containing source code files (as defined in the previous subsection). That is, a provider is a component that provides services to other components (which therefore become dependent upon the provider). Likewise, a consumer relies on functionality provided in other components (and is therefore dependent upon those). Incidentally, Java packages are in fact represented as directories in a source code file system. + + +Software Evolution and Program Classification + + +There is a continuous pressure on software systems to evolve in order to prevent becoming obsolete (Lehman, 1978). Lehman (1980) stated a number of “laws of software evolution”. He presents a classification of programs into three classes: S, P and E, which relates to how programs evolve. The three program types are briefly summarized below. + + +S-Programs + + +Lehman (1980) described S-Programs as: “programs whose function is formally defined by and derivable from a specification.” These are programs that solve a specific problem, which is completely defined. The specification of the problem “directs and controls the programmer in his creation of the program that defines the desired solution” (Lehman, 1980). Changes may of course be made to the program, for instance, to improve resource usage or improve its maintainability. However, such changes must not change the mapping between the input and output. If changes are made due to a changed specification, it is a different program that solves a new problem. Typical examples of S-type programs are library routines that implement mathematical operations, for instance the sine and cosine functions. + + +P-Programs + + +P-Programs are programs that implement a solution to a problem that is well-defined but whose implementation must be limited to an approximation to achieve practicality. The problem statement of P-Programs “is a model of an abstraction of a real-world situation, containing uncertainties, unknown, arbitrary criteria and continuous variables” (Lehman, 1980). Whereas the correctness of an S-Program depends on its specification, the value and validity of P-Programs is dependent on the solution +acquired in a real-world environment. As the environment or world in which the program is used is changing, P-Programs themselves must also change. Examples, as suggested by Lehman, are a software program implementing the game of chess, as well as weather prediction software. + + +E-Programs + + +The defining characteristic of the third class of programs, E-Programs, is that the installation of a program itself changes the nature of the problem that it is solving. As Lehman (1980) stated: “Once the program is completed and begins to be used, questions of correctness, appropriateness and satisfaction arise […] and inevitably lead to additional pressure for change.” In other words, the environment (or world) in which the program was originally conceived is changing due to the introduction of the program itself. Or, stated in more abstract terms, the introduction of a solution (the software program) to a problem changes the nature of the problem itself. This leads to the need for continuous change to E-type programs. Lehman mentions as examples of such types of programs operating systems and air-traffic control software (Lehman, 1980). + + +Software Architecture, Decay and Architectural Recovery + + +The empirical analysis of the FFmpeg components reported below revealed several changes in the components and in their connections to the core of the system: these changes revealed (in at least one case) a decay in how some of the components are internally structured, and externally connected to other components. Therefore this work is also related to the study of software architectures, as it relates to components, and their mutual relationships (Bass et al., 2003). + + +It is now widely accepted that a system’s software architecture has different views (IEEE, 2000); well known is the 4+1 view model of architecture (Kruchten, 1995), which defines the logical, development, process, physical views, plus a use-case view. As outlined, our study considers components as directories containing source code files, which would be presented in the development view. One related aspect that was also considered for the present study is about how such structural characteristics decay over time, how components become less cohesive and how the connections between them infringe the original design constraints. + + +One important aspect of software architectures and components is modularity (Parnas, 1972): the division of a system into modules (or components) helps in the separation of the functionality and responsibilities of the various modules. Reusability is a quality attribute that is directly related to a component’s (or system’s); examining the inter-component couplings (Bass et al., 2003) may provide valuable insights that help to assess the reusability of a component (or system). The analysis of coupling and cohesion of object-oriented systems has also shown that a good degree of modularity is achieved by observing the “loose coupling and high cohesion” principle for components (Fenton, 1991; Macro & Buxton, 1987; Troy & Zweben, 1981). + + +As software systems evolve over time, the software engineering literature has firmly established that software architectures and the associated code suffer from software decay (Eick et al., 2001). Perry and Wolf (1992) speak of architectural erosion and architectural drift. The former occurs as a result of violating the (conceptual) software architecture. The latter is due to an insensitivity of stakeholders about the architecture, which may lead to an obscuration of the architecture, which in turn may cause violation of the architecture. As a result, software systems have the progressive tendency to lose their original structure, which makes it difficult to understand and further maintain them (Schmerl et al., 2006). Among the most common discrepancies between the original and the degraded structures, the phenomenon of highly coupled, and lowly cohesive, modules has already been known since 1972 (Parnas, 1972) and it is an established topic of research. + + +Architectural recovery is one of the recognized counter-measures to this decay (Dueñas et al., 1998). Several earlier works have focused +on the architectural recovery of proprietary software (Dueñas et al., 1998), closed academic software (Abi-Antoun et al., 2007), COTS-based systems (Avgeriou & Guelfi, 2005) and OSS (Bowman et al., 1999; Godfrey & Lee, 2000; Tran et al., 2000). In all of these studies, systems were selected in a specific state of evolution, and their internal structures analyzed for discrepancies between the conceptual and concrete architectures (Tran et al., 2000). Researchers have proposed various approaches to address this issue by proposing frameworks (e.g., Sartipi et al., 2000), methodologies (e.g., Krikhaar et al., 1999) or guidelines and concrete advice to developers (e.g., Tran et al., 2000). + + +Architectural recovery provides insights into the concrete architecture, which in turn may be of help to developers and integrators. For instance, certain architectural styles (Clements et al., 2010) may be identified, which can provide valuable insights into a system’s quality attributes (Bass et al., 2003; Harrison & Avgeriou, 2011). Recovery is very important as well to ensure the maintainability of a software product; if the conceptual architecture is not respected, the resulting concrete architecture may become a spaghetti architecture, which can be an obstacle to making necessary changes to the system. In the context of software reuse, and this research in particular, components (as defined) may be identified that can be reused in other systems (i.e., OSS projects). + + +RESEARCH DESIGN + + +The study presented in this paper is a quantitative, descriptive case study (Yin, 2003). As Easterbrook et al. (2008) pointed out, there exists some confusion in the software engineering literature over what constitutes a case study, distinguishing between a case study as a “worked example” and case study as an “empirical method”. Case studies can also be conducted in different contexts, for instance in industry (“in vivo”) or in a research/laboratory setting (“in vitro”). This study is an empirical, “in vitro” case study of one OSS project, namely FFmpeg. As such, this study presents the description and analysis of a system, and following the classification by Glass et al. (2002) the research approach can therefore be classified as “descriptive.” + + +The remainder of this section proceeds as follows. First, we provide further information on the FFmpeg project. Second, we introduce the research questions that guided the research. Third, we present the definitions to operationalize this research. The section concludes with a discussion of data collection and analysis procedures. + + +Selection and Description of the FFmpeg System + + +This paper presents a case study of reuse of build-level components in the FFmpeg project. We selected this project as an example of software reuse for several reasons: + + + + + + +It has a long history of evolution as a multimedia player that has grown and refined several build-level components throughout its life cycle. Some of these components appear like “E” type systems, instead of traditional “S” or “P” types, with lower propensity for software evolution. + + + + + + +Several of its core developers have been collaborating also in the MPlayer (http://www.mplayerhq.hu) project, one of the most commonly used multimedia players across OSS communities. Eventually, the libavcodec component has been incorporated (among others from FFmpeg) into the main development trunk of MPlayer, increasing FFmpeg’s visibility and widespread usage. + + + + + + +Its components are currently reused on different platforms and architectures, both in static linking and in dynamic linking. Static linking involves the inclusion of source code files or pre-compiled libraries at compile-time, while dynamic linking involves the inclusion of a (shared) binary library at runtime. + + + + +Finally, the static-linking reuse of the FFmpeg components presents two opposite scenarios: either a black-box reuse strategy, with “update propagation” issues reported when the latest version of a project has to be compiled against a particular version of the FFmpeg components (Orsila et al., 2008); or a white-box reuse strategy. + + + + +As mentioned, the FFmpeg system has successfully become a highly visible OSS project partly due to its components, libavcodec in particular, which have been integrated into a large number of OSS projects in the multimedia domain. + + +In terms of a global system’s design, the FFmpeg project does not yet provide a clear description of either its internal design, or how the architecture is decoupled into components and connectors. Nonetheless, by visualizing its source tree composition (de Jonge, 2002), the folders containing the source code files appear to be semantically rich, in line with the definitions of build-level components (de Jonge, 2005), and source tree composition (de Jonge, 2002). The first column of Table 2 summarizes which folders currently contain source code and subfolders within FFmpeg. + + +As shown, some components act as containers for other subfolders, apart from source files, as shown in columns two and three, respectively. Typically these subfolders have the role of specifying/restricting the functionalities of the main folder in particular areas (e.g., the libavutil folder which is further divided into the various supported architectures, such as Intel x86, ARM, PPC, etc.; as mentioned, Lungu et al. (2006) refer to this structural “pattern” as an Archipelago). The fourth column describes the main functionalities of the component. It can be observed that each directory provides the build and configuration files for itself and the subfolders contained, following the definition of build-level components (de Jonge, 2005). The fifth column of Table 2 lists the month in which the component was first detected in the repository. Apart from the miscellaneous tools component, each of these are currently reused as OSS components in other multimedia projects as development libraries, for example, the libavutil component is currently redistributed as the libavutil-dev package. + + +Table 2 shows that the main components of this system have originated at different dates, and that the older ones (e.g., libavcodec) are typically more articulated into several directories and multiple files. The libavcodec component was created relatively early in the history of this system (08/2001), and it has now grown to some 220,000 source lines of code (SLOC) alone. + + +As is visible in the time-line in Figure 2, other components have coalesced since then; each component appears modularized around a specific “function,” according to the “De- + + +| Component name | Folder count | File count | Description | First detected | +|----------------|--------------|------------|-------------|----------------| +| libavcodec | 12 | 625 | Extensive audio/video codec library | 08/2001 | +| libpostproc | 1 | 5 | Library containing video postprocessing routines | 10/2001 | +| libavformat | 1 | 205 | Audio/video container mux and demux library | 12/2002 | +| libavutil | 8 | 70 | Shared routines and helper library | 08/2005 | +| libswscale | 6 | 20 | Video scaling library | 08/2006 | +| tools | 1 | 4 | Miscellaneous utilities | 07/2007 | +| libavdevice | 1 | 16 | Device handling library | 12/2007 | +| libavfilter | 1 | 11 | Video filtering library | 02/2008 | +scription” column in Table 2, and as such have become more identifiable and hence reusable in other systems (and are in fact repackaged as distinct OSS projects, http://www.libav.org). + + +Research Questions + + +This research has been guided by three research questions: + + +RQ1: How does the size of FFmpeg components evolve? +Rationale: at first, we were interested in how the components of FFmpeg behave in terms of their size, when they become available, and if there is a limit to growth in such components affecting their ability to be reused properly. + + +RQ2: How does the architecture of FFmpeg components evolve? +Rationale: we were interested in understanding how the various FFmpeg components relate to one another in terms of coupling and cohesion. We consider these measures to be a representation of the software architecture. + + +RQ3: How do FFmpeg components evolve when separated from FFmpeg (e.g., in white-box reuse)? +Rationale: as mentioned, the FFmpeg components have been reused so far in a black-box or a white-box scenario. OSS components are particularly suitable for white-box reuse due to the availability of the source code. A number of FFmpeg components have in fact been reused using a white-box reuse approach. Since in such a scenario a copy of the component is made and maintained by a new hosting project, the component is likely to evolve separately from its original host project (i.e., FFmpeg). Therefore, it is interesting to study how FFmpeg components evolve when they are reused as white-box components. + + +Definitions and Operationalization + + +This section introduces a number of definitions that are relevant to the research presented in this paper. In this paper we use terminology and definitions provided in related and previous studies. + + +The previous section already discussed our interpretation of the term component. To summarize, we consider a directory in the source code file system, containing several source code files, to be a build-level component (de Jonge, 2005), which are subsequently used as units of composition. Others have used the word “module” for this (e.g., Clements et al., 2010). + + +In order to measure the evolution of components and their architectural evolution, we use a number of measurements that have been well established in software engineering measurement literature, namely coupling and cohesion. Coupling is further divided into outbound coupling (fan-out) and inbound coupling (fan-in). Furthermore, we have considered the concept of “connection” which states whether two components are related or not. +• +Coupling +: Coupling is a measure of the degree of interdependence between modules (Fenton, 1991). There are several types of coupling, such as common coupling where modules reference a global data area, control coupling where control data is passed between modules, etc. An extensive classification of types of coupling is presented by Lethbridge and Laganière (2001, p. 323). In this study, we define coupling as the union of “routine call” coupling and “inclusion/import” coupling. Routine call coupling refers to function calls from a component A to a component B. Inclusion/import coupling refers to dependencies expressed using the #include directive of the C preprocessor. We used the Doxygen tool (http://www.doxygen.org/) to extract this information. Since the empirical study is based on the definition of build-level components, two further conversions have been made: + + + + + + +The file-to-file and the functions-to-functions couplings have been “lifted” (Krikhaar, 1999, p. 38, p. 85) into folder-to-folder couplings, as also done by Tran and Holt (1999); this is graphically illustrated in Figure 3. A stronger coupling link between folder A and B will be found when many elements within A call elements of folder B. + + + + + + +Since the behavior of build-level components is studied here, the couplings to subfolders of a component have also been redirected to the component alone; hence a coupling $A \rightarrow B/C$ (with C being a subfolder of B) is reduced to $A \rightarrow B$. This is graphically illustrated in Figure 4. + + + + + + +• +Outbound coupling + (fan-out): for each component, the percentage of couplings directed from any of its elements to elements of other components, as in requests of services. A component with a large fan-out, or “controlling” many components provides an indication of poor design, since the component is probably performing more than one function. + + +• +Inbound coupling + (fan-in): for each component, the percentage of couplings directed to it from all the other components, as in “provision of services.” A component with high fan-in is likely to perform often-needed tasks, invoked by many components, which is regarded as an acceptable design behavior. + + +• +Cohesion +: for each component, the sum of all couplings, in percentage, between its own elements (files and functions). + + +• +Connection +: distilling the couplings as defined, one could say, in a Boolean manner, whether two folders are linked by a connection or not, disregarding the strength of the link itself. The overall number of +these connections for the FFmpeg project is recorded monthly in Figure 5; the connections of a folder to itself are not counted (for the encapsulation principle), while the two-way connection and is counted just once (since we are only interested in which folders are involved in a connection). + + +Data Collection and Analysis + + +The source code repository (SVN) of FFmpeg was parsed monthly, resulting in some 100 temporal points, after which the tree structures were extracted for each of these points. The monthly extraction of the raw data was achieved by downloading the repository on the first day of each month. As an example, for retrieving the snapshot for 02/2008, the following command was issued: + + +svn -r {2008-02-01} checkout svn://svn.ffmpeg.org/ffmpeg/trunk + + +On the one hand, the number of source folders (but not yet build-level components) of the corresponding tree is recorded in Figure 5. On the other hand, in order to produce an accurate description of the tree structure as suggested by Tran et al. (2000), each month’s data has been further parsed using Doxygen, with the aim of extracting the common coupling among the elements (i.e., source files and headers, and source functions) of the systems. Doxygen generates so-called .dot files in the process. Each of these .dot files represents a file (or a class), or a cluster of files, and its couplings towards other in the system. In order to generate the .dot files (and keep them available after the process), the Doxygen configuration file (http://mastodon.uel.ac.uk/IJOSSP2012/Doxygen_base.txt) contains these two commands: + + +"HAVE_DOT = YES" +"DOT_CLEANUP = NO" + + +Various scripts are then applied to obtain the summary of function calls (http://mastodon.uel.ac.uk/IJOSSP2012/ffmpeg-2008-02-01-summary_ALL_FUNCTION_CALLS.txt), dependencies and include relationships. The information in the summary files is at the atomic level of functions or files: in order to define inter-relationships between components, these relations are lifted (Krikhaar, 1999) to the level of the build-level components (i.e., folders) that contain them, as was mentioned. + + +The analysis of size growth has been performed using the SLOCCount tool (Wheeler, n.d.). + + +For each build-level component summarized in Table 2, a study of its relative change in terms of the contained SLOC along its lifecycle has been undertaken. In addition, a study of the +architectural connections has been performed, by analyzing temporally: + + + + +The number of couplings that were actually involved with elements of the same component (as per the definition of cohesion); + + +The number of couplings that consisted of links to or from other components (as per the definition of inbound and outbound couplings, respectively). + + + + +Previous studies that present recovered architectures have used “box-and-line” (or box and arrow) diagrams (e.g., Bowman et al., 1999). We use UML package diagrams (rather than component diagrams) to graphically visualize (build-level) components, as defined in the previous section. + + +RESULTS AND DISCUSSION + + +This section provides the results of the empirical investigation, addressing the three research questions identified in the previous section. First, the size growth of the FFmpeg components is presented (Table 2). This is followed by a presentation of an analysis of the architectural evolution of the components. This section concludes with a discussion of the deployment of libavcodec in other OSS projects. + + +Size Growth of FFmpeg Components + + +As a general result, two different evolutionary patterns can be observed, which have been clustered in the two graphs of Figure 6 and Figure 7; the measures are all relative to the highest values recorded, and they are presented as percentages on the Y-axis. In the top graph, three components (libavcodec, libavutil and libavformat in blue, yellow and red, respectively) show a linear growth as a general trend (relative to the maximum size achieved by each). In the following, these components are referred to as E-type components. On the other hand, the other components in FFmpeg (Table 2) show a more traditional evolution that is typical for library packages, and are referred to as either “S-type” or “P-type” systems (as presented in the background section). + + +Size Growth in E-Type Components + + +Considering the top diagram in Figure 6, the libavcodec component started out as a medium-sized component (18 KSLOCs), but currently its +size has reached over 220 KSLOCs, which is an increase of over 1,100%. Also, the libavformat component has moved through a comparable pattern of growth (250% increase), but with a smaller size overall (from 14 to 50 KSLOC). Although reusable resources are often regarded as “S-type” or “P-type” systems, since their evolutionary patterns manifest a reluctance to growth (as in the typical behavior of software libraries), these two components achieve an “E-type” evolutionary pattern even when heavily reused by several other projects. The studied cases appear to be driven mostly by adaptive maintenance (Swanson, 1976), since new audio and video formats are constantly added and refined among the functions of these components. + + +Using a metaphor from botany, these software components appear and grow as “fruits” from the main “plant” (“trunk” in the version control system). Furthermore, these components behave as “climacteric” fruits (such as bananas), meaning that they ripen off the parent plant (and in some cases, they must be picked in order to ripen; that is, a component needs to +be separated from the parent project in order to allow it to mature and evolve). These FFmpeg components have achieved an evolution even when separated from the project they belonged to (i.e., FFmpeg), similarly to climacteric fruits. + + +Size Growth in S- and P-Type Components + + +The bottom diagram in Figure 7 details the relative growth of the remaining components. The Figures 6 and 7 show that these remaining components show a more traditional library-style type of evolution. Maintenance activities in these components are more likely to be of a corrective or perfective nature (Swanson, 1976). The components libpostproc and libswscale appear to be hardly changing at all, even though they have been formed for several years in the main project (Figure 2). Libavdevice, when created, was already at 80% of its current state; libavfilter, in contrast, although achieving a larger growth, does so since it was created at a very small stage (600 SLOC), which has now doubled (1,400 SLOCs). These resources are effectively library-type of systems, and their reuse is simplified by the relative stability of their characteristics, meaning the type of problem they solve. Using the same metaphor as shown, the components (“fruits”) following this behavior are unlikely to ripen any further once they have been picked. Outside the main trunk of development, these components remain unchanged, even when incorporated into other OSS projects. + + +Architectural Evolution of FFmpeg Components + + +The observations related to the growth in size have been used to cluster the components based on their coupling patterns. As mentioned, each of the 100 monthly checkouts of the FFmpeg system were analyzed in order to extract the common couplings of each element (functions or files), and these common couplings were then converted (lifted) into connections between components. + + +As observed also with the growth in size, the E-type components present a steadily increasing growth of couplings compared to the more stable S-type and P-type components. In the following section, we will study whether the former also display a more modularized growth pattern, resulting in a more stable and defined behavior. + + +Coupling Patterns in E-Type Components + + +Figures 8 through 10 present the visualization of the three E-type components identified. For each component, four trends are displayed: + + + + +The overall amount of its common couplings; + + +The amount of couplings directed towards its elements (cohesion); + + +The amount of its outbound couplings (fan-out); + + +The amount of its inbound couplings (fan-in). + + + + +As seen, these trends are also measured relative to the highest values recorded in each trend, and they present the results in percentages on the Y-axis. + + +Each component has a continuous growth trend regarding the number of couplings affecting it. The libavutil component has one sudden discontinuity in this growth, which will be later explained. As a common trend, it is also visible that both the libavcodec and libavformat components have a strong cohesion factor, which maintains over the 75% threshold throughout their evolution. In other words, in these two components, more than 75% of the total number of couplings are consistently between internal elements. The cohesion of libavutil, on the other hand, degrades until it becomes very low, revealing a very high fan-in. After the restructuring at around one fifth of its lifecycle (June 2006), this component becomes a provider (Lungu et al., 2006), fully providing services to other components (more than 90% of the overall amount of its couplings – around +3,500 – are either towards its own elements or serving calls from other components). + + +When observing the three components as part of a common, larger system, the changes in one component become relevant to the other components as well. For example, the general trend of libavcodec is intertwined to the other two components (i.e., libavutil and libavformat) in the following ways: + + + + + + +The overall cohesion decreases during a time interval when no overall couplings (i.e., the blue trend) were added, therefore this attribute has decayed. + + + + + + +In parallel with the cohesion decay, the fan-out of libavcodec (top of Figure 5) abruptly increases, topping some 17% at the latest studied point: at a closer inspection, this larger fan-out (e.g., requests of services) is increasingly directed towards the libavutil component, which around the same period (middle of Figure 5) experiences a sudden increase of its fan-in (i.e., provision of services). + + + + + + +Also, the fan-in of libavcodec decreases: in the first part of its evolution, libavcodec served numerous requests from the libavformat component; throughout the evolution, these links are converted into connections to libavutil instead, decreasing the fan-in of libavcodec. + + + + + + +Performing a similar analysis for libavformat, it becomes clear that its fan-out degrades, becoming gradually larger, the reason being an increasingly stronger link to the elements of both libavcodec and libavutil. This form of inter-component + + + + + + +Figure 8. Coupling patterns of E-type components. Libavavcodec. + + +Figure 9. Coupling patterns of E-type components. Libavutil. +dependencies is a form of architectural decay (Eick et al., 2001). This has been reproduced for the latest available data point in Figure 11: both libavformat and libavcodec depend heavily on libavutil (1,093 and 1,748 overall couplings, respectively); furthermore, the same two components are also intertwined by 523 calls by libavformat that are served by libavcodec. + + +Figure 11 shows that most of the couplings of these displayed components are amongst themselves; for instance, 68% of the couplings of libavformat (4,051 couplings) are couplings to itself (i.e., its cohesion); 18% (1,093) is to libavutil, and 9% is to libavcodec. Ninety-five per cent of libavformat’s couplings are found within these three components; the remaining 5% are couplings to other components. When comparing these results with the plots in Figures 8 through 10 (especially the one representing the libavcodec component), it becomes clear how its architecture has decayed. In the earliest points, libavcodec represented an excellent component, with a cohesion made of 90% of all its couplings, and a fan-in of 10% of all its couplings. No fan-out was recorded, so essentially libavcodec had no need for services by other components. The latest available point, instead (Figure 11), shows a component that has decayed, that needs more from libavutil (16% of all its couplings), and for which the fan-out has increased to some 18% of its overall couplings. + + +The graph in Figure 11 shows another result, representing in fact the typical trade-offs between encapsulation and decomposition: several of the common files accessed by both libavformat and libavcodec have been “relocated” (Tran & Holt, 1999) recently to a third location (libavutil), that acts as a provider (Lungu et al., 2006) to both. This in turns has a negative effect on reusability; when trying to reuse (some of) the functionality of libavcodec, it will be necessary to include also (some of) the contents of libavutil, since a large amount of calls are issued by libavformat towards libavutil. Even worse, when trying to reuse (some of) the functionality of libavformat, it will be necessary to include also (some of the functionality of) libavutil and libavcodec, since the three components are heavily intertwined. + + +Coupling Patterns in S- and P-Type Components + + +The characteristics of the E-type components as described can be summarized as follows: + + + + +High cohesion; + + +Fan-out under a certain threshold; and +Clear, defined behavior as a component (e.g., a “provider” as achieved by the libavutil component). + + + + +The second cluster of components identified (the “S-” and “P-type”) revealed several discrepancies from the results observed previously. A list of key results is summarized here: + + + + + + +As also observed for the growth of components, the number of couplings affecting this second cluster of components reveals a difference of one (libswscale, libavdevice and libavfilter) and even two (libpostproc) orders of magnitude with respect to the E-type components. + + + + + + +Slowly growing trends in the number of couplings were observed in libavdevice and libavfilter, but their cohesion remains stable. On the other hand, a high fan-out was consistently observed in both, with values of 0.7 and 0.5, respectively. Observing more closely, these dependencies are directed towards the three E-type components defined. This suggests that these components are not yet properly designed; this may also be due to their relatively young age. Their potential reuse is subsumed to the inclusion of other FFmpeg libraries as well. + + + + + + +To summarize, this second type of components can be classified as slowly growing, less cohesive and more connected with other components in the same system. They can be acceptable reusable candidates, but resolving the inter-connections with other components from the same project could prove difficult. + + +Deployment of libavcodec in other OSS Projects + + +Although identified as “E-type” components, the three components libavcodec, libavformat and libavutil have been shown as highly reusable, based on coupling patterns and size growth attributes. This is interesting, as it seems to contradict the expectation that E-type software is less reusable, due to the need to continuously evolve. In order to observe how these components are actually reused and deployed in other hosting systems, this section summarizes the study of the deployment of the libavcodec component in four OSS projects: avifile (http://avifile.sourceforge.net/), avidemux (http://fixounet.free.fr/avidemux/), MPlayer and xine (Freitas, Roitzsch, Melanson, Mattern, Langauf, Petteno et al., 2002). + + +The selection of these projects for the deployment study is based on their current reuse of these components. Each project hosts a copy... +of the libavcodec component in their code repositories, therefore implementing a white-box reuse strategy of this resource. In other words, these projects maintain their own copy of the libavcodec component. The issue to investigate is whether these hosting projects maintain the internal characteristics of the original libavcodec, hosted in the FFmpeg project. In order to do so, the coupling attributes of this folder have been extracted from each OSS project, and the number of connected folders has been counted, together with the total number of couplings. The results are shown in Figure 12. + + +Each diagram in Figure 12 represents a hosting project: the libavcodec copy presents some degree of cohesion (the re-entrant arrow), and its specific fan-in and fan-out (inwards and outwards arrows, respectively). The number of connections (i.e., distinct source folders) responsible for the fan-in and fan-out are displayed by the number in the (multi-) module diagram in the upper-left and upper-right corners. The following observations can be made: + + + + +The total amount of couplings in each copy is always lower than the original FFmpeg copy; this means that not the whole FFmpeg project is reused, but only some specific resources. + + +In each copy, the ratio $\text{fan-in/fan-out}$ is approximately 2:1. In the xine copy, this is reversed: this is due to the fact that, apparently, xine does not host a copy of the libavformat component. + + +For each graph, the connections between libavcodec and libavutil, and between libavcodec and libavformat have been specifically detailed: the fan-in from libavformat alone has typically the same order of magnitude than all the remaining fan-in. + + +The fan-out towards libavutil typically accounts for a much larger ratio. This is a confirmation of the presence of a consistent dependency between libavcodec and libavutil, which therefore must be reused together. The avidemux project moved the necessary dependencies to libavutil within the libavcodec component; therefore no build-level component for libavutil is detectable. + + + + +THREATS TO VALIDITY + + +We are aware of a few limitations of this study, which are discussed below. Threats may occur with respect to construct validity, reliability and external validity. Since we do not seek to establish any causal relationships, we do not discuss threats to internal validity. + + +Construct Validity + + +Construct validity is concerned with establishing correct operational measures for the concepts that are being studied (Yin, 2003). We used coupling and cohesion measures to represent inter-software component connections. These measures are widely used within the software engineering literature in relation to software module inter-connectivity. We interpreted the term “component” as “build-level” component, as previously done in other studies (e.g., de Jonge, 2005). + + +Furthermore, the build-level components presented in Table 2 (though probably accurate) are automatically assigned, but they could be only subcomponents of a larger component (e.g., composed of both libavutil and libavcodec). + + +Reliability + + +Reliability is the level to which the operational aspects of the study, such as data collection and analysis procedures, are repeatable with the same results (Yin, 2003, p. 34). At the time of our study, FFmpeg was hosted in a Subversion repository, which was parsed monthly, as discussed in the research design section. Guba (1981) states that an inquiry can be affected by “instrumental drift or decay,” which may produce effects of instability. In order to guard against this, we have established an audit trail of the data extraction process, which is a recommended practice to establish reliability (Guba, 1981). A snapshot (of the example given in the research design section) is made +publicly available (http://mastodon.uel.ac.uk/IJOSSP2012/ffmpeg-2008-02-01.tar.gz). The generated .dot files (which represent individual files, classes or clusters of files, and contain its couplings to other modules in the system) are also publicly available (http://mastodon.uel.ac.uk/IJOSSP2012/ffmpeg-2008-02-01-dots.tar). + + +External Validity + + +External validity is concerned with the extent to which the results of a study can be generalized. In our study, we have focused on one case study (FFmpeg), which is written mostly in the C programming language. Performing a similar study on a system written in, for instance, an object-oriented language (e.g., C++ or Java), the results could be quite different. However, as outlined in the introduction section, it is not our goal to present generalizations based on our results. Rather, the aim of this paper is to document a successful case of OSS reuse by other OSS projects. +CONCLUSION AND FUTURE WORK + + +This section presents the conclusion of this study followed by directions for future work. + + +Conclusion + + +Empirical studies of reusability of OSS resources should proceed in two strands: first, they should provide mechanisms to select the best candidate component to act as a building block in a new system; second, they should document successful cases of reuse, where an OSS component(s) has been deployed in other OSS projects. This paper contributes to the second strand by empirically analyzing the FFmpeg project, whose components are currently widely reused in several multimedia OSS applications. The empirical study was performed on project data for the last eight years of its development, studied at monthly intervals, to determine and extract the characteristics of its size, the evolutionary growth and its coupling patterns, in order to identify and understand the attributes that made its components a successful case of OSS reusable resources. After having studied these characteristics, four OSS projects were selected among the ones implementing a white-box reuse of the FFmpeg components; the deployment and the reuse of these components was studied from the perspective of their interaction with their hosting systems. + + +In our case study of FFmpeg, a number of findings were obtained. First, it was found that several of its build-level components make for a good start in the selection of reusable components. They coalesce, grow and become available at various points in the life cycle of this project, and all of them are currently available as building blocks for other OSS projects to use. Second, it was possible to classify (using Lehman’s S-P-E program type categories) at least two types of components: one set presents the characteristics of evolutionary (E-type) systems, with a sustained growth throughout. The other set, albeit with a more recent formation, is mostly unchanged, therefore manifesting the typical attributes of software libraries. + + +The two clusters were compared again in the study of the connections between components. The first set showed components with either a clearly defined behavior, or an excellent cohesion of its elements. It was also found that these three components become increasingly mutually connected, which results in the formation of one single super-component. The second set appeared less stable, with accounts of a large fan-out, which suggests a poor design or immaturity of the components. + + +One of the reusable resources found within FFmpeg (i.e., libavcodec) was analyzed when deployed into four OSS systems that have reused it using a white-box approach. Its cohesion pattern appeared similar to the original copy of libavcodec, while it emerged with more clarity that currently its reuse is facilitated when the libavformat and libavutil components are reused, too. Given that most of the projects reusing the libavcodec library are “dynamically” linking (i.e., black box reuse) it to their code, any change made to the libavcodec library have a propagation issue (Orsila et al., 2008): this means that the linking projects need to adapt their code as long as a new version of libavcodec is released; on the other hand, the projects hosting their own copy of the same library (i.e., white box reuse) will face less of the propagation issue, since the changes pushed onto the original version libavcodec will not affect their copies. + + +Future Work + + +This work has several open strands to follow: at first, it would be interesting to replicate this study to other systems that are currently widely reused. In particular, it is necessary to start defining and distinguishing the reuse of whole systems “as libraries” (such as the project zlib), from the reuse of components within larger projects (such as the component libavcodec within the FFmpeg project). In the first case, the whole project is reused as-is, and +it seems likely that only a subset of functions will be reused. In the latter, the implications are more interesting; researchers and practitioners should try to extract automatically libraries that comply with reusability principles, and avoid reusing whole systems. + + +The second research direction that needs to be addressed is about the evolution of reusable resources. It needs to address the following questions: + + + + +Do libraries need to remain mostly unchanged to be reusable? + + +What are the main issues of forking reusable libraries to avoid the effects of “cascade updates”? + + + + +In this respect, OSS developers and interested parties have to produce a strategy for the upgrade of their resources when such resources rely heavily on external libraries. + + +Thirdly, the example of the components being available at different times in FFmpeg shows that other evolving projects might be able to produce a similar response to the OSS communities, by signaling the presence of reusable libraries that could benefit other projects apart from their own. + + +Finally, the presence of so many available OSS projects implementing similar applications (e.g., our example of over 100 projects implementing an “email client”) should be analyzed further to detect how much code duplication, code cloning or components reuse is visible in these projects. + + +ACKNOWLEDGMENTS + + +The authors would like to thank Dr Daniel German for the clarification on the potential conflicts of licenses in the FFmpeg project, Thomas Knowles for the insightful discussions, and Nicola Sabbi for the insider knowledge of the MPlayer system. We thank the anonymous reviewers for their constructive feedback, which has improved this paper. This work was, in part, supported by Science Foundation Ireland grant 10/CE/I1855 to Lero—The Irish Software Engineering Research Centre (www.lero.ie). This paper is a revised version of: Capiluppi, A., Boldyreff, C. & Stol, K. (2011) Successful Reuse of Software Components: A Report from the Open Source Perspective, in: Hissam, S. A., Russo, B., de Mendonça Neto, M. G. & Kon, F. (Eds.) Open Source Systems: Grounding Research, Springer, Advances in Information and Communication Technology (AICT) vol. 365, pp. 159-176. + + +REFERENCES + + +Abi-Antoun, M., Aldrich, J., & Coelho, W. (2007). A case study in re-engineering to enforce architectural control flow and data sharing. +Journal of Systems and Software +, 80(2), 240–264. doi:10.1016/j.jss.2006.10.036 + + +Avgeriou, P., & Guelfi, N. (2005). Resolving architectural mismatches of COTS through architectural reconciliation. In X. Franch & D. Port (Eds.), +Proceedings of the 4th International Conference on COTS-Based Software Systems + (LNCS 3412, pp. 248-257). + + +Ayala, C., Sørensen, C., Conradi, R., Franch, X., & Li, J. (2007). Open source collaboration for fostering off-the-shelf components selection. In Feller, J., Fitzgerald, B., Scacchi, W., & Sillitti, A. (Eds.), +Open source development, adoption, and innovation +. New York, NY: Springer. doi:10.1007/978-0-387-72486-7_2 + + +Basili, V. R., & Rombach, H. D. (1991). Support for comprehensive reuse. +IEEE Software Engineering Journal +, 6(5), 303–316. + + +Bass, L., Clements, P., & Kazman, R. (2003). +Software architecture in practice + (2nd ed.). Reading, MA: Addison-Wesley. + + +Bowman, I. T., Holt, R. C., & Brewster, N. V. (1999). Linux as a case study: Its extracted software architecture. In +Proceedings of the 21st International Conference on Software Engineering + (pp. 555-563). + + +Capiluppi, A., & Boldyreff, C. (2008). Identifying and improving reusability based on coupling patterns. In H. Mei (Ed.), +Proceedings of the 10th International Conference on Software Reuse: High Confidence Software Reuse in Large Systems + (LNCS 5030, pp. 282-293). +Capiluppi, A., & Knowles, T. (2009). Software engineering in practice: Design and architectures of FLOSS systems. In Proceedings of the 5th IFIP WG 2.13 International Conference on Advances in Information and Communication Technology (Vol. 299, pp. 34-46). + + +Clements, P., Bachmann, F., Bass, L., Garlan, D., Ivers, J., & Little, R. …Stafford, J. (2010). Documenting software architectures: Views and beyond (2nd ed.). Reading, MA: Addison-Wesley. + + +de Jonge, M. (2002). Source tree composition. In C. Gacek (Ed.), Proceedings of the 7th International Conference on Software Reuse: Methods, Techniques, and Tools (LNCS 2319, pp.17-32). + + +de Jonge, M. (2005). Build-level components. IEEE Transactions on Software Engineering, 31(7), 588–600. doi:10.1109/TSE.2005.77 + + +Dueñas, J. C., de Oliveira, W. L., & de la Puente, J. A. (1998). Architecture recovery for software evolution. In Proceedings of the 2nd Euromicro Conference on Software Maintenance and Reengineering (pp. 113-119). + + +Easterbrook, S., Singer, J., Storey, M.-A., & Damian, D. (2008). Selecting empirical methods for software engineering research. In Shull, F., Singer, J., & Sjøberg, D. I. K. (Eds.), Guide to advanced empirical software engineering (pp. 285–311). New York, NY: Springer. doi:10.1007/978-1-84800-044-5_11 + + +Eick, S. G., Graves, T. L., Karr, A. F., Marron, J. S., & Mockus, A. (2001). Does code decay? Assessing the evidence from change management data. IEEE Transactions on Software Engineering, 27(1), 1–12. doi:10.1109/32.895984 + + +Fenton, N. E. (1991). Software metrics: A rigorous approach. London, UK: Chapman & Hall. + + +Fitzgerald, B. (2006). The transformation of open source software. Management Information Systems Quarterly, 30(3), 587–598. + + +Freitas, M., Roitzsch, M., Melanson, M., Mattern, T., Langauf, S., & Petteno, D. …Lee, A. (2002). Xine multimedia engine. Retrieved from http://www.xine-project.org/home + + +German, D. M., & González-Barahona, J. M. (2009). An empirical study of the reuse of software licensed under the GNU general public license. In Proceedings of the 5th IFIP WG 2.13 International Conference on Open Source EcoSystems: Diverse Communities Interacting (pp. 185-198). + + +German, D. M., Gonzalez-Barahona, J. M., & Robles, G. (2007). A model to understand the building and running inter-dependencies of software. In Proceedings of the 14th Working Conference on Reverse Engineering (pp. 140-149). + + +German, D. M., & Hassan, A. E. (2009). License integration patterns: Addressing license mismatches in component-based development. In Proceedings of the 31st IEEE International Conference on Software Engineering (pp. 188-198). + + +Glass, R. L., Vessey, I., & Ramesh, V. (2002). Research in software engineering: An analysis of the literature. Information and Software Technology, 44(8), 491–506. doi:10.1016/S0950-5849(02)00049-6 + + +Godfrey, M. W., & Lee, E. H. S. (2000). Secrets from the monster: Extracting Mozilla’s software architecture. In Proceedings of the 2nd Symposium on Constructing Software Engineering Tools (pp. 15-23). + + +Guba, E. (1981). Criteria for assessing the trustworthiness of naturalistic inquiries. Educational Communication and Technology, 29, 75–92. + + +Haefliger, S., von Krogh, G., & Spaeth, S. (2008). Code reuse in open source software. Management Science, 54(1), 180–193. doi:10.1287/mnsc.1070.0748 + + +Harrison, N. B., & Avgeriou, P. (2011). Pattern-based architecture reviews. IEEE Software, 28(6), 66–71. doi:10.1109/MS.2010.156 + + +Hauge, Ø., Ayala, C., & Conradi, R. (2010). Adoption of open source software in software-intensive organizations - A systematic literature review. Information and Software Technology, 52(11), 1133–1154. doi:10.1016/j.infsof.2010.05.008 + + +Hauge, Ø., Østerlie, T., Sørensen, C.-F., & Gerea, M. (2009, May 18). An empirical study on selection of open source software - Preliminary results. In Proceedings of the 2nd ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development, Vancouver, BC, Canada (pp. 42-47). + + +Hauge, Ø., Sørensen, C.-F., & Røsdal, A. (2007). Surveying industrial roles in open source software development. In Feller, J., Fitzgerald, B., Scacchi, W., & Sillitti, A. (Eds.), Open source development, adoption and innovation (pp. 259–264). New York, NY: Springer. doi:10.1007/978-0-387-72486-7_25 +Heinemann, L., Deissenboeck, F., Gleirscher, M., Hummel, B., & Irbeck, M. (2011). On the extent and nature of software reuse in open source Java projects. In K. Schmid (Ed.), +Proceedings of the 12th International Conference on Software Reuse: Top Productivity through Software Reuse + (LNCS 6727, pp. 207-222). + + +IEEE. (2000). +IEEE Std 1471-2000: IEEE recommended practice for architectural description of software-intensive systems +. Piscataway, NJ: IEEE. + + +Krikhaar, R. (1999). +Software architecture reconstruction + (Unpublished doctoral dissertation). University of Amsterdam, Amsterdam, The Netherlands. + + +Krikhaar, R., Postma, A., Sellink, A., Stroucken, M., & Verhoef, C. (1999). A two-phase process for software architecture improvement. In +Proceedings of the IEEE International Conference on Software Maintenance + (pp. 371-380). + + +Kruchten, P. B. (1995). The 4+1 view model of architecture. +IEEE Software +, 12(5), 42–50. doi:10.1109/52.469759 + + +Lang, B., Abramatic, J.-F., González-Barahona, J. M., Gómez, F. P., & Pedersen, M. K. (2005). Free and proprietary software in COTS-based software development. In X. Franch & D. Port (Eds.), +Proceedings of the 4th International Conference on Composition-Based Software Systems + (LNCS 3412, p. 2). + + +Lehman, M. M. (1978). Programs, cities, students, limits to growth? +Programming Methodology +, 42-62. + + +Lehman, M. M. (1980). Programs, life cycles, and laws of software evolution. +Proceedings of the IEEE +, 68(9), 1060–1076. doi:10.1109/PROC.1980.11805 + + +Lethbridge, T. C., & Laganière, R. (2001). +Object-oriented software engineering: Practical software development using UML and Java + (2nd ed.). London, UK: McGraw-Hill. + + +Li, J., Conradi, R., Bunse, C., Torchiano, M., Slyngstad, O. P. N., & Morisio, M. (2009). Development with off-the-shelf components: 10 facts. +IEEE Software +, 26(2), 80–87. doi:10.1109/MS.2009.33 + + +Lungu, M., Lanza, M., & Gîrba, T. (2006). Package patterns for visual architecture recovery. In +Proceedings of the 10th European Conference on Software Maintenance and Reengineering +. + + +Macro, A., & Buxton, J. (1987). +The craft of software engineering +. Reading, MA: Addison-Wesley. + + +Mockus, A. (2007). Large-scale code reuse in open source software. In +Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development +. + + +Orsila, H., Geldenhuys, J., Ruokonen, A., & Hamouda, I. (2008). Update propagation practices in highly reusable open source components. In +Proceedings of the IFIP 20th World Computer Congress on Open Source Software + (Vol. 275, pp. 159-170). + + +Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules. +Communications of the ACM +, 15(12), 1053–1058. doi:10.1145/361598.361623 + + +Perry, D. E., & Wolf, A. L. (1992). Foundations for the study of software architectures. +ACM SIGSOFT Software Engineering Notes +, 17(4), Runeson, P., & Höst, M. (2009). Guidelines for conducting and reporting case study research in software engineering. +Empirical Software Engineering +, 14(2), 131–164. + + +Sametinger, J. (1997). +Software engineering with reusable components +. Berlin, Germany: Springer-Verlag. + + +Sartipi, K., Kontogiannis, K., & Mavaddat, F. (2000). A pattern matching framework for software architecture recovery and restructuring. In +Proceedings of the 8th International Workshop on Program Comprehension + (pp. 37-47). + + +Schmerl, B., Aldrich, J., Garlan, D., Kazman, R., & Yan, H. (2006). Discovering architectures from running systems. +IEEE Transactions on Software Engineering +, 32(7), 454–466. doi:10.1109/TSE.2006.66 + + +Senyard, A., & Michlmayr, M. (2004). How to have a successful free software project. In +Proceedings of the 11th Asia-Pacific Software Engineering Conference + (pp. 84-91). + + +Sojer, M., & Henkel, J. (2010). Code reuse in open source software development: Quantitative evidence, drivers, and impediments. +Journal of the Association for Information Systems +, 11(12), 868–901. + + +Sommerville, I. (2004). +Software engineering (International Computer Science Series) + (7th ed.). Reading, MA: Addison-Wesley. + + +SourceForge. (2011). +Email client +. Retrieved from http://sourceforge.net/directory/?q=email%20client + + +Swanson, E. B. (1976). The dimensions of maintenance. In +Proceedings of the 2nd International Conference on Software Engineering + (pp. 492-497). +Szyperski, C. (2002). +Component software: Beyond object-oriented programming + (2nd ed.). Reading, MA: Addison-Wesley. + + +Torchiano, M., & Morisio, M. (2004). Overlooked aspects of COTS-based development. +IEEE Software +, 21(2), 88–93. doi:10.1109/MS.2004.1270770 + + +Tran, J. B., Godfrey, M. W., Lee, E. H. S., & Holt, R. C. (2000). Architectural repair of open source software. In +Proceedings of the 8th International Workshop on Program Comprehension + (pp. 48-59). + + +Tran, J. B., & Holt, R. C. (1999). Forward and reverse repair of software architecture. In +Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research +. + + +Troy, D. A., & Zweben, S. H. (1981). Measuring the quality of structured designs. +Journal of Systems and Software +, 2(2), 113–120. doi:10.1016/0164-1212(81)90031-5 + + +Ven, K., & Mannaert, H. (2008). Challenges and strategies in the use of open source software by independent software vendors. +Information and Software Technology +, 50(9-10), 991–1002. doi:10.1016/j.infsof.2007.09.001 + + +Wheeler, D. A. (n.d.). +SLOCCount +. Retrieved from http://www.dwheeler.com/sloccount/ + + +Wikipedia. (n.d.). +Lamp (software bundle) +. Retrieved from http://en.wikipedia.org/wiki/LAMP_(software_bundle) + + +Yin, R. K. (2003). +Case study research: Design and methods + (3rd ed.). Thousand Oaks, CA: Sage. + + +ENDNOTES +---------------------------------------- +------------------------------- +Section 293: +1 Of course, a full structural evaluation of these 128 projects should be performed before arguing that no features are reused among these projects. +---------------------------------------- +------------------------------- +Section 294: +2 A list of OSS and commercial projects integrating the libavcodec is given and maintained under http://ffmpeg.org/projects.html +---------------------------------------- +------------------------------- +Section 295: +3 The term “connection” is not intended to cover the term “dependency” between packages in a distribution, since this paper only analyses the internal architecture of components. + + +Andrea Capiluppi is a Lecturer in Software Engineering at University Brunel since May 2012. Before that, he was a Senior Lecturer at the University of East London, from February 2009 to April 2012, and a Senior Lecturer at University of Lincoln, UK, for three years, from January 2006 to February 2009. He has gained a PhD from Politecnico di Torino, Italy, in May 2005, and has held a Researcher position and a Consultant position at the Open University in UK. In November 2003 he was a Visiting Researcher in the GSyC group at the University of Rey Juan Carlos de Madrid, Spain, one of the partners of the project proposal. His publications include some 50 papers, published in leading international conferences and journals, mostly devoted to the Open Source Software topic. He has been a consultant to several industrial companies and has published works where results on FLOSS research have been disseminated in commercial sites. He has taken part in one of the packages of the CALIBRE project, a €1.5 million pan-European EU research project focused on the use of FLOSS in industry. + + +Klaas-Jan Stol is a researcher at Lero, the Irish Software Engineering Research Centre, where he has worked since 2008. He holds a PhD in Software Engineering from the University of Limerick, Ireland, and a MSc in Software Engineering from the University of Groningen, the Netherlands. His research interests are in Open Source Software (OSS), software development methods (including OSS development practices), software architecture, component-based software development, software reuse and empirical software engineering. +Cornelia Boldyreff is the Associate Dean (Research and Enterprise) at the School of Architecture, Computing and Engineering at the University of East London. She gained her PhD in Software Engineering from the University of Durham. In 2004 she moved to the University of Lincoln to become the first Professor of Software Engineering at the university, where she co-founded and directed the Centre for Research in Open Source Software. She has over 25 years experience in software engineering research and has published extensively on her research in the field. She is a Fellow of the British Computer Society and a founding committee member of the BCSWomen Specialist Group. She has been actively campaigning for more women in SET throughout her career. +---------------------------------------- +------------------------------- +Section 296: +Changes in free and open source software licenses: managerial interventions and variations on project attractiveness + + +Carlos Denner dos Santos Jr + + +Abstract + + +The license adopted by an open source software is associated with its success in terms of attractiveness and maintenance of an active ecosystem of users, bug reporters, developers, and sponsors because what can and cannot be done with the software and its derivatives in terms of improvement and market distribution depends on legal terms there specified. By knowing this licensing effect through scientific publications and their experience, project managers became able to act strategically, loosening up the restrictions associated with their source code due to sponsor interests, for example; or the contrary, tightening restrictions up to guarantee source code openness, adhering to the “forever free” strategy. But, have project managers behaved strategically like that, changing their projects license? Up to this paper, we did not know if and what types of changes in these legal allowances project managers have made and, more importantly, whether such managerial interventions are associated with variations in intervened project attractiveness (i.e., related to their numbers of web hits, downloads and members). This paper accomplishes these two goals and demonstrates that: 1) managers of free and open source software projects do change the distribution rights of their source code through a change in the (group of) license(s) adopted; and 2) variations in attractiveness are associated with the strategic choice of a licensing schema. To reach these conclusions, a unique dataset of open source projects that have changed license was assembled in a comparative form, analyzing intervened projects over its monthly periods of different licenses. Based on a sample of more than 3500 active projects over 44 months obtained from the FLOSSmole repository of Sourceforge.net data, 756 projects that had changed their source code distribution allowances and restrictions were identified and analyzed. A dataset on these projects’ type of changes was assembled to enable a descriptive and exploratory analysis of the types of license interventions observed over a period of almost four years anchored on projects’ attractiveness. More than 35 types of interventions were detected. The results indicate that variations in attractiveness after a license intervention are not symmetric; that is, if a change from license schema A to B is beneficial to attractiveness, a change from B to A is not necessarily prejudicial. This and other interesting findings are discussed in detail. In general, the results here reported support the current literature knowledge that the restrictions imposed by the license on the source code distribution are associated with market success vis-a-vis project attractiveness, but they also suggest that the state-of-the-science is superficial in terms of what is known about why these differences in attractiveness can be observed. The complexity of the results indicates to free software managers that no licensing schema should be seen as the right one, and its choice should be carefully made, considering project strategic goals as perceived relevant to stakeholders of the application and its production. These conclusions create awareness of several limitations of our current knowledge, which are discussed along with guidelines to understand them deeper in future research endeavors. + + +Keywords: Open source software, Attractiveness, Software license, Intellectual property, GPL, Free software, Governance, Project and people management, Information technology, Software project, Open source + + +Correspondence: carlosdenner@unb.br +Department of Management (PPGA/ADM), University of Brasilia (UnB), Brasília, Brazil + + +© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. +1 Introduction: collective production and legal issues + + +Society and its creations have become increasingly complex as our body of knowledge grew, and information retrieval technologies evolved. Innovating and competing on a global scale is no activity for an individual alone. Searching for partners and peers to collaborate with and in projects is a crucial task in most fields, notably in science, software engineering and public policy management [1–3]. Experts have noticed this and expressed such notion by saying that modern inventors are organizations, not individuals and that production processes are best dealt with in an open and public fashion, as opposed to the proprietary and private economic model for firm production [3–5]. This change, of course, raises concerns on how the rights of such collective goods (properties) should be regulated and managed as to prevent disincentives for entrepreneurship, cooperation and thus maintain the labor market active and sustainable [6–8]. + + +The digitalization of the world has stimulated this trend of working in collectivities by decreasing the costs of searching for collaborators and using communication technologies to coordinate production activities. The asynchronicity of production activities over the web has led many investigators and developers to engage in geographically distributed projects, such as for software development [9, 10]. For at least the last 20 years, this phenomenon of “collective production” has been particularly prominent in the development of free and open source software (free software, for short), reshaping the information technology (IT) industry as it became a strategic player. Nowadays, there are hundreds of thousands of free software projects online, each representing a computer supported cooperative work opportunity for generating an active and growing ecosystem of users and contributors capable of joint development at an unprecedented scale [11, 12]. + + +Free software projects (FSP) reflect the intention of a founder, the original owner of the property rights, to share costs of continuous software improvement, user base expansion, and visibility growth [13–15]. The ability to attract peers to co-create with the founder is understood as the attractiveness of the project [12]. Richard Stallman and Linus Torvalds are among the first and most famous ones to publicize this type of intention, bringing forth the GNU operating system and Linux, a project incredibly successful that alone impacted the IT industry deeply. Unsurprisingly, inspired by the Linux case, many organizations have created FSP as a deliberate organizational strategy, known as open sourcing, an alternative to the classic outsourcing possibility [11]. When successful, FSP involve active communities structured as networks for the evolution of public software through a resourceful communication channel between users, developers and sponsors. Nevertheless, in these terms, success has been achieved only by a small fraction of the total number of FSP, making the investment of releasing intellectual property to the public and assembling a proper IT infrastructure risky and worth of managerial consideration, as a failed attempt wastes organization’s limited resources [12–16]. + + +In this scenario of uncertainty and competition on whether the attention of users and developers will be obtained, knowledge on how to effectively create and manage FSP to suit better the demands and interests of stakeholders, be a sponsor or a co-developer, is useful and timely. Founders and managers should take into account the stakeholders demands and interests as they expect that to translate on increasing software adoption and intention to contribute (i.e., people reporting and developers fixing bugs). One of the central issues in the literature of open source project affecting intention to adopt and contribute, its attractiveness, is the license terms, the legal specifications under which the software has been released to regulate further improvement and distribution [6, 7, 16–18]. + + +The influence of the license choice has been discussed on many grounds, from a legal [6], strategic [3, 8] and sociological [7] standpoints. The main effects can be summarized as related to people’s motivation in getting involved as some in the community (stakeholders) believe that private property should not be a derivative of a public one; a legal restriction that has been found to scare corporations’ investments away from software obliged to be always free and open (e.g., licensed GPL 2.0). This duality of effects creates a tension where the interests of all cannot be met at once, forcing FSP managers to choose a strategic path and “pick a side” in terms of licensing, distribution rights. + + +A major concern has been the terms under which the application source code is allowed to be modified and re-distributed. Free software can be modified, and the result of that modification distributed in a sold hardware, for example, and the source code of the embedded software kept proprietary, or not, depending on the license chosen. According to previous studies, the intellectual property policy delineated by the chosen license schema has the power to drive people and organizations away from adopting and contributing to FSP, and operates as a governance mechanism, thereby impacting the attractiveness of the project and consequently its production activities [6–8, 12, 17–19]. + + +In a nutshell, the license is believed to influence FSP’s attractiveness, production activities and, thereby, success. As this strategic effect becomes known to FSP founders and managers, assuming their rationality towards the attempt to be successful, an expectation that they should act in practice and change their project licenses to affect +attractiveness is created. This paper represents a methodological advance in comparison to previous studies, as it verifies this theoretically-derived expectation of a relationship between license and attractiveness by performing a longitudinal study with a large sample observed in natura over a wide time frame. This methodological approach was specifically developed towards the answers of the following research questions: 1) Do intellectual property interventions, license changes, occur in practice? 2) Are the different licensing schemas chosen by project managers associated with FSP attractiveness? These questions are answered with a sampling strategy designed to identify the projects that have changed licenses, followed by a statistical analysis of various types of license interventions that FSP managers have decided to make, changing thereby the legal restrictions of their software (and thereby their project attractiveness). Nevertheless, besides this methodological improvement to the literature found here, this paper also contributes in the sense that most previous empirical studies have considered that an open source project has only one type of license, even though many of these projects have more than one. This paper incorporates that in its methodological procedures and improves the classic way of classifying licenses based on Lerner and Tirole’s work in a more realistic, empirically-based schema. Furthermore, the unique dataset assembled to produce this paper is released open, free of charge along with its publication, which is another form of contribution to future research endeavors Additional file 1. + + +The scientific basis grounding the theoretical expectations just spelled out are next stated in more details, a foundation followed by a methods section describing the specific steps followed to obtain the sample and results discussed before the conclusions. + + +1.1 Theoretical foundations: definitions and related work + + +1.1.1 Free and open source software projects + + +In general, projects are endeavors toward goals, such as writing a paper or developing software. When a software project has its source code freely and publicly available online for use and modification with a license specifying that attached to it, it may be classified as a free and open source software project [7, 8, 11, 12]. Free software projects (FSP) are the object of interest to this study for their position as key players in the IT industry. Several of them have become widely known, such as the GNU/Linux operating system, the R statistical package, and the Apache web server. The communities maintaining these systems are large, active and professional, producing first class applications in their domains and receiving sponsorship from companies such as IBM and Google. However, beyond these high-class applications, most FSP has not become successful, never attracting external users and contributors to generate a network of peers producing useful, up-to-date public software freely available [12–14]. + + +1.2 The role of attractiveness + + +One way to understand why some FSP are successful and others are not is through the study of their attractiveness [12], or their “magnetism and stickiness” as some have more informally stated. Attractiveness is a common cause of how many visitors a project website receives, how many users it has, or its number of downloads, and how many contributors it possesses. FSP attractiveness is a concept considered responsible for the (lack of) flow of market resources, basically time and money, to the project. Higher attractiveness leads to more intention to adopt (download) and contribute (become a member), motivating and justifying production activities and investments towards the software to improve quality and generate innovation via the “more eyeballs effect” [12, 19, 20]. FSP attractiveness has a vital role in this perspective, and it is evident how important it is to understand what influences or is associated with attractiveness variations. + + +1.3 The choice of license and FSP success + + +The choice of license impacts FSP success because it defines the scope of doing business with the distribution of the software and its derivatives, perhaps preventing the source code hijacking, or impacting the reuse or “citation” incentive, but for sure influencing stakeholders’ perception of control and utility over the technology. People and organizations take the license terms into consideration on deciding whether to adopt and use free software and, later, if it is worthy contributing to or reusing the source code [7, 8, 16, 21]. Figure 1 depicts this thesis causal chain, from intellectual property choice to attractiveness and then software quality/project success. + + +In summary, based on the literature review in which this study is grounded [8, 12], Fig. 1 can be read from left to right as FSP managers select a license that defines the restrictions applied to the source code redistribution, which affects the flow of market resources to the project (visits to the website: visitors, downloads: intention to use, and membership: intention to contribute). As a consequence of an increase in the project attractiveness, with more people thus interested in the software quality, more bugs will be reported and fixed, and new features will be requested and developed, influencing directly in the project long-term success. Accordingly, this causal chain is expected to be “disturbed” by a managerial intervention/change in the project license, as the interests of relevant stakeholders (sponsors, volunteers, etc.) might not be met anymore. + + +To explore empirically this hypothesis, based on what has been done in previous research [8, 12, 21, 22], this study focuses on four types of legal restrictions that may be applied to the free and open source code. The first relates to whether the source code is “restrictive”, requiring derivative works to be released under the same +license in case of redistribution [19]; the second, to whether it is “highly restrictive”, which besides being restrictive, forbids the source code to be even mingled for compilation with software of a different license [19]; the third, to whether the code may be relicensed, meaning that “any distributor has the right to grant a license to the software […] directly to third parties” ([7], p. 88); and the fourth, to whether a project is licensed under the Academic Free License, since it was written to correct problems of important licenses such as MIT and BSD [7] and is understudied. Methodologically speaking, projects licenses were classified in this basis, including the cases where a project would have more than a license. Therefore, in this schema, a project might not have a restriction for one group of stakeholders, students for example, but do have that restriction for corporations. This methodological choice reflects the reality of open source projects more accurately but has the downside of being more complex, as the results will demonstrate for themselves. + + +The basic sampling strategy idea that guided this research was to look for projects that have undergone a change in these legal terms during their life-cycle and verify possible associations/variations on the main indicators of attractiveness of such projects. This approach aims to uncover whether FSP managers change legal restrictions over their projects life-cycle (research question #1-RQ1) and evaluate whether the success of FSP is associated with the legal terms change through a before-and-after statistical analysis of a managerial intellectual property intervention (IPI) on project attractiveness (research question #2-RQ2). These intents together have not been addressed in previous research with such methodological approach. +---------------------------------------- +------------------------------- +Section 297: +2 Methods: data, sampling and statistical analyses + + +To obtain data capable of answering the questions of whether FSP managers have performed changes in their schema of licensing over the years (RQ1), and whether these changes are associated with the attractiveness of the project (RQ2), a search on the internet for secondary data on free software projects was made. A few options popped up, such as the University of Notre Dame based, but the more seemingly straightforward one was chosen, FLOSSmole [23]. Data obtained and released by FLOSSmole on all projects from the largest free software repository available online [6] at the time of this project data collection efforts was organized in a database for inspection, covering 44 months of activities. This database was filtered down to contain only those projects that have changed their listed licenses over the years covered in the obtained dataset. If this filtered dataset was equal to zero projects, the first research question of this paper would be “no, FSP managers have not changed their license schema, despite the known effect of that on attractiveness found in previous research”. But the empirical answer is yes, FSP managers have made these interventions (aka, IPI) hundreds of times in this research sample. + + +After obtaining this working sample, a data organization process was performed, classifying the various licenses of projects (many have more than one license at a given point) into the categories described right after Fig. 1 shown above. All information on the project audience (end-user or developer, for example), date of creation, etc. was also kept for sample description, and data on numbers of web hits, downloads, and members were gathered monthly to allow for comparisons on these indicators of attractiveness anchored on the type of licensing schema intervention. The choice of these specific indicators is aligned with previous research [12], where attractiveness was first directly addressed in the specialized literature. A few more details on this data preparation procedure are described below. + + +The sampling and filtering procedures adopted were specifically designed to detect the changes in license terms adopted by FSP managers and explore if these IPI are associated with FSP attractiveness variations. As the ideal methodological situation of random selection of projects to undergo a license change is not possible due to the impossibility of doing that with other people’s project (this is not an experiment), alternatively, to +control for confounding effects, projects that had their listing categories or audiences changed during the period covered by this study were selected out too. Also, any project with missing data on the number of members was also removed from the sample, as this indicates an “orphan” project. The working sample is of 756 FSP with monthly data covering a period of 44 months, from October/2005 to June/2009 (1 month was missing in FLOSSmole, July 2008). + + +For each project, monthly data on its license were collected for further classification based on the legal restrictions covered in this paper, as explained before. This classification set forth here is based on previous research, which has always treated licenses by their restrictions of 1) compatibility for mingling with a different software during compilation (when not, referred to as “highly restrictive”), 2) whether an improvement of a software must be released as free software as well (when yes, referred to as “relicensable”), and 3) whether a software might be relicensed by third party to a different license originally chosen (referred to as “relicensable”). However, the empirical fact that projects have more than one license challenges a classification that considers a project simply based on one of its licenses. Free software projects choose schemas of licensing, for example, with a “highly restrictive” stamp for non-payers, and a “relicensable” option for who pays for the software. The classification adopted here takes that into account to obtain a more accurate however complex picture of projects licensing schema. All listed projects’ licenses were considered, and so a dual-licensed project might indeed be “Restrictive, Highly Restrictive and Relicensable”, something that at first sight can appear contradictory. This classification was performed per month, and changes in the schema, managerial interventions detected were flagged for further analysis. +---------------------------------------- +------------------------------- +Section 298: +3 Results and Findings: descriptive statistics towards RQ1 + + +Table 1 summarizes all interventions detected along with labels given to them (see column “description”), and the number of occurrences of each type of change in legal terms is displayed in the table cells. This table represents the detailed answer to RQ1. One can see, for example, that GPL was involved in the managerial interventions 715 times (being the end-state 298 times, the sum of column F, and a beginning state of the change 417 times, the sum of row F). In the description column, one can see that the GPL is restrictive and highly restrictive, that is, derivative work redistributed must be GPL as well, and source code mingled with it during compilation must be GPL as well (a “viral” license). Further, GPL software cannot be relicensed under a different license. GPL is thus restrictive, highly restrictive and non-relicensable. GPL motivates the most managerial interventions, probably due to its popularity and mixed feelings of the community with its adoption (loved by those who believe in “free software forever” and not so much by those primarily guided by competitive motivations). This GPL leadership is followed by the dual-licensing strategy, where FSP managers decide to release code under different licenses depending on the interest and profile of the user (e.g., whether an individual or a for-profit organization). These interventions ranking and the number of their occurrences can be found on Table 1’s column for data related to the new license type chosen to be adopted, and on its rows for the data about the license type abandoned by the project (the “from” and “to” indicated in the first cell of the second row). + + +Additionally, monthly data on Web hits (visitors), downloads (intention to install the software use) and a number of members (intention to contribute reporting bugs or features), besides the type of project and development stage, were gathered. Table 2 contains the + + +| From\To Description | Count of license type interventions in sample | +|---------------------|---------------------------------------------| +| | A | B | C | D | E | F | G | Sum | Ranking | +| A None (or “other”). | 0 | 22 | 2 | 13 | 3 | 47 | 1 | 88 | 5 | +| B Non-Restrictive and Relicensable (e.g., Public Domain or MIT). | 8 | 0 | 7 | 20 | 16 | 31 | 45 | 127 | 4 | +| C Academic Free License-AFL (Non-Restrictive and Relicensable). | 2 | 5 | 0 | 0 | 7 | 0 | 14 | 7 | | +| D Restrictive and Non-Relicensable (e.g., GNU Lesser General Public License-LGPL). | 6 | 34 | 0 | 0 | 21 | 67 | 6 | 134 | 3 | +| E Restrictive and Relicensable (e.g., Mozilla Public License-MPL). | 3 | 19 | 0 | 12 | 0 | 7 | 8 | 49 | 6 | +| F Restrictive, Highly Restrictive and Non-Relicensable (e.g., GNU General Public License-GPL). | 36 | 81 | 3 | 137 | 5 | 0 | 155 | 417 | 1 | +| G Restrictive, Highly Restrictive and Relicensable (e.g., dual licensed: GPL and Apache). | 0 | 32 | 0 | 6 | 6 | 139 | 0 | 183 | 2 | +| Sum | 55 | 193 | 12 | 188 | 51 | 298 | 215 | 1012| | +| Rank | 53 | 74 | 61 | 2 | | | | | | + + +Source: author’s own +descriptive statistics for the numerical variables, and Table 3 the frequency of projects that have a particular type of license versus their development status in the first month of the dataset, October of 2005. To calculate “attractiveness,” a latent construct, the correlation matrix of a previous study [12] was used in a principal component analysis [24], where a linear combination of three indicators of attractiveness was identified to maximize the explained variance. The first principal component extracted is operationally defined as ((0.63\times \log\text{webhits} + 0.64\times \log\text{downloads} + 0.43\times \log\text{members})) and explains 65% of the sample variance. This first component extracted was used to calculate a new variable named attractiveness, a result of the multiplied sum of projects log-transformed web hits, downloads and number of members at any given month. This measure of attractiveness expresses the ability of a project to attract these market resources from the environment where it competes with other projects. Attractiveness is thus a common cause of website visits, downloads and membership numbers. Data was organized and statistically analyzed with R. + + +From Table 2, one can see that in the sample: 1) projects were founded as early as 1999; 2) on average a project had approximately 378 downloads in October 2005; and that at least one project has four different licenses listed at this point. Table 3 depicts a different picture, showing that: 1) 48% of projects, 363, are licensed GPL (restrictive + highly_restrictive + non-relicensable) and out of these 95 are in beta stage; 2) 11% of the 756 projects have no license specified; and 3) only 7 projects have no license and no development status on their file at October/2005. This distribution of projects in the sample demonstrates a wide variability over the various stages of software lifecycle, reducing once more the limitations of non-experimental nature of this study and its potential sampling biases. +---------------------------------------- +------------------------------- +Section 299: +4 Results and Findings: preparing to answer RQ2 + + +To explore the IPI associations with attractiveness variations and obtain some if any statistical evidence of variation, FSP were classified according to the type of intervention they were subject to every month, and the working sample was again organized and analyzed in the following fashion. + + +To allow for statistical comparisons with reasonable sample sizes, the dataset was reorganized to display the seven licensing schemas, from A to G on columns, and attractiveness on the rows. In this new dataset, each cell represents the attractiveness of a project in a specific month, broken by licensing schema with the various columns. This analytical strategy of treating the licensing +---------------------------------------- +------------------------------- +Section 300: +Table 2 + + +| Variable | Minimum | Maximum | Mean | Std. deviation | +|---------------------------|---------|---------|-------|----------------| +| Registered | 11/04/1999 | 3/13/2009 | 1/08/2003 | – | +| n_licenses.200510 | 0 | 4 | 1.10 | 0.39 | +| attractiveness.200510 | 0 | 16.12 | 5.4694| 3.33 | +| downloads.200510 | 0 | 34,514 | 378.07| 1941.56 | +| webhits.200510 | 0 | 836,740 | 9267.23| 48,727.7 | +| members.200510 | 1 | 55 | 3.38 | 5.22 | + + +Source: Author’s own +---------------------------------------- +------------------------------- +Section 301: +Table 3 + + +| Type of license/development status | Alpha | Beta | Mature | None | Planning | Prealpha | Stable | Total | +|-----------------------------------|-------|------|--------|------|----------|----------|--------|-------| +| A | # | 12 | 18 | 2 | 7 | 7 | 33 | 86 | +| % | 1.6% | 2.4% | 0.3% | 0.9% | 0.9% | 0.9% | 4.4% | 11.4% | +| B | # | 25 | 36 | 4 | 2 | 15 | 11 | 33 | 126 | +| % | 3.3% | 4.8% | 0.5% | 0.3% | 2.0% | 1.5% | 4.4% | 16.7% | +| C | # | 1 | 1 | 0 | 2 | 1 | 0 | 3 | 8 | +| % | 0.1% | 0.1% | 0% | 0.3% | 0.1% | 0% | 0.4% | 1.1% | +| D | # | 22 | 33 | 2 | 5 | 15 | 13 | 38 | 128 | +| % | 2.9% | 4.4% | 0.3% | 0.7% | 2.0% | 1.7% | 5.0% | 16.9% | +| E | # | 4 | 1 | 0 | 0 | 1 | 1 | 3 | 6 | 25 | +| % | 0.5% | 1.3% | 0% | 0.1% | 0.1% | 0.4% | 0.8% | 3.3% | +| F | # | 84 | 95 | 9 | 8 | 36 | 38 | 93 | 363 | +| % | 11.1% | 12.6%| 1.2% | 1.1% | 4.8% | 5.0% | 12.3% | 48.0% | +| G | # | 4 | 7 | 0 | 0 | 2 | 3 | 4 | 2 | +| % | 0.5% | 0.9% | 0% | 0% | 0.3% | 0.4% | 0.5% | 2.6% | +| TOTAL | # | 152 | 200 | 17 | 25 | 77 | 75 | 210 | 756 | +| % | 20.1% | 26.5%| 2.2% | 3.3% | 10.2% | 9.9% | 27.8% | 100% | + + +Source: Author’s own +schema and not the specific change of the schema increased the sample size immensely and permitted statistical mean comparisons of attractiveness, as RQ2 required. The classic t-test, robust to violations of assumptions with such large samples, was performed using the software SPSS. + + +The descriptive statistics, variable by variable, for this new dataset is shown below in Table 4, and it is possible to see that the smallest sample size is 265, which means that in 33,264 month-projects available (756 projects times 44 months), 265 month-projects could be flagged with a C type of schema. +---------------------------------------- +------------------------------- +Section 302: +5 Results and Findings: revisiting RQ1 towards RQ2 + + +A project license or schema of licensing imposes restrictions and allowances to the application adopter and source code contributor, creator of a derivative work. For example, a company that customizes a GPL application and distributes it in the market is obliged to make the source code of the redistributed, improved public software. The license choice is a strategic decision with social and economic impacts on the project, as it can block the interests of people related to the software, that is, users, developers and other relevant stakeholders. A major decision like that is not expected to occur very often, as managers avoid status quo changes that harm expectations and turn people’s attention away from the actual work (e.g., into politics and disputes). This tendency to not change strategic matters is known in the organizational literature as structural inertia [25]. + + +In conformance to this organizational inertia, out of thousands of free software projects obtained from FLOSSmole and Sourceforge.net and analyzed in this research, only 756 have decided to change their license type over the period of 44 months covered in this research, from October/2005 to June/2009, missing July/2008. Nevertheless, as it has already been shown in Table 1, these 756 projects that changed licenses have done so 1012 times, a considerable number that validates the theoretical expectation of managerial action through changes in software legal restrictions towards meeting stakeholders’ demands and expectations for project success. Previous research has stated that the license affects the probability of project success and, accordingly, FSP managers have indeed attempted changes in legal restrictions. + + +In terms of specific results, leaving projects exposed and legally unattended, the managerial decision of not having a license specified was detected both ways, as projects left the “none” choice 88 times and, surprisingly, changed their current state of having a license to one where they have no license 55 times (see Table 1, type of license A). In fact, it has been found that projects have had no license specified in every month covered by this research. FSP with no license, the “none” A-category created, have less average attractiveness than restrictive/relicensable and dual-licensed projects often, but have more attractiveness than GPL (F-schema). Let us now move one step further to analyze the data numerically. + + +To initially explore the statistical associations of attractiveness and license, the ratios of mean attractiveness after/before interventions were computed, considering all projects of a given change in licensing schema (summarized in Table 5). For calculating the ratios, it was summed up for all projects of a specific license, after the attractiveness component was calculated for standardization. It is the sum of the attractiveness of all projects in a state of license change for each type of change. Projects were aggregated and afterwards one ratio was calculated by dividing their mean attractiveness after the change by their mean attractiveness before the change. + + +To interpret the results in Table 5, for example, one can see that the ratio of 0.94 in the first row indicates that projects changing from type of license A to B experienced lower levels of attractiveness after the intervention, that is, moving away from a status of having no license (A) and going to a status of “public domain” license (B) is on average detrimental to attractiveness (specifically, a reduction + + +| From* +| To | A | B | C | D | E | F | G | +|---|---|---|---|---|---|---|---| +| Ab | – | 0.94 | 1.07 | 1.06 | 1.14 | 1.09 | 0.87 | +| Bcdf | 0.96 | – | 0.97 | 1.02 | 1.03 | 0.98 | 1.01 | +| Cdf | 0.92 | 0.93 | – | – | – | 1.05 | – | +| Dbeg | 0.98 | 1.05 | – | – | – | 0.96 | 1.03 | +| Edg | 0.70 | 0.86 | – | 0.91 | – | 0.89 | 0.89 | +| Fbc | 0.89 | 1.00 | 2.00 | 0.98 | 1.06 | – | 1.01 | +| Gde | – | 0.85 | – | 0.98 | 0.88 | 0.89 | – | + + +*Superscript letters indicate an asymmetric effect of interventions, that is, going from one license to another has a similar effect (e.g., it is both positive to leave C and go F, and to leave F and go to C) + + +Source: Author’s own + + +Table 4 Descriptive statistics for mean comparisons by licensing schema + + +| License Type | Sample Size | Minimum | Maximum | Mean | Std. Dev | +|---|---|---|---|---|---| +| A_attractiveness | 2134 | 0.44 | 14.60 | 6.9037 | 2.81252 | +| B_attractiveness | 5322 | 0.00 | 16.12 | 6.4007 | 2.75651 | +| C_attractiveness | 265 | 0.00 | 11.12 | 6.4196 | 2.09862 | +| D_attractiveness | 5522 | 0.30 | 14.46 | 6.7547 | 2.31416 | +| E_attractiveness | 1073 | 0.30 | 15.97 | 7.4004 | 2.74449 | +| F_attractiveness | 9849 | 0.00 | 16.83 | 6.6443 | 2.88175 | +| G_attractiveness | 1865 | 0.44 | 18.01 | 7.6265 | 3.08157 | + + +Source: Author’s own +of 6%). However, that strategic move has been detected only 22 times in the sample (see Table 1), imposing a limitation to any robust statistical analysis of such variation in attractiveness. This limitation is overcome later in the analysis, with the t-tests as described in the methods section. + + +Moving ahead with this exploratory results interpretation, as for the associations of the odd managerial action of moving away from having a license specified to not having one (type of change with “A” as target) with attractiveness variations, the average attractiveness ratio of projects that have undergone this type of change have been found to be always detrimental to attractiveness (column A of Table 5), demonstrating that stakeholders do not like the uncertainty associated with a project with no license. By looking at the interventions with A (the none choice) as target in Table 5, it is noteworthy that every time such change was made, the average project attractiveness decreased (a number smaller than one indicates the attractiveness ratio of after/before the change is on average pushed down). Additionally, when a project went from none to a restrictive and relicensable choice (A → E), this change was associated with an average change of 14% in attractiveness. + + +From a distinct perspective, interestingly, the intervention from none to non-restrictive and relicensable (e.g., MIT), and to restrictive, highly restrictive and relicensable (i.e., dual licensed) led to an attractiveness reduction (see from A → B and A → G in Table 5). At this moment, one can only wonder the actual reasons for such findings in a case-specific manner, but the general theoretical interpretation is that relevant stakeholders’ interests were harmed due to the project license change, affecting its consequent attractiveness. + + +Together, these findings related to the managerial decision of having no license specified can probably be interpreted in several ways, such as a sign of a not welcoming market to unregulated software, easier to suffer litigation, if you consider that a managerial change to not having a license specified is always detrimental. However, from another perspective, projects with no license can still be considered attractive, suggesting the possibility that the regular user does not take the license into account at all. Perhaps both explanations are valid and complementary, as the attractiveness measure adopted in this research groups the effects on developers and users together (downloads and membership numbers), and only future research can sort this out. Attractiveness is a cause that these variables have in common, but most likely it is not the only one (the first principal component extracted, for example, explains 64% of the variance, and so 36% is not due to this attractiveness measure). Future studies can dig into this line of inquiry, studying these indicators separately as well. + + +Back to the results interpretation, by focusing on the most popular choice, the GPL, or more generically, the most restrictive licensing (i.e., restrictive, highly restrictive and non-relicensable – the F-schema), it has been found beneficial to projects to abandon this scheme for source code regulation concerning attractiveness increase. Overall, a positive variation with such change in terms of attractiveness was detected, but such strategic move was detrimental to FSP attractiveness when projects went to “none” (A), or restrictive and non-relicensable (D), that is, normally, to the LGPL option (see changes involving F in Table 5). In support of these results, to become GPL was good to FSP attractiveness when the initial state was the absence of license (option A), the Academic Free License (C), or the LGPL one (D). These strategic interventions were detected 47, 7 and 67 times, respectively (Table 1). When taken together, these findings suggest that it is good to avoid the GPL, but it is better to adopt it when compared to having no license or the LGPL. The more challenging explanation for the findings of this type of change is the intervention from GPL to AFL (F → C) and the opposite (C → F), which are both positive. This means that it is good to change from GPL to Academic Free License, and it is also positive to change to GPL coming from the Academic Free License. This suggests that any change might be good to the project, depending on whether such change is aligned with FSP stakeholders’ demands. The (lack of) symmetry on the effects of interventions can be better observed by looking at the matrix shown in Table 5 (the superscript letters), a pattern of the findings dealt with in details later in this section. + + +Analyzing all interventions together, out of 35 types observed in the sample, 13 were positive to attractiveness, 21 were negative, and only one neutral. In total, 1012 intellectual property interventions were found (an average of more than one per project). When taking the initial state (involves F in Table 5) into account, the most common managerial intervention is F (detected 417 times), and it has a consistent positive impact on attractiveness. The least common origin is C (14), and it is associated with a negative change in attractiveness. The largest negative impact occurs for the abandonment of E (15%), which was found 49 times. The mixed results apparent in a visual inspection of Table 5’s coloring scheme suggests that interventions on types of licenses do not always come for good, and that there is always an impact, although only exploratory not statistical here, on attractiveness (the only exception is F to B). This reinforces the importance to carefully and strategically think through the decision, as its impacts do not seem to be irrelevant regarding associated changes in attractiveness. + + +Moreover, every intervention that targeted A, or originated from E or G, impacted attractiveness negatively. +Also, although changing from C to B does not change the project type of license in terms of the restrictions analyzed in this research, it does impact attractiveness, suggesting that stakeholders prefer AFL to MIT, for instance, which makes sense as AFL was designed to improve MIT and that was the reason to include it separately in this study. However, the actual reasons for such finding should be an object of future research, as it suggests there is more to the licensing scheme as this quantitative research captures. + + +Finally, going from G to B led to a reduction of 15% on attractiveness. The dual-license option that G represents signals to projects’ stakeholders that the software is suitable for a wider audience as this intellectual property model can accommodate the interests of various groups, being more market flexible (a generic strategy). Moving away from this management model appears to push attractiveness down, always, as mentioned before (a focused strategy). +---------------------------------------- +------------------------------- +Section 303: +6 Results and Findings: the asymmetry of effects and the statistical answer to RQ2 + + +The lack of symmetry of effect is interesting and deserves further consideration. None of the types of licensing schemas analyzed in this research escapes from this. All the licensing schemas have asymmetric effects with at least one other type of license. The most contradictory type of license is B, which has symmetric effects only with E and G. The least contradictory scheme is A, having the opposite effect on attractiveness only when B is involved (see the superscript letters in Table 5). This finding suggests that a match between licensing scheme and projects’ specific stakeholders might exist, or the direction of the effect of a given license would simply be reversed depending on whether it is the source or the destiny of the intervention. The suitability of one license schema is likely to rely on the context of its adoption, that is, on the momentary demands of stakeholders, and thus no combination of license should be treated as ideal in general, but only in specific according to stakeholders’ expectations on a project-by-project basis. + + +Now towards the statistically based answer to RQ2, the results here reported were further analyzed. The reorganized dataset with mean month-projects attractiveness per licensing schema was subjected to analysis (see Table 4 for descriptive statistics). But, before getting into the mean difference comparisons (t-tests), the values for mean attractiveness for all the time were considered. These results taken together signal that less restrictive licenses are more attractive on average, as dual license beats the academic unrestricted schema (e.g., MIT), which in turn is more attractive than the GPL highly restrictive choice. The conclusion is that the project attractiveness varies according to license schema consistently. Of course, this analysis is basic in statistical terms, but what is clear is that variations on attractiveness indicators are associated with the licensing schema chosen by the FSP manager. The t-tests performed below give further confidence on the answer of RQ2. + + +As explained before, for the mean statistical comparisons, the monthly data was aggregated to increase sample size, as explained before, and the mean differences between each pair of licensing schema was calculated, along with the standard deviation of these differences and subsequent confidence intervals for statistical significance determination. The results are presented in Table 6 below, which considers if the mean difference is significant at 0.05 type I error with the Bonferroni correction procedure applied (marked with +), and the effect size of each pair of licensing schema based on Cohen’s $D^ +$ (marked with $^*$). + + +According to the results shown in Table 6, one can see that 11 out of 21 pairs are statistically significantly different, using the most conventional statistical procedure to control for inflated alpha in the context of multiple comparisons (Bonferroni). Out of this 11, 4 have effect sizes between small and medium but significant according to Cohen’s D famous suggested interpretations (higher than 0.2). This signals that the licensing schema is indeed associated with the average numbers of web hits, downloads and members a project can attract. These differences in absolute numbers and effect sizes between schemas peaks at the C-G pair, with a $-1.35$ mean difference in favor of the dual license schema when a project moves away from the AFL license option. The rest of the results for each pair of licensing schemas can be found in Table 6. + + +Overall, these statistical results and analysis on the variations of attractiveness taken together allow for a solid answer to the second research question posed here in this paper of whether an intellectual property intervention (a managerial change in licensing schema). The licensing schema is indeed associated with variations attractiveness level, not in all, but in many cases, having a meaningful effect size in a few of them. In the next section, the general conclusions are discussed based on the answers found for both research questions, presenting directions for future research and guidelines for free and open source software managers. +---------------------------------------- +------------------------------- +Section 304: +7 Conclusions: implications to research and practice + + +This research focused on intellectual property rights interventions in free and open source software projects (FSP), on licensing schema changes that regulate the distribution allowances of the software source code under the hypothesis that such managerial interventions would affect stakeholders’ perceptions of value and thus variations of FSP attractiveness before and after the managerial intervention could be observed. +To validate such theoretical expectation, data on thousands of FSP over almost 4 years was filtered to identify a sample of 756 projects that changed their types of licenses, allowing then an empirical study of the various managerial interventions detected in a period of 44 months. These variations were cataloged and organized to allow for comparisons of attractiveness changes grouped by the intervention type, a finding so far missing from the free software literature. Moreover, further reorganization of these original datasets allowed the comparisons of projects’ attractiveness to verify whether the licensing schema adopted by FSP managers were associated with the project performance concerning attraction of developers, users and visitors, represented by a linear combination of the numbers of members, downloads and web hits. The classification schema for the licenses adopted by FSP managers developed in this paper also represents a step forward in the literature, as up to now the reality of the adoption of various licenses with apparent contradictory allowances to the source code (with GPL and a public domain license, for example) was not captured in previous research. The result is a more complex but accurate classification, with of course pros and cons. + + +As for a general conclusion, the results indicate that the legal terms specified in the license are indeed associated with project attractiveness, as an aggregated measure. This is in line with previous research, which led to the expectation that the various business models possible with open source, expressed through their licensing schemas, are related to their success regarding the attraction of users and developers [10, 12, 26]. However, moving beyond the previously published literature, the findings suggest the specifics of such generic hypothesis are not well understood yet. + + +It has been found that changes in the software rights of distribution, to be fully understood, cannot be treated solely generically, as interventions vary in attractiveness variations associated with them, being beneficial or not depending on much more than what is known from published literature on free software. This research is the first to point that out, providing thus ground for future (case/qualitative) studies to follow the lead and explore the specific reasons for the license intervention and the consequent increase or reduction in attractiveness based on stakeholders’ perceptions. Both projects +---------------------------------------- +------------------------------- +Section 305: +Table 6: Statistical tests for attractiveness mean differences + + +| one MINUS another licensing schema | Paired differences of attractiveness | 99% confidence interval | t | df | p-value | Cohen’s D | +|-----------------------------------|-------------------------------------|------------------------|---|----|---------|-----------| +| Mean | Std. Deviation | Std. Error Mean | Lower | Upper | +| Pair 1 | A – B | 0.22 | 3.97 | 0.10 | -0.02 | 0.47 | 2.33 | 1735 | 0.02 | 0.06 | +| Pair 2 | A - C + | 0.90 | 3.87 | 0.27 | 0.19 | 1.61 | 3.28 | 200 | 0.00 | 0.23 | +| Pair 3 | A – D | 0.04 | 3.74 | 0.09 | -0.19 | 0.27 | 0.50 | 1751 | 0.62 | 0.01 | +| Pair 4 | A – E | -0.31 | 4.08 | 0.14 | -0.66 | 0.04 | -2.26 | 895 | 0.02 | -0.08 | +| Pair 5 | A – F | 0.03 | 3.91 | 0.09 | -0.21 | 0.27 | 0.34 | 1749 | 0.74 | 0.01 | +| Pair 6 | A - G + | -0.67 | 4.22 | 0.11 | -0.95 | -0.39 | -6.24 | 1553 | 0.00 | -0.16 | +| Pair 7 | B – C | 0.39 | 3.44 | 0.25 | -0.24 | 1.03 | 1.61 | 195 | 0.11 | 0.11 | +| Pair 8 | B - D + | -0.38 | 3.57 | 0.05 | -0.51 | -0.24 | -6.95 | 4356 | 0.00 | -0.11 | +| Pair 9 | B - E + | -0.57 | 3.84 | 0.13 | -0.91 | -0.22 | -4.26 | 829 | 0.00 | -0.15 | +| Pair 10 | B - F + | -0.32 | 3.95 | 0.06 | -0.48 | -0.17 | -5.44 | 4419 | 0.00 | -0.08 | +| Pair 11 | B - G + | -0.90 | 4.19 | 0.11 | -1.18 | -0.62 | -8.22 | 1458 | 0.00 | -0.22 | +| Pair 12 | C – D | -0.06 | 3.44 | 0.25 | -0.72 | 0.59 | -0.26 | 186 | 0.80 | -0.02 | +| Pair 13 | C - E + | -0.90 | 3.60 | 0.24 | -1.53 | -0.28 | -3.77 | 224 | 0.00 | -0.25 | +| Pair 14 | C – F | 0.09 | 3.23 | 0.23 | -0.50 | 0.69 | 0.40 | 197 | 0.69 | 0.03 | +| Pair 15 | C - G + | -1.35 | 3.79 | 0.25 | -2.01 | -0.68 | -5.28 | 220 | 0.00 | -0.36 | +| Pair 16 | D - E + | -0.60 | 3.67 | 0.13 | -0.93 | -0.27 | -4.70 | 817 | 0.00 | -0.16 | +| Pair 17 | D – F | 0.07 | 3.66 | 0.05 | -0.07 | 0.21 | 1.28 | 4614 | 0.20 | 0.02 | +| Pair 18 | D - G + | -0.84 | 3.95 | 0.10 | -1.10 | -0.57 | -8.17 | 1490 | 0.00 | -0.21 | +| Pair 19 | E - F + | 0.65 | 3.85 | 0.13 | 0.31 | 0.99 | 4.88 | 836 | 0.00 | 0.17 | +| Pair 20 | E – G | -0.31 | 4.07 | 0.13 | -0.65 | 0.04 | -2.29 | 927 | 0.02 | -0.08 | +| Pair 21 | F - G + | -0.77 | 4.16 | 0.11 | -1.05 | -0.50 | -7.20 | 1504 | 0.00 | -0.19 | + + +*indicates significance at 0.05 with Bonferroni correction (<0.0023 = 0.05/21) +Cohen’s D calculated as mean divided by std. deviation. Superscript letter A means effect size between small and medium +Source: Authors own +managers and stakeholders’ perceptions should be considered in these future research endeavors. + + +This future line of inquiry based on case/qualitative studies would be able to shed light on the asymmetric effects detected in the sample as well. Quite often an intervention from one license to another did not have an opposite effect when a change from another to one was analyzed (a vice-versa comparison is not possible). Probably, FSP stakeholders have expectations related to an occasional change that might occur in the license terms of the free software they have the intention to adopt or contribute. This means that depending on the current license (the anchor), the effects of changing to one same license might be different; and that the specific interests of project stakeholders also matter (e.g., hardware production or service sale). Managers should take that into account when considering a license change. + + +FSP managers should be aware that the success of their projects is linked with their choice of license, as fewer market resources – the attention of users and the labor of developers – might flow in their direction depending on that. This means that managers must understand who are the relevant stakeholders of their application, what they want out of the software source code, and attempt to meet their expectations, carefully considering a change in the licensing only through a direct negotiation with these stakeholders to avoid unwanted consequences. This research indicates that there is no silver bullet concerning right licensing schema, or business model, signaling the general hypothesis here explored needs further elaboration. + + +Academically speaking, a contingent type of theory to explain the license schema impacts on attractiveness based on context, perhaps stakeholder-based, needs to be developed. To help guide future researchers in that direction, at this moment, it is possible to highlight that a general strategy (multiple licenses) appears to be superior to the specific license schema, as it perhaps accommodates stakeholders’ conflicting interests better. This would explain the noticeable trend to adopt the “various licenses” strategy, and demonstrates how important it is to improve the classification schema previously adopted in the literature. + + +In conclusion, intellectual property interventions are not always beneficial for a free software project, but almost invariably are associated with attractiveness variations. Accordingly, FSP managers should be aware of the importance to carefully select and change the type of license for FSP to (continuously) succeed as a result of a growing market interest in the application and its source code. Nevertheless, such intervention decision should not occur unaware of the specific project under consideration and its stakeholders’ intentions with the software in the future. + + +Nevertheless, methodologically speaking, future research must persist in pursuing the license-attractiveness relationship, analyzing this longitudinal type of data with more advanced inferential statistical techniques, such as structural equation modeling, to explore and understand the causal relationships better and even more rigorously. The t-tests with the Bonferroni procedure applied here is a basic and reliable choice for the problem at hand, but analytical improvements are possible and welcome for a collective, scientific communication towards knowledge accumulation. Another downside of this research is its sample, which was restricted to Sourceforge.com projects. Nowadays there are many other free software repositories that could be considered. Nevertheless, the findings here reported are likely to be constant across these repositories, a hypothesis that future research can verify as well. + + +Finally, the measures of attractiveness here adopted are another point of improvement to be performed by future research. Only number of web hits, downloads and members were utilized, but other various measures are possible. For example, one could use market share as an alternative, or survey methods, to evaluate attractiveness subjectively. Moreover, attractiveness is probably the consequence of many things besides the license chosen by the project manager, and so other factors should be considered in future research. In this paper, this endogeneity issue was dealt with a sampling procedure that identified projects of various kinds and level of maturity, thereby controlling for some of those effects. Additionally, the results here discussed appear complex but seem to be a more accurate representation of FSP reality. As such, they are in themselves not fully understood, and so future research should use the same dataset, made available along with this paper, with different analytical and theoretical approaches to shed more light on these projects behaviour over time. + + +8 Endnotes +1http://www.gnu.org/gnu/initial-announcement.en.html +2http://www.nber.org/papers/w9363 +3http://dl.acm.org/citation.cfm?id=2597116 +4http://flossmole.org/ +5http://www3.nd.edu/~oss/Data/data.html +6http://thestatsgeek.com/2013/09/28/the-t-test-and-robustness-to-non-normality/ +7http://nrs.harvard.edu/urn-3:HUL.InstRepos:11718205 +---------------------------------------- +------------------------------- +Section 306: +9 Additional file + + +Additional file 1: Dataset with the raw data used in the research. (CSV 1489 kb) + + +Abbreviations +AFL: Academic Free License; FSP: Free and open source software projects; GPL: General Public License; IPI: Intellectual property interventions; MIT: Massachusetts Institute of Technology (the license) +Acknowledgements +I appreciate the comments and guidance provided by Professors Julio Singer (statistics, USP) and Fabio Kon (computer science, USP). Their contributions on initial stages of this research were incredibly helpful. I also thank the Center for Technology Development (CDT) of the University of Brasilia (UnB) for the technical help provided in the work of Raphael Saigg. A previous version of this paper was presented at CSCW 2011. + + +Funding +I thank FAPESP (2009/02046-2) for funding. + + +Authors’ contributions +I am the sole author. + + +Ethics approval and consent to participate +No need. Only secondary and public data used. + + +Competing interests +The authors declare that they have no competing interests. + + +Publisher’s Note +Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. + + +Received: 2 March 2016 Accepted: 12 July 2017 +Published online: 07 August 2017 + + +References +1. McCafferty D. Should code be released? Commun ACM. 2010;53:10. +2. Stone R. Earth-Observation Summit Endorses Global Data Sharing. Science. 2010;330:6006. +3. Sojer M, Henkel J. Code reuse in open source software development: Quantitative evidence, drivers, and impediments. J Assoc Inf Syst. 2010;11(12):868–901. +4. Allen RC. Collective invention. J Econ Behav Organ. 1983;4:1–24. +5. von Hippel E. Cooperation between rivals: Informal know-how trading. Res Policy. 1987;16:291–302. +6. Colazo J, Fang Y. Impact of license choice on Open Source Software development activity. J of the Am Society for Inf Sci Tech. 2009;60:5. +7. Rosen, L. Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall; 2004. +8. Stewart KJ, et al. Impacts of License Choice and Organizational Sponsorship on User Interest and Development Activity in Open Source Software Projects. Inf Syst Res. 2006;17:2. +9. Raymond, Eric S. The Cathedral & the Bazaar: Musings on linux and open source by an accidental revolutionary. O’Reilly; 2001. +10. Fitzgerald, Brian. The Transformation of Open Source Software, MIS Quarterly (30: 3). 2006. +11. Agerfalk P, Fitzgerald B. Outsourcing to an unknown workforce, Exploring Opensourcing as a global sourcing strategy. MIS Q. 2008;32:2. +12. Santos C, Kuk G, Kon F, Pearson J. The attraction of contributors in free and open source software projects. J Strateg Inf Syst. 2013;22(1):26–45. +13. Maillart, T., Sornette, D., Spaeth, S., von Krogh, G. Empirical tests of Zipf’s law mechanism in open source Linux distribution. Phys Rev Lett. 2008;101. +14. Wiggins, A., Howison, J., Crowston, K. Heartbeat: measuring active user base and potential user interest in FLOSS projects. In: Proceedings of the Fifth International Conference on Open Source Systems (OSS). 2009. p. 94–104. +15. Crowston K, Howison J, Annabi H. Information systems success in Free and Open Source Software development: Theory and measures. Soft Proc Improv Pract. 2006;11(2):123–48. +16. Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, Daniel M, German DM, Poshyvanyk D. 2015. When and why developers adopt and change software licenses. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (ICSME ’15). IEEE Computer Society, Washington, DC, USA, 31-40. http://dx.doi.org/10.1109/ICSM.2015.7332449. +17. Stewart K, Gosain S. The impact of ideology on effectiveness in open source software development teams. MIS Q. 2006;30(2):291–314. +18. Sen, R. et al. Determinants of the Choice of Open Source Software License. JMIS. 25;(2008). +19. Lerner J, Tirole J. The Scope of Open Source Licensing. J Law Econ Org. 2005;21:1. +20. Raymond E. The cathedral and the bazaar. Knowledge Technol Policy. 1999;12(3):23–49. +21. Sing P, Phelps C. Networks, social influence, and the choice among competing innovations: Insights from open source software licenses. Inf Syst Res. 2009;24(3):539–60. +22. Y. Wu, Y. Manabe, T. Kanda, D. M. German, and K Inoue. A method to detect license inconsistencies in large-scale open source projects. In The 12th Working Conference on Mining Software Repositories MSR 2015, Florence, Italy, May 16-17, 2015. IEEE; 2015. +23. Howison J, Conklin M, Crowston K. FLOSSmole: A collaborative repository for FLOSS research data and analyses. Int J Inform Technol Web Engr. 2006;1(3):17–26. +24. Mardia, K. et al. Multivariate Analysis (Probability and Mathematical Stats). Academic Press; 1980. +25. Hannan M, Freeman J. Structural Inertia and Organizational Change. Am Sociol Rev. 1984;49(2):149–64. Retrieved from http://www.jstor.org/stable/2095567?seq=1#page_scan_tab_contents. +26. Watson RT, et al. The business of open source. Commun ACM. 2008;51(4):41–6. +---------------------------------------- +------------------------------- +Section 307: +An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software + + +Lucas Franke +lfranke@vt.edu +Virginia Tech +Blacksburg, Virginia, USA + + +Huayu Liang +huayu98@vt.edu +Virginia Tech +Blacksburg, Virginia, USA + + +Sahar Farzanehpour +saharfarza@vt.edu +Virginia Tech +Blacksburg, Virginia, USA + + +Aaron Brantly +abrantly@vt.edu +Virginia Tech +Blacksburg, Virginia, USA + + +James C. Davis +davisjam@purdue.edu +Purdue University +West Lafayette, Indiana, USA + + +Chris Brown +dcbrown@vt.edu +Virginia Tech +Blacksburg, Virginia, USA + + +ABSTRACT + + +Background: Governments worldwide are considering data privacy regulations. These laws, such as the European Union’s General Data Protection Regulation (GDPR), require software developers to meet privacy-related requirements when interacting with users’ data. Prior research describes the impact of such laws on software development, but only for commercial software. Although open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance, we do not know how such laws impact open-source software development. + + +Aims: Understanding how data privacy laws affect open-source software development. We focused on the European Union’s GDPR, as it is the most prominent such law. We specifically investigated how GDPR compliance activities influence OSS developer activity (RQ1), how OSS developers perceive fulfilling GDPR requirements (RQ2), the most challenging GDPR requirements to implement (RQ3), and how OSS developers assess GDPR compliance (RQ4). + + +Method: We distributed an online survey to explore perceptions of GDPR implementations from open-source developers (N=56). To augment this analysis, we further conducted a repository mining study to analyze development metrics on pull requests (N=31,462) submitted to open-source GitHub repositories. + + +Results: Our results suggest GDPR policies complicate open-source development processes and introduce challenges for developers, primarily regarding the management of users’ data, implementation costs and time, and assessments of compliance. Moreover, we observed negative perceptions of GDPR from open-source developers and significant increases in development activity, in particular metrics related to coding and reviewing activity, on GitHub pull requests (PRs) related to GDPR compliance. + + +Conclusions: Our findings provide future research directions and implications for improving data privacy policies, motivating the need for policy-related resources and automated tools to support data privacy regulation implementation and compliance efforts in open-source software. +---------------------------------------- +------------------------------- +Section 308: +1 INTRODUCTION + + +Software products collect an increasing amount of data from users to enhance user experiences through personalized, machine learning-enabled [53] application behaviors [33] and marketing [79]. Such practices may benefit users, but also threaten their well-being. For example, in 2013, Facebook allowed the political research firm Cambridge Analytica to access data on ~87 million Facebook users [62]. Cambridge Analytica used this data to influence US elections [114, 115]. + + +To protect their citizens, over 100 governments worldwide are developing data privacy regulations [105]. Their goal is to constrain how their citizens’ personal data is collected, processed, stored, and saved. Some target specific industries, e.g., the United States’s Health Insurance Portability and Accountability Act (HIPAA), which places requirements on healthcare organizations handling medical data [7]. Others cover personal data regardless of context, e.g., the European Union’s General Data Protection Regulation (GDPR), which grants rights to EU citizens and affects entities that handle their data [12]. The penalties for non-compliance with data privacy laws and regulations may be severe [18, 46]. For example, under GDPR, corporations have been fined millions or billions of euros [80]. Most organizations store and manipulate this data electronically through software, and so ensuring the software is in legal compliance is an important software engineering task. + + +Data privacy regulations create challenging software requirements because they entail both technical and legal expertise. Software developers must implement required features, such as obtaining consent from users for data collection, to ensure their organizations’ products are compliant. However, developers may have limited legal knowledge [81, 109] and receive minimal training [21, 55]. This can lead to coarse solutions, such as exiting the affected market [88] — hundreds of websites simply banned all European users when GDPR went into effect [97, 103]. Researchers have explored the impact of data privacy regulations on businesses [72, 73, 88], users [22, 32, 68], and observable software product properties such as website cookies [67] and database performance [92]. However, there has been limited study of how such laws affect the software development process. The few existing studies have been of commercial software development [20, 29]; we lack knowledge of the effects of GDPR on open-source software (OSS) development. + + +The goal of this work is to describe the impact of data privacy regulation compliance on open-source software. Our study is the first on this topic.2 We therefore adopt an exploratory methodology to provide an initial characterization and identify phenomena of + + +2This paper is an extension of our preliminary work, presented as a poster [44]. +interest for further study. Our study draws on two data sources collected in two phases. The first phase examined qualitative data on developers’ experiences with GDPR implementations in OSS, collected via a survey (N=56). To further investigate the impact of GDPR in OSS, the second phase collected and analyzed developers’ activities in open-source projects on GitHub, examining metrics and sentiments on 31,462 pull requests, divided into 15,731 GDPR and non-GDPR pull requests (PRs). + + +Our results show GDPR compliance negatively impacts open-source development—incurring complaints from developers and significantly increasing coding and reviewing activities on PRs. In addition, despite the benefits of data privacy regulations for users, we find developers have mostly negative perceptions of the GDPR, reporting challenges with implementing and verifying policy compliance. We also find that interactions with legal experts hinder development processes, yet developers rarely consult with legal teams—often relying on ad hoc methods to verify GDPR compliance. + + +In sum, our contributions are: + + + + +We survey OSS developers to understand developers’ experiences with GDPR compliance and challenges with implementing and assessing data privacy regulations. + + +We empirically analyze the impact of GDPR-related implementations on development activity metrics. + + +We use natural language processing (NLP) techniques to evaluate the perceptions of GDPR compliance through discussions on OSS repositories. + + + + +Significance: This work contributes an exploratory analysis on the impact of GDPR compliance on open-source software. It identifies interesting phenomena for further research—in particular opportunities to support policy implementation and verification. We also provide recommendations for policymakers and software developers to improve data privacy regulations and their implementation. +---------------------------------------- +------------------------------- +Section 309: +2 BACKGROUND + + +2.1 Software Regulatory Compliance + + +2.1.1 In General. Software requirements are divided into two categories: functional and non-functional [96]. Functional requirements pertain to input/output characteristics, i.e., the functions the software computes. Non-functional requirements cover everything else, such as resource constraints, deployment conditions, and development process. One major class of non-functional requirement is compliance with applicable standards and regulations. These requirements are typically developed and enforced on a per-industry basis in acknowledgment of that industry’s risks and best practices [54]. + + +Complying with standards and regulations has been part of software engineering work for many years. Some standards apply to any manufacturing process, e.g., the ISO 9001 quality standard [11]. Others are generic to software development (e.g., ISO/IEC/IEEE 90003 [10]). Still others are contextualized to the risk profile of the usage context, e.g., ISO 26262 [13] or IEC 61508 [9] which describe standards for safety-critical systems [54]; the US HIPAA law (Health Insurance Portability and Accountability Act) which describes privacy standards for handling medical data [7]; and the US FERPA law (Family Education Rights and Privacy Act) which describes privacy standards for handling educational data [5]. Although these regulations are not new (e.g., FERPA dates to 1974, HIPAA to 1996, and IEC 61508 to 1998), software engineering teams still struggle to comply with them [34, 40, 43, 75]. + + +2.1.2 In Open-Source Software. This study focuses on GDPR compliance in open-source software. The reader may be surprised that regulatory compliance is a factor in open-source software development, as open-source software licenses such as MIT [3], Apache [8], and GNU GPL [6] disclaim legal responsibility. For example, the MIT license, the most common license on GitHub [27], states “the software is provided ‘as is’, without warranty...[authors are not] liable for any claim, damages, or other liability”. However, users and developers of open-source software may desire regulatory compliance. We note three examples. (1) A majority of open-source software is developed for commercial use [47] and may require standards or regulatory compliance [108]. (2) Users with open-source software components in software supply chains [52, 83] may request compliance requirements such as web cookies. The developers may service these requests. (3) Users may extend open-source software themselves and undertake their own compliance analysis [99]. Standards such as IEC 61508–Part 3 include provisions for doing so [60]. + + +Open-source software is no longer a minor player in commercial software engineering. Multiple estimates suggest that open-source components comprise the majority of many software applications [47, 82]. In a 2023 survey of ∼1700 codebases across 17 industries, Synopsys found open-source software in 96% of the codebases and reported an average contribution of 75% of the code in the codebase [101]. It is therefore important to understand how open-source software development considers non-functional requirements such as regulatory compliance. + + +2.2 Privacy Regulations, Especially GDPR + + +2.2.1 Consumer Privacy Laws. In §2.1 we discussed standards and regulatory requirements that affect software products based on industry. Recently a new kind of regulation has begun to affect software: consumer privacy laws. The most prominent example of such a law is the European Union’s General Data Protection Regulation (EU GDPR), enacted in 2016 and enforceable beginning in 2018. Examples in the United States include the California Consumer Privacy Act (CCPA, enacted 2018) and the Virginia Consumer Data Protection Act (CDPA, enacted 2021). Similar legislation has been considered by >100 governments [59, 105]. + + +2.2.2 The General Data Protection Regulation (GDPR). The General Data Protection Regulation (GDPR) [12] protects the personal data of European Union (EU) citizens, regardless of whether data collection and processing is based in the EU. The law has implications for entities that interact with the personal data of EU citizens, divided into data subjects, data controllers, and data processors [45]. Data subjects are individuals whose personal data is collected. Data controllers are any entities —organization, company, individual, or otherwise — that own, control, or are responsible for personal data. Data processors are entities that process data for data controllers. The GDPR grants data subjects rights to their personal data, providing guidelines and requirements to data controllers and processors to understand how to properly handle this data. + + +GDPR compliance is complex for software engineers and consequential for their organizations. Data controllers and processors +commonly use software, e.g., a controller’s mobile app transmits data to its backend service and processors subsequently access and update the database. Software teams must determine appropriate data policies, update their systems to comply, and validate them, e.g., incorporating cookie consent notices into websites to provide users with informed consent [106]. Anticipating a lengthy compliance process, the EU enacted the GDPR in 2016 but made it enforceable in 2018, allowing two years for corporations to prepare [1]. Companies in the US and UK alone invested $9 billion in GDPR compliance [110]. As of December 2022, many use manual compliance methods or are not compliant [14]. Non-compliance is costly: thousands of distinct fines have been imposed on non-compliant data controllers and processors, exceeding €2.5 billion [15]. + + +Although GDPR compliance affects any software that processes the data of EU citizens, and open-source software components comprise the majority of many software applications that process such data [47, 82, 101], to the best of our knowledge there is no prior research on the impacts of GDPR compliance in open-source software. +---------------------------------------- +------------------------------- +Section 310: +3 METHODOLOGY + + +3.1 Data Availability and Research Questions + + +In §2 we described a range of privacy-related standards and regulations. We noted that there has been little study of the effect of these requirements on open-source software engineering practice. To address this gap, we need data. Table 1 estimates the availability of software engineering data associated with these requirements through two common metrics: the number of posts on Stack Overflow and the number of pull requests on GitHub. + + +| Privacy Law (Year) | Stack Overflow | GitHub-PRs | +|-------------------|----------------|------------| +| GDPR (2016) | 2058 | 64 K | +| HIPAA (1996) | 725 | 5 K | +| CCPA (2018) | 96 | 1 K | +| FERPA (1974) | 35 | 254 | +| CDPA (2021) | 7 | 19 | +| PIPEDA (2000) | 5 | 31 | + + +Based on this data, we scoped our study to the EU’s GDPR; and to open-source software hosted on GitHub, currently the most popular hosting platform for OSS. We answer four research questions: + + +RQ1: + How does GDPR compliance influence development activity on OSS projects? + + +RQ2: + How do OSS developers perceive fulfilling GDPR requirements? + + +RQ3: + What GDPR concepts do OSS developers find most challenging to implement? + + +RQ4: + How do OSS developers assess GDPR compliance? + + +We analyzed data from quantitative and qualitative sources: surveying open-source developers and mining OSS repositories on GitHub. We present how we obtained and analyzed each data source next. We integrate this data in answering RQ1 and RQ2, and use the survey data alone to answer RQ3 and RQ4. + + +3.2 Data Source 1: Developer Survey + + +To explore the impact of implementing GDPR policies on OSS development, we distributed an online survey for open-source developers. This data informed our answers to all RQs. We used a four-step approach motivated by the framework analysis methodology [90] for policy research to collect and analyze data in the second phase of our experiment. An overview of this process is presented in Table 2. Our Institutional Review Board (IRB) provided oversight. + + +3.2.1 Step 1: Pilot Study and Data Familiarization. To formulate an initial thematic framework for our qualitative analysis, we conducted semi-structured pilot interviews with OSS developers (n = 3). As no prior work has explored the perceptions of GDPR compliance in OSS, pilot interviews gave us insight into developers’ perceptions and experiences with implementing GDPR concepts in the context of open-source software development. Two subjects had contributed to PRs in our dataset, and the third was a personal contact. They had a wide range of open-source development experience, from < 1 year to > 20 years. Interviews were transcribed using Otter.ai and coded by two researchers to inform our survey. + + +Thematic analysis of our pilot interviews provided insight that informed our survey questions. The participants highlighted the challenges with implementing GDPR requirements in open-source software. One participant worked at a large corporation and outlined differences between GDPR compliance at their company and in OSS, namely with (1) approaches used to assess whether compliance is implemented correctly, and (2) access to legal teams. The other two participants discussed the impact of the GDPR, noting its privacy benefits as well as challenges OSS developers face implementing GDPR requirements and assessing compliance. These findings informed our survey. + + +3.2.2 Step 2: Survey Design. The survey consisted of open-ended and short answer questions seeking details about GDPR implementation and experiences in the context of open-source software development. We used the pilot study interview results to identify topics to focus on in the survey. Based on the interviews, we asked about the perceived impact of the GDPR on data privacy, the most difficult concepts to implement, and how they assess GDPR compliance. The survey instrument is in the supplemental material. + + +3.2.3 Step 3: Participant Recruitment. We distributed our survey in three rounds. In the first round, we emailed a sample of 98 developers who authored or commented on GDPR-related pull requests with a publicly available email addresses. We received 5 responses, i.e., a 5% response rate. In the second round, we made broader calls for participation on Twitter and Reddit. We received 44 responses, 2 of which indicated no experience implementing GDPR compliance. All survey respondents in these rounds were entered in a drawing for two $100 Amazon gift cards. After a few months, we undertook the third round, redistributing our survey to an additional 235 GitHub users with GDPR implementation experience (authored GDPR-related pull requests in our dataset) and offered individual compensation ($10 gift card) to encourage participation. We received 9 responses (4% response rate). In total we have data from 56 survey participants (14 from direct GitHub contacts and 42 from Twitter and Reddit). +Table 2: Overview of sample questions from pilot interview study and survey design/analysis for framework analysis approach used for Data Source 2. The final column notes the inter-rater agreement score for these themes using the $\kappa$ score, prior to reaching agreement. + + +| Interview Question | Codes | Survey Question | Codes | $\kappa$ | +|--------------------|-------|----------------|-------|---------| +| What meaningful impact, if any, do you believe the GDPR has had on data security and privacy? | data privacy, rights to users, data collection | What impact, if any, do you believe the GDPR and similar data privacy regulations have had on data security and privacy? | data privacy, data processing, data collection, insufficient information, data breach, fines | 0.736 | +| What GDPR concepts do you find the most difficult or frustrating to implement? | None, data minimization, embedded content | What GDPR concepts do you find the most difficult or frustrating to implement? | privacy by design, data minimization, cost, data processing, user experience, data management, security risks, None, lawfulness and dispute resolution, time, right to erasure | 0.929 | +| Have you had to specifically seek out legal consultation on GDPR-related issues, and if so, how did that affect your development process? | Yes/No; no effect, negative effect (time) | Have you had to specifically seek out legal consultation on GDPR-related issues, and if so, how did that affect your development process? | Yes/No; N/A, no effect, positive effect, negative effect (cost, time, data storage, data processing,...) | 0.514 | +| During your software development projects, do you frequently consult with a legal team, and if so, how does this impact the development processes? If not, how did you assess GDPR compliance for your software projects? | Yes: legal consultation; No: privacy by design, data minimization | During your software development projects, have you consulted with a legal team? If not, how do you assess GDPR compliance for your software projects? | Yes: legal consultation; No: accountability system, online resources, self-assessment, data management, none), N/A | 0.668 | +| — | — | Has implementing GDPR concepts for compliance impacted your development process in any way? (yes/no/maybe) Please explain: | positive impact (logging, privacy by design), negative impact (cost, data management, security,...), no impact | 0.860 | + + +Our participants have a median of approximately 5 years of OSS development experience (avg = 5.9) and 6 years of general industry experience (avg = 7.7). Participants reported contributing to a variety of OSS projects such as Mozilla, Wordpress, Fedora, Moodle, Ansible, Flask, Django, Kubernetes, PostGreSQL, OpenCV, GitLab, and Microsoft Cognitive Toolkit. + + +3.2.4 Step 4: Data Analysis. To analyze our survey results, we used an open coding approach. Two researchers independently performed a manual inspection of responses–highlighting keywords and categorizing responses based on the pre-defined themes derived from our pilot study. If new themes arose, the coders discussed and agreed upon adding the new theme. Then, both coders came together to merge their individual results. Finally, we used Cohen’s kappa ($\kappa$) to calculate inter-rater agreement (see Table 2). + + +3.3 Data Source 2: GDPR PRs on GitHub + + +We collected data concerning GDPR compliance by analyzing pull requests on GitHub repositories. Pull requests are a mechanism on GitHub that allow developers to collaborate on open-source repositories, involving code contributions from developers to be reviewed and merged into the source code [48]. + + +3.3.1 GDPR and non-GDPR PRs. We used the GitHub REST API to search for GDPR-related pull requests—pull requests returned by the GitHub API’s default search with the query string “GDPR”. Manual inspection suggested the results are typically English-language PRs related to (GDPR) data privacy regulatory compliance. + + +Using this method, we collected GDPR-related PRs created from April 2016 (when the GDPR was adopted by the European Parliament) to January 2024. We removed content submitted by users with “bot” in their username [16] and designated as a bot type according to the GitHub API to avoid PRs generated by automated systems. This resulted in 15,731 GDPR-related pull requests across 6,513 unique GitHub repositories. For comparison, we also collected a random sample of 15,731 pull requests created in these same repositories after April 2016 that did not mention “GDPR”, which we call non-GDPR-related pull requests. The studied repositories had a median of 14 stars (avg = 1,635), 11 forks (avg = 416), 727 commits (avg = 8,997), 172 PRs (avg = 1,425), and 15 contributors (avg = 59), suggesting popular, active repositories. The distribution of PRs across all repositories in our GDPR-related and non-GDPR-related datasets is summarized in Table 3. + + +3.3.2 Measuring Development Activity. To analyze GDPR’s impacts, we collected development activity metrics [49] per pull request: + + + + +Comments: + the total number of comments + + +Active time: + the amount of time the PR remained active (until merged or closed) + + + + +Table 3: Distribution of PRs in Datasets. + + +| Dataset | min | 50%ile | 75%ile | 90%ile | max | +|---------|-----|--------|--------|--------|-----| +| GDPR | 1 | 1 | 2 | 3 | 956 | +| non-GDPR| 1 | 2 | 10 | 34 | 203 | + + +3https://docs.github.com/en/graphql/reference/objects#bot +• +Commits +: the total number of commits +• +Additions +: the number of lines of code added +• +Deletions +: the number of lines of code removed +• +Changed files +: the total number of modified files +• +Status +: outcome of PR (merged, closed, or open) + + +We selected these metrics to analyze development activity, specifically to derive coding and code review tasks from pull requests. We compared the distributions of these metrics between GDPR-related and non-GDPR-related PRs using a Mann-Whitney U test, to compare nonparametric ordinal data between the datasets [76]. To control for multiple comparisons on the same dataset, we calculate adjusted p-values using Benjamini-Hochberg correction [30]. We measure effect size ($r$) for significant results using Cohen’s $d$ [39]. +---------------------------------------- +------------------------------- +Section 311: +3.3.3 Measuring Developer Perception + + +To augment our survey results, we applied sentiment analysis—a technique to automatically infer sentiment from natural language—on the title, body, commit messages, review comments, and discussion comments from pull requests in our datasets to examine developer perceptions of GDPR compliance. Prior studies have similarly inferred developer sentiment and emotion from GitHub activity, including PR discussion comments [87], review comments [57], commit messages [50], and bodies [84]. While this technique sometimes has negative results in software engineering contexts [64], we use it in our exploratory work as a proxy to obtain preliminary insights into developers’ sentiments regarding GDPR compliance in OSS. + + +We followed standard NLP preprocessing steps [69]: (1) We removed bot-generated content using the process described in Section 3.3.1. (2) We removed non-sentiment material: hyperlinks and mentions (“@username”). (3) We tokenized text using the Natural Language Toolkit (NLTK) tokenize library. (4) We converted tokens to lowercase and removed punctuation. (5) We removed stopwords such as “but” and “or” (nltk.corpus library). (6) We lemmatized the text, i.e., reducing words to their base form (e.g., “mice” becomes “mouse” [23]) using WordNetLemmatizer from the nltk.stem library. (7) We normalize the data by removing meaningless tokens, such as SHA or hash values for commits, and non-standard English words, such as words that contain numerical values (i.e., “3d”) [98]. + + +After preprocessing the data, we were left with 15,731 titles, 14,515 bodies, 15,217 commit messages, 4,922 review comments, and 4,862 discussion comments across the GDPR-related pull requests. We compared these against non-GDPR-related PRs, for which we had 15,731 titles, 13,718 bodies, 15,652 commit messages, 3,427 review comments, and 3,165 discussion comments. + + +To perform sentiment analysis, we use three state-of-the-art models: Liu-Hu [56], VADER [58], and SentiArt [63]. We fed the preprocessed textual data to each model, which provided compound sentiment scores. We use a t-test ($t$) to statistically analyze sentiment across our datasets. Moreover, we aim to assess the impact of the GDPR on developer sentiment over time. To accomplish this, we divided the GDPR and non-GDPR PRs into 3-month segments based on the creation date of the PR. Then, we performed sentiment analysis on the binned data to observe whether and how developer sentiments manifest in OSS interactions over the lifecycle of the GDPR regulation — from its initial adoption in 2016, enforcement in 2018, and to the present. We combined all preprocessed textual elements (title, body, commit messages, review comments, and discussion comments) to observe the overall trends in PR communications and compare with non-GDPR data as a baseline sentiment in developer communications for the projects studied. +---------------------------------------- +------------------------------- +Section 312: +4 RESULTS + + +We are interested in understanding the impact of GDPR implementations on open-source software by analyzing development activity and developer perceptions, including challenges with implementation and assessment of compliance. In this work, we answer our research questions using multiple sources—analyzing GitHub repositories and surveying open-source developers. For RQ1 and RQ2, we report views from the survey and the GitHub measurements. For RQ3 and RQ4, we use data only from the survey. +---------------------------------------- +------------------------------- +Section 313: +4.1 RQ1: Development Activity + + +This question was: +RQ1: How does GDPR compliance influence development activity on OSS projects? +---------------------------------------- +------------------------------- +Section 314: +4.1.1 Survey + + +We surveyed 56 OSS developers to understand the impact of GDPR implementations on development activity. Most participants ($n = 41, 73\%$) responded “Yes” to a question regarding the impact of implementing GDPR concepts on development processes, indicating data privacy compliance effects open-source development. When asked to elaborate, 23 developers provided examples of development impacts related to the GDPR. + + +Data Management: + 11 participants mentioned GDPR requirements related to data management impact development activity, notably increasing development efforts. For instance, responses indicated handling personal data (P17) and anonymization (P19), managing data controllers (P21) and data recipients (P23), implementing functionality to limit the collection of personal data (P26), and the monitoring of data subjects from the EU (P28) all impacted development processes. P53 also added “we had to separate in a clear way sensitive data from the other data”, exemplifying the effort needed to implement compliant data processing in OSS. + + +Time and Costs: + Five participants mentioned GDPR compliance increases development time and costs in OSS. For example, regarding time, respondents said “it does slow down our development cycle” (P54) and “we lost a complete year to be ready” (P56). For costs, participants said “budgets have soared” (P5) and “costs of production should not go over the cost of consequence of data breach” (P46). + + +Design: + Three participants also noted the effects of GDPR compliance on the design and structure of software products. For example, P54 responded “we have to check whether we comply with GDPR every time we draft a new design” and P55 added “the design of systems now incorporates the concept of needing to remove PII after the fact”. P21 explained how GDPR compliance reduced the quality of their application’s design—replying “the principle of minimum scope was not observed”—indicating potential unnecessarily extended scopes of variables in the code [36]. + + +Organization: + Three participant responses embodied the negative effects of data privacy regulations on their organization, stating the GDPR has a “major impact” requiring “an overhaul of project management and program priorities” (P1). P45 highlighted that “making sure to follow privacy by design” is challenging for GDPR compliance in OSS development. One participant also mentioned +additional steps to verify implementations affected their development, stating "we need to make an additional review with the GDPR consultants that functionality that is related to the users’ data" (P53). + + +Benefits: + One participant mentioned benefits to their development team and processes regarding the implementation of GDPR concepts, stating it helped highlight "things we had not considered before", such as ensuring that "logging functionality" and "access restrictions" were in place (P1). However, the majority of responses indicate that GDPR compliance often increases development efforts and incurs negative impacts for open-source developers. +---------------------------------------- +------------------------------- +Section 315: +4.1.2 Pull Request Metrics + + +To further observe the impact of GDPR compliance on OSS, we compared metrics for GDPR and non-GDPR related PRs. Table 4 presents these results. Using a Mann-Whitney U test, we found statistically significant differences between GDPR and non-GDPR PRs in the number of comments, active time, number of commits, lines of code added, lines of code deleted, and number of modified files. We also calculate the effect size for these results. + + +This indicates that incorporating changes related to the GDPR has a major impact on development work, leading to increased discussions between developers, longer review times, more code commits, and higher code churn. While we observed significant differences exist in pull request metrics between GDPR and non-GDPR PRs, the calculated effect sizes are "small" [71], indicating low practical differences between the groups. Yet, these findings support our survey results from open-source developers purporting that GDPR compliance efforts affect OSS development. + + +Finding 1: + Developers report implementing GDPR compliance negatively affects development processes—citing cost, time, and data management as concerns. + + +Finding 2: + PRs related to GDPR compliance have significantly more development activity for coding (comments, additions, deletions, files changed) and review (comments, active time) tasks. +---------------------------------------- +------------------------------- +Section 316: +Table 4: GDPR (G) vs. Non-GDPR (non-G) GitHub Activity Metrics. + + +| Characteristic | Type | Median | p-value | +|----------------------|------|--------|---------| +| Comments + | G | 1 | < 0.0001 | +| | non-G| 1 | (U = 1.4E8, r = 0.09) | +| Active time (days) + | G | 418.05 | < 0.0001 | +| | non-G| 1.78 | (U = 1.4E8, r = 0.14) | +| Commits + | G | 2 | < 0.0001 | +| | non-G| 1 | (U = 1.4E8, r = 0.04) | +| Additions + | G | 57 | < 0.0001 | +| | non-G| 19 | (U = 1.5E8, r = 0.05) | +| Deletions + | G | 7 | < 0.0001 | +| | non-G| 4 | (U = 1.3E8, r = 0.05) | +| Changed files + | G | 4 | < 0.0001 | +| | non-G| 2 | (U = 1.4E8, r = 0.03) | + + + + +denotes statistically significant results (p-value < 0.05) +---------------------------------------- +------------------------------- +Section 317: +4.2 RQ2: GDPR Perceptions + + +This question was: +RQ2: How do OSS developers perceive fulfilling GDPR requirements? +---------------------------------------- +------------------------------- +Section 318: +4.2.1 Survey + + +We asked participants their perceptions on the impact of GDPR regulations on privacy. Of participants who responded to this question (n = 25), most had negative opinions of the GDPR. Three participants were neutral (e.g., "N/A" (P4)). We summarize positive and negative perceptions next. + + +Negative Perceptions: + Despite the utility of data privacy regulations, 22 participants reported negative perceptions of the GDPR. These responses primarily focused on three issues: cost, organizations, and enforcement. For costs, respondents noted that implementing GDPR requirements is expensive and burdensome. Participants said that compliance is "costly for many companies" (P16) is "too expensive" (P24), and "the cost of protection should not go over the cost of consequence of data breach...GDPR [isn’t] worth the time" (P46). P55 also highlights that "in general there have been major costs to companies of all sizes" regarding GDPR implementations. For organizations, participants reported a negative impact of the GDPR on companies and organizations. They mentioned that GDPR compliance "weakens small and medium-sized enterprises" (P15), "threatens innovation" (P18), "fails to meaningfully integrate the role of privacy-enhancing innovation and consumer education in data protection" (P23), and that "in order to be safer than risky useful functionality is removed" (P52). P46 added that the GDPR is "a lot of headache...jobs for lawyers at the expense of people who are trying to solve real problems". For enforcement, one subject said "there is a large gap in GDPR enforcement among member states (P17) and another observed "the trend...is an increase in the number of times and the amount of fines" (P18). Similarly, P49 described GDPR as "a big hammer", but was unsure "if it has necessarily increased security and privacy at this point". + + +Positive Perceptions: + Eight participants had positive perceptions of the GDPR, generally stating that GDPR enhances data privacy for users. For example, participants said that "the risk of incurring and paying out hefty fines has made companies take privacy and security more proactively" (P30), that GDPR brings "awareness to the importance about privacy" (P45), that "data integrity is ensured" (P47), and "customers can now delete their data quite easily" (P54). Participants also appreciated the increased accountability for corporations in safeguarding users’ data—for example one participant stated "Before GDPR data protection was usually considered only as an afterthought if not an outright joke. Nowadays companies will at least consider what they are doing wrong before violating data protection laws, rather than doing it by accident because no-one even thought about it" (P50). These responses reflect the intentions of the GDPR — to safeguard the rights of users and their data online. +---------------------------------------- +------------------------------- +Section 319: +4.2.2 Sentiment Analysis + + +We investigated the sentiment of developers implementing GDPR concepts by analyzing PR titles, commit messages, review comments, discussion comments, and bodies. Our overall results are in Table 5. We anticipated a higher percentage of negative comments for GDPR-related pull requests. However, we did not find evidence that GDPR-related PRs have less favorable sentiments from developers. In fact, we found they often had more positive sentiments than non-GDPR-related PRs—with two of the three models (Liu-Hu and VADER) indicating a statistically significant difference between the GDPR and non-GDPR sentiment. We speculate two explanations. First, non-GDPR-related PRs represent a broad range of code contributions, which could address a number +Table 5: GDPR (G) vs Non-GDPR (non-G) Sentiment Analysis + + +| Test | Type | Mean | Variance | p-value | +|----------|--------|-------|----------|---------| +| Liu-Hu + | G | 0.43 | 0.27 | ( p < 0.0001 ) ( (t = -4.05, r = 0.22) ) | +| | non-G | -0.04 | 0.28 | | +| VADER + | G | 0.44 | 0.04 | ( p < 0.0001 ) ( (t = -6.47, r = 0.02) ) | +| | non-G | 0.21 | 0.01 | | +| SentiArt | G | 0.39 | 0.01 | ( p = 0.1399 ) ( (t = -1.10, r = 0.01) ) | +| | non-G | 0.36 | 0.002 | | + + + + +denotes statistically significant results (p-value < 0.05) + + + + +Figure 1: Longitudinal GDPR (G) and Non-GDPR (non-G) Sentiment Analysis Data. We grouped GDPR and non-GDPR data into 3-month segments and used 3 sentiment models. For each model, GDPR data is plotted in a color with a filled marker, and non-GDPR data in the same color but with a hollow marker. The general trend is that sentiment for GDPR data is moderately positive, and more positive than for non-GDPR data. + + +of issues. Second, we are limited by the capabilities of the sentiment analyzer. For example, the two most negative commit messages for non-GDPR pull requests said “obsolete” and “fatal”, which are common terms of art in software maintenance tasks [89, 113] (e.g., “fix fatal error”). We also observed some variation at the beginning and end of our dataset collection period, but no significant variation in sentiment over time (see Figure 1). + + +Nonetheless, manual inspection of negatively scored content showed OSS developers expressing frustration with GDPR compliance. For instance, one title and commit message described GDPR-related changes to “avoid lawsuits by mentioning cookies thing” [91]. Another title states adding “just enough EULA [end user license agreement] not to get banned” [31]. Similar frustrations were shared in a PR body for “GDPR stuff” adding changes to “display the annoying cookies banner” [104]. Discussion comments, such as “will this conflict with GDPR?” [24], also highlight OSS developers’ confusion with GDPR requirements. + + +Finding 3: Despite its nominal advantages, most developers had negative perceptions of the GDPR and its implementation. +Finding 4: We found developers did not express more negative sentiments about GDPR compliance in PR discussions. +Finding 5: Sentiment related to GDPR compliance appears to be stable over time. + + +4.3 RQ3: Implementation Challenges + + +This question was: RQ3: What GDPR concepts do OSS developers find most challenging to implement? In the survey data, we observed three common challenges: data management, data protection, and vague requirements. + + +Data Management: 11 developers responded that processing and storing users’ data according to GDPR requirements is the most challenging concept to implement. For example, participants mentioned challenges implementing “data protection” (P24), handling “personal data” (P34), the “exchange of documents containing personal data” (P32), the “improper storage” (30) of user data, and “knowing what info can or cannot be accessed or saved” (P49). In particular, four participants mentioned users’ right to erasure—or the obligation for data controllers to delete users’ data upon request “without undue delay” [4]—as the most complicated requirement to implement. For example, P53 responded, “it’s not always easy enough to implement data processing in a way, that it’s anonymized, and if the user would like their data to be erased, be able to continue processing of the results based on user data in an anonymous way”—describing the complexity of this requirement for their project. + + +Data Protection: Five participants mentioned security factors as a challenge for GDPR compliance. For instance, participants were concerned with “data protection” and “other security concerns” (P24), “leaks” (P27), and the fact that other entities have “the ability to steal data” (P28). P55 noted challenges with handling and securing data in “central databases, where that data may be relied on by many loosely connected applications and systems”. These responses highlight the difficulties of implementing mechanisms to safeguard users’ data. + + +Vague Requirements: 10 survey respondents highlighted a lack of clear requirements as the biggest challenge with GDPR compliance in OSS. For example, one participant mentioned that GDPR “is pretty vague” with a lack of “standard format” (P54). Another described confusion in knowing “how long can data be retained” and “what is Personal[sic] Identifiable Information”—adding, the “lack of clarity in the regulations[sic] leads to confusion” (P52). Moreover, P48 highlighted the lack of company understanding of GDPR requirements makes compliance difficult. + + +Beyond these clear categories, we also received a wide range of other responses, including “lawfulness and dispute resolution” (P47), the conflict between “individual privacy and the public’s right to know” (P21), and being in a “rush to regulate” (P28). P27 mentioned challenges with user experiences, stating “users endure invasive pop-ups”. Further, P1 noted the challenges evolve during the lifetime of a project, stating “At the beginning of a project, privacy by design and default. In the middle or the end, data minimization and transparency” are the main challenges. Based on the challenges of implementation, participants described difficulties limiting functionality—e.g., “knowing when interacting with EU citizens” (P49) and “more than 1,000 +news websites in the European Union have gone dark” (P15). Meanwhile, P17 mentioned difficulties implementing GDPR requirements for data-intensive programming domains: “many of the GDPR’s requirements are essentially incompatible with big data, artificial intelligence, blockchain, and machine learning”. These challenges motivate new resources to help developers overcome problems related to GDPR implementation and compliance. + + +Finding 6: + The management and protection of user data and vague requirements are key challenges open-source developers face when implementing GDPR requirements. +---------------------------------------- +------------------------------- +Section 320: +4.4 RQ4: Compliance Assessment + + +This question was: RQ4: How do OSS developers assess GDPR compliance? We found three kinds of responses related to compliance assessment: consulting with legal counsel, referencing other compliance resources, and self-assessment. + + +Compliance Through Legal Counsel: + In our survey results, 15 OSS developers reported consulting with legal teams for GDPR compliance. We were also interested in exploring the impact of seeking legal counsel for GDPR compliance on OSS development processes. Seven participants with experience seeking legal consultations noted that it did have a positive impact on development activity (P6, P13, P14, P45, P53, P55, P56). Participants noted the benefits of seeking legal experts, stating the importance of “consulting with lawyers on the team who have a seat at the table” (P45), it “clarifies requirements and prevents misinterpretations” (P55), and allowed GDPR compliance to be “implemented rather easily” (P56). + + +However, most participants (n = 9) with experience seeking legal counsel lamented the impact, stating it decreased development productivity: “it slows things down as code has to be reviewed and objectives revised” and “it impacted our approach to the SDLC” (P1), “it’s a bit of a headache” (P24), “it slowed us down...was mostly a box ticking exercise” (P51), and “it interrupted the development but it is required” (P49). Respondents also bemoaned the costs of working with legal teams, stating “for a global project open source project any legal advice would be extremely expensive” (P52) and “open-source projects can’t afford even to sustain maintainers, not even speaking about legal team...Legal teams are consulted with some corps want to kill the project” (P47). P54 also noted legal experts found difficulties with the vagueness of GDPR compliance, replying the “legal team struggles to interpret how to comply with GDPR, there are a lot of back-and-forth. We have to change our design many times”. + + +In sum, legal experts can provide valuable insight into data privacy regulations and compliance, but developers often find these interactions negatively impact development processes. + + +Compliance Resources: + To assess GDPR compliance, three participants mentioned a variety of other resources. One participant described formal training on regulatory compliance, with a “special training on GDPR within the company” (P16). Another participant responded that their team uses an “accountability system” (P24) to assess compliance. Finally, P15 noted using online resources to help, but highlighted their ineffectiveness, stating, “many of the articles on the Internet about GDPR are incomplete or even wrong”. + + +Self-assessment: + Other developers mentioned they were largely responsible for evaluating the “legality” (P18) and “integrity and confidentiality” (P23) of the processing and storage of user data in their system on their own. P24 responded developers have to “consider whether you really need all the data you collect” while P38 advised to “get your consent in order”. P53 noted the impact on development teams, stating GDPR implementations “took us significant amount of time due to several rounds of architecture review”. P18 added there is “really no good way” to evaluate compliance. + + +Finding 7: + Developers often do not consult legal experts to validate GDPR compliance, relying on other resources such as compliance training, accountability systems, online resources, and self-assessed data management. + + +Finding 8: + Participants with experience interacting with legal teams provided mixed perceptions, feeling they provided valuable insight but hindered development processes. +---------------------------------------- +------------------------------- +Section 321: +5 DISCUSSION AND FUTURE WORK + + +Our results demonstrate that GDPR-related code changes have a major impact in OSS development, significantly increasing development activity with regards to number of lines of code added and the number of commits included in PRs–indicating increased effort in code contributions and code review activities for developers (§4.1.2). Further, we found that GDPR compliance provides a wide range of challenges for OSS development (§4.3) and that developers often assess compliance without the help of legal and policy experts (§4.4). These findings posit that implementing GDPR compliance is a challenging activity for OSS developers. + + +We recognize many stakeholders are involved in adhering to data privacy legislation. For instance, policymakers also play a role in data privacy compliance [112]. Data privacy regulations, such as the GDPR, are beneficial for protecting the rights and data of users online. However, we noticed developers complaining about providing privacy to people–holding negative perceptions of the GDPR policy in general and its implementation. To that end, we provide guidelines to enhance data privacy regulations and software development processes to reduce the negative effects of policy compliance in OSS software. +---------------------------------------- +------------------------------- +Section 322: +5.1 Improving Data Privacy Regulations + + +Provide Clear Requirements. + We found developers struggled to implement GDPR concepts (§4.3). Moreover, few respondents reported consulting with legal experts to provide insight of policies and assess the compliance of projects (§4.4). Thus, most development teams are forced to evaluate the system themselves. Yet, participants complained that understanding compliance is difficult due to the ambiguity of GDPR concepts: for instance, “the procedure for obtaining user consent and the information provided are unclear” (P25). Prior work suggests ambiguity is a main challenge in requirements engineering [28]. Further, incomplete requirements can increase development costs and the probability of project failure [38]. + + +To improve program specifications, researchers have explored a variety of techniques. For instance, Wang et al. explored using natural language processing to automatically detect ambiguous terminology in software requirements [111]. Similar techniques could be applied to regulations such as the GDPR to notify policymakers of unclear language and clarify requirements for software +engineers. Another way to improve the clarity of requirements is to involve software developers in the policy-making process. Verdon argues a good policy must be “understandable to [its] audience” [109, p. 48], yet our results show developers are confused by GDPR requirements. Prior work shows collaboration between policy makers and practitioners improves policies in domains such as public health [37] and education [61]. Thus, developers should be incorporated into the policy-making process to provide input on the impact of implementing and complying with policies concerning software development, such as data privacy regulations. + + +5.1.2 Policy Resources. Our survey results show OSS developers face challenges implementing GDPR-related changes (§4.3). Participants also found legal consultations negatively affect development processes (§4.4), and report existing resources are largely ineffective, primarily relying on self-assessment within the development team. Only one participant mentioned receiving formal training on GDPR compliance (P16). To that end, OSS developers largely resort to implementing and evaluating compliance on their own efforts with “insufficient information” (P26). Prior work also outlines issues with software developers and security policies, noting a lack of understanding from programmers [109]. + + +Based on our findings, we posit OSS development can benefit from novel resources to educate developers on policies and their implementation. To further support compliance, policymakers can provide resources, such as guides or online forums, to provide information on data privacy-related concepts in an accessible manner. These guidelines can also reduce the effects of GDPR compliance on code review tasks by providing specialized expertise and correct understanding for reviewers [85]. Yet, there are limited online developer communities focused on seeking help in data privacy policy implementation. Popular programming-related Q&A websites, e.g., Stack Overflow, are frequently used by developers to ask questions and seek information online [86]—and are used for discussions on data privacy policy implementation (see Table 1). However developers have no way to verify the correctness of responses, which can also become obsolete over time. Zhang et al. recommend automated tools to identify outdated information in responses for development concepts, such as API libraries and programming languages [116]. A similar approach can be used to keep responses regarding GDPR compliance up-to-date and accurate. + + +5.2 Improving Development Processes + + +5.2.1 Privacy by Design. Participants reported challenges implementing GDPR compliance (§4.3) and negative effects on development practices (§4.1.1). Moreover, our GitHub analysis found GDPR-related changes necessitated significantly more time and effort (i.e., comments, commits, etc.) for developers to implement and review in PRs (see Table 4). However, compliance is required for organizations to avoid “paying out hefty fines” (P30). Researchers have investigated techniques to streamline the incorporation of privacy in development processes. For instance, Privacy By Design (PBD) is a software development approach to make privacy the “default mode of operation” [35]. P50 mentioned cultivating “a privacy-respecting mindset long before GDPR came about” avoided negative impacts on development processes and made the effort required “quite minimal”. However, numerous participants noted the burden of implementing GDPR requirements, with one survey participant in particular (P1) highlighting that prioritizing privacy in software development processes “requires an overhaul”. Additionally, while PBD can benefit GDPR compliance efforts, Kurtz et al. note a scarcity of research in this area and note particular challenges with PBD for GDPR implementations, such as ensuring third party libraries also adhere to privacy principles [70]. + + +PBD can be effective for new projects starting from scratch [102], yet may be ill-equipped for existing projects complying with new and changing data privacy regulations. Anthonysamy et al. outline limitations with current privacy requirements that solve present issues, which may differ from regulations and policies in the future [25]. More work is needed to explore tools and processes to support data privacy in mature software projects. One solution could be a partial or gradual approach to compliance. For instance, some programming languages (e.g., Typescript) support gradual typing to selectively check for type errors in code [93]. Similarly, research in formal methods has explored supporting gradual verification of programs [26]. Thus, gradually introducing privacy into OSS can help reduce efforts related to GDPR compliance as opposed to overhauling development processes to prioritize privacy. + + +5.2.2 Automated Tools. We found GDPR compliance has a major impact on OSS development, significantly increasing coding and reviewing tasks for PRs in GitHub repositories (see Table 4). Developers who responded to our survey also indicated the impact of GDPR compliance on their project source code, noting data privacy regulations always need more software (P4) and violate the principle of minimum scope (P21). This indicates further difficulty for developers to validate their projects for the GDPR, with one participant responding there is “no good way” to assess compliance (P18). These findings point to an increased burden and effort on OSS developers to implement and review GDPR requirements to comply with data privacy regulations and avoid penalties for non-compliance (e.g., losing market share). + + +To that end, we posit automated tools can reduce the burden of GDPR implementation efforts. One participant mentioned using a tool, an “accountability system” (P24), to help assess compliance—however did not provide any details about this system. Our findings for RQ1 (§4.1) show GDPR-related pull requests have significantly more coding involved, consisting of more commits and lines of code added in code contributions, as well as requiring significantly more comments and time in reviewing processes. Thus, systems to support data privacy implementation and tools to review policy-relevant code are needed to streamline compliance. Ferrara and colleagues present static analysis techniques to support GDPR compliance [42]. Further tools can support review processes for assessing implementation changes. Prior work suggests static analysis tools can reduce time and effort in code reviews [94]. Future systems could also provide automated feedback to developers and reviewers on data privacy regulation compliance. For instance, using NLP techniques [17] or rule-based machine learning approaches [51] to automatically summarize requirements and verify compliance. + + +5.3 Other Directions + + +Based on our results, we observe several other avenues of future work. First, we plan to investigate other data sources to further +explore GDPR compliance in open-source projects. For example, we plan to mine relevant queries from Stack Overflow to gain insight into challenges and information needs developers have for implementing GDPR policies. We will also examine answers to observe how developers respond. For instance, online discussions between developers regarding policies often use disclaimers, such as the acronyms “IANAL” or “NAL” to indicate “I am not a lawyer”, before offering advice or answering questions related to legal frameworks. Without legal expertise, we anticipate it is difficult for OSS developers to offer guidance and seek help complying with data privacy regulations—motivating the need for novel approaches to support regulation adherence and compliance assessment. + + +Moreover, we aim to engage with policymakers to understand their perspectives on data privacy policies and the challenges developers face implementing them. We will collect qualitative insights from politicians and individuals with authority to develop policies to further explore methods to support the implementation of privacy laws. Finally, we aim to extend this work to investigate the impact of broader technology-related policies on open-source software development practices—for instance, investigating the impact of alternative data privacy regulations (i.e., the CCPA or CDPA) as well as other legal frameworks that will impact software development and maintenance, such as current and imminent legislation regarding artificial intelligence governance. +---------------------------------------- +------------------------------- +Section 323: +6 RELATED WORK + + +We note two lines of related work: characterizations of stakeholder perspectives on data privacy regulations, and technical and methodological approaches for regulatory compliance. + + +Stakeholder perspectives: Research has investigated perspectives on the GDPR for stakeholders in data privacy regulation compliance. Sirur and colleagues examined organizational perceptions on the feasibility of implementing GDPR concepts, finding that larger organizations were confident in their ability to comply while smaller companies struggled with the breadth and ambiguity in GDPR requirements [95]. Earp et al. surveyed software users to show the Internet privacy protection goals and policies for online websites do not meet users’ expectations for privacy [41]. Similarly, Strycharz et al. surveyed consumers to uncover frustrations and negative attitudes related to the GDPR [100]. Our work focuses on the perceptions of developers, who are responsible for implementing code changes to comply with data privacy regulations. + + +On the perspective of software engineers as regulatory stakeholders, van Dijk and colleagues provide an overview of the transition of privacy policies from self-imposed guidelines from developers to legal frameworks and legislation [107]. Alhazmi interviewed software developers to uncover barriers for adopting GDPR principles—finding the lack of familiarity, precedented techniques, useful help resources, and prioritization from employers. The paper also found that developers generally do not prioritize privacy features in their projects, focusing instead on functional requirements prevent compliance [20]. Similarly, researchers interviewed senior engineers to understand the challenges implementing general privacy guidelines, indicating a frustration with legal interactions and the non-technical aspects of requirements [29]. Finally, Klymenko et al. interviewed technical and legal professionals to investigate measures for data privacy compliance in GDPR implementation—noting a lack of understanding and need for interdisciplinary solutions [66]. While these papers take similar approaches to our research, ultimately our goals and questions are distinct, since we are specifically interested in the perspective of open-source developers. + + +Implementing and verifying GDPR compliance: Prior work has explored approaches to implement and verify GDPR compliance. For instance, Martín et al. recommend Privacy by Design methods and tools for GDPR compliance [78]. Shastri and colleagues introduce GDPRBench, a tool to assess the GDPR compliance of databases [92]. Li et al. investigated automated GDPR compliance as part of continuous integration workflows [74]. Al-Slais conducted a literature review to develop a taxonomy privacy implementation approaches to guide GDPR compliance [19]. Finally, Mahindrakar et al. proposed the use of blockchain technologies to validate personal data compliance [77]. Rather than proposing new software engineering methods, measures, and tools related to GDPR, our work takes an empirical perspective to understand current practices. +---------------------------------------- +------------------------------- +Section 324: +7 THREATS TO VALIDITY + + +We discuss three types of threats to validity. + + +Construct: In mining OSS repositories, we defined the construct of “GDPR-related pull requests” based on the presence of the string “GDPR”. Some PRs may incorrectly refer to GDPR (false positives), while others may perform GDPR-relevant changes without using the acronym (false negatives). This is also biased towards English-speakers, as this acronym differs in other languages. To mitigate non-English GDPR-related PRs polluting the non-GDPR-related dataset, we manually inspected PR titles for various iterations of the GDPR in other languages, including “RGPD” (French, Spanish, and Italian), “DSGVO” (German), and “AVG” (Dutch). However, these were not included in our GDPR-related dataset since we only focus on PRs in English for our analysis. We used off-the-shelf NLP techniques to assess sentiment, inheriting biases from these methods (e.g., misinterpreted connotations of homonyms such as “mock”). In addition, parametric models for sentiment analysis are based on defined dictionary values and cannot detect certain aspects of human communication, such as sarcasm. Prior work also suggests sentiment analysis tools can be inaccurate in software engineering contexts [64]. However, we use this to gain preliminary insights into developers’ perceptions of GDPR compliance in OSS. + + +Internal: We perceive no internal threats. This study provides characterizations rather than cause-effect measurements. + + +External: There are several threats to the generalizability of our findings. We inherit the standard perils of mining open-source software [65]. We focus on open-source software available on GitHub, which omits other code hosting platforms, such as GitLab, which may be used by different populations of developers. We doubt our results generalize to commercial software, since those development organizations directly face the consequences of GDPR non-compliance. We only consider the effect of GDPR because it is the most prominent privacy law, and hence has the most available data. Other regulations may have different effects. Specifically, we conjecture differences in the software engineering impact between general data privacy regulations, such as the GDPR and CCPA, +and industry-specific data privacy regulations, such as HIPAA and FERPA: general regulations may necessarily be more ambiguous. +---------------------------------------- +------------------------------- +Section 325: +8 CONCLUSIONS + + +Data privacy regulations are being introduced to prevent data controllers from misusing users’ information and to protect individuals. To adhere with these regulations, developers are charged with the complex task of understanding policies and making modifications to the source code of applications to implement privacy-related requirements. This work examines the impact of data privacy regulations on software development processes by investigating code contributions and developer perceptions of GDPR compliance in open-source software. Our results show that complying with data privacy regulations significantly impacts development activities on GitHub, evoking negative perceptions and frustrations from developers. Our findings provide implications for developers and policymakers to support the implementation of data privacy regulations that protect the rights of human users in digital environments. +---------------------------------------- +------------------------------- +Section 326: +9 DATA AVAILABILITY + + +We have uploaded the survey, datasets, and data collection and analysis scripts as supplementary materials [2]. Our IRB protocol does not allow us to share individual survey responses. +---------------------------------------- +------------------------------- +Section 327: +10 ACKNOWLEDGMENTS + + +Brown and Brantly acknowledge support from the Virginia Commonwealth Cyber Initiative (CCI). +REFERENCES + + +[1] [n. d.]. https://edps.europa.eu/data-protection/data-protection/legislation/its- +tory-general-data-protection-regulation_en + + +[2] [n. d.]. https://anonymous.4open.science/r/GDPR-OSS-Impact-D77B + + +[3] [n. d.]. MIT License. https://opensource.org/licenses/MIT. Accessed. July 2023. + + +[4] [n. d.]. Right to erasure (‘right to be forgotten’). https://gdpr-info.eu/art-17- +gdpr/ + + +[5] 1974. Family Educational Rights and Privacy Act of 1974. 20 U.S.C. § 1232g; 34 +CFR Part 99. https://www2.ed.gov/policy/gen/guid/epco/erpa/index.html + + +[6] 1991. GNU General Public License, version 2. Free Software Foundation. +https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html + + +[7] 1996. Health Insurance Portability and Accountability Act of 1996. Pub. L. +No. 104-191, 110 Stat. 1936. https://www.govinfo.gov/content/pkg/PLAW- +104publ191/pdf/PLAW-104publ191.pdf + + +[8] 2004. Apache License, Version 2.0. Apache Software Foundation. https: +//www.apache.org/licenses/LICENSE-2.0 + + +[9] 2010 IEC 61508-1:2010 - Functional safety of electro- + + +cal/electronic/programmable electronic safety-related systems – Part 1: +General requirements. International Electrotechnical Commission. +https://webstore.iec.ch/publication/5512 + + +[10] 2014. ISO 90003:2014 - Software engineering – Guidelines for the applica- +tion of ISO 9001:2015 to computer software. International Organization for +Standardization. https://www.iso.org/standard/59149.html + + +[11] 2015. ISO 9001:2015 - Quality management systems – Requirements. +International Organization for Standardization. https://www.iso.org/standard/62085.h + + +[12] 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council +of 27 April 2016 on the protection of natural persons with regard to the processing +of personal data and on the free movement of such data, and repealing Directive +95/46/EC (General Data Protection Regulation). Official Journal of the European +Union. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: +95/46/EC + + +[13] 2018. ISO 26262-1:2018 - Road vehicles – Functional safety – Part 1: Vocabulary. +International Organization for Standardization. https://www.iso.org/standard +/68383.html + + +[14] 2023. 5th State of CCPA & GDPR Privacy Rights Compliance Research Report +– Q4 2022. Cytrio. https://cytrio.com/wp-content/uploads/2023/02/5th-State- +of-CCPA-GDPR-Compliance-Report_FNL2.pdf + + +[15] 2023. GDPR Enforcement Tracker – list of GDPR fines. Enforcement Tracker. +https://www.enforcementtracker.com + + +[16] Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, et al. 2022. BotHunter: an +approach to detect software bots in GitHub. In Proceedings of the 19th Interna- +tional Conference on Mining Software Repositories. 6–17. + + +[17] Abdel-Jaouda Aberkane, Geert Poels, and Seppe Vanden Broucke. 2021. Ex- +ploring automated gdpr-compliance in requirements engineering: A systematic +mapping study. IEEE Access 9 (2021), 66542–66559. + + +[18] Saeed Akhlaghpour, Farkhondeh Hassandoust, et al. 2021. Learning from +enforcement cases to manage gdpr risks. MIS Quarterly Executive 20, 3 (2021). + + +[19] Yaqoob Al-Slais. 2020. Privacy Engineering Methodologies: A survey. In 2020 In- +ternational Conference on Innovation and Intelligence for Informatics, Computing +and Technologies (SICT). 1–6. https://doi.org/10.1109/3ICT51146.2020.9311949 + + +[20] Abdulrahman Alhazmi and Nalin Asanka Arachchilage. 2021. I’m all ears! +listening to software developers on putting GDPR principles into software +development practice. Personal and Ubiquitous Computing, 25, 5 (2021), 879–892. + + +[21] Reni Allan. 2007. Reskilling for compliance. Inf. Professional 4, 1 (2007), 20–23. + + +[22] Fernando Almeida and José Augusto Monteiro. 2021. Exploring the effects of +GDPR on the user experience. Journal of information systems engineering and +management 6, 3 (2021). + + +[23] Murugan Anandarajan, Chelsey Hill, Thomas Nolan, Murugan Anandarajan, +Chelsey Hill, and Thomas Nolan. 2019. Text preprocessing. Practical text +analytics: Maximizing the value of text data (2019), 45–59. + + +[24] Maythee Anegboonlap. 2018. Will this conflict with GDPR? https://github.com +/ReferaCandy/woocommerce-refera-candy/pull/24#discussion_r2381535 +46. Github repository: ReferaCandy/woocommerce-refera-candy. + + +[25] Pauline Anthonysamy, Awais Rashid, and Ruzanna Chitchyan. 2017. Privacy re- +quirements: present & future. In IEEE/ACM International Conference on Software +Engineering: Software Engineering in Society (ICSE-SEIS). IEEE, 13–22. + + +[26] Johannes Bader, Jonathan Aldrich, and Éric Tanter. 2018. Gradual program +verification. In Verification, Model Checking, and Abstract Interpretation (VMCAI). +Springer, 25–46. + + +[27] Ben Balter. 2015. Open source license usage on Github.com. Github Blog. +https://github.blog/2015-03-09-open-source-license-usage-on-github-com/ + + +[28] Muneera Bano. 2015. Addressing the challenges of requirements ambiguity: +A review of empirical literature. In 2015 IEEE Fifth International Workshop on +Empirical Requirements Engineering (EmpiRE) IEEE, 21–24. + + +[29] Kathrin Bednar, Sarah Spekermann, and Marc Langheinrich. 2019. Engineering +Privacy by Design: Are engineers ready to live up to the challenge? The +Information Society 35, 3 (2019), 122–142. + + +[30] Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery +Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the +Royal Statistical Society. Series B (Methodological) 57, 1 (1995), 289–300. http: +//www.jstor.org/stable/2346181 + + +[31] Ani Betts. 2021. Just enough EULA to not get banned. https://github.com/anais- +bets/sirene/pull/37. Github repository: anaisbets/sirene. + + +[32] Alex Bowyer, Jack Holt, and Johnnie Go Jeffers, Rob Wilson, David Kirk, and +Jan David Smeddinck. 2022. Human-GDPR interaction: Practical experiences +of accessing personal data. In Proceedings of the 2022 chi conference on human +factors in computing systems. 1–19. + + +[33] Randolph E. Bucklin and Catarina Sinimero. 2009. Click here for Internet insight: +Advances in clickstream data analysis in marketing. Journal of Interactive +marketing 23, 1 (2009), 35–48. + + +[34] Noel Carroll and Ita Richardson. 2016. Software-as-a-medical device: demystify- +ing connected health regulations. Journal of Systems and Information Technology +18, 2 (2016), 186–215. + + +[35] Ann Cavoukian. 2009. Privacy by design. (2009). + + +[36] David Chisnall. 2012. The Go programming language phrasebook. Addison- +Wesley. + + +[37] Bernard CK Choi, Tikki Pang, Vivian Lin, et al. 2005. Can scientists and policy +makers work together? Journal of Epidemiology & Community Health 59, 8 +(2005), 632–637. + + +[38] Tom Clancy. 1995. The chaos report. The Standish Group (1995). + + +[39] Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Rout- +ledge. + + +[40] Jose Luis de La Vara, Markus Borg, Krzysztof Wnuk, and Leon Moonen. 2016. +An industrial survey of the impact of evidence change impact analysis practice. +IEEE Transactions on Software Engineering 42, 12 (2016), 1095–1117. + + +[41] J.B. Earp, A.I. Anton, L. Aiman-Smith, and W.H. Stufflebeam. 2005. Examining +Internet privacy policies within the context of user privacy values. IEEE +Transactions on Engineering Management 52, 2 (2005), 227–237. + + +[42] Pietro Ferrara, Nicola Fausto Spoto, et al. 2018. Static analysis for GDPR com- +pliance. In CEUR Workshop Proceedings. CEUR Workshop Proceedings, 1–10. + + +[43] Aaron J Fischer, Brandon K Schultz, Melissa A Collier-Meek, et al. 2018. A +critical review of videoconferencing software to support school consultation. +International Journal of School & Educational Psychology 6, 1 (2018), 12–22. + + +[44] Lucas Franke, Huayu Liang, Aaron Brantly, James C. Davis, and Chris Brown. +2024. A First Look at the General Data Protection Regulation (GDPR) in Open- +Source Software. In Proceedings of the 2024 IEEE/ACM 46th International Confer- +ence on Software Engineering: Companion Proceedings (Lisbon, Portugal) +(ICESE Companion ’24). Association for Computing Machinery, New York, NY, USA, +268–269. https://doi.org/10.1145/3639478.3643077 + + +[45] GDPR. 2018. Art. 4 GDPR: Definitions. https://gdpr.eu/article-4-definitions/ + + +[46] GDPR. 2018. Art. 83 GDPR: General conditions for imposing administrative fines. +https://gdpr.eu/article-83-conditions-for-imposing-administrative-fines/ + + +[47] Github. 2022. Octoverse 2022: The state of open source software. https: +//octoverse.github.com + + +[48] Github. 2023. Creating a pull request. https://help.github.com/en/articles/crea- +ting-a-pull-request. Github Help. + + +[49] Georgios Gousios and Andy Zaidman. 2014. A dataset for pull-based develop- +ment research. In Conference on Mining Software Repositories. 368–371. + + +[50] Emiza Guzman, David Aziozar, and Yang Li. 2014. Sentiment analysis of commit +comments in Github: an empirical study. In Mining Software Repositories (MSR) + + +[51] Rajaa El Hamdani et al. 2021. A combined rule-based and machine learning +approach for automated GDPR compliance checking. In Eighteenth International +Conference on Artificial Intelligence and Law 40–49. + + +[52] Nikolay Harutyunyan. 2020. Managing your open source supply chain-why +and how? Computer 53, 6 (2020), 77–81. + + +[53] Paul Hitlin, Rainie Lee, and Kenneth Olmstead. 2019. Facebook Algorithms and +Personal Data. Pew Research Center. https://www.pewresearch.org/internet/2 +019/01/16/facebook-algorithms-and-personal-data/ + + +[54] Chris Hobbs. 2019. Embedded software development for safety-critical systems +CRC Press. + + +[55] Sebastian Holst. 2017. GDPR liability: software development and the new law. +LinkedIn (2017). https://www.linkedin.com/pulse/gdpr-liability-software- +development-new-law-sebastian-holst/ + + +[56] Mingyu Hu and Bing Liu. 2004. Mining opinion features in customer reviews. +In AAAI. Vol. 4. 755–760. + + +[57] Syed Fatiul Huq, Ali Zafar Sadiq, and Kazi Sakib. 2019. Understanding the +effect of developer sentiment on fix-inducing changes: An exploratory study +on github pull requests. In 2019 26th Asia-Pacific Software Engineering Conference +(APSEC) IEEE, 514–521. + + +[58] Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model +for sentiment analysis of social media text. In Proceedings of the international +AAAI conference on web and social media, Vol. 8. 216–225. + + +[59] International Association of Privacy Professionals. Accessed 2023. Global +Comprehensive Privacy Law Mapping Chart. https://iappr.org/resources/article/glo- +bal-comprehensive-privacy-law-mapping-chart/ +[60] International Electrotechnical Commission. 2010. Functional safety of electrical/ electronic/programmable electrical safety-related systems - Part 3: Software requirements. https://webstore.iec.ch/publication/9277 + + +[61] Chongtao Jia, Mihai Stănescu, and Elham Marin. 2019. How can researchers facilitate the utilisation of research by policy-makers and practitioners in education? Research Papers in Education 34, 4 (2019), 483–498. + + +[62] Onnisaak and M. Henna. 2018. User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection. Computer 51, 8 (2018), 56–59. + + +[63] Arthur M. Jacobs. 2019. Sentiment analysis for words and fiction characters from the perspective of computational (neuro-) poetics. Frontiers in Robotics and AI 6 (2019), 53. + + +[64] Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik. 2017. On negative results when using sentiment analysis tools for software engineering research. Empirical Software Engineering 22 (2017), 2543–2584. + + +[65] Eirini Kallianvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining github. In 11th Working Conference on Mining Software Repositories (MSR). 92–101. + + +[66] Oleksandra Klymenko, Oleksandr Kosenkov, Stephen Meisenbacher, Parisa Elahidoost, Daniel Mendez, and Florian Matthes. 2022. Understanding the implementation of technical measures in the process of data privacy compliance: A qualitative study. In Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 261–271. + + +[67] Michael Kretschmer, Jan Pennekamp, and Klaus Weber. 2021. Cookie banners and privacy policies: measuring the impact of the gdpr on the web. ACM Transactions on the Web (TWEB) 15, 4 (2021), 1–42. + + +[68] Oksana Kulyk, Nina Gerber, Annika Hilt, et al. 2020. Has the gdpr hype affected users’ reaction to cookie disclaimers? Journal of Cybersecurity - 1. 88–95. + + +[69] Aman Kumar, Manish Khare, and Saurabh Tiwari. 2022. Sensitivity Analysis of Developers’ Comments on GitHub Repository: A Study. In International Conference on Advanced Computational Intelligence (ICACI). IEEE, 91–98. + + +[70] Christian Kurtz, Martin Semmann, and Tilo Bohman. 2018. Privacy by design to comply with GDPR: a review on third-party data processors. (2018). + + +[71] Daniël Lakens. 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in psychology 4 (2013), 6267. + + +[72] Roslyn Layton and Silvia Elaluf-Calderwood. 2019. A social economic analysis of the impact of GDPR on security and privacy practices. In 2019 12th CMI Conference on Cybersecurity and Privacy (CMI). IEEE, 1–6. + + +[73] Thomas W MacFarland, Jan M Yates, Thomas W MacFarland, and Jan M Yates. 2016. Mann–whitney u test. Introduction to nonparametric statistics for the biological sciences using R (2016), 103–132. + + +[74] Abhishek Mahindrakar and Karuna Pande Joshi. 2020. Automating GDPR Compliance using Policy Integrated Blockchain. In IEEE Intl Conf on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conf on High Performance and Smart Computing (HPSC) and IEEE Intl Conf on Intelligent Data and Security (IDS). 86–93. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS49724.2020.00026 + + +[75] MH Lloyd and PJ Reeve. 2009. IEC 61508 and IEC 61511 assessments-some lessons learned. (2009). + + +[76] Thomas W MacFarland, Jan M Yates, Thomas W MacFarland, and Jan M Yates. 2018. 2023–2026. The impact of GDPR on global technology development. Journal of Global Information Technology Management 22, 1 (2019). + + +[77] Ze Shi Li, Colin Werner, and Neil Ernst. 2019. Continuous Requirements: An Example Using GDPR. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). 144–149. https://doi.org/10.1109/REW.2019.00031 + + +[78] MH Lloyd and PJ Reeve. 2009. IEC 61508 and IEC 61511 assessments-some lessons learned. (2009). + + +[79] Thomas W MacFarland, Jan M Yates, Thomas W MacFarland, and Jan M Yates. 2016. Mann–whitney u test. Introduction to nonparametric statistics for the biological sciences using R (2016), 103–132. + + +[80] Abhishek Mahindrakar and Karuna Pande Joshi. 2020. Automating GDPR Compliance using Policy Integrated Blockchain. In IEEE Intl Conf on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conf on High Performance and Smart Computing (HPSC) and IEEE Intl Conf on Intelligent Data and Security (IDS). 86–93. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS49724.2020.00026 + + +[81] Yod-Samuel Martad and Anu Kung. 2018. Methods and tools for GDPR compliance through privacy and data protection engineering. In IEEE European Symposium on Security and Privacy–Workshops. IEEE, 108–111. + + +[82] J M Valdez Mendia and J J A. Flores-Cuatle. 2022. Toward customer hyper-personalization experience — A Data-driven approach. Cogent Business & Management 9, 1 (2022), 2041384. https://doi.org/10.1080/23311975.2022.2041384 + + +[83] Dan Milmo and Lisa O’Carroll. 2023. Facebook owner Meta fined €1.2bn for mishandling user information. The Guardian. https://www.theguardian.com/technology/2023/may/22/facebook-fined-mishandling-user-information-ireland-eu-meta + + +[84] Rene Moquin and Robin L Wakefield. 2016. The roles of awareness, sanctions, and ethics in software compliance. Journal of Computer Info. Sys. 56, 3 (2016). + + +[85] Frank Nagle, James Dana, Jennifer Hoffman, Steven Randazoo, and Xanou Zhou. 2022. Census II of Free and Open Source Software—Application Libraries. Linux Foundation, Harvard Laboratory for Innovation Science (LISH) and Open Source Security Foundation (OpenSSF) 80 (2022). + + +[86] Chinenye Okafor et al. 2022. Sok: Analysis of software supply chain security by establishing secure design properties. In ACM SCORED Workshop. 15–24. + + +[87] Kang-il Park and Bonita Sharif. 2021. Assessing perceived sentiment in pull requests with emojis: evidence from tool and developer eye movements. In 2021 IEEE/ACM Sixth International Workshop on Emotion Awareness in Software Engineering (SEmotion). IEEE, 1–6. + + +[88] Luca Pascarella, Davide Spadini, et al. 2018. Information needs in contemporary code review. Proc. of the ACM on Human-Computer Interaction: CSCW (2018). + + +[89] Cole S Peterson, Jonathan A Saddler, Natalie M Halavick, and Bonita Sharif. 2019. A gaze-based exploratory study on the information seeking behavior of developers on stack overflow. In CI 1–6. + + +[90] Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on github. In Proceedings of the 11th working conference on mining software repositories. 348–351. + + +[91] Supreeth Shastri et al. 2020. Understanding and benchmarking the impact of GDPR on database systems. VLDB 13, 7 (2020), 1064–1077. + + +[92] Jeremy Sirk and Walid Tabu. 2007. Gradual typing for objects. In European Conference on Object-oriented Programming. Springer, 2–27. + + +[93] Devarshi Singh et al. 2017. Evaluating how static analysis tools can reduce code review effort. In 2017 IEEE symposium on visual languages and human-centric computing (VL/HCC). IEEE, 191–105. + + +[94] Sean Sirur, Jason R.C. Nurse, and Helena Webb. 2018. Are We There Yet? Understanding the Challenges Faced in Complying with the General Data Protection Regulation (GDPR). In 2nd International Workshop on Multimedia and Security (MMSec). Springer, 1–16. + + +[95] Ian Sommerville. 2011. Software Engineering, 9/E. Pearson Education India. + + +[96] Jeff South. 2018. More than 1,000 U.S. news sites are still unavailable in Europe, two months after GDPR took effect. Nieman Lab. https://www.niemanlab.org/2018/08/more-than-1000-us-news-sites-are-still-unavailable-in-europe-two-months-after-gdpr-took-effect/ + + +[97] Richard Sproat, Alan W Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christina D Richards. 2001. Normalization of non-standard words. Computer speech & language 15, 3 (2001), 287–333. + + +[98] David Stokes. 2012. 21 - Validation and regulatory compliance of free/open source software. In Open Source Software in Life Science Research, Lee Harland and Mark Forster (Eds.). Woodhead Publishing, 481–504. + + +[99] Joanna Stryczew, Jef Audouls, and Natali Helberger. 2020. Data protection or data frustration? Individual perceptions and attitudes towards the GDPR. Eur. Data Prot. L. Rev. 6 (2020), 407. + + +[100] Synopsys. 2023. Open Source Security and Risk Analysis Report. https://www.pwc.com/us/en/services/consulting/library/gdpr-readiness.html + + +[101] Aurelia Tamò-Larrieux and Aurelia Tamò-Larrieux. 2018. Privacy by Design for the Internet of Things: A Startup Scenario. Designing for Privacy and its Legal Framework: Data Protection by Design and Default for the Internet of Things (2018), 203–226. + + +[102] Neil Thurman. 2020. Many EU visitors shut out of US sites in response to GDPR never came back. Reuters Institute for the Study of Journalism. https://reutersinstitute.politics.ox.ac.uk/news/many-eu-visitors-shut-out-us-sites- + + +[103] Serj Tubin. 2023. GDPR stuff. https://github.com/2beens/serj-tubin-vue/pull/71. GitHub repository: 2beens/serj-tubin-vue. + + +[104] UNCTAD. 2021. Data Protection and Privacy Legislation Worldwide. United Nations Conference on Trade and Development (2021). https://unctad.org/page/data-protection-and-privacy-legislation-worldwide + + +[105] Christine Utz, Martin Degeling, Sascha Fahl, et al. 2019. (Un) informed consent: Studying GDPR consent notices in the field. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 973–990. + + +[106] N. van Dijk, A. Tanas, K. Rommetveit, and C. Raab. 2018. Right engineering? the redesign of privacy and Personal Data Protection. International Review of Law, Computers & Technology 32, 2–3 (Apr 2018), 230–256. https://doi.org/10.1080/13600069.2014.1575022 + + +[107] Ana Vazão, Leonel Santos, Maria Beatriz Piedade, and Carlos Rabadao. 2019. SIEM open source solutions: a comparative study. In 2019 14th Iberian Conference on Information Systems and Technologies (CISTE). IEEE, 1–5. + + +[108] Denis Verdon. 2006. Security policies and the software developer. IEEE Security & Privacy 4, 4 (2006), 42–49. + + +[109] Branka Vuleta. 2023. 10 unbelievable GDPR statistics in 2023. https://legaljobs.io/blog/gdpr-statistics/ + + +[110] Yue Wang, Irene L Manotas Gutièrrez, Kristina Winbladh, and Hui Fang. 2013. Automatic detection of ambiguous terminology for software requirements. In 18th International Conference on Applications of Natural Language to Information Retrieval (NAACL-HLT). Association for Computational Linguistics, 75–85. + + +[111] R Kent Weaver. 2015. Getting people to behave: Research lessons for policy makers. Public Administration Review 75, 6 (2015), 806–816. + + +[112] Krzysztof Wnuk, Tony Gorschek, and Showary Zahda. 2013. Obsolete software requirements. Information and Software Technology 55, 6 (2013), 921–940. +[114] Christopher Wylie. 2019. How I Helped Hack Democracy. New York Magazine. https://nymag.com/intelligencer/2019/10/book-excerpt-mindf-ck-by-christopher-wylie.html + + +[115] Christopher Wylie. 2019. I Made Steve Bannon’s Psychological Warfare Tool: Meet the Cambridge Analytica Whistle-blower. New York Magazine. https://nymag.com/intelligencer/2019/10/book-excerpt-mindf-ck-by-christopher-wylie.html + + +[116] Haoxiang Zhang, Shaowei Wang, Tse-Hsun Chen, Ying Zou, and Ahmed E Hassan. 2019. An empirical study of obsolete answers on stack overflow. IEEE Transactions on Software Engineering 47, 4 (2019), 850–862. +---------------------------------------- +------------------------------- +Section 328: +“They Can Only Ever Guide:” How an Open Source Software Community Uses Roadmaps to Coordinate Effort + + +DANIEL KLUG, CHRISTOPHER BOGART, and JAMES D. HERBSLEB, Carnegie Mellon University, USA + + +Unlike in commercial software development, open source software (OSS) projects do not generally have managers with direct control over how developers spend their time, yet for projects with large, diverse sets of contributors, the need exists to focus and steer development in a particular direction in a coordinated way. This is especially important for “infrastructure” projects, such as critical libraries and programming languages that many other people depend on. Some projects have taken the approach of borrowing planning tools that originated in commercial development, despite the fact that these techniques were designed for very different contexts, e.g. strong top-down control and profit motives. Little research has been done to understand how these practices are adapted to a new context. In this paper, we examine the Rust project’s use of roadmaps: how has an important OSS infrastructure project adapted an inherently top-down tool to the freewheeling world of OSS? We find that because Rust’s roadmaps are built in part by summarizing what motivated developers most prefer to work on, they are in some ways more a description of the motivated labor available than they are a directive that the community move in a particular direction. They allow the community to avoid wasting time on unpopular proposals by revealing that there will be little help in building them, and encouraging work on popular features by making visible the amount of consensus in those features. Roadmaps generate a collective focus without limiting the full scope of what developers work on: roadmap issues consume proportionally more effort than other issues, but constitute a minority of the work done (i.e issues and pull requests made) by both central and peripheral participants. They also create transparency among and beyond the community into what central contributors’ plans are, and allow more rational decision-making by providing a way for evidence about community needs to be linked to decision-making. + + +CCS Concepts: • Human-centered computing → Open source software; • Social and professional topics → Sustainability. + + +Additional Key Words and Phrases: collaboration; common pool resources; open source; Rust language + + +ACM Reference Format: +Daniel Klug, Christopher Bogart, and James D. Herbsleb. 2021. “They Can Only Ever Guide:” How an Open Source Software Community Uses Roadmaps to Coordinate Effort. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 158 (April 2021), 28 pages. https://doi.org/10.1145/3449232 +---------------------------------------- +------------------------------- +Section 329: +1 INTRODUCTION + + +Open source software (OSS) has come to fulfill an infrastructure role in the economy. Eghbal [26] highlights OSS projects such as MySQL and Ruby that both OSS and industrial projects depend on heavily, but are themselves non-profit OSS projects. To fulfill an infrastructural role, there needs to be careful coordination among maintainers and users of the infrastructure [68], who are doing the work on behalf of different companies or foundations, or perhaps as volunteers. Good coordination... +is especially important for infrastructure projects, since by definition the project is an essential underpinning of many other projects: poorly-considered changes can damage these stakeholders more than they would if the project was merely an incidental dependency of other projects, that they could simply swap out for an alternative. Coordination of work in self-organizing systems[27] poses a difficult and important problem in CSCW. + + +How can software infrastructure projects ensure that they will not only be maintained in the future, but will preserve values that their users depend on? Unlike in commercial software development, in OSS “developer community” [78] software projects there is no manager who has direct control over which features or attributes developers choose to spend their time on, yet these projects still need to somehow coordinate, stabilize, and make visible their development priorities. Open source projects do have governance, but governance models do not generally dictate what features will be added and when. For example, even in the highly orchestrated work in the Linux kernel, there are multiple coordination processes, driven by the open source norm that contributors self-select their tasks [75]. + + +Much preexisting work in CSCW has focused on the tensions between infrastructure contributors’ work on infrastructure and their own priorities, often driven by the primary work they do that the infrastructure is intended to support. For example in scientific software written by academic collaborators, short-term paper deadlines can lead people to focus on needed new features over long-term maintainability [68]; on the other hand infrastructure development can offer contributors new opportunities leading them to realign their own priorities [11], perhaps helping build consensus. Researchers have identified a broad spectrum of ways that OSS communities can organize themselves to coordinate development and avoid tragedy-of-the-commons problems [50], but in some cases preexisting social networks among contributors drive much of the work done [46]. Some OSS projects have taken the approach of borrowing planning tools that originated in commercial development, milestones and issue tracking (e.g. Scala 1), beta testing (e.g. PostgreSQL 2) or roadmaps (e.g. Rust), despite the fact that these techniques were originally conceived for very different contexts, i.e. strong top-down control and profit motives, in which executives and managers make final decisions about goals and timelines, and rank-and-file developers are responsible for carrying out these plans. Developers in open source communities, in contrast, are often free to choose their own tasks, so this bottom-up power may have an impact on how planning tools work in the open-source world. + + +Investigating how diverse OSS projects attempt to shape collaboration in a stable visible way requires considering the bottom-up forces at work: developers’ motivation whether and how to contribute, users’ motivation to choose, support, or influence development, and factors that make one project survive while another fails [98], as well as the top-down techniques leadership employs in projects despite the relative lack of power that OSS leaders have over their communities [75]. + + +In this research we investigate how consensus around a community’s direction is constructed, maintained, and evaluated. We approach this by considering how roadmaps as an originally top-down technique from industry are adapted and reconfigured to work for an OSS project. Roadmaps can be understood as a layout of existing plans to make future decisions. They are usually a visualization of further steps [97] intended to be open to later revision [79]. We do not investigate what effect the choice of roadmapping had over some other method of coordination the community could have chosen, or the process of deciding on the use of roadmaps in the first place; but rather how they carried out the particular method they did choose, and its immediate effects during + + +1 https://github.com/scala/scala/milestones +2 https://www.postgresql.org/developer/beta/ +one iteration. We look at an OSS roadmap’s creation, how it is applied, and how the community evaluates its impact, by addressing the following research questions: + + +RQ1. + What functions does a roadmap serve in an open source community? + + +RQ2. + How does an open source community use a roadmap to fulfill those functions? + + +Our results show that although a roadmap appears superficially to be an edict from project leaders specifying where resources should be applied, it in fact reflects a consensus among active developers about where they wish to apply their efforts. Its power derives not just from the core developers’ ability to accept or reject changes, but because it reassures a would-be contributor that productive developers are already motivated to collaborate with them, if they stick to roadmap-related topics. The roadmap-building process helps these developers reach consensus, and community members use the roadmap throughout the year as a rhetorical resource to cut off digressions and to signal intention to cooperate with community goals. +---------------------------------------- +------------------------------- +Section 330: +2 BACKGROUND + + +These research questions address an apparent mismatch between the idea of volunteers coming together to do work that motivates them, and roadmaps as plans that on their surface appear to be telling people what to do. Prior research has only partly explained how open source collaborators set directions, and literature on roadmapping in corporate settings appears to reveal little about how roadmapping applies to volunteer projects. In this section we describe prior research in both these areas. +---------------------------------------- +------------------------------- +Section 331: +2.1 The Problem of Coordinating Developer Effort in Open Source Software + + +In recent years, the use of OSS has become pervasive [35] among software developers resulting in great economic value of OSS [20, 34] which is, however, largely invisible to the public. Although OSS is often critical infrastructure [26], it is managed very differently from traditional infrastructure. Its users can freely distribute, access, adapt, modify and redistribute source code for their own and for community use. Analyses of OSS projects from various social and organizational perspectives have shown that managing such a project requires taking into account developers’ distinct motivations for contributing [5, 15, 38], benefits and rewards of contributing [13, 44, 54], preferred levels of involvement [4, 62], building and managing social capital [66, 80], networking [60, 76, 77], and differing communication and interaction strategies [6, 19, 33]. + + +The varying motivations and characteristics raise the question of how OSS communities coordinate to agree to and work towards common goals. We define “coordination” as many individuals deciding how to work together effectively; that is, how to choose tasks that amount to collective progress in a mutually agreeable direction as opposed to working at cross-purposes. OSS contributors and maintainers often work in a distributed and decentralized way, with very little hierarchy or institutional structure [22, 99], and are more likely to engage in projects and tasks based on personal interests [5]. Coordinating and organizing work in OSS projects therefore involves matching the demand for effort (desired features and known bugs that will take time and specialized skills to fix) with supply of effort (volunteers and paid developers who have their own motivations and priorities). +---------------------------------------- +------------------------------- +Section 332: +2.1.1 Supply of and Demand for Development Work. + + +Like any software, OSS requires maintenance “to correct faults, improve performance or other attributes, or adapt to a changed environment” [48]. Unfulfilled demand for maintenance may render regular software obsolete. But for infrastructure, the ramifications of insufficient maintenance are magnified because other projects and their users... +rely on the infrastructure; thus the demand for development effort is greater, coming from a large dependent pool of projects and users. Prior research shows the demand for maintenance work, such as issue fixes, testing, and documentation may depend on many factors: for example, the size of the user base for a particular feature [56], or extent of upstream or interdependent projects [12]. Research on managing OSS requirements [73, 103] shows how demand is discovered, analyzed, prioritized, and validated within discussions and issue requests. Popular projects need help triaging user-reported issues [2, 104]. Infrastructures typically also need coordinators [65] who ensure that individual projects have features needed for an infrastructure-wide release. + + +Skilled volunteers are motivated by factors such as their strength of identification with the community [38], internal (e.g., self-use) and external (e.g., reputation) motivations [36, 45], a desire to learn [102, 105], or long-term “hobbyist” status, in which developers become more deeply involved and play a critical role in long-term viability [74]. Developers hired by industry also play an increasing role in OSS development [28]. Firms are more likely to pay developers to participate as a way of sharing the cost of innovation, creating demand for their complementary products or services, establishing their technology as de facto standard, or attracting improvements or complements for their products [100]. However, industry support for OSS projects carries some risk of discouraging volunteers. But this can be mitigated by transparency in decision-making [101] and negotiation of governance, membership, ownership, and control over production [58]. + + +2.1.2 Matches and Mismatches in Effort Supply and Demand. Participants in OSS infrastructure are generally free to contribute anywhere. These individual decisions bring about an emergent allocation of effort across projects. But besides the decision-making of individual participants, it is unclear what mechanism influences participants to apply effort where there is the greatest need. In contrast, it is clear that in commercial firms participating in markets, the forces of supply and demand determine price, a strong signal guiding the allocation of resources [9]. Economists Dalle & David [21] were puzzled about “how, in the absence of directly discernible market links between the producing entities and ‘customers,’ the output mix of the OSS sector of the software industry is determined. Yet, to date, the question does not appear to have attracted any significant research attention”. We were unable to find research that addresses this issue in the years since then. The study of requirements management points out the difficulties of discovering, articulating, and implementing needed features even when development effort is plentiful [103]. The lack of development effort has been documented in the highly publicized Heartbleed bug [23], but we are not aware of systematic studies of under-supply or how to recognize and address it. In total, the research seems to support the conclusion that there currently is no general mechanism closing the gap between demand for and supply of effort, except for the perceptions and decision-making of individual developers. Yet infrastructure and effort mismatch are difficult for participants to see [68]. + + +2.1.3 Organizing and Allocating Work in Open Source Software Projects. OSS project leaders face tradeoffs between openness and fostering a productive collaboration. Decision-making behind closed doors can cause conflict that discourages volunteers, since they may feel their preferences are not being considered [40]. But too much visibility into disagreements among leadership can also lead to uncertainty among volunteers that decisions may not be firm and their contributions may not end up being used [82]. + + +With often only partial control and limited means of enforcement, OSS project leaders may rely on social factors such as their technical reputation and community traditions to promote a vision of the project’s direction [55, 64]. Publishing schedules and roadmaps can help get developers to identify with and take responsibility for community goals [64]. Leaders may develop formal +policies and guidelines for collaboration to give structure to developers’ work [40], and may assert the authority to reject additions in a given software release [55]. + + +Prior research has identified implicit ways that core members influence newcomers and peripheral members to adopt cultural norms and practices. Hemetsberger and Reinhardt [37] describe a number of mechanisms that core members of open-source projects such as KDE use to enculturate peripheral members: for example that project’s manifesto(^3) may discourage non-like-minded contributors, and KDE’s leaders enforce norms through mailing list discussions and code review processes. Crowston and Shamshurin [18] showed that core members of successful Apache incubator projects were more communicative than in unsuccessful projects, and were more likely to use pronouns in a way that suggested inclusiveness of the peripheral community. Gallivan [32], however, argues that rigorous control, standardization, and measurability (“McDonaldization”) helps open source projects achieve common objectives in virtual, distributed environments where trust relationships are difficult to form; in particular despite potentially many mutual trust relationships in open-source communities, control is a one-directional relationship from core to periphery. + + +2.2 Roadmaps in Commercial and OSS Development + + +Roadmaps are plans for use of resources over time, often created in iterative and reflective processes [61] and intended to be open for changes [79]. The goal is to lay out existing plans, future decisions, and visualize further steps [79, 97] that may be revised based on project results [41]. In commercial contexts, developer resources and needs are coordinated explicitly by management, and roadmaps are a tool to create, implement, and manage software in alignment with company strategies, product life-cycles, and audiences [24, 30, 96]. + + +In Software Product Management (SPM), roadmaps are a communicative tool for knowledge sharing [81], consensus-reaching, and individual interpretation of goals by people involved in development processes [47]. For example, product roadmaps present features to manage product stages [49, 96], select and assign requirements [25], and connect teams to ensure the success of a product within a larger time frame [30, 96]. To create roadmaps, information about audiences, their characteristics, and needs is usually collected beforehand [7]. As a communicative tool, roadmaps describe what will be (or should be) achieved in which way in a project, and how it will meet business objectives [57]. + + +Many OSS projects generate roadmap documents, including large OSS communities such as React [67], Facebook Libra [84], Scala [85], and QT [95] as well as industry-produced OSS such as AWS CloudFormation [14] and industrial coalitions like Open Service Broker [3]. These roadmaps appear to have varying roles in the communities. Some seem to have multiple versions as if they are being maintained and revisited, while others are one-time descriptions of envisioned future features. However it is difficult for a casual observer to tell what importance these roadmap documents play. In this research we choose the Rust Language community as particular example to examine its use of roadmaps. +---------------------------------------- +------------------------------- +Section 333: +3 CASE STUDY: ROADMAPS IN THE RUST LANGUAGE PROJECT + + +Based on our theoretical propositions, we selected the Rust programming language as a single-case study. It is appropriate both because of its popularity and its openness. Its popularity as infrastructure means that there are many users who may pressure participants to make and implement good choices about features and priorities. Its openness means that a rich variety of data about the Rust compiler community’s working and decision-making processes is available from blogs, forums, and GitHub repositories. Thus we have the opportunity to study in great detail a community + + +(^3)https://manifesto.kde.org/ +making and implementing consequential choices together. This constitutes what Yin [106] calls a “revelatory case” as it provides “an opportunity to observe and analyze a phenomenon previously inaccessible to social science inquiry.” + + +The Rust programming language has been growing into the role of a popular and important part of the software infrastructure [59]: many individuals, subteams, and outside organizations have a stake in its future. Rust is promoted as being suitable for infrastructural code where performance and reliability are important, such as web browser engines or in hardware devices with limited resources; for that reason it is used by numerous big tech companies [70], such as Facebook and Mozilla. The Rust community is organized in teams [69] and work groups. It has a large and active social community, with a variety of blogs, chat rooms, forums, GitHub discussion threads, and in-person conferences and meetings worldwide. + + +The Rust community adopted, then evolved, a roadmapping process, adding to the purposes that the roadmap serves over time. After the release of version 1.0 in 2015 the Rust core team initiated a process to organize and prioritize future work and to define future goals in all areas of Rust, citing a need to sequence feature additions to avoid later rework, and prioritize features that would solve many problems or benefit many users [51]. In 2016, the Rust team refined their RFC (request for comments) process; RFCs are documents proposing significant changes to the project [93]. An overarching roadmap process was added to define initiatives as rallying points with concrete goals, fixed time frames, and clear commitments from individuals. This process involves building consensus in the community on project-wide goals, then proposing these goals for community discussion through an RFC, and finally advertising and publishing the agreed upon goals as a yearly roadmap. + + +The Rust core team [69] released the first Rust roadmap in February 2017 [94]. To create the roadmap, the core team gathered priorities through a Rust community survey [92] and a commercial user survey with companies using Rust [91]. For the 2018 roadmap, in addition to the annual survey [83], the core team asked the Rust community to blog and post ideas for Rust in the next year [87]. The Rust community submitted 100 blog posts with suggestions for the roadmap. The core team then collected and incorporated the suggestions into an RFC for discussion and review [71], and released the roadmap in March 2018 [88]. The 2019 roadmap followed a similar process [86]: building on 73 community blog posts [72], a survey [90], and the RFC discussion, the core team created the roadmap and released it in March 2019 [89]. Unlike previous years, the 2019 roadmap was explicitly organized around Rust’s team structure, and made explicit mention of those teams having their own roadmaps. + + +The process thus has evolved over four years to more thoughtfully sequence development, to prioritize the worst problems and the most users, to elicit both broad (survey) and deep (narrative blog post) input from the community, to devolve some planning to the separate teams in the form of team-specific roadmaps, and, finally, to ensure that chosen initiatives are are not only needed, but actually supported by people willing to commit to working on them. + + +3.1 The 2018 Rust roadmap + + +The 2018 roadmap, available at https://github.com/rust-lang/rfcs/blob/master/text/2314-roadmap-2018.md, lays out four major goals: shipping a ‘Rust 2018’ edition of the language, creating more documentation support for intermediate-level Rust programmers, encouraging global spread of Rust by adding internationalization support and links with local Rust groups, and finally, strengthening the compiler’s work teams and their leadership. The document goes on to identify several concrete things that need to be done to support those areas. +The 2018 compiler release that is the Roadmap’s first goal focuses on support for four identified use cases for the language: network services, WebAssembly (i.e. use in web browsers), command line applications, and use in embedded devices. + + +The document also specifies a rough schedule for the year, starting with design work in February and March 2019, focusing on RFCs, “buckling down” in April through July, focusing on development work, “Fun!” in August through November, focused on forward-looking, exploratory features, and “Reflection” in December. + + +The document ends with a brief discussion of ‘rationale, drawbacks, and alternatives’. + + +3.2 Other Rust documents + + +The Rust project publishes a great many documents defining their product, their community, and its governance. Documents that are somewhat standard for open source projects, available at the project’s GitHub site at https://github.com/rust-lang/rust, include a “README.md” telling users what Rust is and how to install it, copyright and license files positioning the work legally, “CONTRIBUTING.md” and “CODE_OF_CONDUCT.md” files laying out high level community norms for how developers can contribute and how they are expected to interact, and “RELEASES.md” describing the change history of the project at a high level. Beyond that the project provides a wealth of deeper information, including “The Rust Programming Language”(^4) that teaches the language itself, the “Guide to Rustc Development”(^5) teaching how the compiler works and going into great depth about contribution norms and governance. + + +As of September 2019, beyond the compiler project itself, the Rust community had 147 other GitHub repositories under its organizational umbrella, including the collection of RFCs and the discussions around them https://github.com/rust-lang/rfcs (the annual roadmaps are found among these RFCs); the other repositories hold auxiliary tools, bots, websites, and documents. +---------------------------------------- +------------------------------- +Section 334: +4 METHODOLOGY + + +Understanding how communities work is often a complex research matter that requires large data collection. Our research benefits from the Rust community being very open and communicative; they produce lots of publicly accessible artifacts that document community and software related activities. Therefore, a high volume of data is available to researchers about how the community builds, maintains, and evaluates consensus about its direction. + + +4.1 Data Collection + + +To analyze what functions a roadmap serves to the Rust community and how they use it to fulfill those functions, we collected the following publicly available data produced by the Rust community. + + +4.1.1 Yearly Rust Roadmaps. We focused on the community-wide 2018 Rust roadmap and collected the official roadmap document [88]. Because the Rust community introduced its first roadmap for 2017, analyzing the 2018 roadmap allows to look at the past and the following years’ roadmap to include the community’s own reflection on how the roadmap was used. We collected 97 of the 100 blog posts [71] (3 were no longer retrievable) submitted by Rust community members during the process of creating the 2018 roadmap, written in response to the core team’s call for goals and directions for Rust in 2018. + + +4.1.2 Direct records of Rust compiler project work. The Rust community uses the RFC process to find consensus on proposed substantial changes to the language, standard libraries, and also to + + +(^4)https://doc.rust-lang.org/book/index.html +(^5)https://rustc-dev-guide.rust-lang.org/ +Fig. 1. We gathered software engineering artifacts, GitHub comments, blog posts, and email interviews. We analyzed software engineering artifacts and a set of pre-roadmap blog posts for roadmap-relevant content. We analyzed GitHub comments, chats, blog posts, and interview text through qualitative coding, and statistically analyzed Likert-scale answers in the email interviews. We describe the functions and mechanisms of the roadmap by drawing on all three types of analysis. + + +Community standards. Issues and PRs (pull requests: i.e. proposed specific edits to code) are often linked to RFCs and show where the actual coding work of all contributors happens and to what Rust contributors allocate their time and effort. Comments in these RFCs, issues, and PRs involve discussions among contributors and teams. We scraped all code and discussion contents of GitHub repositories associated with the Rust compiler project from Jan 1, 2018 to Dec 31, 2018, the time frame for the 2018 roadmap. This data allowed us to analyze how much of which kind of work (coding work and discussion work) by which people (core or peripheral people) adhered to the topics called for in the roadmap. + + +4.1.3 Records of argumentation and discussion. To understand how participants used the roadmap as a resource for argumentation during the year to affect decisions and priorities, we collected excerpts from across several communication channels used by the Rust community (Table 1) in which people explicitly mentioned the roadmap (i.e. explicit mentions of the word “roadmap” or “road map”): + + + + + + +Compiler project work + We extracted roadmap mentions from the corpus of RFC, issue, and PR discussions described above, excluding any mentions in the roadmap’s own RFC#2314 (https://github.com/rust-lang/rfcs/pull/2314). + + + + + + +Posts in Rust blogs and forums + Some participants in the Rust project, as in many OSS communities, maintain personal and official community blogs to post about updates, goals, ideas, or critical thoughts. To gather samples of participants explicitly using the existence and content of the roadmap as a resource in argumentation about the project direction, we searched for roadmap mentions in posts of main publicly accessible Rust blogs (Rust Blog (https://blog.rust-lang.org), Inside Rust Blog (https://blog.rust-lang.org/inside-rust), Read Rust (https://readrust.net), This Week in Rust (https://this-week-in-rust.org)) and the Rust Internals forum (https://internals.rust-lang.org) from Jan 1, 2018 to Apr 23rd, 2019. This time period was extended past the end of the year specifically to include posts advocating for content for the 2019 roadmap, since they might contain reflections about the 2018 roadmap and its content. The 2019 call for roadmap blog posts explicitly asked Rust contributors to reflect on Rust in 2018 [86]. + + + + + + +Online team meetings: + As an OSS community, Rust contributors characteristically are distributed all over the world which is why meetings are mainly held online. The Rust compiler team holds weekly meetings on the collaborative chat software Zulip (https://rust-lang.zulipchat.com) to update, manage, monitor, and plan work, in working groups and +Table 1. total number of collected data and excerpts of each data that mention "roadmap" + + + + + + +| data collected | RFC, issue, and PR comments on GitHub | Blog and forum posts | Blog posts reflecting on roadmap | messages in Zulip chat threads | total | +|----------------|--------------------------------------|----------------------|---------------------------------|-------------------------------|-------| +| mentions of "roadmap" | 135,234 | 3,394 | 73 | 58,901 | 197,602 | +| 59 | 110 | 28 | 144 | 341 | + + +throughout the larger community. Zulip conversations are semi-public: members need to create a free account and log in to participate, thus setting a low barrier to read or contribute to the discussions. Anticipating that team members and contributors might use these online meetings to discuss matters related to roadmaps and roadmap processes, we searched for roadmap mentions in Rust team meetings held on Zulip starting from Jan 1, 2018 and extending a few months beyond the end of 2018 to Apr 23rd, 2019, so as to also include reflection on the 2018 roadmap that happened in early 2019. + + +In the textual data collected from GitHub comments, online meetings, and blog post we identified a total of 118 participants by name and username who made at least one comment or multiple comments regarding roadmaps. We anonymized participants (P001, P002, ..., P118) chronologically by appearance in the different data sources. Five participants were core team members, 28 were members of other teams, 85 were non-team members, and five were identified as working group members (see Fig. 2). + + +4.1.4 Email Interviews. In addition to our data mining, we conducted short emailed structured interviews with Rust contributors to contextualize some of our findings about the two research questions. We generated a sample of community members stratified by level of community involvement. To find highly involved members, we collected a list of all Rust team members and all blog post authors for the 2018 roadmap (99 people at the time of the sampling). For the less-involved members we chose a random sample of the same size, out of all other committers to the compiler project who listed emails on their Github profiles. After later data cleaning (people with multiple or invalid emails), we ended up with a list of 190 candidates. We mailed the interview to those candidates, and 39 people responded (20.5% response rate). 24 of those identified themselves as belonging to a Rust team, and 15 said they did not (see Fig. 2). As the email interviews were conducted anonymously, we could not match participants with our existing list of participants in Rust forums. Therefore, interview participants were anonymized and numbered separately (PS001, PS002, ..., PS039). The interview questions asked Rust contributors about their experience with and opinions on all Rust roadmaps of any year. The questions are shown in Appendix A. + + +4.2 Data Description and Analysis + + +Our case study includes data collected from GitHub to reconstruct the allocation of effort in code work, textual data from several Rust community sources to analyze the communicative aspects of creating and using roadmap documents and discussing work effort related to roadmap topics, and answers from structured email interviews with Rust community members to triangulate results obtained from the collected textual data. To analyze how the Rust community creates, uses, and evaluates roadmaps, we decided to follow a mixed-method approach as quantitative or qualitative methods by themselves could not sufficiently address our research questions [16]. We simultaneously used quantitative and qualitative data collection methods and followed a convergent approach to separately analyze the data sets and then combine results in the interpretation. Following this +methodological approach, our goal was to generate a complete and deep understanding [17] of how roadmaps are used to discuss and allocate effort. + + +We used a quantitative technique to estimate the proportion of work done during the year that was relevant to the community-wide 2018 roadmap. We developed a roadmap topic heuristic for determining whether a given piece of text was relevant to topics mentioned in the roadmap. The purpose of the heuristic was to give us an objective way of saying whether a unit of discussion or coding was part of the roadmap or not, and secondarily, which part of the roadmap it pertained to. The heuristic starts with some hand-written regular expressions built around topics we found in the 2018 roadmap, and identifies text by applying those regular expressions, and also by making inferences about topics of “related” items, for example inferring that an issue that claims to track an RFC probably addresses the same topic as the RFC does. Its output is a list of all issues, pull requests, RFCs, and commits, tagged as “in roadmap” or “not in roadmap”. The algorithm is described in detail in Appendix B. We applied this heuristic to create two datasets: + + + + + + +To identify where ideas in the roadmap came from, we applied this heuristic to the 97 retrievable blog posts that answered to the 2018 Rust call for roadmap blogs, generating a mapping between 2018 roadmap topics and the blogs which the core team drew on in preparing the roadmap. We also identified whether each post was written by a member of a Rust team. + + + + + + +To estimate the influence of the roadmap on work done throughout 2018, we applied the heuristic to all Rust project issues and Rust project PRs, creating a data set consisting of one record per PR or issue, tagged with: a (possibly empty) set of roadmap topics, the context of discussion (issue, or PR), the type of contributor (Rust team member or not), and two measures of work effort: discussion work and coding work. Discussion work was operationalized as the number of characters of English text in PR and issue discussion threads (after removing code snippets); coding work was operationalized as lines of code added or removed in the Rust project commits associated with PRs. + + + + + + +These datasets distinguish individual participants as “team” vs “non-team”: we defined these by scraping the membership of all Rust teams (Figure 2) from the project’s governance page. + + +Fig. 2. We classify Rust community members as “team” (191 people) or “non-team” (other participants, whether contributing code or other effort), depending on whether they were listed on some team in https://www.rust-lang.org/governance on January 3, 2019. Although organizational literature often refers to “core” and “peripheral” members, to avoid confusion we use the word “core” for the 9-person team the Rust governance page identified as the “core team”, “team” to refer to the 191 members of teams (including core), and “non-team” for the larger community periphery. + + +As a supplemental check on sources of ideas in the roadmap, we manually inspected the ten commits to the 2018 Rust roadmap document in the GitHub Rust RFC project and summarized the changes, looking for introductions of new topics (none were found). This was a small, relatively + + + + +6https://www.rust-lang.org/governance, in January 2019 as retrieved from https://web.archive.org/web/20190103220022/https://www.rust-lang.org/governance +Table 2. Examples for applying codes to excerpts and sorting them into categories + + +| Excerpt | Code | Category | +|------------------------------------------------------------------------|-----------------------------|-----------------------------------------------| +| “a key step in any successful WG is going to be forming a +roadmap +” | point out need for a roadmap | creating a roadmap | +| “it’s not the kind of change that’s targeted for the roadmap this year” | rejecting an RFC | using roadmaps to decline allocating effort | + + +informal effort since a cursory check showed that little substantial change to the RFC had been made during the discussion. Complementing this quantitative technique, we also created a dataset of hand-coded roadmap mentions from project work, team meetings, and blogs. Table 1 shows the amount of data collected from each source. We extracted 341 excepts that mentioned "roadmap" or "road map" from the collected data, tracking for each excerpt its author and source. + + +In our case study of textual data collected from GitHub comments, online meetings, and blog posts, we followed a qualitative content analysis approach [42, 52] to characterize what people said about roadmaps in the excerpts of these sampled Rust online artifacts. + + +We decided to use qualitative content analysis for our case study because the method is rooted in social research but is not linked to any particular science or concepts [43]. This makes it a very useful approach to study documents and artifacts across various data sources [8]. Content analysis is profitable for mixed-method research as it comprises quantitative and qualitative methodology and qualitative content analysis in particular allows the researcher to extract manifest and latent information from different textual data [10]. + + +We used a data-driven open coding approach across all collected excerpts from the text-based data sources (GitHub comments, online meetings, blog posts) [52]. We performed inductive coding and created preliminary codes to construct a coding scheme while processing through the qualitative data. Open codes from all data sources were then combined into larger categories. In total, we generated 91 codes (see Table 2 for code examples), that were then sorted into eight categories (see Table 3). + + +Throughout the open coding process, the research team ensured a common shared agreement of generated and applied codes. The coding of each varying textual data set (GitHub comments, online meetings, blog posts) was based on the consistent use of codes by one researcher and the subsequent review of generated and applied codes by a second researcher. In this process, little disagreement was found. In such cases, the two researchers met to review, discuss, and refine the disagreed upon codes in relation to the data source and to which research question the coded excerpt relates most. Through this discussion and refinement, all disagreements were solved and codes were mutually validated. This way of ensuring validity in qualitative research through agreement is an established approach in the CSCW community [53] and matches our inductive coding approach for a qualitative case study across varying textual data sources. + + +In addition to analyzing blog posts and online meetings, the structured email interviews served to collect additional data to triangulate results we observed [29]. Interview questions asked about how roadmaps influence decision-making, how helpful roadmaps are for the community, and how roadmaps match personal work priorities (see Appendix A). We analyzed the numeric responses, shown in Table 4. To identify themes in the textual responses, one researcher grouped responses to each question into categories, and another researcher reviewed and challenged the categorizations. +---------------------------------------- +------------------------------- +Section 335: +5 RESULTS + + +In the following two subsections we answer the research questions: what does the roadmap accomplish for the Rust community (RQ1), and how does it do so (RQ2). +Table 3. Number of excerpts and number of codes applied per category + + +| Category | Num. excerpts | Num. codes applied | +|---------------------------------|---------------|--------------------| +| Creating a roadmap | 134 | 17 | +| Using roadmap to decline | 33 | 13 | +| allocating effort | | | +| Pointing effort to roadmap | 26 | 12 | +| topics | | | +| Executing a roadmap | 81 | 28 | +| Asking about a roadmap | 11 | 4 | +| Linking to roadmap documents | 28 | 2 | +| Praising the use of a roadmap | 13 | 6 | +| Criticizing the use of a roadmap| 15 | 9 | +| +total + | +341 + | +91 + | + + +Table 4. Summary of responses to email interview. Q1-3 asked for textual explanations accompanied by a Likert-style question on a five-point scale, where 3 would be a neutral answer, and 5 means the roadmap is high in influence on respondent’s activities, helpfulness to them, and in alignment with the respondent’s priorities. * = team and non-team differ (t-test, p<0.05). Questions are given in Appendix A + + +| Question | Likert answers (mean) | Text answers (count) | +|---------------------------------|-----------------------|----------------------| +| | overall | Team | Non-Team | Team | Non-Team | +| Q1 influence (1-5 scale) | 2.8 | 3.2 | 2.3 + | 10 | 6 | +| Q2 helpful (1-5 scale) | 4.1 | 4.2 | 4.0 | 11 | 7 | +| Q3 priorities (1-5 scale) | 3.5 | 3.7 | 3.1 + | 11 | 3 | +| Q4 improve (text) | - | - | - | 15 | 4 | +| Q5 years (numeric) | 3.7 | 3.8 | 3.5 | - | - | +| Q6 team (yes/no) | - | 24 yes | 15 no | - | - | + + +5.1 Functions of the Roadmap + + +Building and using the roadmap appeared to serve neither the extreme of forcing team members’ agenda on a wider community, nor letting the broader user community choose a direction. Rather it allowed team members and others to identify areas of consensus around project goals, and keep focus on those goals through the year. + + +5.1.1 Reaching consensus of purpose among team members. The Rust team put out a call at the beginning of 2018 asking the community to submit “blogposts reflecting on Rust in 2017 and proposing goals and directions for Rust in 2018”. An analysis of those posts and the eventual 2018 roadmap suggests that the Rust team indeed succeeded at soliciting input from people outside their team structure: only 18 of the 97 retrievable blog posts collected were authored by people listed as team members or alumni. + + +However, the blog posts responding to the solicitation did not seem to be a major source of novel ideas from outside the central community; the resulting roadmap document was a synthesis of shared ideas from many sources. Most (23 of 30) of the roadmap topics we could find in the blog posts were mentioned by both team and non team blog posts. Only three topics were mentioned only by team members, and four only by non team members. No single blog post contained more than 12 of the topics, suggesting that the roadmap really is a synthesis of many perspectives, not +Table 5. This table quantifies two types of effort (discussion and code contribution) applied by the Rust community, broken down by roadmap-relatedness and type of effort. The “total” figures show most discussion and coding was about non-roadmap items; however the Bytes per issue and lines per PR figures show that there was more effort per item about roadmap items. + + +| | Total issue text | ÷ # issues = Bytes per issue | Total lines of code | ÷ # PRs = lines per PR | +|------------------|------------------|------------------------------|---------------------|------------------------| +| Roadmap | 31.6 MB | ÷ 2899 = 10915.0 | 246K | ÷ 680 = 362.2 | +| Non Roadmap | 78.8 MB | ÷ 9092 = 8662.2 | 923 K | ÷ 3320 = 277.9 | + + +simply a codification of an existing consensus. Nor did the RFC-style process for accepting the roadmap after the core team had created it elicit completely new ideas from the community; rather discussion consisted mostly of clarification and acceptance. The roadmap changed little from when the core team proposed it on Jan 29, 2018, and its adoption on March 5th. Discussion (51 general comments and 20 comments linked to lines in the document) led to little change during that time. Besides typos, formatting, and clarifications, the main substantive change was a rewording to more strongly emphasize compiler performance. In short, the process did not appear to generate innovative new directions, but rather a consolidation of ideas that already had support but had not previously been gathered together. + + +5.1.2 Focusing work during the year. Analysis of effort expended by the Rust community during 2018 demonstrates that the 2018 roadmap was neither followed religiously nor ignored completely. Rather it represented a community focus, in the sense that its initiatives attracted proportionally more coding and discussion per issue than issues not on the roadmap. Table 5 quantifies two types of effort applied by the Rust community, broken down by type of effort (contributing to discussion in GitHub threads, or writing code). + + +Fig. 3. Volume of discussion (left) and coding (right), broken down by team (n=191)/non-team (n=2392) members, and by roadmap/ non-roadmap issues. Left figure measures discussion in megabytes; right figure measures lines of code in pull requests in thousands of lines of code. Non-roadmap matters dominated in volume, for both discussion and code. Non-team members did most discussion; team members did somewhat more coding overall. Note that team members do more work per person, but there are vastly more non-team members; and that roadmap issues involve more work per item than non-roadmap issues (see Table 5). + + +Roadmap matters constitute a minority of the work, but receive outsize attention. In the Rust compiler project’s issue and PR threads, the Rust community generated 121,457 comments across +11,991 different discussion threads during 2018 discussing proposed and ongoing development on the Rust compiler. The hottest threads (i.e. Rust project issues or pull requests with the most bytes of discussion) were more likely to be roadmap topics – 6 of the top 10 largest issue threads were roadmap topics, but overall only 2899 out of 11991 (24%) of issues related directly to the roadmap, as measured by our heuristic ($\chi^2 = 6.989$, p=.0082). In other words, roadmap topics were a community focus, but the long tail of smaller efforts actually constituted most of the discussion. Discussion of roadmap-related issues constituted on average 27.8% more text per issue than non-roadmap issues. Roadmap issues included more text than non-roadmap issues (p=.0092, 2-tailed t-test of log-transformed byte counts of issue discussions). + + +Although 21.1% of the lines of code added and deleted were roadmap-related, the same focus relationship applied; only about 17.0% of PRs worked on were associated with the roadmap, but these roadmap PRs were more substantial changes, averaging 30.4% more lines of code per PR (p<.0001, 2-tailed t-test of log-transformed lines of code per pull request). + + +Thus although the majority of issues discussed and code changes proposed are not envisioned in the roadmap, the ones that are in the roadmap consume proportionally more effort per issue, especially from frequent contributors. The roadmap appears to serve as a focus of attention while still allowing for a great deal of work outside its boundaries. Not everything the community agrees on requires consensus-building or needs to be in the roadmap; some priorities, such as bug fixing, are obvious. + + +When asked whether they followed the roadmap personally, twelve of the interviewees (PS002, PS006, PS007, PS008, PS016, PS018, PS020, PS021, PS023, PS027, PS036, PS039) replied that Rust roadmaps set a common direction for the community. Some emphasized common focus (“I think they give a clear focus point for the year, what the community wants to work on next, (...) see if we accomplished our goals and what our next ones can or should be” –PS008, email interview), while others emphasized an open, non-prescriptive attitude (“Ostensibly, they should not be called roadmaps, but they are helpful in the sense that they set +general + priorities. Of course, a lot of other things outside the roadmap will be worked on as you cannot command volunteers to do otherwise” –PS016, email interview). Another said: “Roadmaps are independent of the actual work that we can invest, so they can only ever guide” (PS021, email interview). + + +5.1.3 Prioritizing work for the core. Team members pay more heed to roadmap priorities than non-team members do. Although the roadmap is pitched as a description of general community priorities, there is evidence that some people, both team and non-team, perceive the roadmap as especially relevant to the activities of team developers, and less important or binding for non-team participants. Four of the 16 people who answered our interview question about how roadmaps influenced their decisions about what to work on indicated that the roadmap applied most to highly-involved people. One respondent, who claimed a fairly low (2/5) influence from the roadmap, said: “I started contributing for my own learning and experience, roadmaps didn’t influence me to start contributing but do influence what I contribute now that I’m more involved” (PS023, email interview). Another, who claimed high influence (5/5) from the roadmap, said, “I’m on the core team and work on subteams so the roadmap is directly related to the work I do” (PS039, email interview). + + +The amounts of text and code generated by participants support the idea that team members were more likely to pay attention to roadmap issues: 87 out of the 108 team members who contributed code in 2018 (81%) added a comment to at least one roadmap-related issue, while only 39% of non-team contributors did so (1065 out of 2757); this difference in proportions was significant ($\chi^2 = 75.98$, p<.00001). Still, the bulk of the work they did, regardless of role, was on non-roadmap matters. 34.1% of the text team members wrote in issue comments was in roadmap-related issues. +(6.9MB out of 20.3MB in Figure 3), and 21.7% of the lines of code they wrote were in roadmap-related PRs. Contributors not in teams had a similar proportion of roadmap work, with 27.4% of issue comment text and 20.0% of code lines written being roadmap-relevant. It seems that teams’ proportionally greater preference to work on roadmap issues at the individual issue level does not result in a vastly greater proportion of roadmap work done, by volume; this might be explained, for example, by team members “touching” many issues in which they do not do the bulk of the work. + + +Although some developers have very particular issues that they prefer to work on, others, especially team members, took cues from the roadmap when setting their own priorities. In the interviews, people gave equivocal answers to the question of whether Rust roadmaps influence their decision of what work to contribute: the average choice was 2.8 on a 1-5 scale, slightly closer to “not at all” than the scale’s midpoint of 3. People who said they were on a team rated this higher (3.2) than non-team respondents (2.3) (t-test, p<.01). Four of the people who elaborated on this question said that they felt the roadmap was mostly relevant to important issues addressed by team developers. Two specifically indicated that the presence of a feature in the roadmap gave developers confidence to work on that feature, knowing that some change they wanted to work on would be taken up by others in the community. One said they “only contribute drive-by [i.e. as a one-off edit without much community engagement] when an itch needs scratching; roadmaps do influence where I see a chance of scratching actually result in usable changes to the language” (PS001, email interview). In short, the roadmap provides encouragement to work on certain issues, for certain people, but most developers do not feel constrained to work on roadmap initiatives. + + +Influence between individual priorities and the roadmap ran both ways among interviewees. People rated agreement with the roadmap’s priorities slightly positively: an average of 3.5 on the 1-5 scale, with team members significantly higher at 3.7 than non-team members at 3.1 (t-test, p<.05). Out of 14 who chose to elaborate, causality ran both ways: two said their priorities matched the roadmap’s because they helped write it, and three said they just happened to agree with its priorities; on the other hand five said they pursued roadmap initiatives because they didn’t have their own priorities, and three said they disagreed with the priorities but valued the importance of having a shared goal more than getting their own way. One person said the roadmap priorities were too vague to resolve the disagreements that were relevant in their working team. + + +5.1.4 Creating external visibility. Some saw the roadmap as also serving to communicate the intentions of the Rust community to those outside the community, to make the community’s trajectory more predictable. When first proposing the roadmap process, the author of the proposal listed among its goals “Advertise our goals as a published roadmap.” and “Celebrate our achievements with an informative publicity-bomb” [1]. + + +In our interviews, four of the 14 people (PS001, PS005, PS006, PS038) who answered our question about why roadmaps are valuable indicated that they helped the project communicate its vision and intentions outside the project. One said the roadmap helps users plan by giving them “(...) a sense of which unstable features are OK to use in a project that’s planning to switch to stable in a reasonable time frame” (PS001, email interview). Another respondent found them helpful as a way to judge their own plans to use the language: “I consider Rust to still be a young language that is not yet finalized, depending on the direction it goes it could be a deal-breaker for me” (PS038, email interview). + + +5.1.5 Building a sense of group identity. In online team meetings on Zulip, the largest number of roadmap mentions concerned creating roadmaps. The majority of mentions (63%, 91/144) were by a single participant, P008, a core team member who championed both the roadmap and the formation and strengthening of Rust’s team structure in 2018. P008’s rhetorical use of the roadmap included emphasizing the need to start a separate roadmap, e.g. for a subproject, and suggesting +and collecting roadmap topics for existing roadmaps. P008 emphasized benefits of having roadmaps, such as successful collaborative work (“a key step in any successful WG is going to be forming a +roadmap +” –P008, core team member, online meeting), structuring work processes (“I think encouraging people to outline a roadmap with specific steps is a good idea” –P008, core team member, online meeting), or reaching bigger and shared goals. They argued, for example, that creating roadmaps is worth the effort put into it (“it’s worth taking the time to make the roadmap” –P008, core team member, online meeting) and that work time is needed to create roadmaps. + + +A few non team members also mentioned a need for roadmaps to organize work effort (“We need to open issues first, and to have some kind of roadmap” –P040, non team member, online meeting) but were overall less committed to making decisions of how to create and manage roadmaps (“not sure if we want to wait and collect all the appropriate tool/subteam roadmaps and publish one collectively?” –P038, non team member, online meeting). In online meetings, non team members rather make comments to show mostly strong support for roadmap creation in reaction to suggestions made by core team members (“I think a roadmap is definitely a good idea, something to get working groups working towards a goal could be helpful in keeping them active” –P045, non team member, online meeting) or praise the effort made by team members to create and apply roadmaps (“I applaud all this, can’t agree more on everything :)” –P048, non team member, online meeting). + + +Team members understood roadmaps as a useful planning tool for ongoing and future work and to manage working groups and attract more contributors by presenting work areas and goals. Roadmaps functioned to manifest topics working groups should focus on over a certain time which is why team members gently pushed towards creating roadmaps, for example by suggesting a new group begin with a very lightweight alternative to the complex community-wide process: (“I’m not imagining very long ‘roadmaps’, just some bullets” –P008, core team member, online meeting). The team members’ effort to have contributors and working groups start roadmaps illustrates the need and the goal to organize and manifest work in written form and how especially the core team tries to manage larger and general goals for the distributed Rust community. + + +5.1.6 Summary. The Rust community’s team members began with a diverse set of priorities as individuals: the roadmap process was a way for team members to decide on a consensus focus of attention, and commit to applying themselves to those things during the year. It gave them a way to define themselves more strongly as a group by knowing that they had a shared purpose, and there is some evidence that it gave peripheral participants a way to assert their identity with a group, and for in-group members to gently channel outside contributions away from distracting alternate paths. Although the process explicitly listened to input from outside the community of Rust team membership, it did not in practice bring significant new ideas from outsiders into the conversation. + + +5.2 Mechanisms of the Roadmap + + +A roadmap written and never referred to again might simply gather dust and bear no relation to subsequent activity. The Rust community however appears to take the roadmap seriously after it is written. Individuals used it to gauge whether their own ideas are likely to be supported by others, to strengthen formation of teams, to discuss and argue with each other to encourage or discourage proposed efforts, and to reflect on progress. + + +5.2.1 Assembling work groups. Although the roadmap, during its creation phase, helps the whole community build consensus about its overall goals, developers also use it to find each other and form collaborations to do more particular tasks. + + +In blog posts, team and non team members alike mentioned personal or project roadmaps as a way to inform each other about work activities and promote plans of action. For example, they +referred to detailed goals in project roadmaps ("There’s a bit more detail on the project roadmap" –P091, non team member, blog post) or pointed out roadmap goals for work groups ("Embedded is one of the four target domains in the Rust 2018 Roadmap (...)" –P084, non team member, blog post). + + +In one issue comment, a contributor motivated others to contribute ideas to the roadmap call for blog posts to influence the Rust roadmap ("Please write a Rust 2019 blog post and express this concern. I think if enough of us do that, we can influence the roadmap" –P021, team member, issue comment). + + +Core team members in early 2018 pushed for creation of formal working groups for domains that were defined as focus in the roadmap. In blog posts, team members emphasized that work effort would be aimed at domain working groups ("the primary focus of this year’s project is (...) the domain working groups that we kicked off with our 2018 Roadmap" –P076, core team member, blog post) and team leaders advertised to the community to allocate their resources to domain working groups. Blog posts at the time announced new working groups for a domain or argued for reorganizing existing working groups to better meet roadmap goals ("The dev-tools team should be reorganised to continue to scale and to support the goals in this roadmap" –P077, team member, blog post). + + +Conversely, although the roadmaps are not promoted as being a complete list of things to work on, they also serve to pre-warn developers that some things they might work on would not likely attract much support or collaboration. In some RFC, issue, and PR comments, team members used the roadmap to refer to the overall direction Rust should take. Even without definite future goals, the mere existence of a roadmap process served to reject proposals not matching potential goals. This included explanations such as, it is not the right time, not the right trend ("While the details of roadmap is still in play, (...) this seems like a clear expansion with insufficiently strong motivation" –P008, core team member, RFC comment), or not the right perspective ("I don’t think that major rework of enums currently aligns well with our current priorities or those priorities we are likely to set in the upcoming roadmap" –P008, core team member, RFC comment). + + +5.2.2 Discouraging non-roadmap RFCs and basis for rejecting proposals. Team membership appears to affect how people talk about the roadmap. Roadmap mentions by team members in RFC, issue, and PR comments intended to point contributors to roadmap topics and away from the RFC proposal ("I’d like to draw attention to our 2018 roadmap" –P012, core team member, RFC comment). However, team members often still valued developers’ ideas and motivated future work. For example, they presented the prospect that a feature could make it on the upcoming roadmap ("could be an interesting thing to consider for next year’s roadmap" –P002, team member, RFC comment). + + +The roadmap gave a justification for team members and especially for core team members to dismiss proposals that did not fit well with the community’s vision for Rust, or that would take too much significant effort away from current efforts. In comments on GitHub, the roadmap was mostly mentioned as an argument in discussions for team members to decline proposed RFCs when they did not seem to fit roadmap goals ("it’s not the kind of change that’s targeted for the roadmap this year" –P002, team member, RFC comment). This argumentative strategy seems to go against the perception of the roadmap as a mere guideline, instead posing roadmap goals as delimiting boundaries to which work and effort should be allocated. Only some comments gave additional explanations for declining such RFCs in relation to the roadmap. For example, the roadmap was treated as a strict work plan when proposals are a possible threat to achieving roadmap goals ("I am pretty worried if we delay now we will have a hard time delivering on our roadmap for the year" –P007, team member, issue comment). Team members also used the roadmap to reinforce something perceived to be a true but insufficient reason to end RFC or issue discussions, for example, when a proposal did not generate enough community interest ("There hasn’t been a lot of activity on this RFC (...) it also doesn’t particularly fit the roadmap" –P008, core team member, RFC comment). +also defined adequacy of RFC discussions against the roadmap goals ("I also don’t think this RFC is of high enough priority to the Rust roadmap to devote a lot of attention to reaching consensus" –P018, core team member, RFC comment). In other words, features that did not match the roadmap were not worth the effort to find consensus within the community. + + +Although non-team members rarely used the roadmap to argue against features, one contributor mentioned the roadmap to speak out against an issue ("Finally, ‘abstract type’s are not close on the roadmap" –P011, non team member, RFC comment). Beyond its role then in consolidating a consensus when it was created, the roadmap also is used as an argumentative resource for encouraging work on shared goals, and discouraging work (and even extended discussion of work) that risks becoming a distraction. + + +5.2.3 Reason to promote particular issues and PRs. We found in issue and PR comments non team members mostly mentioned the roadmap by referring to, supporting, or emphasizing roadmap goals in issue discussions or when asking about clarification or the status of roadmap goals. They often argued in favor of features that were on or related to the roadmap ("Using build systems other than/in addition to Cargo is explicitly a goal in the 2018 roadmap" –P028, non team member, issue comment). They often mentioned the roadmap as a strong reference to argue for working on or implementing features, sometimes even with reference to previous roadmap topics ("Cargo being able to integrate into larger build systems was I think on the 2017 roadmap" –P009, non team member, RFC comment). In discussing work effort in issues and PRs, non team members also pointed roadmap goals out to others ("Note for those who haven’t seen yet: macros 2.0 is apparently slated to be +stable + later this year, according to the proposed roadmap" –P021, team member, issue comment). + + +5.2.4 Shared basis for later reflection. The Rust roadmap process promises a retrospective reflection at the end of each year [1]. As part of that, the Rust core team asked people to reflect on 2018’s roadmap when posing ideas for the 2019 Roadmap. The reflections within these posts mostly evaluated progress on the roadmap’s particular initiatives. For example, posters praised progress on WebAssembly ("2018 has been a really cool year for WASM and Rust" –P116, team member, blog post reflecting) or on futures and async/await ("A lot of progress was made on Futures async/await in 2018" –P110, team member, blog post reflecting). People also criticized lack of progress in unfinished tooling ("Tooling was a large part of the goal for Rust 2018. If one gets lucky, tooling around editor and IDE support can “just work”, but many times it doesn’t." –P071, non team member, blog post reflecting) or missing libraries. Other posts commented on the features themselves, claiming that changes made had no actual benefit for the users or were mistimed. + + +Reflections about the process itself were relatively rare. Developers mentioned that community collaborative work processes had not yet improved as planned and that the community still needed to better manage exhaustion and time spent on topics in general ("many of the key contributors to rustc (...) were put under an enormous amount of pressure to get their changes shipped by the deadline" –P086, non team member, blog post reflecting). Moving into 2019 as the efforts to reflecting on 2018 waned, blog posts mentioning roadmaps mostly highlighted work group achievements, such as developments in the Rust package manager, cargo; WebAssembly goals and stabilization; and the growth and increased productivity of Rust teams. This seems consistent with the 2019 roadmap’s shift in emphasis towards team-specific roadmaps. + + +In our email interviews, 19 people (PS002, PS005, PS006, PS007, PS008, PS013, PS016, PS018, PS021, PS023, PS025, PS028, PS029, PS030, PS032, PS035, PS036, PS037, PS039) responded to our question about how roadmaps could be improved; all but two of these were people on teams. Most of the suggestions seemed aimed at reinforcing the roadmap’s role as a commitment to achieve goals. The most common suggestion (7 respondents: PS006, PS007, PS008, PS028, PS030, PS032, PS035) +was better reflection about the process, in most cases at the end of the year during preparation of the next roadmap. One respondent said: “It’d be nice to have a retrospective that examines how much work for the year kept to plan, and to give a summary of how the language advanced in the desired direction” (PS007, email interview). Seven respondents were satisfied with the process (PS002, PS036, PS037) or said they had no opinion (PS005, PS013, PS023, PS029), but the rest had ideas for improvements. Other suggestions were: less ambitious goals, more specific/concrete goals, and better estimation of effort levels. Only two non-team members responded to this question; one of these called for more stakeholder involvement, saying: “Figuring out low threshold way of bringing library stakeholders into the projects where minimal time commitment is paramount” (PS018, email interview). + + +5.2.5 Summary. The intention and process for creating a roadmap gave the community an opportunity and shared artifact around which to talk about and balance priorities, and define boundaries and shared purpose when forming teams. During the year it was in effect, community members used it in online discourse as justification for discouraging off-topic work, and as justification for encouraging on-topic work. It also tipped the balance for individual decisionmaking about work allocation by providing evidence that on-topic efforts would be supported by other community members. Afterwards it served as a standard against which to evaluate progress over the year. +---------------------------------------- +------------------------------- +Section 336: +6 DISCUSSION + + +Rust’s roadmap process strikes a balance between openness to new ideas and people, and unifying around common goals. As a popular programming language, there are many potential contributors who could be welcomed and encouraged to help; but as mentioned above in Subsection 2.1.3, eliciting help from the peripheries of a community requires a balance between welcoming openness, and predictable direction. Rust’s process seems to strike that balance by creating some ceremony around the transition from openness to direction: they welcome input when building the roadmap, then visibly commit to one direction when the roadmap is released. Although few new ideas from outsiders appear to enter the roadmap through this process, they are enumerated, summarized, and listened to. The fact that new ideas from outsiders have a non-zero chance of being heeded may well be important for encouraging participation, just as the infinitesimal but non-zero chance of winning a lottery is effective in encouraging broad participation. + + +Another advantage of the transparent roadmap creation process is that it confers legitimacy on the governing process [31]. A document with no visible grounding in such a process might not be trusted as out of date, or as one individual’s interpretation of the community’s goals, or even as intentions of a sponsoring organization like Mozilla. In contrast, by offering prospective contributors the ability to gain knowledge and trust of a community’s true intentions, Rust might be allowing them to more quickly gain a sense of belongingness to the community, a well-studied motivator for contribution [38]. The fact that we observed non-team participants encouraging others to work on PRs relevant to the roadmap suggests that they may be visibly signalling their commitment to the community by demonstrating their familiarity with the roadmap. + + +When individual contributors can trust that planned work will be done by others in a known timeframe, “divide and conquer” approaches to coordination may become more viable. Howison and Crowston [39] found concurrent development of dependent contributions to be rare in open source. When studying how open source projects performed complex multi-person tasks, Howison and Crowston only observed developers either immediately adding contributions when the necessary supporting code was already in place, or deferring contributions in the hopes that someday that support would become available. They did not observe a pattern of multi-person interdependent work, in which one developer proceeded on a feature, trusting that another developer +would be writing supporting code at the same time. We hypothesize that such co-work may be more common in projects that provide some trustable signal about others’ intentions. Searching for such examples in Rust would be fruitful future work. + + +Team members, particularly the core team itself, play an important role in curating suggestions and articulating a common vision. + The core team influences the consensus built and maintained by the roadmap process by: + + + + +Framing community survey questions and requests for pre-roadmap blog posts, then choosing among the answers to build a coherent set of initiatives. + + +Using their visibility and respect to argue for their vision publicly, in blog posts, RFC and issue discussions, forums, team meetings. + + +Holding voting privileges over RFCs and merge rights for PRs; as mentioned earlier, while most accepted RFCs do not align with the roadmap, the roadmap is sometimes used a way to frame rejection of RFCs, usually that are problematic for other reasons. + + +A roadmap allows core team members to take a role similar to a manager; this can be seen, for example, in P008’s strategy in steering team and contributor effort by using the roadmap as an agreed upon validation. +---------------------------------------- +------------------------------- +Section 337: +7 IMPLICATIONS FOR OTHER PROJECTS + + +A case study is useful for providing a deep example of how a process has played out in the real world: as such it can provide experiences that other projects can learn from, but other projects considering roadmapping need to consider how it applies to their own context. + + +A project may want to consider a roadmapping process if it is struggling to balance diverging priorities and wants to strengthen a sense of shared direction. Based on our observation of a single case, we suggest the following guidance: + + + + +Actively solicit input from the larger community of developers as well as the core team. As we saw in this case, the overlap in ideas can be very helpful in identifying areas of consensus that already exist, and in letting those harboring ideas lacking in consensus that there is unlikely to be significant effort, in the aggregate, applied to their ideas. + + +Adopt a non-zero number of ideas from the community. It seems likely that in order to keep the larger community engaged and interested, a few of the ideas from beyond the core team should make it into the roadmap. + + +The evaluation process should be open and fair. As with any form of governance, fairness and openness convey a sense of legitimacy around the decision-making and enhance the likelihood that the community will accept and act on the roadmap. + + +Don’t expect all – or even most – of the development work and discussion to focus on roadmap items. Nevertheless, significant progress on these items can be made, especially by the most frequent contributors. + + +Reflecting on the community’s progress against the roadmap and on the process by which the roadmap was constructed can be helpful in creating future versions. + + + + +As we caution in the next section, however, this paper describes Rust’s experience building a roadmap process for its own particular needs. It is not clear how this process would need to be different for a community building different software, with different developers, for different users. +---------------------------------------- +------------------------------- +Section 338: +8 THREATS TO VALIDITY + + +Our results rely in part on detailed qualitative analysis. Qualitative studies mostly do not aim at generalizability but at providing “a rich, contextualized understanding of human experience through the intensive study of particular cases” [63]. We looked at the Rust community as a case +study example on how OSS communities use roadmaps as organizational tools to manage and allocate work effort to shared work goals. Interviewees may not have been representative of the entire community; although our response rate was fairly high, there is a long tail of contributors, and there may be some self-selection bias especially among low-volume contributors. + + +We do not know how typical Rust is of OSS communities with regard to its roadmap, so we only speculate about how our findings might apply beyond Rust. + + +We identified a specific list of roadmap topics, and classified issues, PRs, and RFCs according to those topics using a heuristic, described in Appendix B, that may undercount what work is or is not from the roadmap. The boundaries of these topics are not well-defined, since features interact, and work on a non-roadmap feature may be needed where it interacts with a roadmap feature, or vice-versa. However we relied on titles and labels assigned by the community themselves, and our mapping from roadmap topics to labels in many cases had a great deal of face validity. + + +We do not attempt to tease out the effectiveness of roadmaps as a coordination mechanism, as compared to other ways of governing. Our focus was on understanding how this community constructed and used roadmaps. Future work could address questions of effectiveness by, for example, comparing quality, productivity, or community satisfaction before and after roadmap adoption. +---------------------------------------- +------------------------------- +Section 339: +9 CONCLUSIONS + + +In this work we set out to understand the functions of roadmaps for the Rust community, and how they used it to fulfill those functions. To do this, we qualitatively examined the creation, management, and reflection on consensus through the roadmap process, and estimated the proportions of roadmap-related work done throughout the planned year. + + +We have shown that roadmap’s purposes included building and legitimizing consensus, focusing and prioritizing collective attention, particularly for team members, building group identity, and creating external visibility for the community’s plans. + + +The community accomplishes these purposes by assembling work groups around the roadmap’s structure, using roadmap goals as justification for directing people towards roadmap-related work, and by using the roadmap to ground reflection at the end of the year when planning for the next year. + + +The power that the roadmap has to influence contributors’ choices during the year comes from the fact that it comprises exactly those initiatives where collaborators are willing to help. Its transparent process provides evidence of that willingness to other developers who are deciding where to contribute their effort. During the roadmapped year, instead of strictly constraining activity, the roadmap rather functioned to nudge contributors to work on collectively agreed upon topics in case their focus would wander off to other, individually motivated, topics. In this way, the roadmap enables the community to guide itself to areas of mutual interest, rather than commanding effort on shared goals. + + +It thus guides the community, without the need to exert hierarchical power, and provides a useful prediction about future development for people working on dependent projects. + + +REFERENCES + + +[1] Brian Anderson. 2016. Feature: north-star. https://github.com/brson/rfcs/blob/north-star/text/0000-north-star.md Last accessed 13 January 2020. + + +[2] John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who Should Fix This Bug?. In Proc. International Conference on Software Engineering (Shanghai, China) (ICSE ’06). ACM, New York, NY, USA, 361–370. + + +[3] Open Service Broker API. 2019. Roadmap & Release Planning. https://github.com/openservicebrokerapi/servicebroker/projects/1 Last accessed 13 January 2020. +[4] A Barcomb, A Kaufmann, D Riehle, K Stol, and B Fitzgerald. 2018. Uncovering the Periphery: A Qualitative Survey of Episodic Volunteering in Free/Libre and Open Source Software Communities. +IEEE Trans. Software Eng. + (2018), 1–1. + + +[5] Hoda Baytiyeh and Jay Pfaffman. 2010. Open source software: A community of altruists. +Comput. Human Behav. + 26, 6 (Nov. 2010), 1345–1354. + + +[6] Stefan Kambiz Behfar, Ekaterina Turkina, and Thierry Burger-Helmchen. 2018. Knowledge management in OSS communities: Relationship between dense and sparse network structures. +Int. J. Inf. Manage. + 38, 1 (Feb. 2018), 167–174. + + +[7] Willem Bekkers, Inge van de Weerd, Marco Spruit, and Sjaak Brinkkemper. 2010. A Framework for Process Improvement in Software Product Management. In +Systems, Software and Services Process Improvement +. Springer Berlin Heidelberg, 1–12. + + +[8] Mariette Bengtsson. 2016. How to plan and perform a qualitative study using content analysis. +NursingPlus Open + 2 (2016), 8–14. + + +[9] Yochai Benkler. 2002. Coase’s Penguin, or, Linux and “The Nature of the Firm”. +Yale Law J. + (2002), 369–446. + + +[10] Bruce Lawrence Berg, Howard Lune, and Howard Lune. 2004. Qualitative research methods for the social sciences. Vol. 5. Pearson Boston, MA. + + +[11] Matthew J Bietz, Eric P S Baumer, and Charlotte P Lee. 2010. Synergizing in Cyberinfrastructure Development. +Comput. Support. Coop. Work + 19, 3-4 (July 2010), 245–281. + + +[12] Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to Break an API: Cost Negotiation and Community Values in Three Software Ecosystems. In +Proc. International Symposium on Foundations of Software Engineering + (Seattle, WA, USA) (FSE 2016). ACM, New York, NY, USA, 109–120. + + +[13] Yuanfeng Cai and Dan Zhu. 2016. Reputation in an open source software community: Antecedents and impacts. +Decis. Support Syst. + 91 (Nov. 2016), 103–112. + + +[14] AWS Cloudformation. 2018. CloudFormation Public Coverage Roadmap. https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap Last accessed 13 January 2020. + + +[15] J Coelho, M T Valente, L L Silva, and A Hora. 2018. Why We Engage in FLOSS: Answers from Core Developers. In +Intl. Workshop on Cooperative and Human Aspects of Software Engineering + (CHASE). 114–121. + + +[16] John W Creswell and Vicki L Plano Clark. 2017. Designing and conducting mixed methods research. Sage publications. + + +[17] John W Creswell and Cheryl N Poth. 2016. Qualitative inquiry and research design: Choosing among five approaches. Sage publications. + + +[18] Kevin Crowston and Ivan Shamshurin. 2016. Core-Periphery Communication and the success of free/libre open source software projects. +IFIP Advances in Information and Communication Technology + 472 (2016), 45–56. https://doi.org/10.1007/978-3-319-39225-7 + + +[19] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In +Proc. Conference on Computer Supported Cooperative Work + (Seattle, Washington, USA) (CSCW ’12). ACM, New York, NY, USA, 1277–1286. + + +[20] Carlo Daffara. 2012. Estimating the economic contribution of open source software to the European economy. In +The First Openforum Academy Conference Proceedings +. books.google.com. + + +[21] Jean-Michel Dalle, Paul A David, and Others. 2003. The allocation of software development resources in ‘open source’ production mode. +SIEPR-Project NOSTRA Working Paper + (15th February) [Accepted for publication in Joe Feller, Brian Fitzgerald, Scott Hissam, Karim Lakhani, eds., +Making Sense of the Bazaar +, forthcoming from MIT Press in 2004] (2003). + + +[22] Premkumar Devanbu, Pallavi Kudigrama, Cindy Rubio-González, and Bogdan Vasilescu. 2017. Timezone and Time-of-day Variance in GitHub Teams: An Empirical Method and Study. In +Proc. International Workshop on Software Analytics + (Paderborn, Germany) (SWAN 2017). ACM, New York, NY, USA, 19–22. + + +[23] Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beekman, Mathias Payer, Nicolas Weaver, David Adrian, Vern Paxson, Michael Bailey, and J. Alex Halderman. 2014. The Matter of Heartbleed. In +Proc. Internet Measurement Conference + (Vancouver, BC, Canada) (IMC ’14). Association for Computing Machinery, New York, NY, USA, 475–488. https://doi.org/10.1145/2663716.2663755 + + +[24] Christof Ebert. 2007. The impacts of software product management. +J. Syst. Softw. + 80, 6 (June 2007), 850–861. + + +[25] Christof Ebert and Sjaak Brinkkemper. 2014. Software product management–An industry evaluation. +J. Syst. Softw. + 95 (2014), 10–18. + + +[26] Nadia Eghbal. 2016. Roads and Bridges: The unseen labor behind our digital infrastructure. Technical Report. Ford Foundation. + + +[27] Anna Filippova and Hichang Cho. 2016. The Effects and Antecedents of Conflict in Free and Open Source Software Development. +Proc. Conf. on Computer Supported Cooperative Work & Social Computing + (CSCW) (2016), 705–716. + + +[28] Brian Fitzgerald. 2006. The Transformation of Open Source Software. +MIS Quarterly + 30, 3 (2006), 587–598. + + +[29] Uwe Flick. 2018. +An introduction to qualitative research +. Sage Publications Limited. +[30] Samuel A Fricker. 2012. Software product management. In Software for People. Springer, 53–81. + + +[31] Archon Fung. 2006. Varieties of Participation in Complex Governance. Public Administration Review 66, s1 (2006), 66–75. + + +[32] Michael J. Gallivan. 2001. Striking a balance between trust and control in a virtual organization: A content analysis of open source software case studies. Information Systems Journal 11, 4 (2001), 277–304. https://doi.org/10.1046/j.1365-2575.2001.00108.x + + +[33] Mohammad Gharehyazie, Daryl Posnett, Bogdan Vasilescu, and Vladimir Filkov. 2015. Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation. Empirical Software Engineering 20, 5 (Oct. 2015), 1318–1353. + + +[34] Shane Greenstein and Frank Nagle. 2014. Digital dark matter and the economic contribution of Apache. Research Policy 43, 4 (May 2014), 623–631. + + +[35] Gordon Haff. 2018. How Open Source Ate Software: Understand the Open Source Movement and So Much More. Apress. + + +[36] A. Hars and Shaosong Ou. 2001. Working for free? Motivations of participating in open source projects. In Proc. Hawaii International Conference on System Sciences. 9 pp.–. + + +[37] Andrea Hemetsberger and Christian Reinhardt. 2009. Collective development in open-source communities: An activity theoretical perspective on successful online collaboration. Organization Studies 30, 9 (2009), 987–1008. https://doi.org/10.1177/0170840609339241 + + +[38] Guido Hertel, Sven Niedner, and Stefanie Herrmann. 2003. Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel. Research Policy 32, 7 (July 2003), 1159–1177. + + +[39] James Howison and Kevin Crowston. 2014. Collaboration through open superposition: a theory of the open source way. Miss. Q. 38, 1 (2014), 29–50. + + +[40] Chris Jensen and Walt Scacchi. 2010. Governance in open source software development projects: A comparative multi-level analysis. In IFIP International Conference on Open Source Systems. Springer, 130–142. + + +[41] Hans-Bernd Kittlaus and Samuel A Fricker. 2017. Software Product Management: The ISPMA-Compliant Study Guide and Handbook. Springer. + + +[42] Florian Kohlbacher. 2006. The use of qualitative content analysis in case study research. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, Vol. 7. Institut für Qualitative Forschung, 1–30. + + +[43] Klaus Krippendorff. 2018. Content analysis: An introduction to its methodology. Sage publications. + + +[44] Sandeep Krishnamurthy, Shaosong Ou, and Arvind K Tripathi. 2014. Acceptance of monetary rewards in open source software development. Research Policy 43, 4 (2014), 632–644. + + +[45] K Lakhani. 2005. Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. Perspectives on Free and Open Source Software (2005), 3–21. + + +[46] Charlotte P Lee, Paul Dourish, and Gloria Mark. 2006. The human infrastructure of cyberinfrastructure. Comput. Support. Coop. Work (2006), 483–492. + + +[47] Jung Hoon Lee, Hyung-Il Kim, and Robert Phaal. 2012. An analysis of factors improving technology roadmap credibility: A communications theory assessment of roadmapping processes. Technol. Forecast. Soc. Change 79, 2 (Feb. 2012), 263–280. + + +[48] M M Lehman, J F Ramil, P D Wernick, D E Perry, and W M Turski. 1997. Metrics and laws of software evolution-the nineties view. In Proceedings Fourth International Software Metrics Symposium. IEEE, 20–32. + + +[49] Andrey Maglyas, Uolevi Nikula, and Kari Smolander. 2013. What are the roles of software product managers? An empirical investigation. J. Syst. Softw. 86, 12 (Dec. 2013), 3071–3090. + + +[50] M Lynne Markus. 2007. The governance of free/open source software projects: Monolithic, multidimensional, or configurational? Journal of Management and Governance 11, 2 (2007), 151–163. + + +[51] Niko Matsakis. 2015. Priorities after 1.0. https://internals.rust-lang.org/t/priorities-after-1-0/1901 Last accessed 13 January 2020. + + +[52] Philipp Mayring. 2004. Qualitative content analysis. A companion to qualitative research 1 (2004), 159–176. + + +[53] Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for CSCW and HCI practice. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–23. + + +[54] Rebeca Méndez-Durón. 2013. Do the allocation and quality of intellectual assets affect the reputation of open source software projects? Information & Management 50, 7 (Nov. 2013), 357–368. + + +[55] Martin Michlmayr, Francis Hunt, and David Probert. 2007. Release management in free software projects: Practices and problems. IFIP Int. Fed. Inf. Process. 234, December 2006 (2007), 295–300. + + +[56] A Mockus, D M Weiss, and Ping Zhang. 2003. Understanding and predicting effort in software projects. In 25th International Conference on Software Engineering. 2003. Proceedings. IEEE, 274–284. +[57] Jürgen Münch, Stefan Trieflinger, and Dominic Lang. 2019. Product roadmap–from vision to reality: a systematic literature review. In 2019 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC). IEEE, 1–8. + + +[58] Siobhán O’Mahony and Beth A Bechky. 2008. Boundary organizations: Enabling collaboration among unexpected allies. Administrative science quarterly 53, 3 (2008), 422–459. + + +[59] Stack Overflow. 2019. Most Loved, Dreaded, and Wanted Languages. https://insights.stackoverflow.com/survey/2019#technology-_-most-loved-dreaded-and-wanted-languages Last accessed 13 January 2020. + + +[60] Gang Peng, Yun Wan, and Peter Woodlock. 2013. Network ties and the success of open source software development. The Journal of Strategic Information Systems 22, 4 (Dec. 2013), 269–281. + + +[61] Robert Phaal and Gerrit Muller. 2009. An architectural framework for roadmapping: Towards visual strategy. Technol. Forecast. Soc. Change 76, 1 (Jan. 2009), 39–49. + + +[62] Gustavo Pinto, Luiz Felipe Dias, and Igor Steinmacher. 2018. Who Gets a Patch Accepted First?: Comparing the Contributions of Employees and Volunteers. In Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (Gothenburg, Sweden) (CHASE ’18). ACM, New York, NY, USA, 110–113. + + +[63] Denise F Polit and Cheryl Tatano Beck. 2010. Generalization in quantitative and qualitative research: Myths and strategies. International journal of nursing studies 47, 11 (2010), 1451–1458. + + +[64] Germán Poo-Caamaño, Eric Knauss, Leif Singer, and Daniel M German. 2017. Herding cats in a FOSS ecosystem: a tale of communication and coordination for release management. Journal of Internet Services and Applications 8, 1 (2017). + + +[65] Germán Poo-Caamaño, Leif Singer, Eric Knauss, and Daniel M German. 2016. Herding cats: A case study of release management in an open collaboration ecosystem. IFIP Adv. Inf. Commun. Technol. 472 (2016), 147–162. + + +[66] Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, and Bogdan Vasilescu. 2019. Going Farther Together: The Impact of Social Capital on Sustained Participation in Open Source. + + +[67] Hector Ramos. 2018. Open Source Roadmap. https://facebook.github.io/react-native/blog/2018/11/01/oss-roadmap Last accessed 13 January 2020. + + +[68] David Ribes and Thomas A Finholt. 2009. The long now of infrastructure: Articulating tensions in development. Journal of the Association for Information Systems (JAIS) (2009). + + +[69] Rust. 2019. Governance. https://www.rust-lang.org/governance Last accessed 13 January 2020. + + +[70] Rust. 2019. Production users. https://www.rust-lang.org/production/users Last accessed 13 January 2020. + + +[71] Read Rust. 2018. Rust 2018: Hopes and dreams for Rust in 2018. https://readrust.net/rust-2018 Last accessed 13 January 2020. + + +[72] Read Rust. 2019. Rust 2019: Ideas from the community for Rust in 2019, and the next edition. https://readrust.net/rust-2019 Last accessed 13 January 2020. + + +[73] W Scacchi. 2002. Understanding the requirements for developing open source software systems. IEEE Proceedings - Software 149, 1 (Feb. 2002), 24–39. + + +[74] Sonali K Shah. 2006. Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Manage. Sci. 52, 7 (July 2006), 1000–1014. + + +[75] Maha Shaikh and Ola Henfridsson. 2017. Governing open source software through coordination processes. Information and Organization 27, 2 (2017), 116–135. + + +[76] Cuihua Shen and Peter Monge. 2011. Who connects with whom? A social network analysis of an online open source software community. First Monday 16, 6 (June 2011). + + +[77] Param Vir Singh, Yong Tan, and Vijay Mookerjee. 2011. Network Effects: The Influence of Structural Capital on Open Source Project Success. MIS Quarterly 35, 4 (2011), 813–829. + + +[78] Matthias Stürmer. 2013. Four types of open source communities. https://opensource.com/business/13/6/four-types-organizational-structures-within-open-source-communities. Accessed: 2020-1-5. + + +[79] Tanja Suomalainen, Outi Salo, Pekka Abrahamsson, and Jouni Similä. 2011. Software product roadmapping in a volatile business environment. Journal of Systems and Software 84, 6, 958–975. + + +[80] Yong Tan, Vijay Mookerjee, and Param Singh. 2007. Social capital, structural holes and team composition: Collaborative networks of the open source software community. Proc. International Conference on Information Systems (2007), 155. + + +[81] Antony Tang, Taco de Boer, and Hans van Vliet. 2011. Building roadmaps: a knowledge sharing perspective. In Proc. International Workshop on SHAring and Reusing Architectural Knowledge. 13–20. + + +[82] Niels C Taubert. 2008. Balancing requirements of decision and action: Decision-making and implementation in free/open source software projects. Science, Technology & Innovation Studies 4, 1 (2008), 69–88. + + +[83] Jonathan Taylor. 2017. Rust 2017 Survey Results. https://blog.rust-lang.org/2017/09/05/Rust-2017-Survey-Results.html Last accessed 13 January 2020. +[84] Libra Engineering Team. 2019. Libra Core Roadmap #2. https://developers.libra.org/blog/2019/12/17/libra-core-roadmap-2 Last accessed 13 January 2020. + + +[85] Scala Team. 2017. Scala 2.13 Roadmap. https://www.scala-lang.org/news/roadmap-2.13.html Last accessed 13 January 2020. + + +[86] The Rust Core Team. 2018. A call for Rust 2019 Roadmap blog posts. https://blog.rust-lang.org/2018/12/06/call-for-rust-2019-roadmap-blogposts.html Last accessed 13 January 2020. + + +[87] The Rust Core Team. 2018. New Year’s Rust: A Call for Community Blogposts. https://blog.rust-lang.org/2018/01/03/new-years-rust-a-call-for-community-blogposts.html Last accessed 13 January 2020. + + +[88] The Rust Core Team. 2018. Rust’s 2018 roadmap. https://blog.rust-lang.org/2018/03/12/roadmap.html Last accessed 13 January 2020. + + +[89] The Rust Core Team. 2019. Rust’s 2019 Roadmap. https://blog.rust-lang.org/2019/04/23/roadmap.html Last accessed 13 January 2020. + + +[90] The Rust Survey Team. 2018. Rust Survey 2018 Results. https://blog.rust-lang.org/2018/11/27/Rust-survey-2018.html Last accessed 13 January 2020. + + +[91] Jonathan Turner. 2016. 2016 Rust Commercial User Survey Results. https://internals.rust-lang.org/t/2016-rust-commercial-user-survey-results/4317 Last accessed 13 January 2020. + + +[92] Jonathan Turner. 2016. State of Rust Survey 2016. https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html Last accessed 13 January 2020. + + +[93] Aaron Turon. 2016. Refining Rust’s RFCs. http://aturon.github.io/blog/2016/07/05/rfc-refinement/ Last accessed 13 January 2020. + + +[94] Aaron Turon. 2017. Rust’s 2017 Roadmap. https://blog.rust-lang.org/2017/02/06/roadmap.html Last accessed 13 January 2020. + + +[95] Tuukka Turunen. 2018. QT Roadmap for 2018. https://www.qt.io/blog/2018/02/22/qt-roadmap-2018 Last accessed 13 January 2020. + + +[96] I van de Weerd, S Brinkkemper, R Nieuwenhuis, J Versendaal, and L Bijlsma. 2006. Towards a Reference Framework for Software Product Management. In International Requirements Engineering Conference (RE’06). 319–322. + + +[97] Konstantin Vishnevskiy, Oleg Karasev, and Dirk Meissner. 2015. Integrated roadmaps and corporate foresight as tools of innovation management: The case of Russian companies. Technol. Forecast. Soc. Change 90 (Jan. 2015), 433–443. + + +[98] Georg Von Krogh, Stefan Haefliger, Sebastian Spaeth, and Martin W Wallin. 2012. Carrots and rainbows: Motivation and social practice in open source software development. MIS Quarterly (2012), 649–676. + + +[99] Kangning Wei, Kevin Crowston, U Yeliz Eseryel, and Robert Heckman. 2017. Roles and politeness behavior in community-based free/libre open source software development. Information & Management 54, 5 (July 2017), 573–582. + + +[100] Joel West and Scott Gallagher. 2006. Challenges of open innovation: the paradox of firm investment in open-source software. R&D Management 36, 3 (2006), 319–331. + + +[101] Joel West and Siobhán O’Mahony. 2008. The Role of Participation Architecture in Growing Sponsored Open Source Communities. Industry and Innovation 15, 2 (April 2008), 145–168. + + +[102] Chorng-Guang Wu, James H Gerlach, and Clifford E Young. 2007. An empirical analysis of open source software developers’ motivations and continuance intentions. Information & Management 44, 3 (2007), 253–262. + + +[103] Xuan Xiao, Aron Lindberg, Sean Hansen, and Kalle Lyytinen. 2018. “Computing” Requirements for Open Source Software: A Distributed Cognitive Approach. Journal of the Association for Information Systems 19, 12 (2018), 1217–1252. + + +[104] J Xie, M Zhou, and A Mockus. 2013. Impact of Triage: A Study of Mozilla and Gnome. In International Symposium on Empirical Software Engineering and Measurement. IEEE, 247–250. + + +[105] Yunwen Ye and Kouichi Kishida. 2003. Toward an Understanding of the Motivation Open Source Software Developers. In Proc. International Conference on Software Engineering (Portland, Oregon) (ICSE ’03). IEEE Computer Society, Washington, DC, USA, 419–429. + + +[106] Robert K Yin. 2017. Case study research and applications: Design and methods. Sage publications. +A EMAIL INTERVIEW QUESTIONS + + +• Q1. How much do Rust roadmaps influence your decision about what work you contribute to the Rust project? + No influence at all 1 2 3 4 5 A lot of influence + Explain (optional) + + +• Q2. In your opinion, how helpful are roadmaps for the Rust community? + Not at all helpful 1 2 3 4 5 Very helpful + Can you explain in what way they are helpful or unhelpful? (optional) + + +• Q3. How much do Rust roadmaps (e.g. for working groups or projects) match your own priorities for Rust? + Do not at all represent my priorities 1 2 3 4 5 Represent my priorities very well + Explain (optional) + + +• Q4. How could the use of roadmaps in Rust be improved in the future? + + +• Q5. How many years have you been involved with Rust? + + +• Q6. Have you been on any official Rust team or working group? + Yes No + + +B ROADMAP TOPIC HEURISTICS + + +We began by manually extracting a list of topics from the 2018 roadmap. To assign topics to particular issues, PRs, and RFCs, we used the following method: + + +• Two researchers independently compiled a list of topics from this document, identifying bullet points or lists in the text that appeared to identify specific features. One researcher’s list was strictly longer (36 items) than the other’s (23 items), so the two discussed each of the additional topics and included all but two of them, resulting in 34 topics. + + +• Using the generated list, one researcher generated a list of proposed search keywords for each topic, using acronyms, distinctive terms, or word sequences found in that part of the roadmap, that the researcher judged would have high selectivity for distinguishing text about that topic from general Rust discussion. The final list is shown in Table B + + +• Labels (short strings used by GitHub to tag issues, RFCs, and pull requests) were assigned to roadmap topics by applying the keywords to the labels’ descriptions as shown here: https://github.com/rust-lang/rust/labels; for example the label A-net was assigned to topic “network services” because it matched the search term “networking”. Both researchers checked through this list of labels and their descriptions, and agreed that they matched the topics. + + +• This mapping was used to assign topics to all issues, PRs, and RFCs in rust (excluding so-called "Rollup" PRs). An issue, PR, or RFC was assigned to a topic if it was tagged with a label that mapped to that keyword. + + +• Topics were also assigned to RFCs, and tracking issues (a subset of issues formally tied to certain RFCs) if the search terms matched the item’s title. + + +• We then spread activation from RFCs to issues, issues to PRs and RFCs, and PRs to issues: that is, an issue inherits the topic of an RFC if the RFC lists the issue as an official tracking issue. A PR inherits the topic of an issue if the PR mentions the issue ID in its initial description. This was not done recursively. + + +• We assign a commit to a topic if it was part of a non-Rollup PR of that topic that was eventually merged into the main thread. We omitted commits with multiple parents (to avoid double counting merges of commits) and commits of more than 100 files (to avoid commits that were mass moves of files). +• "Discussion effort" was operationalized as characters of text in the header and commentthread of each RFC discussion, issue, or PR, excluding code embedded in those comments (which is delimited by triple backticks). +• "Coding effort" was operationalized as lines of code deleted plus lines of code added. +• "Team contributors" were operationalized as anyone who was a member of one of the teams listed on Rust’s governance page at the beginning of 2018. + + +Also note that some development happened outside these repositories; for example there is a rust-lang/cargo repository; we only capture aspects of development that affect the main compiler project. + + +Table 6. Search terms for identifying 2018 roadmap topics in labels and text. The left and middle columns are used as search terms within the descriptions of labels; the right column shows the labels that matched. + + +| 2018 Topic | Search Terms | Labels | +|-----------------------------|------------------------------------------------------------------------------|---------------------------------------------| +| add edition flag to rustfix | (edition AND rustfix) OR (2018 AND lint + AND rustfix) | A-async-await, AsyncAwait-Triaged, AsyncAwait-Focus, AsyncAwait-OnDeck, F-async_await | +| async/await | (async AND await) OR (async/await) | | +| build system integration | | | +| cargo custom registries | (Cargo AND registry) OR (Cargo AND registries) | A-registry | +| Cargo/Xargo integration | cargo AND xargo | | +| CLI apps | (CLI app +) OR (CLI application +) OR (command AND line AND app +) OR (command AND line AND application +) | | +| Clippy | (Clippy AND rustup) OR (Clippy AND 1.0) OR (Clippy AND 1 AND 0) | A-lint | +| compiler optimizations | (optimization +) OR (optimisation +) OR (optimize) OR (optimise) | A-optimization, A-LLVM, A-mir-opt | +| compiler parallelization | (parallelization) OR (parallelisation) | | +| compiler-driven code | (auto-complete AND RLS) OR (completion AND RLS) | | +| completion for RLS | | | +| const generics | | A-const-generics, F-const_generics | +| custom allocator | custom AND allocator + | A-allocators | +| custom test frameworks | custom AND test AND framework + | F-custom_test_frameworks | +| embedded device | embedded | WG-embedded | +| GATs | (generic AND associated AND type +) OR (associated AND type AND constructor +)| F-generic_associated_types | +| generator | | A-generators, F-generators | +| 2018 Topic | Search Terms | Labels | +|-----------------------------|------------------------------------------------------------------------------|---------------------------------------------| +| improve compiler error | error + AND message + | A-diagnostics, F-on_unimplemented | +| message | | | +| incremental compilation | incremental AND compilation | A-incremental, A-incr-comp, WG-compiler-incr| +| internationalization | (internationalization) OR (internationalisation) | | +| macros 2.0 hygiene | (macro + AND hygiene) OR (macro + AND 2.0) OR (macro + AND 2 AND 0) OR (hygiene)| A-hygiene, A-macros-2.0 | +| MIR-only rlibs | MIR AND rlib + | | +| modules revamp | modules | A-modules | +| network services | networking | A-net | +| non-lexical lifetimes | (NLL) OR (non AND lexical AND lifetime +) OR (non-lexical AND lifetime +) | A-NLL, NLL-complete, NLL-diagnostics, NLL-fixed-by-NLL, NLL-performant, NLL-polonius, NLL-reference, NLL-sound | +| public dependencies in | (cargo AND libstd) OR (cargo AND std) OR (cargo AND xargo) | | +| cargo | | | +| revise cargo profiles | cargo AND profile + | A-profile | +| RLS 1.0 | RLS | A-language-server, A-rls | +| rustdoc RLS-based edition | RLS AND rustdoc | | +| rustfmt | rustfmt | | +| Ship or drop ergonomics RFCs| (ergonomics AND rfc) OR (ergonomics AND initiative) | Ergonomics Initiative | +| SIMD | | A-simd, F-simd_ffi | +| stabilize impl Trait | impl Trait | A-impl-trait, F-impl_trait_in_bindings, F-type_alias_impl_trait | +| tokio | | | +| web assembly | (webassembly) OR (wasm) OR (web assembly) | O-wasm | + + +Received June 2020; revised October 2020; accepted December 2020 +---------------------------------------- +------------------------------- +Section 340: +On the Extent and Nature of Software Reuse in Open Source Java Projects + + +Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, and Maximilian Irlbeck + + +Institut für Informatik, Technische Universität München, Germany +{heineman,deissenb,gleirsch,hummelb,irlbeck}@in.tum.de + + +Abstract. Code repositories on the Internet provide a tremendous amount of freely available open source code that can be reused for building new software. It has been argued that only software reuse can bring the gain of productivity in software construction demanded by the market. However, knowledge about the extent of reuse in software projects is only sparse. To remedy this, we report on an empirical study about software reuse in 20 open source Java projects with a total of 3.3 MLOC. The study investigates (1) whether open source projects reuse third party code and (2) how much white-box and black-box reuse occurs. To answer these questions, we utilize static dependency analysis for quantifying black-box reuse and code clone detection for detecting white-box reuse from a corpus with 6.1 MLOC of reusable Java libraries. Our results indicate that software reuse is common among open source Java projects and that black-box reuse is the predominant form of reuse. +---------------------------------------- +------------------------------- +Section 341: +1 Introduction + + +Software reuse involves the use of existing software artifacts for the construction of new software [9]. Reuse has multiple positive effects on the competitiveness of a development organization. By reusing mature software components, the overall quality of the resulting software product is increased. Moreover, the development costs as well as the time to market are reduced [7, 11]. Finally, maintenance costs are reduced, since maintenance tasks concerning the reused parts are “outsourced” to other organizations. It has even been stated that there are few alternatives to software reuse that are capable of providing the gain of productivity and quality in software projects demanded by the industry [15]. + + +Today, practitioners and researchers alike fret about the failure of reuse in form of a software components subindustry as imagined by McIlroy over 40 years ago [13]. Newer approaches, such as software product lines [2] or the development of product specific modeling languages and code generation [8], typically focus on reuse within a single product family and a single development organization. However, reuse of existing third party code is—from our observation—a common practice in almost all software projects of significant size. Software repositories on the Internet provide a tremendous amount of freely reusable source code, frameworks and libraries for many recurring problems. Popular examples are... +the frameworks for web applications provided by the Apache Foundation and the Eclipse platform for the development of rich client applications. Due to its ubiquitous availability in software development, the Internet itself has become an interesting reuse repository for software projects [3, 6]. Search engines like Google Code Search\footnote{http://www.google.com/codesearch} provide powerful search capabilities and direct access to millions of source code files written in a multitude of programming languages. Open source software repositories like Sourceforge\footnote{http://sourceforge.net}, which currently hosts almost a quarter million projects, offer the possibility for open source software projects to conveniently share their code with a world-wide audience. + + +Research problem. + Despite the widely recognized importance of software reuse and its proven positive effects on quality, productivity and time to market, it remains largely unknown to what extent current software projects make use of the extensive reuse opportunities provided by code repositories on the Internet. Literature is scarce on how much software reuse occurs in software projects. It is also unclear how much code is reused in black-box or white-box fashion. We consider this lack of empirical knowledge about the extent and nature of software reuse in practice problematic and argue that a solid basis of data is required in order to assess the success of software reuse. + + +Contribution. + This paper extends the empirical knowledge about the extent and nature of code reuse in open source projects. Concretely, we present quantitative data on reuse in 20 open source projects that was acquired with different types of static analysis techniques. The data describes the reuse rate of each project and the relation between white-box and black-box reuse. The provided data helps to substantiate the academical discussion about the success or failure of software reuse and supports practitioners by providing them with a benchmark for software reuse in 20 successful open source projects. +---------------------------------------- +------------------------------- +Section 342: +2 Terms + + +This section briefly introduces the fundamental terms this study is based on. + + +Software reuse. + In this paper, we use a rather simple notion of software reuse: software reuse is considered as the utilization of code developed by third parties besides the functionality provided by the operating system and the programming platform. + + +We distinguish between two reuse strategies, namely +black-box + and +white-box + reuse. Our definitions of these strategies follow the notions from [17]. + + +White-box reuse. + We consider the reuse of code to be of the white-box type, if it is incorporated in the project files in source form, +i.e. +, the internals of the reused code are exposed to the developers of the software. This implies that the +code may potentially be modified. The reuse rate for white-box reuse is defined as the ratio between the amount of reused lines of code and the total amount of lines of code (incl. reused source code). + + +Black-box reuse. + We consider the reuse of code to be of the black-box type, if it is incorporated in the project in binary form, +i.e. +, the internals of the reused code are hidden from the developers and maintainers of the software. This implies that the code is reused +as is +, +i.e. +, without modifications. For black-box reuse the reuse rate is given by the ratio between the size of the reused binary code and the size of the binary code of the whole software system (incl. reused binary code). +---------------------------------------- +------------------------------- +Section 343: +3 Methodology + + +This section describes the empirical study that was performed to analyze the extent and nature of software reuse in open source projects. +---------------------------------------- +------------------------------- +Section 344: +3.1 Study Design + + +We use the Goal-Question-Metric template from [20] for defining this study: + + +We analyze +open source projects + for the purpose of +understanding the state of the practice in software reuse + with respect to +its extent and nature + from the viewpoint of +the developers and maintainers + in the context of +Java open source software +. + + +To achieve this, we investigate the following three research questions. + + +RQ 1 Do open source projects reuse software? + The first question of the study asks whether open source projects reuse software at all, according to our definition. + + +RQ 2 How much white-box reuse occurs? + For those projects that do reuse existing software, we ask how much of the code is reused in a white-box fashion as defined in Section 2. We use as metrics the number of copied lines of code from external sources as well as the reuse rate for white-box reuse. + + +RQ 3 How much black-box reuse occurs? + We further ask how much of the code is reused in a black-box fashion according to our definition. For this question we use as metrics the aggregated byte code size of the reused classes from external libraries and the reuse rate for black-box reuse. Although not covered by our definition of software reuse, we separately measure the numbers for black-box reuse of the Java API, since one could argue that this is also a form of software reuse. +---------------------------------------- +------------------------------- +Section 345: +3.2 Study Objects + + +This section describes how we selected the projects that were analyzed in the study and how they were preprocessed in advance to the reuse analyses. +Table 1. The 20 studied Java applications + + +| System | Version | Description | LOC | Size (KB) | +|-------------------------|---------------|------------------------------|---------|-----------| +| Azureus/Vuze | 4.504 | P2P File Sharing Client | 786,865 | 22,761 | +| Buddi | 3.4.0.3 | Budgeting Program | 27,690 | 1,149 | +| DavMail | 3.8.5-1480 | Mail Gateway | 29,545 | 932 | +| DrJava | stable-20100913-r5387 | Java Programming Env. | 160,256 | 6,199 | +| FreeMind | 0.9.0 RC 9 | Mind Mapper | 71,133 | 2,352 | +| HSQLDB | 1.8.1.3 | Relational Database Engine | 144,394 | 2,032 | +| iReport-Designer | 3.7.5 | Visual Reporting Tool | 338,819 | 10,783 | +| JabRef | 2.6 | BibTeX Reference Manager | 109,373 | 3,598 | +| JEdit | 4.3.2 | Text Editor | 176,672 | 4,010 | +| MediathekView | 2.2.0 | Media Center Management | 23,789 | 933 | +| Mobile Atlas Creator | 1.8 beta 2 | Atlas Creation Tool | 36,701 | 1,259 | +| OpenProj | 1.4 | Project Management | 151,910 | 3,885 | +| PDF Split and Merge | 0.0.6 | PDF Manipulation Tool | 411 | 17 | +| RODIN | 2.0 RC 1 | Service Development | 273,080 | 8,834 | +| soapUI | 3.6 | Web Service Testing Tool | 238,375 | 9,712 | +| SQuirreL SQL Client | Snapshot-20100918-1811 | Graphical SQL Client | 328,156 | 10,918 | +| subsonic | 4.1 | Web-based Music Streamer | 30,641 | 1,050 | +| Sweet Home 3D | 2.6 | Interior Design Application | 77,336 | 3,498 | +| TV-Browser | 3.0 RC 1 | TV Guide | 187,216 | 6,064 | +| YouTube Downloader | 1.9 | Video Download Utility | 2,969 | 99 | +| +Overall + | | | +3,195,331 + | +100,085 + | + + +Selection Process. We chose 20 projects from the open source software repository Sourceforge as study objects. Sourceforge is the largest repository of open source applications on the Internet. It currently hosts 240,000 software projects and has 2.6 million users.(^3) + + +We used the following procedure for selecting the study objects.(^4) We searched for Java projects with the development status +Production/Stable +. We then sorted the resulting list descending by number of weekly downloads. We stepped through the list beginning from the top and selected each project that was a standalone application, purely implemented in Java, based on the Java SE Platform and had a source download. All of the 20 study objects selected by this procedure were among the 50 most downloaded projects. Thereby, we obtained a set of successful projects in terms of user acceptance. The application domains of the projects were diverse and included accounting, file sharing, e-mail, software development and visualization. The size of the downloaded packages (zipped files) had a broad variety, ranging from 40 KB to 53 MB. + + +Table 1 shows overview information about the study objects. The +LOC + column denotes the total number of lines in Java source files in the downloaded and preprocessed source package as described below. The +Size + column shows the bytecode sizes of the study objects. + + +Preprocessing. We deleted test code from the projects following a set of simple heuristics (e.g. folders named test/tests). In few cases, we had to remove code that was not compilable. For one project we omitted code that referenced a commercial library. + + +(^3) +http://sourceforge.net/about + + +(^4) The project selection was performed on October 5th, 2010. +Table 2. The 22 libraries used as potential sources for white-box reuse + + +| Library | Description | Version | LOC | +|--------------------------|--------------------------------------|---------|---------| +| ANTLR | Parser Generator | 3.2 | 66,864 | +| Apache Ant | Build Support | 1.8.1 | 251,315 | +| Apache Commons | Utility Methods | 5/Oct/2010 | 1,221,669 | +| log4j | Logging | 1.2.16 | 68,612 | +| ASM | Byte-Code Analysis | 3.3 | 3,710 | +| Batik | SVG Rendering and Manipulation | 1.7 | 366,507 | +| BCEL | Byte-Code Analysis | 5.2 | 48,166 | +| Eclipse | Rich Platform Framework | 3.5 | 1,404,122 | +| HSQLDB | Database | 1.8.1.3 | 157,935 | +| Jaxen | XML Parsing | 1.1.3 | 48,451 | +| JCommon | Utility Methods | 1.0.16 | 67,807 | +| JDOM | XML Parsing | 1.1.1 | 32,575 | +| Berkeley DB Java Edition | Database | 4.0.103 | 367,715 | +| JFreeChart | Chart Rendering | 1.0.13 | 313,268 | +| JGraphT | Graph Algorithms and Layout | 0.8.1 | 41,887 | +| JUNG | Graph Algorithms and Layout | 2.0.1 | 67,024 | +| Jython | Scripting Language | 2.5.1 | 252,062 | +| Lucene | Text Indexing | 3.0.2 | 274,270 | +| Spring Framework | J2EE Framework | 3.0.3 | 619,334 | +| SVNKit | Subversion Access | 1.3.4 | 178,953 | +| Velocity Engine | Template Engine | 1.6.4 | 70,804 | +| Xerces-J | XML Parsing | 2.9.0 | 226,389 | +| +Overall + | | | +6,149,439 + | + + +We also added missing libraries that we downloaded separately in order to make the source code compilable. We either obtained the libraries from the binary package of the project or from the library’s website. In the latter case we chose the latest version of the library. + + +3.3 Study Implementation and Execution + + +This section details how the study was implemented and executed on the study objects. All automated analyses were implemented in Java on top of our open source quality analysis framework ConQAT(^5), which provides—among others—clone detection algorithms and basis functionality for static code analysis. + + +Detecting White-Box Reuse. As white-box reuse involves copying external source code into the project’s code, the sources of reuse are not limited to libraries available at compile time, but can virtually span all existing Java source code. The best approximation of all existing Java source code is probably provided by the indices of the large code search engines, such as Google Code Search or Koders. Unfortunately, access to these engines is typically limited and does not allow to search for large amounts of code, such as the 3 MLOC of our study objects. Consequently, we only considered a selection of commonly used Java libraries and frameworks as potential sources for white-box reuse. We selected 22 libraries which are commonly reused based on our experience with both own development projects and systems we analyzed during earlier studies. The libraries + + +(^5) +http://www.conqat.org + +are listed in Table 2 and comprise more than 6 MLOC. For the sake of presentation, we treated the Apache Commons as a single library, although it consists of 39 individual libraries that are developed and versioned independently. The same holds for Eclipse, where we chose a selection of its plug-ins. + + +To find potentially copied code, we used our clone detection algorithm presented in [5] to find duplications between the selected libraries and the study objects. We computed all clones consisting of at least 15 statements with normalization of formatting and identifiers (type-2 clones), which allowed us to also find partially copied files (or files which are not fully identical due to further independent evolution), while keeping the rate of false positives low. All clones reported by our tool were also inspected manually, to remove any remaining false positives. + + +We complemented the clone detection approach by manual inspection of the source code of all study objects. The size of the study objects only allows a very shallow inspection, based on the names of files and directories (which correspond to Java packages). For this we scanned the directory trees of the projects for files residing in separate source folders or in packages that were significantly different from the package names used for the project itself. The files found this way were then inspected and their source identified based on header comments or a web search. Of course this step only can find large scale reuse, where multiple files are copied into a project and the original package names are preserved (which are typically different from the project’s package names). However, during this inspection we are not limited to the 22 selected libraries, but potentially can find other reused code as well. + + +Detecting Black-Box Reuse. The primary way of black-box reuse in Java programs is the inclusion of libraries. Technically, these are Java Archive Files (JAR), which are zipped files containing the byte code of the Java types. Ideally, one would measure the reuse rate based on the source code of the libraries. However, obtaining the source code for such libraries is error-prone as many projects do not document the exact version of the used libraries. In certain cases, the source code of libraries is not available at all. To avoid these problems and prevent measurement inaccuracies, we performed the analysis of black-box reuse directly on the Java byte code stored in the JAR files. + + +While JAR files are the standard way of packaging reusable functionality in Java, the JAR files themselves are not directly reused. They merely represent a container for Java types (classes, interfaces, enumerations and annotations) that are referenced by other types. Hence, the type is the main entity of reuse in Java. Our black-box reuse analysis determines which types from libraries are referenced from the types of the project code. The dependencies are defined by the Java Constant Pool [12], a part of the Java class file that holds information about all referenced types. References are method calls and all type usages, induced e.g., by local variables or inheritance. Our analysis transitively traverses the + + + + +6 In addition to JAR files, Java provides a package concept that resembles a logical modularization concept. Packages, however, cannot directly be reused. +dependency graph, i.e., also those types that are indirectly referenced by reused types are included in the resulting set of reused types. The analysis approach ensures that in contrast to counting the whole library as reused code, only the subset that is actually referenced by the project is considered. The rationale for this is that a project can incorporate a large library but use only a small fraction of it. To quantify black-box reuse, the analysis measures the size of the reused types by computing their aggregated byte code size. The black-box analysis is based on the BCEL library\footnote{http://jakarta.apache.org/bcel} that provides byte code processing functionality. + + +Our analysis can lead to an overestimation of reuse as we always include whole types although only specific methods of a type may actually be reused. Moreover, a method may reference certain types but the method itself could be unreachable. On the other hand, our approach can lead to an underestimation of reuse as the implementations of interfaces are not considered as reused unless they are discovered on another path of the dependency search. Details regarding this potential error can be found in the section that discusses the threats to validity (Section 6). + + +Although reuse of the Java API is not covered by our definition of software reuse, we also measured reuse of the Java API, since potential variations in the reuse rates of the Java API are worthwhile to investigate. Since every Java class inherits from \texttt{java.lang.Object} and thereby (transitively) references a significant part of the Java API classes, even a trivial Java program exhibits—according to our analysis—a certain amount of black-box reuse. To determine this baseline, we performed the analysis for an artificial minimal Java program that only consists of an empty \texttt{main} method. This baseline of black-box reuse of the Java API consisted of 2,082 types and accounted for about 5 MB of byte code. We investigated the reason for this rather large baseline and found that \texttt{Object} has a reference to \texttt{Class} which in turn references \texttt{ClassLoader} and \texttt{SecurityManager}. These classes belong to the core functionality for running Java applications. Other referenced parts include the Reflection API and the Collection API. Due to the special role of the Java API, we captured the numbers for black-box reuse of the Java API separately. All black-box reuse analyses were performed with a Sun Java Runtime Environment for Linux 64 Bit in version 1.6.0.20. +---------------------------------------- +------------------------------- +Section 346: +4 Results + + +This section contains the results of the study in the order of the research questions. + + +4.1 RQ 1: Do Open Source Projects Reuse Software? + + +The reuse analyses revealed that 18 of the 20 projects do reuse software from third parties, i.e., of the analyzed projects 90\% reuse code. \texttt{HSQLDB} and \texttt{YouTube Downloader} were the only projects for which no reuse—neither black-box nor white-box—was found. +4.2 RQ 2: How Much White-Box Reuse Occurs? + + +We attempt to answer this question by a combination of automatic techniques (clone detection) and manual inspections. The clone detection between the code of the study objects and the libraries from Table 2 reported 337 clone classes (i.e., groups of clones) with 791 clone instances all together. These numbers only include clones between a study object and one or more libraries; clones within the study objects or the libraries were not considered. As we had HSQLDB both in our set of study objects and the libraries used, we discarded all clones between these two. + + +Manual inspection of these clones led to the observation that, typically, all clones are in just a few of the file pairs which are nearly completely covered by clones. So, the unit of reuse (as far as we found it) is the file/class level; single methods (or sets of methods) were not copied. Most of the copied files were not completely identical. These changes are caused either by minor modifications to the files after copying them to the study objects, or (more likely) due to different versions of the libraries used. As the differences between the files were minor, we counted the entire file as copied if the major part of it was covered by clones. + + +By manual inspection of the study objects we found entire libraries copied in four of the study objects. These libraries were either less well-known (GNU ritopt), no longer available as individual project (microstar XML parser), or not released as an individual project but rather extracted from another project (OSM JMapViewer). All of these could not be found by the clone detection algorithm, as the corresponding libraries were not part of our original set. + + +The results for the duplicated code found by clone detection and the code found during manual inspection are summarized in Table 3. The last column gives the overall amount of white-box reused code relative to the project’s size. + + +| System | Clone Detection (LOC) | Manual Inspection (LOC) | Overall Percent | +|-------------------------|-----------------------|-------------------------|-----------------| +| Azureus/Vuze | 1040 | 57,086 | 7.39% | +| Buddi | — | — | — | +| DavMail | — | — | — | +| DrJava | — | — | — | +| FreeMind | — | — | — | +| HSQLDB | — | — | — | +| iReport-Designer | 298 | — | 0.09% | +| JabRef | — | 7,725 | 7.06% | +| JEdit | 7,261 | 9,333 | 9.39% | +| MediathekView | — | — | — | +| Mobile Atlas Creator | — | 2,577 | 7.02% | +| OpenProj | — | 87 | 0.06% | +| PDF Split and Merge | — | — | — | +| RODIN | — | 382 | 0.14% | +| soapUI | — | 2,120 | 0.89% | +| SQuirreL SQL Client | — | — | — | +| subsonic | — | — | — | +| Sweet Home 3D | — | — | — | +| TV-Browser | — | 513 | 0.27% | +| YouTube Downloader | — | — | — | +| Overall | 11,701 | 76,721 | n.a. | +in LOC. For 11 of the 20 study objects no white-box reuse whatsoever could be proven. For another 5 of them, reuse is below 1%. However, there are also 4 projects with white-box reuse in the range of 7% to 10%. The overall LOC numbers shown in the last row indicate that the amount of code that results from copying entire libraries outnumbers by far the code reused by more selective copy&paste. + + +4.3 RQ 3: How Much Black-Box Reuse Occurs? + + +Figure 1 illustrates the absolute bytecode size distributions between the project code (own), the reused parts of the libraries (3rd party) and the Java API ordered descending by the total amount of bytecode. The horizontal line indicates the baseline usage of the Java API. The reuse of third party libraries ranged between 0 MB and 42.2 MB. The amount of reuse of the Java API was similar among the analyzed projects and ranged between 12.9 MB and 16.6 MB. The median was 2.4 MB for third party libraries and 13.3 MB for the Java API. The project iReport-Designer reused the most functionality in a black-box fashion both from libraries and from the Java API. The project with the smallest extent of black-box reuse was YouTube Downloader. + + +Figure 2 is based on the same data but shows the relative distributions of the bytecode size. The projects are ordered descending by the total amount of relative reuse. The relative reuse from third party libraries was 0% to 61.7% with a median of 11.8%. The relative amount of reused code from the Java API ranged between 23.0% and 99.3% with a median of 73.0%. Overall (third party and Java API combined), the relative amount of reused code ranged between 41.3% and 99.9% with a median of 85.4%. The project iReport-Designer had the highest black-box reuse rate. YouTube Downloader used the most code from the Java API relative to its own code size. For 19 of the 20 projects, the amount of reused code was larger than the amount of own code. Of the overall amount of reused code in the sample projects, 34% stemmed from third party libraries and 66% from the Java API. +Figure 3 illustrates the relative byte code size distributions between the own code and third party libraries, i.e., without considering the Java API as a reused library. The projects are ordered descending by reuse rate. The relative amount of reused library code ranged from 0% to 98.9% with a median of 45.1%. For 9 of the 20 projects the amount of reused code from third party libraries was larger than the amount of own code. +---------------------------------------- +------------------------------- +Section 347: +5 Discussion + + +The data presented in the previous sections lead to interesting insights into the current state of open source Java development, but also open new questions which were not part of our study setup. We discuss both in the following sections. +5.1 Extent of Reuse + + +Our study reveals that software reuse is common among open source Java projects, with black-box reuse as the predominant form. None of the 20 projects analyzed has less than 40% black-box reuse when including the Java API. Even when not considering the Java API the median reuse rate is still above 40% and only 4 projects are below the 10% threshold. Contrary, white-box reuse is only found in about half of the projects at all and never exceeds 10% of the code. + + +This difference can probably be explained by the increased maintenance efforts that are commonly associated with white-box reuse as described by Jacobson et al. [7] and Mili et al. [14]. The detailed results of RQ 2 also revealed that larger parts consisting of multiple files were mostly copied if either the originating library was no longer maintained or the files were never released as an individual library. In both cases the project’s developers would have to maintain the reused code in any case, which removes the major criticism of white-box reuse. + + +It also seems that the amount of reused third party libraries seldom exceeds the amount of code reused from the Java API. The only projects for which this is not the case are iReport-Designer, RODIN and soapUI, from which the first two are built upon NetBeans respectively Eclipse, which provide rich platforms on top of the Java API. + + +Based on our data, it is obvious that the early visions of reusable components that only have to be connected by small amounts of glue code and would lead to reuse rates beyond 90% are not realistic today. On the other hand, the reuse rates we found are high enough to have a significant impact on the development effort. We would expect that reuse of software, as it is also fostered by the open source movement, has a huge contribution to the rich set of applications available today. + + +5.2 Influence of Project Size on Reuse Rate + + +The amount of reuse ranges significantly between the different projects. While PDF Split and Merge is just a very thin wrapper around existing libraries, there are also large projects which have (relatively) small reuse rates (e.g., less than 10% for Azureus without counting the Java API). + + +Motivated by a study by Lee and Litecky [10], we investigated a possible correlation between code size and reuse rate in our data set. Their study was based on a survey in the domain of commercial Ada development on 73 samples and found a negative influence of software size on the rate of reuse. For the reuse rate without the Java API (only third party code) we found a Spearman correlation coefficient of 0.05 with the size of the project’s own code (two-tailed p-value: 0.83). Thus, we can infer no dependence between these values. If we use the overall reuse rate (including the Java API), the Spearman coefficient is -0.93 (p-value < 0.0001), which indicates a significant and strong negative correlation. This confirms the results of [10] that project size typically reduces the reuse rate. +5.3 Types of Reused Functionality + + +It is interesting to investigate what kind of functionality is actually reused by software. Therefore, we tried to categorize all reused libraries into different groups of common functionality. Consequently, we analyzed the purpose of each reused library and divided them into seven categories (e.g., Networking, Text/XML, Rich Client Platforms or Graphics/UI). To determine to which extent a certain type of functionality is reused we employed our black-box reuse detection algorithm presented in Section 3.3 to calculate the amount of bytecode for each library that is reused inside a project. + + +We observed that there is no predominant type of reused functionality and that nearly all projects are reusing functionality belonging to more than one category. We believe that there is no significant insight we can report except that reuse seems to be diverse among the categories and is not concentrated on a single purpose. +---------------------------------------- +------------------------------- +Section 348: +6 Threats to Validity + + +This section discusses potential threats to the internal and external validity of the results presented in this paper. + + +6.1 Internal Validity + + +The amount of reuse measured fundamentally depends on the definition of software reuse and the techniques used to measure it. We discuss possible flaws that can lead to an overestimation of the actual reuse, an underestimation, or otherwise threaten our results. + + +Overestimation of reuse. The measurement of white-box reuse used the results of a clone detection, which could contain false positives. Thus, not all reported clones indicate actual reuse. To mitigate this, we manually inspected the clones found. Additionally, for both the automatically and manually found duplicates, it is not known whether the code was copied into the study objects or rather from them. However, all findings were manually verified, for example by checking the header comments, we ensured that the code was actually copied from the library into the study object. + + +Our estimation of black-box reuse is based on static references in the bytecode. We consider a class as completely reused if it is referenced, which may not be the case. For example, the method holding the reference to another class might never be called. Another possibility would be to use dynamic analysis and execution traces to determine the amount of reused functionality. However, this approach has the disadvantage that only a finite subset of all execution traces could be considered, leading to a potentially large underestimation of reuse. +Underestimation of reuse. The application of clone detection was limited to a fixed set of libraries. Thus, copied code could be missed as the source it was taken from was not included in our comparison set. Additionally, the detector might miss actual clones (low recall) due to weak normalization settings. To address this, we chose settings that yield higher recall (at the cost of precision). The manual inspection of the study objects’ code for further white-box reuse is inherently incomplete; due to the large amounts of code only the most obvious copied parts could be found. + + +The static analysis used to determine black-box reuse misses certain dependencies, such as method calls performed via Java’s reflection mechanism or classes that are loaded based on configuration information. Additionally, our analysis can not penetrate the boundaries created by Java interfaces. The actual implementations used at run-time (and their dependencies) might not be included in our reuse estimate. To mitigate this, one could search for an implementing class and include the first match into the further dependency search and the result set. However, preliminary experiments showed that this approach leads to a large overestimation. For example a command line program that references an interface that is also implemented by a UI class could lead us to the false conclusion that the program reuses UI code. + + +There are many other forms of software reuse that are not covered by our approach. One example are reusable generators. If a project uses a code generator to generate source code from models, this would not be detected as a form of reuse by our approach. Moreover, there are many other ways in which software components can interact with each other besides use dependencies in the source code. Examples are inter-process communication, web services that utilize other services via SOAP calls, or the integration of a database via an SQL interface. + + +6.2 External Validity + + +While we tried to use a comprehensible way of sampling the study objects, it is not clear to what extent they are representative for the class of open source Java programs. First, the choice of Sourceforge as source for the study objects could bias our selection, as a certain kind of open source developers could prefer other project repositories (such as Google Code). Second, we selected the projects from the 50 most downloaded ones, which could bias our results. + + +As the scope of the study are open source Java programs, transferability of the results to other programming languages or commercially developed software is unclear. Especially the programming language is expected to have a huge impact on reuse, as the availability of both open source and commercial reusable code heavily depends on the language used. +---------------------------------------- +------------------------------- +Section 349: +7 Related Work + + +Software reuse is a research field with an extensive body of literature. An overview of different reuse approaches can be found in the survey from Krueger [9]. In the +following, we focus on empirical work that aims at quantifying the extent of software reuse in real software projects. + + +In [18], Sojer et al. investigate the usage of existing open source code for the development of new open source software by conducting a survey among 686 open source developers. They analyze the degree of code reuse with respect to developer and project characteristics. They report that software reuse plays an important role in open source development. Their study reveals that a mean of 30% of the implemented functionality in the projects of the survey participants is based on reused code. Since Sojer et al. use a survey to analyze the extent of code reuse, the results may be subject to inaccurate estimates of the respondents. Our approach analyzes the source code of the projects and therefore avoids this potential inaccuracy. Our results are confirmed by their study, since they also report that software reuse is common in open source projects. + + +Haefliger et al. [4] analyzed code reuse within six open source projects by performing interviews with developers as well as inspecting source code, code modification comments, mailing lists and project web pages. Their study revealed that all sample projects reuse software. Moreover, the authors found that by far the dominant form of reuse within their sample was black-box reuse. In the sample of 6 MLOC, 55 components which in total account for 16.9 MLOC were reused. Of the 6 MLOC, only about 38 kLOC were reused in a white-box fashion. The developers also confirmed that this form of reuse occurs only infrequently and in small quantities. Their study is related to ours, however the granularity for the black-box analysis was different. While they treated whole components as reusable entities, we measured the fraction of the library that is actually used. Since they use code repository commit comments for identifying white-box reuse, their results are sensitive with regards to the accuracy of these comments. In contrast, our method utilizes clone detection and is therefore not dependent on correct commit comments. Their study confirms our finding that black-box is the by far predominant form of reuse. + + +In [16], Mockus investigates large-scale code reuse in open source projects by identifying components that are reused among several projects. The approach looks for directories in the projects that share a certain fraction of files with equal names. He investigates how much of the files are reused among the sample projects and identify what type of components are reused the most. In the studied projects, about 50% of the files were used in more than one project. Libraries reused in a black-box fashion are not considered by his approach. While Mockus’ work quantifies how often code entities are reused, our work quantifies the fraction of reused code compared to the own code within projects. Moreover, reused entities that are smaller than a group of files are not considered. However, their results are in line with our findings regarding the observation that code reuse is commonly practiced in open source projects. + + +In [10], Lee et al. report on an empirical study that investigates how organizations employ reuse technologies and how different criteria influence the reuse rate in organizations using Ada technologies. They surveyed 500 Ada professionals from the ACM Special Interest Group on Ada with a one-page questionnaire. +The authors determine the amount of reuse with a survey. Therefore their results may be inaccurate due to subjective judgement of the respondents. Again, our approach mitigates this risk by analyzing the source code of the project. + + +In [19], von Krogh et al. report on an exploratory study that analyzes knowledge reuse in open source software. The authors surveyed the developers of 15 open source projects to find out whether knowledge is reused among the projects and to identify conceptual categories of reuse. They analyze commit comments from the code repository to identify accredited lines of code as a direct form of knowledge reuse. Their study reveals that all the considered projects do reuse software components. Our observation that software reuse is common in open source development is therefore confirmed by their study. Like Haefliger et al., Krogh et al. rely on commit comments of the code repository with the already mentioned potential drawbacks. + + +Basili et al. [1] investigated the influence of reuse on productivity and quality in object-oriented systems. Within their study, they determine the reuse rate for 8 projects developed by students with a size ranging from about 5 kSLOCs to 14 kSLOCs. While they report reuse rates in a similar range as those from our results, they analyzed rather small programs written by students in the context of the study. In contrast to that, we analyzed open source projects. +---------------------------------------- +------------------------------- +Section 350: +8 Conclusions and Future Work + + +Software reuse, often called the holy grail of software engineering, has certainly not been found in the form of reusable components that simply need to be plugged together. However, our study not only shows that reuse is common in almost all open source Java projects but also that significant amounts of software are reused: Of the analyzed 20 projects 9 projects have reuse rates of more than 50%—even if reuse of the Java API is not considered. Reassuringly, these reuse rates are to a great extent realized through black-box reuse and not by copy&pasting source code. + + +We conclude that in the world of open-source Java development, high reuse rates are not a theoretical option but are achieved in practice. Especially, the availability of reusable functionality, which is a necessary prerequisite for reuse to occur, is well-established for the Java platform. + + +As a next step, we plan to extend our studies to other programming ecosystems and other development models. In particular, we are interested in the extent and nature of reuse for projects implemented in legacy languages like COBOL and PL/1 on the one hand and currently hyped languages like Python and Scala on the other hand. Moreover, our future studies will include commercial software systems to investigate to what extent the open-source development model promotes reuse. + + +Acknowledgment + + +The authors want to thank Elmar Juergens for inspiring discussions and helpful comments on the paper. +References + + + + +Basili, V., Briand, L., Melo, W.: How reuse influences productivity in object-oriented systems. Communications of the ACM 39(10), 116 (1996) + + +Clements, P., Northrop, L.M.: Software Product Lines: Practices and Patterns, 6th edn. Addison-Wesley, Reading (2007) + + +Frakes, W., Kang, K.: Software reuse research: Status and future. IEEE Transactions on Software Engineering 31(7), 529–536 (2005) + + +Haefliger, S., Von Krogh, G., Spaeth, S.: Code Reuse in Open Source Software. Management Science 54(1), 180–193 (2008) + + +Hummel, B., Juergens, E., Heinemann, L., Conradt, M.: Index-Based Code Clone Detection: Incremental, Distributed, Scalable. In: ICSM 2010 (2010) + + +Hummel, O., Atkinson, C.: Using the web as a reuse repository. In: Morisio, M. (ed.) ICSR 2006. LNCS, vol. 4039, pp. 298–311. Springer, Heidelberg (2006) + + +Jacobson, I., Griss, M., Jonsson, P.: Software reuse: architecture, process and organization for business success. Addison-Wesley, Reading (1997) + + +Kelly, S., Tolvanen, J.-P.: Domain-Specific Modeling. Wiley, Chichester (2008) + + +Krueger, C.: Software reuse. ACM Comput. Surv. 24(2), 131–183 (1992) + + +Lee, N., Litecky, C.: An empirical study of software reuse with special attention to Ada. IEEE Transactions on Software Engineering 23(9), 537–549 (1997) + + +Lim, W.: Effects of reuse on quality, productivity, and economics. IEEE Software 11(5), 23–30 (2002) + + +Lindholm, T., Yellin, F.: Java virtual machine specification. Addison-Wesley Longman Publishing Co., Inc., Boston (1999) + + +McIlroy, M., Buxton, J., Naur, P., Randell, B.: Mass produced software components. In: Software Engineering Concepts and Techniques, pp. 88–98 (1969) + + +Mili, H., Mili, A., Yacoub, S., Addy, E.: Reuse-Based Software Engineering: Techniques, Organizations, and Controls. Wiley Interscience, Hoboken (2001) + + +Mili, H., Mili, F., Mili, A.: Reusing software: Issues and research directions. IEEE Transactions on Software Engineering 21(6), 528–562 (1995) + + +Mockus, A.: Large-scale code reuse in open source software. In: FLOSS 2007 (2007) + + +Ravichandran, T., Rothenberger, M.: Software reuse strategies and component markets. Communications of the ACM 46(8), 109–114 (2003) + + +Sojer, M., Henkel, J.: Code Reuse in Open Source Software Development: Quantitative Evidence, Drivers, and Impediments. JAIS (to appear, 2011) + + +von Krogh, G., Spaeth, S., Haefliger, S.: Knowledge Reuse in Open Source Software: An Exploratory Study of 15 Open Source Projects. In: HICSS 2005 (2005) + + +Wohlin, C., Runeson, P., Höst, M.: Experimentation in software engineering: An introduction. Kluwer Academic, Dordrecht (2000) +---------------------------------------- diff --git a/ocr_studies_text/results/output_1ddb7d6cfc87a125093aafd029637d8c893f24f6.jsonl b/ocr_studies_text/results/output_1ddb7d6cfc87a125093aafd029637d8c893f24f6.jsonl new file mode 100644 index 0000000..3226463 --- /dev/null +++ b/ocr_studies_text/results/output_1ddb7d6cfc87a125093aafd029637d8c893f24f6.jsonl @@ -0,0 +1 @@ +{"id": "1c4c6b128a2d59da386c81ba2055ba27bd5c7a87", "text": "Deliberate change without hierarchical influence?\n\nThe case of collaborative OSS communities\n\nAbstract\n\nPurpose \u2013 Deliberate change is strongly associated with formal structures and top-down influence. Hierarchical configurations have been used to structure processes, overcome resistance and get things done. But is deliberate change also possible without formal structures and hierarchical influence?\n\nDesign/Methodology/Approach \u2013 This longitudinal, qualitative study investigates an open-source software (OSS) community named TYPO3. This case exhibits no formal hierarchical attributes. The study is based on mailing lists, interviews, and observations.\n\nFindings \u2013 The study reveals that deliberate change is indeed achievable in a non-hierarchical collaborative OSS community context. However, it presupposes the presence and active involvement of informal change agents. The paper identifies and specifies four key drivers for change agents\u2019 influence.\n\nOriginality/value \u2013 The findings contribute to organizational analysis by providing a deeper understanding of the importance of leadership in making deliberate change possible in non-hierarchical settings. It points to the importance of \u2018change-by-conviction\u2019, essentially based on voluntary behaviour. This can open the door to reducing the negative side effects of deliberate change also for hierarchical organizations.\n\nKeywords\n\nOpen-source communities, deliberate change, change agents, change by conviction, hierarchical influence\nIntroduction\n\nThere is widespread agreement in research as well as in management practice that deliberate change is key for an organisation\u2019s success, if not for its long-term survival (By, 2005; Teece, Pisano, & Shuen, 1997). On the other hand, it is also generally acknowledged that deliberate change challenges organisations and potentially stresses their members. It disturbs existing structures and causes disorder (Schumpeter, 1934), violates the truce of existing routines (Nelson & Winter, 1982), drives people out of their comfort zones, and evokes resistance (Hon, Bloom, & Crant, 2011; Waddell & Sohal, 1998). Therefore, deliberate change is also typically associated with strong leaders and execution power (Kotter, 2007). Thus, there is general agreement that hierarchical influence is particularly needed during the implementation stage in order to get things done and overcome resistance (Somech, 2006). Strong leaders are also needed to promote change in organisations and create a sense of urgency (Higgs & Rowland, 2011; Yates, 2000).\n\nBut what happens if there are only informal leaders with no formal and positional power and organisational members are basically left doing whatever they want? This is exactly the situation for many collaborative communities such as open-source software (OSS) communities. In many of these communities, participation is voluntary, so leaders have only very limited formal power known from hierarchical organizations. How do these communities handle the challenges of deliberate change without formal power successfully? How do they secure efficient and consistent planning procedures? How do they overcome resistance and get things done? Are collaborative communities able to change at all or are they doomed to fail in the long term? Differently put, what does it mean for OSS communities to change deliberately?\nOrganisational scholars have already shown extensive interest in OSS communities and collaborative communities in general (Martinez-Torres & Diaz-Fernandez, 2014). Key topics of interest include the motivation to participate in and contribute to collaborative communities (Cromie & Ewing, 2009; Hars & Ou, 2002; Lerner & Tirole, 2002), structures and the division of labour (Mockus, Fielding, & Herbsleb, 2002), governance structures and processes in communities (Demil & Lecocq, 2006; Markus, 2007), and coordination and communication mechanisms (Lee & Cole, 2003). While extant research thus provides a detailed picture of how OSS communities work, no studies have yet examined deliberate change in OSS communities. The few studies that address change have found that most change in OSS communities is fluid, tacit, and emergent because task execution is typically dependent on the informal structures and the voluntary contributions of members (Sharma, Sugumaran, & Rajagopalan, 2002).\n\nThe aim of this study is to investigate how deliberate change is accomplished in OSS communities. More specifically, the empirical foundation for this research has been based on a longitudinal single-case study. Data have been collected about one OSS community, called TYPO3, during 2006\u20132010. We refer to deliberate change as change that is intended and planned. Change is therefore not the residual outcome of a multitude of processes, even though there might be disparities between plans and outcomes (Burnes, 1996, 2009; Kanter, Stein, & Jick, 1992). In our data collection we observed various deliberate change initiatives in TYPO3 at the strategic as well as at the organisational level. The focus of this paper is on one strategic change initiative carried out in order to redirect the project\u2019s focus towards more product usability. Our results show deliberate change is possible in OSS communities and that change agents play an essential role in change processes. We summarise our findings in a model, structuring the success factors of change agents.\nTwo main contributions are offered. First, our paper advances knowledge about change processes in non-hierarchical structures, such as OSS communities. Because of their increasing relevance for economic activity, it is relevant to know if informal and non-hierarchical organisations allow for executing deliberate change. If this is not possible, such organizations are not likely to become old. Second, and much more important, our investigation of changes in OSS communities gives new insights into how deliberate change in non-hierarchical organisational settings is possible. It shows how organisations can master \u2018change by conviction\u2019, i.e., when organisational members are not being forced to change but accept and adapt to change voluntarily. We will discuss how the insights of this study may be used to reduce tensions and frictions of change in traditional business organisations as well.\n\n**Structure and governance of OSS communities**\n\nAn OSS community consists of individuals who voluntarily contribute to the development of open-source software (Martinez-Torres & Diaz-Fernandez, 2014). Open-source software is freely available to the public under an open license and is based on unrestricted access to source code (Bonaccorsi & Rossi, 2003). Well-known examples of OSS are Linux, Firefox, and Apache (Lakhani & von Hippel, 2003). OSS communities typically demonstrate classic textbook principles of organisations in that they (i) form an entity distinguishable from its environment (Lawrence & Lorsch, 1967), (ii) have specific goals (Etzioni, 1964), (iii) have purposive actions to realise these goals (Mooney & Reiley, 1939), and (iv) are dependent on and affected by the external environment (Scott, 1981). However, at the same time, OSS communities distinguish themselves from traditional business organisations in that they are basically open to anyone to\nparticipate, participation is voluntary, there is a high degree of self-assignment, and they don\u2019t have a physical location like a headquarters. This is enabled by modularization of the software and by distributed activities allowing for rather loosely managed and structured development processes that leave the developers free to choose which tasks to execute (Vujovic & Ulh\u00f8i, 2008). Demil and Lecocq (2006) argue open license is indeed a unique contractual framework that has generated a new type of governance structure distinct from the familiar governance modes of hierarchy, network, and market. Although OSS communities differ in terms of structure, size, and formalisation, there appears to be an \u2018ideal type ground architecture\u2019 that has been identified for many of these communities. The main characteristics of this architecture also apply to TYPO3.\n\nOSS communities are often managed through a two-layer task structure, containing a core and a peripheral layer (Lee & Cole, 2003). The core consists of project leaders and maintainers. While leadership in some projects (e.g., Linux) is more centralised and there is one undisputed project leader, in other projects (e.g., Apache) a committee solves particular leadership tasks, such as disagreements and conflicts, through voting or consensus (Lerner & Tirole, 2002). On the one hand, these communities align with the definition of shared leadership\u2014\u201cdistributed phenomenon in which there can be several (formally appointed and/or emergent) leaders within a group\u201d\u2014and which generally focuses on the emergence of such leaders (Mehra, Smith, Dixon, & Robertson, 2006, p. 233). On the other hand, investigations of shared leadership stem mainly from the context of organizational teams and emphasize the importance of formal leaders to set the stage for informal leadership roles to arise and create the conditions which will maximize the successful outcome of shared leadership in teams (Denis, Langley & Sergi, 2012). This stands in contrast to OSS communities, which are not based on formal leadership in the traditional sense. Such leadership is in fact not required for informal leaders to emerge in OSS communities.\nIn OSS, informal leadership positions emerge through reputational gains based on \u201ctechnical acumen and managerial skill\u201d (Fleming & Waguespack, 2007, p. 165). In addition, trust is a requirement for leaders to be selected by the community (O\u2019Mahony & Ferraro, 2007). Usually, the founders count on the project leaders having earned credibility to act as leaders by contributing the initial source code and demonstrating their expertise. Project leaders typically act as visionaries, providing recommendations, work tasks, milestones, etc., to the community. Another important leadership task is to attract new members by posing challenging programming problems for potential contributors (Lerner & Tirole, 2002, p. 220). The nature of leadership in OSS communities changes as communities grow and mature (O\u2019Mahony & Ferraro, 2007). Over time, project leaders will perform less technical tasks, such as programming, and more organisational building tasks (ibid.). The periphery of an OSS community is often structured by the development and bug-fixing team (Lee & Cole, 2003). Members of the periphery are more loosely connected with the community. Task assignment here is mostly completely voluntary (ibid.).\n\nParticipation in OSS communities is driven by intrinsic (e.g., fun and enjoyment) and extrinsic (e.g., peer recognition, signalling of skills for career benefits) rewards (Lerner & Tirole, 2002). Lakhani and von Hippel (2003, p. 923) emphasize three motivations for participation in OSS communities: need-driven participation (e.g., the need for software), enjoyment-driven participation, and reputation enhancement. Reputation is a low-ranking incentive to join and contribute to an OSS community (ibid.). However, once a reputation is achieved, the member\u2019s desire to maintain his or her reputation encourages the member to continue to provide quality contributions (Sharma et al., 2002).\nThis structure is supported by a number of governance mechanisms that help direct, control, and coordinate individual efforts in OSS communities (Markus, 2007). These mechanisms include the self-assignment of tasks (Crowston, Li, Wei, Eseryel, & Howison, 2007), peer review (Lee & Cole, 2003), bug reporting, voting procedures, and the process of determining software requirements (Scacchi, 2002). Collaboration is enabled through software platforms, which provide infrastructure for sharing solutions, asking for help, etc. Services and tools, such as mailing lists, discussion forums, archives, and blogs, are the key infrastructures that enable communication and collaboration in OSS communities (Fjeldstad, Snow, Miles, & Lettl, 2012; O'Mahony & Ferraro, 2007).\n\nTo sum up, OSS communities have well-developed structures, resembling project structures in traditional business organisations. They also have leaders involved in organising and structuring processes. The major difference is that such leaders have no formal authority and thus no execution power. Participation in OSS communities is voluntary, and tasks are self-assigned. Leaders cannot therefore exert hierarchical influence but can only lead based on expertise, persuasion power, and reputation among peers. The literature has called this type of influence informal leadership (De Souza & Klein, 1995; Hongseok, Labianca, & Myung-Ho, 2006). Lakhani and von Hippel (2003, p. 923) found the informal leaders of OSS communities are capable of organising the \u201cmundane but necessary\u201d tasks in the day-to-day business. But are they also capable of mastering the challenges of change that are already difficult to master in formal companies and for which leadership and power are needed?\n\nDeliberate change in organisations\nLike in other organisations, in OSS communities change concerns the \u201corganisation\u2019s direction, structure, and capabilities\u201d (Moran & Brightman, 2001). In this sense, there is nothing unusual about the basic nature and substance of change in OSS communities. It resembles the basic structure and demands of other organisational change processes.\n\nMany researchers have emphasised the process character of organisational change (Bullock & Batten, 1985; Hayes, 2010; Lewin, 1951). Van de Ven and Poole (1995) identified 20 models that structure change processes in different ways. However, the vast majority of these models identify three key tasks with which deliberate change processes have to deal. First, the need for change has to be recognised and the change process initiated (Kirzner, 1997). This need typically results from opportunities or threats that can be addressed by change. Further, the change initiative has to be put on the organisation\u2019s agenda in order to secure action is taken (Kotter, 2012). Organisational change on the strategic level is a genuine management task. The recognition of change needs might come from \u2018ordinary\u2019 employees, but it is the exclusive right of the management to acknowledge these initiatives and put them on the agenda (Kesting & Ulh\u00f8i, 2010), at least in traditional business organisations. The main rationale behind such a governance structure is to secure consistency\u2014between the different initiatives and organisational activities but also with shareholder and stakeholder interests.\n\nSecond, deliberate change tends to be based on some planning and decision-making activities (By, 2005). Goals have to be defined and information has to be acquired and analysed. The results of this process are management decisions and documents like road maps or business plans. In traditional business organisations, leaders have to drive and structure this process by creating a sense of urgency, involving organisational members and keeping track of the process (Kotter, 2012).\nA distinction between deliberate and emergent change is acknowledged both in the strategy literature (Mintzberg & Waters, 1985) and in the change management literature (Liebhart & Garcia-Lorenzo, 2010). Other aspects like contingency and choice have also been included in this discussion. The review of By (2005) shows how complex, heterogeneous, and inconsistent this distinction is. In this paper we do not intend to contribute to this discussion. For the argumentation of this paper, it is sufficient to specify the substance of deliberate change by the two above attributes: purpose and reason. In our understanding, deliberate change neither implies that everything goes according to plan nor that goals are realised exactly in the planned way. As Dunphy and Stace (1993) argue, organizational change takes place in a dynamic environment and organizations have to adapt their plans accordingly. Against this background, we posit that deliberate change does not rule out the emergent element. Rather, it implies change is grounded in the intention to change. This view corresponds to Mintzberg\u2019s (1994) view of change as an element of the strategy process. In contrast, change is (completely) emergent if it is simply the accumulated result of a series of unrelated decisions and events that have no change or strategic perspective.\n\nThird, change has to be executed and decisions implemented. This means organisation members have to make an effort to bring about the change. Also, routines have to be altered in order to adapt to change. The literature on conflict and resistance caused by change (del Val, 2003; Huy, Corley, & Kraatz, 2014) emphasises leadership and execution power as particularly necessary to get things done and overcome resistance and resolve conflicts.\n\nLeadership power is thus required for all three tasks, most of all, however, for the implementation. Change often burdens organisations and stresses people. Leadership power is needed to change behaviour and overcome resistance. Traditional business organisations\ntherefore often rely on a top-down implementation of planned change (Howell & Avolio, 1993). Leadership vision is needed to motivate organisational members.\n\nBut how can these challenges be handled by informal leaders? How can resistance be overcome without the use of any formal power? How does the governance structure of OSS communities handle deliberate organisational change? Currently, there is no research addressing these questions systematically. However, there is one concept of change leadership that offers some theoretical grounding for an answer that will also be important for the analysis of this article: the concept of the change agent.\n\nBased on Caldwell\u2019s findings (2003), we define change agents as individuals who initiate, direct, manage, and/or implement specific change initiatives. Like many other concepts, the concept of change agents is also used heterogeneously (Wylie, Sturdy, & Wright, 2014) and there are closely related concepts like (product) champions in the literature (Ginsberg & Abrahamson, 1991). The key point for our study is that change agents are individuals that drive change initiatives, i.e., create momentum and ensure decisions are made and actions are taken. In doing so, change agents can assume complex sensemaking (Brown, Colville, & Pye, 2015) and sensegiving (Petkova, Rindova, & Gupta, 2013) roles that can be essential to attract collective attention and gain legitimacy for their change initiatives. Change agents do not have to be assigned leaders with formal given responsibilities. They can even be outsiders like consultants (Volberda, Van Den Bosch, & Mihalache, 2014). However, in traditional business organisations they have to be authorised and supported by formal leaders. Therefore, the activity of change agents is also based on hierarchical influence, even though mostly indirectly. While change agents thus might not have the power to order change, the supporting formal leaders do possess such power. In this case, sensegiving, i.e. \u201cthe processes by which strategic change is framed and\ndisseminated to an organization\u2019s constituents\u201d (Fiss & Zajac, 2006, p. 1173) can be particularly relevant for change agents to attract management attention and promote initiatives.\n\nAs outlined above, deliberate change cannot be decided and enforced by management in OSS communities like in traditional business organisations. Even when initiatives come from the core, they have to be based on initiative and promoted in the community. Here, sensegiving may be particularly relevant for change agents as a way to attract the attention of the community and/or even attract media attention in order to promote change initiatives. Sensegiving can support positions in the \u201csymbolic struggles over the purpose and direction of an organization\u201d (Fiss & Zajac, 2006, p. 1173). When coming from the periphery, it requires even more initiative to change an OSS community deliberately. Therefore, it can be expected that change agents play an important role here. However, conditions are fundamentally different because in OSS communities there is no management support or hierarchical influence upon which to draw. So, how can change agents realise change initiatives here?\n\nMethods\n\nTwo main criteria guided the selection of our focal case. First, our case had to be a representative example of an OSS community. Second, the community had to be a mature case that had already established and formalised work procedures, guidelines, and rules. Studying change in a developed, growing community would hold promises for providing an intensive and rich case that would \u201cmanifest the phenomenon of interest intensely (but not extremely)\u201d because extreme cases may distort the manifestation of the phenomenon (Patton, 2002, p. 234). Accordingly, we selected an OSS community named TYPO3 for this study.\nIn line with the research objective, we first identified deliberate changes at their various stages. Then, we followed the process underlying those changes before tracing the mechanisms used to address the changes. The unit of analysis is the community, i.e., the focus is on the intraorganisational level.\n\n**Study setting**\n\nTYPO3 has been public since 2000. At the time of the study, this community was experiencing continuous growth (see Figure 1). The TYPO3 system is an enterprise-class content management system (CMS) offering out-of-the-box operation with standard modules (http://typo3.org/). The system is aimed at two different groups: (i) authors and (ii) administrators and content managers. TYPO3\u2019s core team members play a central role in the community because they contribute most of the source code and manage the design and development of the project on a voluntary basis. When the study started, approximately half of the core team members (i.e., nine individuals) comprised the project\u2019s R&D committee, the members of which also belonged to the project\u2019s other teams and working groups. Moreover, the members of this committee could be described as the project\u2019s central coordination body, as their responsibilities included (i) supervising and coordinating the development of the software; (ii) providing knowledge, contacts, and financial support; and (iii) supervising and supporting the community-driven teams. We chose the committee as a point of departure for the study because of these responsibilities. With 85.5% of their discussions focussing on governance issues (Table 1), the relevance of the R&D committee members as informants was undeniable. In addition to interviewing seven R&D committee members, two core team members were interviewed because\nthey were directly involved with specific organisational changes before joining the core team (i.e., when they still belonged only to the community\u2019s periphery). As the study unfolded, hundreds of other informants pertaining to the community\u2019s periphery became involved through observations of relevant mailing lists on the TYPO3 website (Table 2).\n\n--- Table 1 ---\n\n--- Figure 1 ---\n\nStarting in the year 2003, TYPO3 began to grow fast, and the number of registered developers doubled each year from 2003 to 2005. This continuous growth trend set the stage for the community changes that are the focus of this study. The time lag between the growth registered from 2003 to 2005 (Figure 1) and the start of the data collection process in 2006 was necessary to see how the community would respond to this growth.\n\nData sources\n\nMultiple sources of data (Table 2) were employed to strengthen the design of the study and to capture the complexities of the case in question. These data sources allowed us to triangulate the data and validate the theoretical constructs. The data were collected on several occasions between 2006 and 2010. When the study began, TYPO3 was addressing organisational issues that had surfaced because of the growing size of the community. However, we soon discovered TYPO3 had experienced other organisational challenges in the past. Therefore, learning about the project\u2019s history and its prior development was just as important as illuminating its current development.\nWe collected our data through interviews, observations of face-to-face R&D committee meetings, three relevant community mailing lists, and archival data. An introductory interview with the project founder, who also acted as the project leader from 2000 to 2007, provided a deeper understanding of the community, its history, its development up to that point, its structure, its internal work processes, its products, and its current and future strategies. The rest of the interviews with the community manager of the TYPO3 Association, the R&D committee, and core team members\u2014some of whom had only recently made a move from the periphery to the core of the community\u2014were focussed on managing deliberate changes in TYPO3. The interviews addressed the following main themes: (i) change initiatives; (ii) activities, roles, and practices related to the identified change initiatives; (iii) motivation; and (iv) background. The same interview guide was used throughout the process, but as new relevant information emerged about specific community changes, additional questions were incorporated into the following interviews. The interviews, which lasted about 60 minutes on average, were recorded and transcribed.\n\nFurthermore, over a two-day period in 2006, more than 18 hours were spent observing face-to-face meetings among R&D committee members. This method yielded insights into a range of organisational issues related to the community\u2019s development and the background for the deliberate change initiatives.\n\nA review of 235 posts from the R&D committee mailing list gave access to the content and type of discussions, the contributions and roles of various individuals, and work coordination and delegation. In particular, this source of information allowed us to obtain a deeper\nunderstanding of the organisational challenges facing the community during that time period and how those challenges were resolved.\n\nThe interviews, the observations of the R&D committee\u2019s meetings, and the R&D committee mailing list together led to the uncovering of a number of change processes in the TYPO3 community. Additional relevant mailing list data (namely, the human-computer interaction (HCI) team\u2019s mailing list and the core team\u2019s mailing list) were included in the data collection. Using archival data allowed us to cross-check some of the facts uncovered during the observation activities and interviews.\n\nData analysis\n\nSince we were interested in both *if*, *how* and *why* deliberate changes are possible in a specific context, a case study design was deemed appropriate. More specifically, when studying contemporary activities and/or events over which the researcher has no (or very limited) control, a case study research is the obvious choice (Yin, 1994). Qualitative techniques were used to analyse the data (Eisenhardt, 1989; Miles & Huberman, 1984; Strauss & Corbin, 1998). Overall, the analysis focussed on organisational practices, change, and structuring while paying specific attention to grounded concepts and proceeded in three steps. First, we constructed case studies (Eisenhardt 1989) for each identified organisational change initiative. We focussed on major change initiatives that affected the entire community. At the time of the study, four change initiatives were ongoing: (i) reorganisation of product development, (ii) establishment of a non-profit organisation called the TYPO3 Association (a central hub from which to support active developers), (iii) installation of usability as a mindset (thus replacing the strong technical mindset...\nin the community), and (iv) restructuring of the entire community to create more efficiency through a more transparent structure with clear responsibilities and increased team autonomy. Although the general character of three of the initiatives was structural and one of them was cultural (the usability initiative), all of the changes involved changes in both structures and practices.\n\nSecond, we divided the coding process into open, axial, and selective coding and employed a constant comparative method within each coding phase to identify the concepts and relationships relevant to each type of change (Locke, 2001; Strauss & Corbin, 1998). Third, a cross-case analysis (Eisenhardt, 1989; Miles & Huberman, 1984) was used to identify any similarities and differences across the three change types. This process was repeated several times. Each time, the resulting conceptual insights were refined and further developed. The analysis generated four core categories that represent the mechanisms employed by TYPO3 to address deliberate changes (Table 4).\n\nThe interviews, the observations from the R&D committee meetings, and the data from the three mailing lists enabled us to determine precisely the timing and order of deliberate changes and their intended effects. The same data sources were used to trace the unintended, emergent effects of the identified deliberate changes. However, the three mailing lists, which documented the reactions (or lack of reactions) of the entire community, played a central role. The interviews played a central role in establishing the timeline for the parts of the change processes (e.g., decision making) that took place offline. The preliminary findings were presented and discussed with the project leader and two core team members, who provided valuable comments that confirmed and elaborated upon the uncovered theoretical constructs.\nFindings\n\nWe observed multiple change initiatives in the community, some of them successful, some less successful. The most significant of these are summarised in Table 3. Change agents played a decisive role in all key tasks of the observed change management processes: recognition, decision making, and implementation. In the observed initiatives, all but one change agent originated from the community\u2019s core. One reason for the prevalence of the core member change agents might be the fact that the identified initiatives were major and, as such, expected to have a wide-scale effect on the community.\n\n--- Table 3 ---\n\nBelow, we sketch the four change initiatives (Table 3) by elaborating (i) the aims of each initiative, (ii) what made them deliberate, (iii) specifying the change agents, and (iv) whether the implementation was successful.\n\nThe first change initiative \u201cReorganization of product development\u201d was launched because the product development process was inefficient. It was characterized by a lack of release discussions between the core and the community, the community\u2019s failure to test enough different software versions, failure to read existing instructions about different project contributions (i.e. release management procedures, testing instructions), and poor planning of subprojects (e.g. too many postponements, unrealistic deadlines). For its part, the Core Team did not have the capacity to respond to all of the inquiries, project proposals and general input. A meeting was arranged where potential solutions were discussed, demonstrating explicit intent to plan and execute the needed change. A Core Team and R&D Committee member, who was in\ncharge of the software release process at that time proposed a solution, which was subsequently adopted. Release management was consequently improved by introducing a rotating release manager function in July 2007. During this change process the R&D Committee\u2019s tasks were taken over by the Core Team and one hierarchical layer got removed. This created more flexibility and readiness for the Core Team, and easier access for new contributions. Additionally, the core development mailing list was opened and created a direct communication channel between the core and periphery. The activity level increased drastically on the mailing list and this initiative more than doubled the amount of incoming patches to the core list and thus freed the Core Team members to also be able to pursue larger projects to a much higher extent than before. The initiative was thus successfully implemented.\n\nThe second change initiative \u201cFounding of a non-profit organization called the TYPO3 Association\u201d intended to create a committee structure, which resembled a functional organizational structure. It consisted in establishing a non-profit organization called the TYPO3 Association and was initiated by the project founder. This complex task demanded deliberate action and took many discussions, especially during the Core Team meetings and TYPO3 conferences. The main goals of the Association were to support core development on a steadier basis and improve the efficiency of the project by \u201cproviding a central hub from which to support active developers as well as to concentrate its members into a pool of regular contributors\u201d (mailing list). The TYPO3 Association was meant to support core development by providing funds to take care of the development that was not taken care of by the commercial interests. One way was through donations, i.e. individuals who earn their income (or part of it) by using this open source software choose to give some of this income back to the community in form of donations. Another way was membership, i.e. firms and individuals could become members of\nthe Association by paying an annual fee, which was used to sponsor software development in TYPO3. Furthermore, the Association was able to create transparency regarding decision-making, roles and activities. The change initiative was thus successfully implemented and the Association created a period of growth under goal-oriented and integrative leadership of the board whose chairman was the project leader.\n\nThe third change initiative \u201cNew team structure\u201d was a deliberate and direct response to the rapid community growth. The project founder was the change agent behind this initiative that sought to make particular responsibilities and tasks explicit in order to create more transparency in project activities (and not only at the upper echelons of the Association). At the team level, therefore, it was determined that the following should apply to team leaders\u2019 tasks: (i) leaders are solely responsible for the team; (ii) members are appointed/accepted by the leader; (iii) decisions are made by the leader (however, agreement is sought with the team members as far as possible); (iv) delegation of tasks is encouraged; and (v) a minimum timeframe is set for the leader\u2019s response to team members\u2019 requests. By defining responsibilities, the community attempted to introduce a measure of accountability in team performance, which was considered vital in this virtual context due to the voluntary nature of participation. To formalize responsibilities and tasks, the project founder thus introduced \u201cteam contracts\u201d. These contracts served the purpose of creating synergy between the already existing teams through elaboration of a written mission statement, which, as a minimum, contained the following team information: the team\u2019s position in the organizational structure (i.e. to which committee or project does the team belong?), a description of the team\u2019s mission, a specification of the team\u2019s responsibilities, the name of the team leader, and the rules for becoming a team member. Although these contracts were introduced, tasks were still taken on by self-assignment. The motive underlying the team\ncontracts was to define two aspects: responsibility and authority. However, team contracts never really gained momentum and attempts at introducing formal authority at the team level did not succeed either. The initiative failed because the attempted structure left too few degrees of freedom to the project contributors. The type of executed authority resembled that of hierarchy (Demil & Lecocq, 2006; Powell, 1990) and unintentionally led to authority erosion. This accentuated the need for more autonomy with regard to following one\u2019s own \u201cpersonal itch\u201d.\n\nFinally, the aim of fourth initiative \u201cInstalling usability as a mindset\u201d was to redirect the project\u2019s focus towards product usability. At the time, the project\u2019s focus was almost entirely technical in nature, which limited the product\u2019s appeal to those customer segments with low technical skills, e.g., a secretary who edits the content on a company website: \u201cA lot of OSS is created by technicians for technicians. [\u2026] And then there are those [users] who use [the software] every third week. They don\u2019t demand that many functions; they demand that they don\u2019t need to remember how [the software] works because they are only using it every third week\u201d (interview, project founder).\n\nThe wish to introduce a greater degree of product usability was put forward by a newcomer to the TYPO3 community in 2001. This newcomer, i.e. a periphery member of the community, became the change agent, who made an explicit decision to launch a process of change, making this initiative a case of deliberate change. He was a software designer by profession and realized the need for TYPO3 to improve its design. The idea remained in the background until 2006, when the project leader established the human-computer interaction (HCI) team and an appertaining mailing list, which was intended to act as \u201cthe melting pot for ideas about usability improvements\u201d (the HCI team mailing list). However, the progress was slow. A breakthrough first came about when the change agent started making a more focussed\neffort to implement the usability idea. In the end, the change initiative was successfully implemented.\n\nWhile our findings are based on the analysis of all the observed initiatives in the community, we selected the fourth initiative \u201cInstalling usability as a mindset\u201d as a representative initiative to illustrate the general traits of the organisational change mechanisms that drove the success of the change initiatives. By focusing the presentation of the study\u2019s results on one particular change initiative our intention was to promote clarity and comprehensibility of the findings.\n\nIn the following, we present our findings, which consist of the four mechanisms that our analysis revealed as central drivers of successful, deliberate change management in the community (Table 4).\n\n--- Table 4 ---\n\n**Individual initiative**\n\nOur data first of all reveal the community cannot be expected to embrace a change initiative\u2014regardless of its inherent value to the community\u2014unless there is a persistent change agent who will bring the initiative from the point of inception to successful implementation. This is a direct consequence of the absence of formal power and hierarchical influence in OSS communities. Since community members cannot be ordered to do something, they have to be persuaded to become active. The change agent of the HCI project expressed the difficulties in doing so by saying, \u201cYou can find developers that are interested in [design] topics, but you don\u2019t\nreally get very far. And that\u2019s what we experienced with the HCI team\u2026a lot\u2019 (interview, change agent).\n\nEven if a change agent has the right idea and engages with the right community members, this is not enough to set the change in motion. As a consequence, the change agent persevered for four years before the concept of usability penetrated the prevailing mindset and culture of the community. Persistence involves a high dose of patience, primarily because the community also needs time to adapt to organisational changes. This need was pointed out by one core member of TYPO3: \u201cThere is a gap between the design of the organisation and letting the organisation accumulate around the design\u2026giving time to people to flock to the teams\u201d (R&D committee meeting, core member).\n\nWe found clear indications that it is less about organisational planning and decision making and more about individual effort and achievement that motivate community members to contribute to a change initiative.\n\nDo decisions matter in OS [communities]? No. The one thing that matters is what is actually done. Post factum situation. By doing things, people make decisions. If we make a decision, it doesn\u2019t mean that people will be motivated to implement it, work by it. The only thing that matters is action. Consult people, hook them up with knowledge and resources, and hope that they do what you would like, what you expect\u2026 we should think of ourselves as service providers. (R&D committee meeting, core member)\n\nThis was one of the key statements of our investigation, outlining the structure of an individual initiative as clearly as possible. This view was also supported by a project founder of TYPO3 with the short statement, \u201cFirst you have to do things yourself, and then others will follow\u201d (interview, project founder).\nBefore taking action, the change agent of the HCI project reflected upon what motivated him and other developers to do work for the project leader. He found a key driver was the project leader\u2019s \u201cfront guy and guru status\u201d and the fact that \u201che usually keeps his promises and is able to do huge workloads\u201d (interview, change agent). Based on this insight the change agent tried to motivate others to participate in the HCI team: \u201cI tried to find guys who were motivated by my work and then do work for me\u201d (interview, change agent). The success of this approach was evident already in 2007 when the change agent became the HCI team leader. This success was also recognised by other community members:\n\nSomeone from the usability mailing list comes up with a nifty and good-looking screenshot and proposes his usability changes to the core developers. They are fascinated and go implement it because it seems like a really great idea to them. Especially [the change agent] has been very successful with this way of getting his suggestions implemented, and now he\u2019s the HCI team leader. (interview, core member)\n\nAnd even:\n\nI don\u2019t know how many have seen the PDF [the change agent] produced, but I saw it and also met him in Frankfurt before the PHP conference ([core team member name] and I joined a meeting of him and [project leader])\u2014and there is hard and impressive work being done. (core team mailing list, core member)\n\nIn the end we found the role of change agents in communities is similar to that of product champions who experience progress over time only through persistent and enthusiastic effort (Tushman & Anderson, 1986). Persistence and leading by example are traits that define a change agent\u2019s degree of individual initiative. Persistent change agents who are able to self-motivate and self-direct their performance, i.e., to exercise self-leadership (Manz, 1986), are an essential part of any organisational change initiative in OSS communities because it takes a great deal of time and persuasion to garner acceptance and support for any organisational change.\nA change agent demonstrating high levels of commitment (personal motivation and skills) may develop mutual, cognitive-based trust, which, in turn, may strengthen the community members\u2019 readiness to engage and collaborate (Chowdhury, 2005; McAllister, 1995). Thus, we put forward the following proposition, which is grounded in the above and similar behaviours observed in the other three change initiatives (Table 3):\n\n**Proposition 1:** The individual initiative of change agents is positively related to a successful implementation of deliberate organisational change initiatives in communities.\n\n**Reputation and reputation lending**\n\nPower struggles were visible during the change process for each initiative. For instance, during the observed R&D committee meeting, one member left the room because he was frustrated the rest of the group did not support his views. He was arguing against an excessively predetermined team structure, which was about to be implemented. However, he lost the debate because he was arguing against the stance of the change agent responsible for the particular change initiative, who had a higher status within the community. It was later revealed the opposing member was actually right and the team structure was, in fact, too prescriptive. This example shows how difficult it is to accomplish anything without the support of community members with higher social statuses. This difficulty exists even when the difference in social status between the change agent and the supporting high-status member is rather low (e.g., when both were members of the core team).\n\nWe find that, by lending their reputations to lower-status members, high-status members can share their influence. This was clearly recognised by the project founder: \u201cAnd then, it is\nclear that for those individuals who have that kind of naturally given power, as I for example have, it is natural that other individuals whom we appoint and those close to us easily gain influence\u201d (interview, project founder).\n\nIn situations where a change agent has a rather lower status in the community, as was the case in the early days of the HCI team, the change agent can gain influence by teaming up with one or more community members who enjoy a high-status reputation.\n\nIn the case of the HCI team, the change agent \u201cdid a lot of work for [the project founder]\u201d to establish himself as a worthy community member. Eventually, he was invited to a TYPO3 Board meeting to discuss usability issues: \u201cWith [the project founder] at [the ] T3 Board we talked about why Drupal is easier than TYPO3 or why WordPress is easier than TYPO3\u201d. By linking to high-status members in this way, the change agent gained respect and support from the high-status core members. They addressed the change agent in complimentary terms and praised his work: \u201cAs the usability guru, please give me your feedback on the description of the two mentioned features in the page tree below\u2026\u201d (core team mailing list, core member).\n\nBut after he was appointed HCI team leader, it was evident the he had not yet gained the same respect from other members, as they were systematically circumventing the HCI team and instead discussed the usability issues on the core team\u2019s mailing list. An effort was made to redirect the attention towards the HCI team, in particular towards the role of the change agent, endorsing him and building his authority. Some examples of that include:\n\n[By the way], this is [user interface] change, so it can be committed only if you get approval from [the change agent]. (core team mailing list, core member)\nI agree with all this but we do not have anyone else properly educated in these questions. I do not trust anyone else in [the] HCI field for TYPO3 because no one showed good HCI skills so far. [The change agent] is the only one who did. (core team mailing list, core member)\n\nYou might also have watched the podcast issue [2] where [the change agent] demonstrates some great ideas about usability improvements in TYPO3 or have seen the PDF [3]. (core team mailing list, core member)\n\nIn the subsequent period the activity levels in the HCI team increased significantly. However, there seemed to be no obvious relationship between the content of the change initiatives and the skills of the high-status members supporting the initiatives. This finding implies a potential spillover effect between reputations rooted in technical contributions and reputations rooted in organisational contributions.\n\nThere were also instances when high-status members (e.g., project and team leaders, core team members, and other respected members) met the change agents halfway. Our data show the leaders in TYPO3 work with the community\u2019s initiatives through a process of mutual adjustments. The leaders notice promising initiatives, assess them, and try to provide them with the necessary resources:\n\nI tried to motivate him to build a team around that. I just noticed him. In this way, I try to enable people to work. It\u2019s a bit intuitive also. I [have been] working already for ten years on this system, so the foundation for something like this was probably already laid a couple of years back. (interview, community manager)\n\nThis type of leadership emphasises intuition and alertness. The main task consists of providing support for change initiatives in the form of knowledge and resources without making decisions on behalf of the community members. Rather, the leaders establish the infrastructure and framework that will hopefully assist the community change agents in paving the way for the intended improvements and changes.\nHigh-status members lend their lateral authority and reputation to a change agent by providing any type of visible support, even if it is only verbal in nature. One reason this method works is that high-status members\u2019 support provides the change agent with credibility, which is crucial if the initiative is to stand a chance of being implemented (Markus & Benjamin, 1996). This finding further suggests community leadership is shared via reputation lending, which also facilitates organisational changes in communities. Therefore, based on the above and similar behaviours observed in the other three initiatives (Table 3) we make the following prediction:\n\nProposition 2: Reputation lending (from high status to lower status members) is positively related to a successful implementation of deliberate organisational change initiatives in communities.\n\nChange-oriented communication\n\nWe found communication about change initiatives was essential to their successful implementation. Through meetings and presentations to small and large target audiences at various community events, change agents in TYPO3 communicated the rationales and arguments behind the initiatives. Still, it took the change agent behind the HCI initiative a long time to realise communicating the idea about usability was vital to its success. The change agent attracted support for the usability initiative by communicating (in a change-oriented fashion) the basic ideas behind the concept in several rounds of presentations to the developer community: \u201cThis is why [the project founder] and I decided that maybe we just need to find out how we can change that point of view to guide developers in a different direction\u2014so a typical marketing and communication thing\u201d (interview, change agent).\nFrom 2007 to 2008, the change agent tried to motivate the community by communicating the relevance of usability to TYPO3 through presentations at the community\u2019s main yearly events.\n\nThe first presentation was just about usability flaws, ten major usability flaws [\u2026] at the Developer Days in 2007. Then, in 2008, at T3Con, I held a presentation about what can be done in a positive way with usability, solutions and future interfaces like, for example, the interfaces in \u201cMinority Report\u201d [\u2026]. If I look back, that was the second phase to motivate [people], saying, \u201cLook, that\u2019s possible if we work together\u201d, and \u201cWouldn\u2019t it be fun to have some amazing interfaces in there?\u201d (interview, change agent)\n\nIn all observed projects, the presentations helped change agents to gain the community\u2019s trust in them and their capabilities.\n\nAfter I showed them [through presentations] that it could really get done, they kind of trusted in the words I said. Because usually it\u2019s a very inner circle, only developers with developers, so they could trust each other. They have the same language. But now, there comes this strange design guy and he says, \u201cYou are doing everything wrong; you have to change everything, and you don\u2019t even have the knowledge to understand what you are doing wrong.\u201d That doesn\u2019t really end in trust. (interview, change agent)\n\nIn addition to establishing the trustworthiness of the change agent (Gurtman, 1992), the change-oriented communication process in TYPO3 also helped stimulate the community members to participate because the process also aimed to educate the target audience about the attempted changes. The community developers were the target: \u201cThen through the Usability Week, we started, in some way, to educate [people]\u201d (interview, change agent).\n\nThis facilitation of community participation resembles a particular dimension of shared leadership, called voice, which is known to increase a person\u2019s social influence among the members of a community (Carson, Tesluk, & Marrone, 2007). During the change initiatives, which had a successful outcome, the change agents excelled at initiating and facilitating\nconstructive, change-oriented dialogue and debates around how the community should achieve the needed changes. Thus, voice boosted the change agents\u2019 level of social influence by increasing immersion and participation through various means, such as opening the core team\u2019s mailing list (under a set of rules) to the rest of the community, implementing rotating release managers, presenting ideas at community events, and establishing Usability Week. Voice in the form of change-oriented communication may be associated with successful change implementations because voice is based on interpersonal events that promote communication and feedback, which, according to Ryan and Deci (1985), catalyse feelings of competence and thereby stimulate intrinsic motivation. Based on the above and on similar behaviours exhibited in the other three initiatives (Table 3), we make the following prediction:\n\n**Proposition 3:** Change-oriented communication is positively related to a successful implementation of deliberate organisational change initiatives in communities.\n\n**Motivation through challenging tasks**\n\nBecause of the self-assignment principle (Crowston et al., 2007), one of the major challenges in open-source communities is motivating developers to work on tasks that are uninteresting but necessary to complete (Lakhani & von Hippel, 2003). We can see this problem extends to organisational change initiatives. This was also recognised by the change agent of the HCI project, \u201c[\u2026] usability topics are not really challenging for developers usually. It\u2019s about removing staff, making staff simple, and that\u2019s usually not the challenge for developers. It\u2019s a challenge for me as designer\u201d (interview, change agent). The resulting challenge was put more\ngenerally by one member of the core team: \u201cWe were uncertain how to get people to do some of the more boring and time-consuming, but essential, tasks\u201d (interview, core team member).\n\nWorking with usability demanded the developers overcome three fundamental tasks. First, the developers needed to become motivated to work on usability issues. Second, the TYPO3 community had to attract skilled software designers who possessed the necessary knowledge regarding usability. Third, the change agent had to find a way to stimulate the developers to follow the designers\u2019 recommendations.\n\nTo motivate developers to work on usability issues, the change agent came up with the idea to create \u201cfake challenges [\u2026] to motivate them to finish the goals\u201d (interview, change agent). His approach was based on the idea that developers would be more willing to work on their tasks if they perceived them to be challenging.\n\nAfter a while I came up with the idea to have a \u2018Usability Week\u2019. The concept was pretty simple. I rented a castle for one week, and I locked 30 developers in that castle, and they had a certain task they needed to solve within that one week. So, the challenge was there in some way because they needed to solve the problem in one week, which is kind of tough because the problems I took [on] were too huge to solve in one week. So, there was a challenge even if the task was simple because they had time pressure. (interview, change agent)\n\nDuring Usability Week, five mixed teams were created. Each team consisted of three developers, one core developer, one manager, and one designer. Each day of the event three meetings took place. The meetings were designed to streamline the tasks and motivate the teams.\n\nTo attract designers to the TYPO3 community and the usability project, the change agent used a different set of tools. He created an entrance barrier that the designers needed to overcome before they could join the community.\nMy major wish through that Usability Week wasn\u2019t to solve those tasks but to find more designers who [were] able and motivated to join the TYPO3 community. My idea to make it more interesting to them was, again, to make it a little bit more complicated because they had to apply to the Usability Week. So, we had about 60 or 70 applications and only 30 places. In the end, only five designers out of 50 could join, and they were somehow charmed because they could attend and others couldn\u2019t. It really worked out and they really stuck to the project and until today [are] doing some design work. (interview, change agent)\n\nFinally, to motivate the developers, the change agent needed to make the tasks related to usability issues more challenging. He achieved this by incorporating (i) novel task structure and content and (ii) freedom to execute the tasks in a different way than usual into simple problems. By doing so, the change agent successfully motivated the developers to solve those problems.\n\nFor example, to structure a website we have something called a \u2018page tree\u2019, which looks like the tree in Explorer on your Windows machine, and that\u2019s kind of very old style, how it is done [\u2026]. However, there is a framework called XJS, written in Java Script, and that is interesting for developers because it\u2019s a new technology in some way and a new framework, and it\u2019s hard to implement, and they need to change a lot. So, I decided that they should use XJS for that page tree, even if we don\u2019t need it, but then I would be sure that in the end I would have the page tree I wished to have and they would have a challenging task to actually do it instead of writing some lines by themselves to change [the page tree]. (interview, change agent)\n\nWe really had the freedom to totally change the core\u2026 Actually, the way [\u2026] we worked\u2026 we [were] taking the beta version of 3.9 back in time, and we just coded anything we liked inside the core. Usually, someone who creates an extension is [told] \u201cnever touch any core file\u201d, [but here] we could really go deeply inside and delete files, replace files totally, and we did not have to focus on keeping [it] compatible with the old code and being compatible with the old [\u2026] extensions. (interview, developer)\n\nIn the case of the HCI project, Usability Week turned out to be quite successful:\n\nThey were challenged by whether they could reach the goals. This really moved the project hugely forward in one week [\u2026] In the end, I have to say, we didn\u2019t reach any of our goals [\u2026] But they\ngot pretty far, and it really gave the whole [usability] project a new motivation. (interview, change agent)\n\nThe self-assignment of tasks, which is the prime mechanism for work division and task allocation in OSS communities, is obviously an issue if the tasks do not attract enough interest and, consequently, remain undone. Task challenge here refers to a continuum ranging from low-to high-stimulation tasks (e.g., highly routinized tasks versus non-standardized, original tasks). The case of TYPO3 shows that increases in task challenge due to, for example, entrance barriers, competition, level of within-task stimulation, task novelty, or freedom to execute a task in a new way, can compensate for an initial lack of personal desire, which would normally drive the self-assignment of tasks. Our analysis shows that in the case of tasks related to the implementation of organisational change initiatives, the change agent needs to increase the perceived task challenge in accordance with the skills and interests of the targeted members. Thus, task challenge should be seen as a dynamic factor dependent on the person-task interaction (Campbell, 1988). Task challenge is associated with increased participation because it appeals to intrinsic motivation, the primary motivational factor in open-source communities (Lakhani & Wolf, 2005). In turn, increased participation improves performance (Hackman & Oldham, 1976; Herzberg, 1959). Furthermore, creating entrance barriers to team membership proved effective at activating a sense of achievement and recognition as stimuli (Herzberg, 1959). Hence, based on the above and the other three observed change initiatives (Table 3) we make the following prediction:\n\nProposition 4: Increased task challenge is positively related to a successful implementation of deliberate organisational change initiatives in communities.\n\nDiscussion\nThis study offers the first comprehensive investigation of deliberate change in OSS communities. It presents clear indications that OSS communities are indeed capable of changing deliberately and, therefore, not doomed to fail in the long run. A change is deliberate because it is desired by a community member\u2014the change agent\u2014and then supported by a sufficient coalition within the community; in the observed HCI project, the change initiative was carried out with the clear goal of improving the usability of TYPO3.\n\nOur study also shows that in OSS communities deliberate change is highly dependent on change agents who play an essential role in managing the key tasks of change processes: (i) change agents recognise the need for change and translate that into organisational goals; (ii) they create a sense of urgency and convince community members to make decisions in this matter; and (iii) they push the change process and ensure things are getting\u2014often by doing things on their own. This is a clear contrast to hierarchical business organisations, where change is mostly driven by leaders with positional power and/or special functions and change agents only play a secondary role. Against this background, this study of deliberate change in OSS communities focuses on the investigation of change agents and the success drivers of their initiatives. The insights of this study can be summarised in a simple model:\n\n--- Figure 2 ---\n\nThese findings are first of all relevant for the research on non-hierarchical organizational settings such as OSS communities. They provide insights into an area that was vastly under-researched so far. In addition, knowledge of change is as important for collaborative communities as it is for traditional business organisations because (i) it allows designing change processes more purposefully and (ii) it provides insights into the long-term behaviour of collaborative\ncommunities in relation to their (competitive) environment. As long as they are based on a similar governance structure, there is good reason to assume these findings also apply to other types of communities of practice not related to software development (Bridwell-Mitchell, 2015). This gives a broader relevance to our findings since the importance of communities is increasing in an information- and knowledge-based economy (O\u2019Mahony & Ferraro, 2007).\n\nHowever, the findings of this study also include some quite interesting and relevant findings that go beyond communities and also concern change processes in traditional business organisations. In this way, our paper can also contribute to the broader change literature. The elements of the above change model are not all completely new. We already know about change agents, informal power and leadership from investigations of other contexts. What is new and important, however, is that the complete absence of formal power does not prevent the execution of deliberate change and the critical role of change agents to drive the process. OSS project leaders and core team members do not have formal command authority to enforce decisions (von Hippel & von Krogh, 2003). This is also clearly illustrated by especially the third change initiative \u201cNew team structure\u201d (Table 3), in which the project leader and founder was the change agent. Although he kept the team contracts on the agenda for two years, he was unable to implement this initiative. Had he had any kind of formal fiat in the community, this initiative would probably have lead to a different outcome. But OSS communities \u201cdo not rely on employment contracts and so are unable to be governed by formal authority, as is the case in a hierarchy\u201d (Demil & Lecocq, 2006, p. 1454). This allows for some quite interesting perspectives and insights.\n\nThe first important finding is the apparent irrelevance of decision making in a hierarchical sense, as expressed by community members. This point needs some clarification. It does not\nmean there is no deliberate planning or decision making taking place in OSS communities. Instead, these statements relate to their power structure. In his article, Finkelstein (1992) distinguished various forms of management power. As outlined above, OSS communities are characterised by the inherent absence of formal power (\u2018structural power\u2019 in the terminology of Finkelstein, 1992, p. 509, i.e., the \u201clegislative right to exert influence\u201d over others). Other forms of informal power, like \u2018expert power\u2019 and \u2018prestige power\u2019 not only exist in OSS communities, but they play an important role in the informal leadership that provides the foundation for the significance of the community\u2019s core team (Fleming & Waguespack, 2007; O\u2019Mahony & Ferraro, 2007). Individual initiative (proposition 1) as a mechanism of change resembles some change factors observed in \u2018traditional\u2019 organizations with formal leadership (i.e. hierarchies, Demil & Lecocq, 2006). Similarly to community change agents, agents in hierarchies make use of exemplary change or leading by example (Kotter, 2012). Also, individual initiative bears resemblance to the tasks performed by change champions (Ulrich, 1997) and product champions (Day, 1994), such as providing impetus for and strongly promoting the change initiative. However, the apparent irrelevance of decision making in community change points to a structural power deficit of change agents with regard to change initiatives. Change agents are able to convince relevant community members, decisions are made, and tasks are distributed, but this does not often result in action. In these situations, decisions are only relevant to legitimise the activities of change agents, not to trigger action. Often, change agents have to keep pushing to get things done; in other cases, they have to complete the tasks themselves. Against this background, individual initiative is a strategy to exert influence without formal power. Yet, it has to be noted this strategy only works locally, and informal power is still needed by change agents at other points. Individual initiative might even result in the acquisition of expert and prestige power because it makes change agents and their abilities visible. To date, the meaning of individual\ninitiative and the structure of low-power contexts are not very well understood. It might be expected that individual initiative also plays a role in high-power contexts as a strategy to exert influence without power. However, more research is needed in this regard.\n\nAnother interesting point is the observations of what we have named \u2018reputation lending\u2019 (proposition 2). There is already some research on reputation and advancement in communities and other organisations without vertical lines of authority (Fleming & Waguespack, 2007). Research knows a lot about (i) what authority means for flat hierarchies and (ii) how authority is acquired there (Dahlander & O\u2019Mahony, 2011). In the context of hierarchies, reputation lending parallels coalition formation, support building and gaining sponsorship from individuals with organizational clout, formal authority, and access to resources (Connor, 1998; Day, 1994; Kanter, 1994; Kotter, 2012). Such actions help legitimize the change initiative and the change agent as well as create acceptance of change by those affected (Buchanan & Boddy, 1992). Conceptually, reputation lending is also somewhat close to leader support in hierarchies (Amabile, Schatzel, Moneta, & Kramer, 2004). Leader support means using the formal power of managers to support activities by less-powerful organisational members, often in relation to innovation and change activities. This support can include resources and time, autonomy, and support in organisational decision making (Mumford, Scott, Gaddis, & Strange, 2002). In contrast, reputation lending implies using the informal power of community leaders to support change agents in their activities, mostly by giving them recognition, letting them participate in board meetings and decision-making procedures, and making them and their initiatives more visible in the community. This informal form of support has not been described so far in the literature. Still, this is interesting because the elements of visibility and acceptance play only a minor role in\nleader support. This finding indirectly confirms the research showing the importance of informal networks and policy systems for change agent success (Battilana & Casciaro, 2012).\n\nWe also discovered interesting findings with regards to the motivation of community members to carry out change-related tasks. As discussed in the conceptual section above, motivation has already been the focus of previous research. Lakhani and von Hippel (2003) found that participation in OSS communities is quite rewarding since \u201c98% of the effort expended by information providers in fact returns direct learning benefits to those providers\u201d (p. 923). However, we observed there are change-related tasks that are not rewarding and that it is rather challenging to motivate community members to work on them. In this regard, we observed the strategy of so-called \u2018fake challenges\u2019 (proposition 4). The underlying approach is to combine unattractive tasks with motivating elements like competitions or social gatherings. There is an interesting early description of the principle: the fence episode in the novel *The Adventures of Tom Sawyer* by Mark Twain (1876). Most readers perhaps remember: Tom had to paint Aunt Polly\u2019s fence as a punishment after he dirtied his clothes in a fight. He hated this work; however, when one of his friends came to the spot, Tom was able to create the impression that it was a privilege and a pleasure to paint the fence. After a while, he was even able to sell painting permissions to his fellows. In this sense, the change agent was successful in creating a sense of exclusivity by restricting spaces at the challenge and transformed boring work into a socially attractive event. To our knowledge, this strategy has not been described by research on OSS communities so far. Ultimately, the strategy of creating challenging tasks is expected to improve the community members understanding and sense of ownership of the change initiative, and eventually enhance their motivation to participate in executing change. In that sense, this approach has the same objective as, for instance, empowerment of organizational members,\nwhich is an important element in the change leadership literature within the context of hierarchies (Caldwell, 2003; Gill, 2003; Goffee & Scase, 1992). While both strategies thus seek to remove obstacles to change, they are in fact each other\u2019s opposites. One strategy uses task design to deal with the downsides of an innate characteristic of OSS communities, i.e. member autonomy. The other, however, seeks to increase member autonomy in a hierarchical setting, where strong administrative controls provide formal powers to supervise and regulate the behaviour of organizational members (Demil & Lecocq, 2006).\n\nAlthough change processes have been theorized about and practiced in a variety of ways, the one finding that deliberate change in OSS communities has mostly in common with change in hierarchies is related to change-oriented communication (proposition 3). Through frequent communication change agents create opportunities for organizational members to understand and give input to the change process (Kotter, 2012). Practicing openness and widespread communication (Buchanan & Boddy, 1992) during a change process increases the chance of successful implementation because organizational communication plays a central role in eroding existing path dependencies (Cohen & Levinthal, 1990), thus paving the way for organizational change.\n\nYet, the most important finding of this study is perhaps the very observation that OSS communities succeed in handling deliberate change processes without any formal or pre-assigned power. Certainly, informal power, persuasion, and group pressure are relevant to manage deliberate change in OSS communities to a certain extent. Situations can arise in which organisational members are faced with the decision to accept change or leave the community. Still, no community member can be ordered to accept change like in traditional business organisations. Nobody can be laid off, and sanctioning possibilities are generally very limited. If\ncommunity members comply with change, they do so because they believe in it or at least accept the majority decision. If a change project is not supported by a critical mass of the community, it will not be successful. We call this type of deliberate change \u2018change by conviction\u2019. Why is that relevant? If people comply with change voluntarily, there is a good chance negative side effects, resulting from enforcement, will be reduced (even though not completely eliminated because group members might submit to change unwillingly or leave the community). Indeed, we found some indications for that in our data, even though we were not directly looking for it. We are convinced these findings may also be applicable to hierarchical business organisations and that the latter can learn a lot from OSS communities to reduce the level of enforcement in change processes, thereby decreasing the levels of demotivation, insecurity, and resistance. Consequently, the relevance of our findings is much broader and does not only concern non-hierarchical settings such as OSS communities but helps shed additional light on deliberate organisational change in general. More research is, however, needed to substantiate these findings, clarify the impact of different elements of change on negative side effects, and explore possibilities for traditional business organisations.\n\nManagerial implications\n\nThe most obvious managerial implication is that communities need to be aware of the central role of change agents in deliberate change to organise change processes accordingly. This study emphasizes the role and importance of individuals taking initiatives and responsibilities by outlining some critical success factors for realizing deliberate change in non-hierarchical settings such as OSS communities.\nAnother implication is that hierarchical organizations need also reconsider their use and appreciation of change agents, including self-appointed ones. Change agents are already being used in hierarchical business organisations but often in an unsystematic way. However, the results of this study suggest it would be useful to base all major change projects on change agents here as well. After decisions have been made, change agents can simply be assigned and endowed with the necessary power or supported by top managers. Contrary to the non-hierarchical case analysed in this study, there is no specific individual initiative needed at this point in hierarchical organisations. Still, it might be important for change agents to care more than usual about the second driver in our model and build a reputation for being the right person to organise the change process among all organisational members involved in it. The two last drivers point to communication and education, as well as to motivation. We are convinced a lot can be done to smooth change projects in hierarchical business organisations, and it might be even possible to establish a regime of change by conviction there.\n\nLimitations and future research\n\nThe first limitation of this study is of theoretical nature. When investigating deliberate change in OSS communities, we are touching on a variety of different themes, including leadership, reputation building, informal power, motivation, innovation, and others. Each of these themes can be further developed, and many of them might potentially offer new insights. For the sake of rigour, we decided to focus on change, the meaning of change agents, and the drivers of change agent success. We have targeted this study primarily toward the research conversations on\ncommunities and on change. This is a decision that was made to keep the study focused and detailed.\n\nSecond, in this study we were not looking at organisational context factors that mediate the effect of the success drivers of change agent activities like the cultural context, size and age of the community, degree of formalisation, or others. We also did not look at the antecedents of change agent activities. This means our study is far from offering a complete model of change agent activity in communities. Still, we think our propositions can be useful stepping stones towards a more holistic model.\n\nAnalysing classic concepts and/or phenomena such as deliberate change under entirely different and new(er) organizational regimes is important as it not only helps to clarify how such organizational settings work, it also sheds new light on the phenomenon under investigation. In our study, the realization of the phenomenon manifested itself in the form of self-appointment of change agents. While this was necessary for the phenomenon to exist in a completely different and non-hierarchical organizational setting, it also holds potential for being applied in hierarchical settings.\n\n**Conclusion**\n\nThis study provides evidence that it is indeed possible to change complex organisations deliberately without formal power and hierarchical influence. All change initiatives we observed were grounded in the individual commitment of change agents. However, we also found the success of change agents\u2019 initiatives depended on their ability to get sufficient support within the organisation. Key drivers of this are individual initiative, reputation and reputation lending,\nchange-oriented communication and education, and motivation through challenging tasks. There is reason to assume these insights also hold for a broader range of organisations, including hierarchical business organisations. This is relevant because there are indications that change by conviction reduces the negative side effects of deliberate change.\n\nReferences\n\nAmabile, T. M., Schatzel, E. A., Moneta, G. B., & Kramer, S. J. (2004). Leader behaviors and the work environment for creativity: Perceived leader support. *Leadership Quarterly, 15*(1), 5-32.\n\nBattilana, J., & Casciaro, T. (2012). Change Agents, Networks, and Institutions: A Contingency Theory of Organizational Change. *Academy of Management Journal, 55*(2), 381-398.\n\nBonaccorsi, A., & Rossi, C. (2003). Why Open Source Software Can Succeed. *Research Policy, 32*, 1243-1258.\n\nBridwell-Mitchell, E. N. (2015). Collaborative Institutional Agency: How Peer Learning in Communities of Practice Enables and Inhibits Micro-Institutional Change. *Organization Studies*.\n\nBrown, A. D., Colville, I., & Pye, A. (2015). Making sense of sensemaking in organization studies. *Organization Studies, 36*(2), 265-277.\n\nBuchanan, D., & Boddy, D. (1992). *The expertise of the change agent*. London: Prentice Hall.\n\nBullock, R. J., & Batten, D. (1985). It's Just a Phase We're Going Through: A Review and Synthesis of OD Phase Analysis. *Group & Organization Studies, 10*(4), 383-412.\n\nBurnes, B. (1996). No such thing as ... a \"one best way\" to manage organizational change. *Management Decision, 34*(10), 11.\nBurnes, B. (2009). *Managing change: a strategic approach to organisational dynamics* (5th ed.). Harlow, England; New York: Prentice Hall/Financial Times.\n\nBy, R. T. (2005). Organisational change management: A critical review. *Journal of Change Management, 5*(4), 369-380.\n\nCaldwell, R. (2003). Models of change agency: A fourfold classification. *British Journal of Management, 14*, 131-142.\n\nCampbell, D. J. (1988). Task complexity: A review and analysis. *The Academy of Management Review, 13*(1), 40-52.\n\nCarson, J. B., Tesluk, P. E., & Marrone, J. A. (2007). Shared leadership in teams: An investigation of antecedent conditions and performance. *Academy of Management Journal, 50*(5), 1217-1234.\n\nChowdhury, S. (2005). The role of affect- and cognition-based trust in complex knowledge sharing. *Journal of Managerial Issues, 17*(3), 310-327.\n\nCohen, W. M., & Levinthal, D. A. (1990). Absorptive capacity: a new perspective on learning and innovation. *Administrative Science Quarterly, 35*(1), 128-152.\n\nConnor, D. R. (1998). *Managing at the speed of change*. Chichester, UK: John Wiley & Sons.\n\nCromie, J. G., & Ewing, M. T. (2009). The rejection of brand hegemony. *Journal of Business Research, 62*, 218-230.\n\nCrowston, K., Li, Q., Wei, K., Eseryel, U. Y., & Howison, J. (2007). Self-organization of teams for free/libre open source software development. *Information and Software Technology, 49*, 564\u2013575.\n\nDahlander, L., & O'Mahony, S. (2011). Progressing to the Center: Coordinating Project Work. *Organization Science, 22*(4), 961-979. doi: 10.1287/orsc.1100.0571\nDay, D. (1994). Raising radicals: Different processes for championing innovative corporate ventures. *Organization Science, 5*, 148-173.\n\nDe Souza, G., & Klein, H. J. (1995). Emergent leadership in the group goal-setting process (English). *Small group research, 26*(4), 475-496.\n\ndel Val, M. P. (2003). Resistance to change: a literature review and empirical study. *Management Decision, 41*(2), 148.\n\nDemil, B., & Lecocq, X. (2006). Neither market nor hierarchy nor network: The emergence of bazaar governance. *Organization Studies, 27*(10), 1447-1466.\n\nDunphy, D., & Stace, D. (1993). The strategic management of corporate change. *Human Relations, 46*(8), 905-920.\n\nEisenhardt, K. M. (1989). Building theories from case study research. *Academy of Management Review, 14*(4), 532.\n\nEtzioni, A. (1964). *Modern organization*. Englewood Cliffs, N.J.: Prentice-Hall, Inc.\n\nFinkelstein, S. (1992). Power in top management teams: dimensions, measurement, and validation. *Academy of Management Journal, 35*(3), 505-538.\n\nFiss, P. C., & Zajac, E. J. (2006). The symbolic management of strategic change: sensegiving via framing and decoupling. *Academy of Management Journal, 49*(6), 1173-1193.\n\nFjeldstad, \u00d8. D., Snow, C. C., Miles, R. E., & Lettl, C. (2012). The architecture of collaboration. *Strategic Management Journal, 33*, 734-750.\n\nFleming, L., & Waguespack, D. M. (2007). Brokerage, Boundary Spanning, and Leadership in Open Innovation Communities. *Organization Science, 18*(2), 165-180.\n\nGill, R. (2003). Change management \u2014 or change leadership? *Journal of Change Management, 3*(4), 307-318.\nGinsberg, A., & Abrahamson, E. (1991). Champions of change and strategic shifts: The role of internal and external change advocates. *Journal of Management Studies, 28*(2), 173-190.\n\nGoffee, R., & Scase, R. (1992). Organizational change and the corporate career: the restructuring of managers\u2019 job aspirations. *Human Relations, 45*(4), 363-384.\n\nGurtman, M. B. (1992). Trust, distrust, and interpersonal problems: a circumplex analysis. *Journal of Personality & Social Psychology, 62*, 989-1002.\n\nHackman, J. R., & Oldham, G. R. (1976). Motivation through the design of work - test of a theory. *Organizational Behavior and Human Performance, 16*(2), 250-250.\n\nHars, A., & Ou, S. (2002). Working for free? Motivations for participating in open-source projects. *International Journal of Electronic Commerce, 6*(3), 25\u201339.\n\nHayes, J. (2010). *The theory and practice of change management* (3rd ed.). New York: Palgrave Macmillan.\n\nHerzberg, F. (1959). *The motivation to work*. New York: John Wiley and Sons.\n\nHiggs, M., & Rowland, D. (2011). What Does It Take to Implement Change Successfully? A Study of the Behaviors of Successful Change Leaders. *Journal of Applied Behavioral Science, 47*(3), 309-335.\n\nHon, A. H. Y., Bloom, M., & Crant, J. M. (2011). Overcoming Resistance to Change and Enhancing Creative Performance. *Journal of Management, 40*(3), 919-941.\n\nHongseok, O., Labianca, G., & Myung-Ho, C. (2006). A mulitlevel model of group social capital. *Academy of Management Review, 31*(3), 569-582.\n\nHowell, J. M., & Avolio, B. J. (1993). Transformational leadership, transactional leadership, locus of control, and support for innovation: key predictors of consolidated-business-unit performance (English). *Journal of applied psychology, 78*(6), 891-902.\nHuy, Q. N., Corley, K. G., & Kraatz, M. S. (2014). From Support to Mutiny: Shifting Legitimacy Judgments and Emotional Reactions Impacting the Implementation of Radical Change. *Academy of Management Journal, 57*(6), 1650-1680.\n\nKanter, R. M. (1994). *The change masters*. London: Allen & Unwin.\n\nKanter, R. M., Stein, B., & Jick, T. (1992). *The challenge of organizational change: How companies experience it and leaders guide it*. New York: Free Press.\n\nKesting, P., & Ulh\u00f8i, J. P. (2010). Employee-driven innovation: extending the license to foster innovation. *Management Decision, 48*(1), 65-84.\n\nKirzner, I. M. (1997). Entrepreneurial Discovery and the Competitive Market Process: An Austrian Approach. *Journal of Economic Literature, 35*(1), 60-85.\n\nKotter, J. P. (2007). Leading Change: Why Transformation Efforts Fail. *Harvard Business Review, 85*(1), 96-103.\n\nKotter, J. P. (2012). *Leading change*. Boston, Mass.: Harvard Business Review Press.\n\nLakhani, K. R., & von Hippel, E. (2003). How open source software works: \"free\" user-to-user assistance. *Research Policy, 32*(2003), 923-943.\n\nLakhani, K. R., & Wolf, R. G. (2005). Why hackers do what they do: Understanding motivation efforts in free/open source projects. In S. A. Hissam, B. Fitzgerald, J. Feller & K. R. Lakhani (Eds.), *Perspectives in free and open source software* (pp. 3-21). Cambridge, MA: MIT Press.\n\nLawrence, P. R., & Lorsch, J. W. (1967). *Organization and environment: Managing differentiation and integration*. Cambridge, MA: Harvard University Press.\n\nLee, G. K., & Cole, R. E. (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux Kernel Development. *Organization Science, 14*(6), 633-649.\nLerner, J., & Tirole, J. (2002). Some Simple Economics of Open Source. *The Journal of Industrial Economics, 50*(2), 197-234.\n\nLewin, K. (1951). *Field theory in social science; selected theoretical papers* ([1st ed.). New York,: Harper.\n\nLiebhart, M., & Garcia-Lorenzo, L. (2010). Between planned and emergent change: decision maker\u2019s perceptions of managing change in organisations. *International Journal of Knowledge, Culture and Change Management, 10*(5), 214-225.\n\nLocke, K. (2001). *Grounded theory in management research*. London: Sage Publications.\n\nManz, C. C. (1986). Self-leadership: toward an expanded theory of self-influence processes in organizations. *Academy of Management Review, 11*, 585-600.\n\nMarkus, M. L. (2007). The governance of free/open source software projects: Monolithic, multidimensional, or configurational? *Journal of Management and Governance, 11*(2), 151-163.\n\nMarkus, M. L., & Benjamin, R. I. (1996). Change agentry - The next IS frontier. *MIS Quarterly, 20*(4), 385-407.\n\nMartinez-Torres, M. R., & Diaz-Fernandez, M. C. (2014). Current issues and research trends on open-source software communities. *Technology Analysis & Strategic Management, 26*(1), 55-68.\n\nMcAllister, D. J. (1995). Affect and cognition based trust as foundations for interpersonal cooperation in organizations. *Academy of Management Journal, 38*(1), 24-59.\n\nMehra, A., Smith, B., Dixon, A., & Robertson, B. (2006). Distributed leadership in teams: The network of leadership perceptions and team performance. *Leadership Quarterly, 17*, 232\u2013245.\nMiles, M. B., & Huberman, A. M. (1984). *Qualitative data analysis: A sourcebook of new methods*. Beverly Hills, CA: Sage Publications.\n\nMintzberg, H. (1994). *The rise and fall of strategic planning*. New York (NY): The Free Press.\n\nMintzberg, H., & Waters, J. A. (1985). Of strategies, deliberate and emergent. *Strategic Management Journal, 6*(3), 257-273.\n\nMockus, A., Fielding, R. T., & Herbsleb, J. (2002). Two case studies of open source software development: Apache and Mozilla. *ACM Transactions on Software Engineering and Methodology, 11*(3), 309\u2013346.\n\nMooney, J. D., & Reiley, A. C. (1939). *The principles of organization*. New York: Harper and Brothers.\n\nMoran, J. W., & Brightman, B. K. (2001). Leading organizational change. *Career Development International, 6*(2), 111-118.\n\nMumford, M. D., Scott, G. M., Gaddis, B., & Strange, J. M. (2002). Leading creative people: Orchestrating expertise and relationships. *Leadership Quarterly, 13*(6), 705.\n\nNelson, R. R., & Winter, S. G. (1982). *An evolutionary theory of economic change*. Cambridge, Mass.: Belknap Press of Harvard University Press.\n\nO\u2019Mahony, S., & Ferraro, F. (2007). The emergence of governance in an open source community. *Academy of Management Journal, 50*(5), 1079-1106.\n\nPatton, M. Q. (2002). *Qualitative research and evaluation methods* (3rd ed.). Thousand Oaks, CA: Sage Publications.\n\nPetkova, A. P., Rindova, V. P., & Gupta, A. K. (2013). No news is bad news: sensegiving activities, media attention, and venture capital funding of new technology organizations. *Organization Science, 24*(3), 865-888.\nPowell, W. W. (1990). Neither market nor hierarchy: network forms of organization. *Research in Organizational Behavior, 12*, 295-336.\n\nRyan, R. M., & Deci, E. L. (1985). Intrinsic and extrinsic motivations: Classic definitions and new directions. *Contemporary Educational Psychology, 25*, 54-67.\n\nScacchi, W. (2002). Understanding the requirements for developing open source software systems. *IEE Proceedings--Software, 149*(1), 24-39.\n\nSchumpeter, J. A. (1934). *The theory of economic development; an inquiry into profits, capital, credit, interest, and the business cycle*. Cambridge, Mass.,: Harvard University Press.\n\nScott, W. R. (1981). *Organizations: rational, natural and open systems*. Englewood Cliffs, NJ: Prentice Hall.\n\nSharma, S., Sugumaran, V., & Rajagopalan, B. (2002). A framework for creating hybrid-open source software communities. *Information Systems Journal, 12*, 7-25.\n\nSomech, A. (2006). The Effects of Leadership Style and Team Process on Performance and Innovation in Functionally Heterogeneous Teams. *Journal of Management, 32*(1), 132-157.\n\nStrauss, A., & Corbin, J. (1998). *Basics of qualitative research - techniques and procedures for developing grounded theory* (2nd edition ed.). London: SAGE Publications.\n\nTeece, D. J., Pisano, G., & Shuen, A. (1997). Dynamic capabilities and strategic management. *Strategic Management Journal, 18*(7), 509-533.\n\nTushman, M. L., & Anderson, P. (1986). Technological Discontinuities and Organizational Environments. *Administrative Science Quarterly, 31*(3), 439-466.\n\nTwain, M. (1876). *The adventures of Tom Sawyer*. Toronto: Belford Bros.\n\nUlrich, D. (1997). *Human resource champions*. Cambridge, MA: Harvard University Press.\nVan De Ven, A. H., & Poole, M. S. (1995). Explaining development and change in organizations. *Academy of Management Review, 20*(3), 510-540.\n\nVolberda, H. W., Van Den Bosch, F. A. J., & Mihalache, O. R. (2014). Advancing Management Innovation: Synthesizing Processes, Levels of Analysis, and Change Agents. *Organization Studies, 35*(9), 1245-1264.\n\nvon Hippel, E., & von Krogh, G. (2003). Open Source Software and the 'Private-Collective' Innovation Model: Issues for Organization Science. *Organization Science, 14*(2), 209-223.\n\nVujovic, S., & Ulh\u00f8i, J. P. (2008). Online innovation: the case of open source software development. *European Journal of Innovation Management, 11*(1), 142-156.\n\nWaddell, D., & Sohal, A. S. (1998). Resistance: a constructive tool for change management. *Management Decision, 36*(7/8), 543.\n\nWylie, N., Sturdy, A., & Wright, C. (2014). Change agency in occupational context: lessons for HRM. *Human Resource Management Journal, 24*(1), 95-110.\n\nYates, M. (2000). Developing leaders in a global landscape. In D. J. Giber, L. Carter & M. Goldsmith (Eds.), *Linkage Inc.'s best practices in leadership development handbook: Case studies, instruments, training* (1st ed.). San Francisco, CA: Jossey-Bass/Pfeiffer.\n\nYin, R. K. (1994). *Case study research: design and methods* (2nd ed.). Thousand Oaks, CA: Sage Publications.\nBiographies:\n\nSladjana N\u00f8rskov is an External Lecturer at the Department of Management, Aarhus University. She received her Ph.D. from Aarhus School of Business. Her research interests include organizational development, user-centered innovation processes, community governance, and new organizational forms.\n\nPeter Kesting is an Associate Professor of Management at Aarhus University, Denmark. His research interests primarily concern innovation management, the cognitive and conceptual foundations of routine and decision-making, negotiations, and the life and work of Joseph A. Schumpeter.\n\nJohn Parm Ulh\u00f8i is a Professor of Organization and Management Theory at Aarhus University. His research interests include organisational development, new forms of organising, human and social capital, and innovation and entrepreneurship. Over the years, he has served as TIM-Division Board Member of the Academy of Management and as Editorial Board member of various journals. He has served as member of various International Expert Boards such as, for example, Directorate-General Research, The European Commission; Israel Science Foundation; European Science Foundation; The Belgian Office for Scientific, Technical and Cultural Affairs; The Research Council of Norway.\nFigure 1. The growth of TYPO3 depicted as the number of registered developers, references, and extensions (2003-2005).\\textsuperscript{1} Source: http://typo3.com/\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=\\textwidth]{typo3_growth.png}\n\\caption{The growth of TYPO3 depicted as the number of registered developers, references, and extensions (2003-2005). Source: http://typo3.com/}\n\\end{figure}\n\n\\textsuperscript{1} The graph shows the number of registered developers from 2003 to 2005. Unfortunately, reliable statistics for the ensuing years could not be obtained.\n\nFigure 2. Model of the moderators of change initiatives in OSS communities\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=\\textwidth]{change_initiatives.png}\n\\caption{Model of the moderators of change initiatives in OSS communities}\n\\end{figure}\nTable 1. Topics discussed in the R&D Committee\u2019s mailing list\n\n| Number, # | Governance-related postings | Technical postings | Other | Sum |\n|-----------|-----------------------------|--------------------|-------|-----|\n| 201 | 21 | 13 | 235 |\n| 85.5 | 9.0 | 5.5 | 100 |\n\nTable 2. Data sources\n\n| Data source | Description | Purpose | Time |\n|-------------|-------------|---------|------|\n| Mailing list I | 235 postings from the R&D Committee mailing list | Insight into the contributions and role of each Committee member; an in-depth understanding of the organizational tasks and issues and how they were addressed | 2006 |\n| Mailing list II | 1,088 postings from the HCI Team mailing list | Understanding organizational developments within the HCI (Usability) Team. Related to a particular change initiative. | 2006-2009 |\n| Mailing list III | 1,191 postings (selected for their relevance from a total of 13,587 postings) from the Core Team mailing list | Understanding the interactions between the core and the periphery and how the interactions developed over time. Actions and reactions related to the identified change processes. | 2006-2008 |\n| Interviews | 11 interviews: - 1 interview with the project founder - 1 interview with the community manager - 9 interviews with 9 Core Team members, out of whom 7 were also members of the R&D Committee | Understanding of the community, its history and development, and change in TYPO3. Managing change in TYPO3; follow-up on specific developments and change initiatives. | 2006-2010 |\n| Observation | 18 hours (a two-day R&D Committee face-to-face meeting) | Insight into issues regularly addressed by the R&D Committee. The observations revealed a range of organizational issues and | 2006 |\nArchival documentation\n\nProject description, bylaws, videos of conferences and meetings, summaries of meetings, and news\n\nLearning about the formal regulations and structures of the community.\nCrosschecking some of the facts uncovered during the observation activities and interviews.\n\n2006-2010\n\nTable 3. The four change initiatives\n\n| Change initiative | Components of the change initiative | Rationale behind changes | Change agent | Outcome |\n|-----------------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|------------------|\n| Reorganization of product development | New work processes \nFeedback \nGate keeping \nCloser interactions \nRelease management | Motivate contributors via feedback, gate keeping, and closer interactions, which were expected to act as rewards and retention mechanisms. Release management improved after setting up strict development phases. | Core member | Successfully implemented |\n| Founding of a non-profit organization called the TYPO3 Association | Create a committee structure (similar to a functional structure) | Support core development on a steadier basis. Improve the efficiency of the project by \"providing a central hub from which to support active developers as well as to concentrate its members into a pool of regular contributors.\" | Project founder | Successfully implemented |\n| New team structure | Establishing 'Team Contracts' for each team. Implement a more transparent structure with clear responsibilities, increased team autonomy, elaborate structure. | Ensure responsibility and accountability for each task and role. | Project founder | Unsuccessful |\n| Installing usability as a mindset | Usability as a mindset \nChanging the mindset of developers. \nBringing software developers and designers together. | Create a team that would work to increase the usability of the TYPO3 system. Developers usually lack the user perspective. Designers are needed to create more user-friendly software. | Periphery member | Successfully implemented |\n| Individual initiative | |\n|--------------------------------------------------------------------------------------|------------------------------------------------------------------|\n| Persistence | You need to be extremely enthusiastic and not afraid of setbacks because you will experience many, and it will take a long time to make changes happen. (Interview, core member) |\n| Leading by example (creating credibility and merit in the community to gain followers for the change initiative) | But what didn\u2019t work out is that I couldn\u2019t motivate persons just to follow the guidance of my changes. So I created about, I would say, 200 mock-ups. And about 10 percent have been realized in TYPO3 until today. (Interview, change agent) So you need to prove to them that you have the skills and that you are able to assess their solutions. (Interview, change agent) |\n\n| Reputation and reputation lending | |\n|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------|\n| Endorsement by high-status members to the change agents | I also realized that [change agent\u2019s name]\u2013 who is one of the most active participants in here \u2013 has been continuously working on a lot of TYPO3 HCI Topics: \n- [\u2026] New Installer 2.0 \n- Backend interface improvements for TYPO3 4.2 \n- TemplaVoil\u00e1 2 (together with [name]) \n- Starting to work on Extension Manager 2 (with [name]) \n- And finally, [change agent\u2019s name] is also an active member of the TYPO3.org redesign group (Core Team mailing list, core member) |\n| Redirecting attention and work efforts towards the initiative | > Could you tell us a bit more about this? Maybe in the [developer list]? \nAnswer: Or HCI, that is. Please continue the discussion there. (\u2026) can you re-send your mail in the HCI list please, once you feel like you want to continue the discussion. (Core Team mailing list) |\n| Proactive recognition and support of initiatives by high-status members | It\u2019s more of keeping this big overview and picking the cherries. It is a dynamic system. I never have an idea all of a sudden. (\u2026) It\u2019s mostly about things that are already under way. (Interview, core member) |\n| | You work mostly with the things that are going on and try to find little suggestions or ask someone else: \u201cWhat do you think about this idea, about this project? Do you have anything to add to that?\u201d (\u2026) It\u2019s mostly that there are already ongoing projects. As a community manager I see, okay, this guy is working on it and this guy is working on it, and I try to connect them. (Interview, community manager) |\n\n| Change-oriented communication | |\n|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------|\n| Inform and educate the community about the rationale and arguments behind the initiatives |\n|---|\n| The breakthrough was the presentation for 5.0 with a guy called [name]. After that presentation, the spirit in the community changed because they saw that it is really possible to do this. [\u2026] (Interview, change agent) |\n| I just watched the HCI podcast and was really impressed. Once we get there, we can all be very proud of not only a flexible product but a user-friendly product as well! As an \u2018outsider\u2019 to the HCI team, it produced two random thoughts I would like to share with you. [\u2026] After viewing the presentation I was overwhelmed when thinking about what it would mean to achieve all this. To really get a consistent look and field, it would require rewriting a lot of code and adapting tons of extensions. Some things like the installer might be easier since it is better modularized. But to achieve major changes, I strongly feel that it would be best to focus on the 5.0 development. (HCI mailing list, developer) |\n\n| Motivation through challenging tasks |\n|---|\n| Novel task structure and content |\n| It was exciting for the developers to use a framework that is so powerful, so new, that has so many functions already inside. By just using the framework, we could use a lot of things out of the box that we could never just pluck into the old system. (Interview, core member) |\n| Freedom to work in new ways |\n| So removing everything and replacing them with totally new components for the whole frame and for the page tree, this was really [going] to bring something totally new in there. Our coding was driven by the huge set of features that were there. Every one of us was coding in the past and was in a position of coding extensions for a customer [\u2026] and to create new menu items was never possible in the past [\u2026] So we really at some point had the freedom to drop compatibility and this was quite helpful to go fast forward to say, ok, let\u2019s delete everything and create new. (Interview, core developer) |", "source": "olmocr", "added": "2025-06-24", "created": "2025-06-24", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies/014-norskov.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 57, "total-input-tokens": 107545, "total-output-tokens": 24870, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 0, 1], [0, 1498, 2], [1498, 3366, 3], [3366, 5418, 4], [5418, 7296, 5], [7296, 9480, 6], [9480, 11381, 7], [11381, 13168, 8], [13168, 15185, 9], [15185, 17231, 10], [17231, 19291, 11], [19291, 21082, 12], [21082, 22864, 13], [22864, 24359, 14], [24359, 26145, 15], [26145, 27929, 16], [27929, 29812, 17], [29812, 31489, 18], [31489, 33573, 19], [33573, 35692, 20], [35692, 37744, 21], [37744, 39224, 22], [39224, 41039, 23], [41039, 43120, 24], [43120, 44904, 25], [44904, 46719, 26], [46719, 48740, 27], [48740, 50511, 28], [50511, 52654, 29], [52654, 54411, 30], [54411, 56360, 31], [56360, 58895, 32], [58895, 60781, 33], [60781, 62698, 34], [62698, 64746, 35], [64746, 67011, 36], [67011, 69058, 37], [69058, 71200, 38], [71200, 73186, 39], [73186, 74988, 40], [74988, 76777, 41], [76777, 78457, 42], [78457, 80021, 43], [80021, 81601, 44], [81601, 83205, 45], [83205, 84974, 46], [84974, 86702, 47], [86702, 88259, 48], [88259, 89851, 49], [89851, 91559, 50], [91559, 92913, 51], [92913, 94179, 52], [94179, 95003, 53], [95003, 96846, 54], [96846, 99456, 55], [99456, 102940, 56], [102940, 104975, 57]]}} diff --git a/ocr_studies_text/work_index_list.csv.zstd b/ocr_studies_text/work_index_list.csv.zstd new file mode 100644 index 0000000..3a4f25f Binary files /dev/null and b/ocr_studies_text/work_index_list.csv.zstd differ diff --git a/061925_notes.txt b/snowballing/061925_notes.txt similarity index 100% rename from 061925_notes.txt rename to snowballing/061925_notes.txt diff --git a/studies/014-norskov.md b/studies/014-norskov.md new file mode 100644 index 0000000..ce66d9f --- /dev/null +++ b/studies/014-norskov.md @@ -0,0 +1,548 @@ +Deliberate change without hierarchical influence? + +The case of collaborative OSS communities + +Abstract + +Purpose – Deliberate change is strongly associated with formal structures and top-down influence. Hierarchical configurations have been used to structure processes, overcome resistance and get things done. But is deliberate change also possible without formal structures and hierarchical influence? + +Design/Methodology/Approach – This longitudinal, qualitative study investigates an open-source software (OSS) community named TYPO3. This case exhibits no formal hierarchical attributes. The study is based on mailing lists, interviews, and observations. + +Findings – The study reveals that deliberate change is indeed achievable in a non-hierarchical collaborative OSS community context. However, it presupposes the presence and active involvement of informal change agents. The paper identifies and specifies four key drivers for change agents’ influence. + +Originality/value – The findings contribute to organizational analysis by providing a deeper understanding of the importance of leadership in making deliberate change possible in non-hierarchical settings. It points to the importance of ‘change-by-conviction’, essentially based on voluntary behaviour. This can open the door to reducing the negative side effects of deliberate change also for hierarchical organizations. + +Keywords + +Open-source communities, deliberate change, change agents, change by conviction, hierarchical influence +Introduction + +There is widespread agreement in research as well as in management practice that deliberate change is key for an organisation’s success, if not for its long-term survival (By, 2005; Teece, Pisano, & Shuen, 1997). On the other hand, it is also generally acknowledged that deliberate change challenges organisations and potentially stresses their members. It disturbs existing structures and causes disorder (Schumpeter, 1934), violates the truce of existing routines (Nelson & Winter, 1982), drives people out of their comfort zones, and evokes resistance (Hon, Bloom, & Crant, 2011; Waddell & Sohal, 1998). Therefore, deliberate change is also typically associated with strong leaders and execution power (Kotter, 2007). Thus, there is general agreement that hierarchical influence is particularly needed during the implementation stage in order to get things done and overcome resistance (Somech, 2006). Strong leaders are also needed to promote change in organisations and create a sense of urgency (Higgs & Rowland, 2011; Yates, 2000). + +But what happens if there are only informal leaders with no formal and positional power and organisational members are basically left doing whatever they want? This is exactly the situation for many collaborative communities such as open-source software (OSS) communities. In many of these communities, participation is voluntary, so leaders have only very limited formal power known from hierarchical organizations. How do these communities handle the challenges of deliberate change without formal power successfully? How do they secure efficient and consistent planning procedures? How do they overcome resistance and get things done? Are collaborative communities able to change at all or are they doomed to fail in the long term? Differently put, what does it mean for OSS communities to change deliberately? +Organisational scholars have already shown extensive interest in OSS communities and collaborative communities in general (Martinez-Torres & Diaz-Fernandez, 2014). Key topics of interest include the motivation to participate in and contribute to collaborative communities (Cromie & Ewing, 2009; Hars & Ou, 2002; Lerner & Tirole, 2002), structures and the division of labour (Mockus, Fielding, & Herbsleb, 2002), governance structures and processes in communities (Demil & Lecocq, 2006; Markus, 2007), and coordination and communication mechanisms (Lee & Cole, 2003). While extant research thus provides a detailed picture of how OSS communities work, no studies have yet examined deliberate change in OSS communities. The few studies that address change have found that most change in OSS communities is fluid, tacit, and emergent because task execution is typically dependent on the informal structures and the voluntary contributions of members (Sharma, Sugumaran, & Rajagopalan, 2002). + +The aim of this study is to investigate how deliberate change is accomplished in OSS communities. More specifically, the empirical foundation for this research has been based on a longitudinal single-case study. Data have been collected about one OSS community, called TYPO3, during 2006–2010. We refer to deliberate change as change that is intended and planned. Change is therefore not the residual outcome of a multitude of processes, even though there might be disparities between plans and outcomes (Burnes, 1996, 2009; Kanter, Stein, & Jick, 1992). In our data collection we observed various deliberate change initiatives in TYPO3 at the strategic as well as at the organisational level. The focus of this paper is on one strategic change initiative carried out in order to redirect the project’s focus towards more product usability. Our results show deliberate change is possible in OSS communities and that change agents play an essential role in change processes. We summarise our findings in a model, structuring the success factors of change agents. +Two main contributions are offered. First, our paper advances knowledge about change processes in non-hierarchical structures, such as OSS communities. Because of their increasing relevance for economic activity, it is relevant to know if informal and non-hierarchical organisations allow for executing deliberate change. If this is not possible, such organizations are not likely to become old. Second, and much more important, our investigation of changes in OSS communities gives new insights into how deliberate change in non-hierarchical organisational settings is possible. It shows how organisations can master ‘change by conviction’, i.e., when organisational members are not being forced to change but accept and adapt to change voluntarily. We will discuss how the insights of this study may be used to reduce tensions and frictions of change in traditional business organisations as well. + +**Structure and governance of OSS communities** + +An OSS community consists of individuals who voluntarily contribute to the development of open-source software (Martinez-Torres & Diaz-Fernandez, 2014). Open-source software is freely available to the public under an open license and is based on unrestricted access to source code (Bonaccorsi & Rossi, 2003). Well-known examples of OSS are Linux, Firefox, and Apache (Lakhani & von Hippel, 2003). OSS communities typically demonstrate classic textbook principles of organisations in that they (i) form an entity distinguishable from its environment (Lawrence & Lorsch, 1967), (ii) have specific goals (Etzioni, 1964), (iii) have purposive actions to realise these goals (Mooney & Reiley, 1939), and (iv) are dependent on and affected by the external environment (Scott, 1981). However, at the same time, OSS communities distinguish themselves from traditional business organisations in that they are basically open to anyone to +participate, participation is voluntary, there is a high degree of self-assignment, and they don’t have a physical location like a headquarters. This is enabled by modularization of the software and by distributed activities allowing for rather loosely managed and structured development processes that leave the developers free to choose which tasks to execute (Vujovic & Ulhøi, 2008). Demil and Lecocq (2006) argue open license is indeed a unique contractual framework that has generated a new type of governance structure distinct from the familiar governance modes of hierarchy, network, and market. Although OSS communities differ in terms of structure, size, and formalisation, there appears to be an ‘ideal type ground architecture’ that has been identified for many of these communities. The main characteristics of this architecture also apply to TYPO3. + +OSS communities are often managed through a two-layer task structure, containing a core and a peripheral layer (Lee & Cole, 2003). The core consists of project leaders and maintainers. While leadership in some projects (e.g., Linux) is more centralised and there is one undisputed project leader, in other projects (e.g., Apache) a committee solves particular leadership tasks, such as disagreements and conflicts, through voting or consensus (Lerner & Tirole, 2002). On the one hand, these communities align with the definition of shared leadership—“distributed phenomenon in which there can be several (formally appointed and/or emergent) leaders within a group”—and which generally focuses on the emergence of such leaders (Mehra, Smith, Dixon, & Robertson, 2006, p. 233). On the other hand, investigations of shared leadership stem mainly from the context of organizational teams and emphasize the importance of formal leaders to set the stage for informal leadership roles to arise and create the conditions which will maximize the successful outcome of shared leadership in teams (Denis, Langley & Sergi, 2012). This stands in contrast to OSS communities, which are not based on formal leadership in the traditional sense. Such leadership is in fact not required for informal leaders to emerge in OSS communities. +In OSS, informal leadership positions emerge through reputational gains based on “technical acumen and managerial skill” (Fleming & Waguespack, 2007, p. 165). In addition, trust is a requirement for leaders to be selected by the community (O’Mahony & Ferraro, 2007). Usually, the founders count on the project leaders having earned credibility to act as leaders by contributing the initial source code and demonstrating their expertise. Project leaders typically act as visionaries, providing recommendations, work tasks, milestones, etc., to the community. Another important leadership task is to attract new members by posing challenging programming problems for potential contributors (Lerner & Tirole, 2002, p. 220). The nature of leadership in OSS communities changes as communities grow and mature (O’Mahony & Ferraro, 2007). Over time, project leaders will perform less technical tasks, such as programming, and more organisational building tasks (ibid.). The periphery of an OSS community is often structured by the development and bug-fixing team (Lee & Cole, 2003). Members of the periphery are more loosely connected with the community. Task assignment here is mostly completely voluntary (ibid.). + +Participation in OSS communities is driven by intrinsic (e.g., fun and enjoyment) and extrinsic (e.g., peer recognition, signalling of skills for career benefits) rewards (Lerner & Tirole, 2002). Lakhani and von Hippel (2003, p. 923) emphasize three motivations for participation in OSS communities: need-driven participation (e.g., the need for software), enjoyment-driven participation, and reputation enhancement. Reputation is a low-ranking incentive to join and contribute to an OSS community (ibid.). However, once a reputation is achieved, the member’s desire to maintain his or her reputation encourages the member to continue to provide quality contributions (Sharma et al., 2002). +This structure is supported by a number of governance mechanisms that help direct, control, and coordinate individual efforts in OSS communities (Markus, 2007). These mechanisms include the self-assignment of tasks (Crowston, Li, Wei, Eseryel, & Howison, 2007), peer review (Lee & Cole, 2003), bug reporting, voting procedures, and the process of determining software requirements (Scacchi, 2002). Collaboration is enabled through software platforms, which provide infrastructure for sharing solutions, asking for help, etc. Services and tools, such as mailing lists, discussion forums, archives, and blogs, are the key infrastructures that enable communication and collaboration in OSS communities (Fjeldstad, Snow, Miles, & Lettl, 2012; O'Mahony & Ferraro, 2007). + +To sum up, OSS communities have well-developed structures, resembling project structures in traditional business organisations. They also have leaders involved in organising and structuring processes. The major difference is that such leaders have no formal authority and thus no execution power. Participation in OSS communities is voluntary, and tasks are self-assigned. Leaders cannot therefore exert hierarchical influence but can only lead based on expertise, persuasion power, and reputation among peers. The literature has called this type of influence informal leadership (De Souza & Klein, 1995; Hongseok, Labianca, & Myung-Ho, 2006). Lakhani and von Hippel (2003, p. 923) found the informal leaders of OSS communities are capable of organising the “mundane but necessary” tasks in the day-to-day business. But are they also capable of mastering the challenges of change that are already difficult to master in formal companies and for which leadership and power are needed? + +Deliberate change in organisations +Like in other organisations, in OSS communities change concerns the “organisation’s direction, structure, and capabilities” (Moran & Brightman, 2001). In this sense, there is nothing unusual about the basic nature and substance of change in OSS communities. It resembles the basic structure and demands of other organisational change processes. + +Many researchers have emphasised the process character of organisational change (Bullock & Batten, 1985; Hayes, 2010; Lewin, 1951). Van de Ven and Poole (1995) identified 20 models that structure change processes in different ways. However, the vast majority of these models identify three key tasks with which deliberate change processes have to deal. First, the need for change has to be recognised and the change process initiated (Kirzner, 1997). This need typically results from opportunities or threats that can be addressed by change. Further, the change initiative has to be put on the organisation’s agenda in order to secure action is taken (Kotter, 2012). Organisational change on the strategic level is a genuine management task. The recognition of change needs might come from ‘ordinary’ employees, but it is the exclusive right of the management to acknowledge these initiatives and put them on the agenda (Kesting & Ulhøi, 2010), at least in traditional business organisations. The main rationale behind such a governance structure is to secure consistency—between the different initiatives and organisational activities but also with shareholder and stakeholder interests. + +Second, deliberate change tends to be based on some planning and decision-making activities (By, 2005). Goals have to be defined and information has to be acquired and analysed. The results of this process are management decisions and documents like road maps or business plans. In traditional business organisations, leaders have to drive and structure this process by creating a sense of urgency, involving organisational members and keeping track of the process (Kotter, 2012). +A distinction between deliberate and emergent change is acknowledged both in the strategy literature (Mintzberg & Waters, 1985) and in the change management literature (Liebhart & Garcia-Lorenzo, 2010). Other aspects like contingency and choice have also been included in this discussion. The review of By (2005) shows how complex, heterogeneous, and inconsistent this distinction is. In this paper we do not intend to contribute to this discussion. For the argumentation of this paper, it is sufficient to specify the substance of deliberate change by the two above attributes: purpose and reason. In our understanding, deliberate change neither implies that everything goes according to plan nor that goals are realised exactly in the planned way. As Dunphy and Stace (1993) argue, organizational change takes place in a dynamic environment and organizations have to adapt their plans accordingly. Against this background, we posit that deliberate change does not rule out the emergent element. Rather, it implies change is grounded in the intention to change. This view corresponds to Mintzberg’s (1994) view of change as an element of the strategy process. In contrast, change is (completely) emergent if it is simply the accumulated result of a series of unrelated decisions and events that have no change or strategic perspective. + +Third, change has to be executed and decisions implemented. This means organisation members have to make an effort to bring about the change. Also, routines have to be altered in order to adapt to change. The literature on conflict and resistance caused by change (del Val, 2003; Huy, Corley, & Kraatz, 2014) emphasises leadership and execution power as particularly necessary to get things done and overcome resistance and resolve conflicts. + +Leadership power is thus required for all three tasks, most of all, however, for the implementation. Change often burdens organisations and stresses people. Leadership power is needed to change behaviour and overcome resistance. Traditional business organisations +therefore often rely on a top-down implementation of planned change (Howell & Avolio, 1993). Leadership vision is needed to motivate organisational members. + +But how can these challenges be handled by informal leaders? How can resistance be overcome without the use of any formal power? How does the governance structure of OSS communities handle deliberate organisational change? Currently, there is no research addressing these questions systematically. However, there is one concept of change leadership that offers some theoretical grounding for an answer that will also be important for the analysis of this article: the concept of the change agent. + +Based on Caldwell’s findings (2003), we define change agents as individuals who initiate, direct, manage, and/or implement specific change initiatives. Like many other concepts, the concept of change agents is also used heterogeneously (Wylie, Sturdy, & Wright, 2014) and there are closely related concepts like (product) champions in the literature (Ginsberg & Abrahamson, 1991). The key point for our study is that change agents are individuals that drive change initiatives, i.e., create momentum and ensure decisions are made and actions are taken. In doing so, change agents can assume complex sensemaking (Brown, Colville, & Pye, 2015) and sensegiving (Petkova, Rindova, & Gupta, 2013) roles that can be essential to attract collective attention and gain legitimacy for their change initiatives. Change agents do not have to be assigned leaders with formal given responsibilities. They can even be outsiders like consultants (Volberda, Van Den Bosch, & Mihalache, 2014). However, in traditional business organisations they have to be authorised and supported by formal leaders. Therefore, the activity of change agents is also based on hierarchical influence, even though mostly indirectly. While change agents thus might not have the power to order change, the supporting formal leaders do possess such power. In this case, sensegiving, i.e. “the processes by which strategic change is framed and +disseminated to an organization’s constituents” (Fiss & Zajac, 2006, p. 1173) can be particularly relevant for change agents to attract management attention and promote initiatives. + +As outlined above, deliberate change cannot be decided and enforced by management in OSS communities like in traditional business organisations. Even when initiatives come from the core, they have to be based on initiative and promoted in the community. Here, sensegiving may be particularly relevant for change agents as a way to attract the attention of the community and/or even attract media attention in order to promote change initiatives. Sensegiving can support positions in the “symbolic struggles over the purpose and direction of an organization” (Fiss & Zajac, 2006, p. 1173). When coming from the periphery, it requires even more initiative to change an OSS community deliberately. Therefore, it can be expected that change agents play an important role here. However, conditions are fundamentally different because in OSS communities there is no management support or hierarchical influence upon which to draw. So, how can change agents realise change initiatives here? + +Methods + +Two main criteria guided the selection of our focal case. First, our case had to be a representative example of an OSS community. Second, the community had to be a mature case that had already established and formalised work procedures, guidelines, and rules. Studying change in a developed, growing community would hold promises for providing an intensive and rich case that would “manifest the phenomenon of interest intensely (but not extremely)” because extreme cases may distort the manifestation of the phenomenon (Patton, 2002, p. 234). Accordingly, we selected an OSS community named TYPO3 for this study. +In line with the research objective, we first identified deliberate changes at their various stages. Then, we followed the process underlying those changes before tracing the mechanisms used to address the changes. The unit of analysis is the community, i.e., the focus is on the intraorganisational level. + +**Study setting** + +TYPO3 has been public since 2000. At the time of the study, this community was experiencing continuous growth (see Figure 1). The TYPO3 system is an enterprise-class content management system (CMS) offering out-of-the-box operation with standard modules (http://typo3.org/). The system is aimed at two different groups: (i) authors and (ii) administrators and content managers. TYPO3’s core team members play a central role in the community because they contribute most of the source code and manage the design and development of the project on a voluntary basis. When the study started, approximately half of the core team members (i.e., nine individuals) comprised the project’s R&D committee, the members of which also belonged to the project’s other teams and working groups. Moreover, the members of this committee could be described as the project’s central coordination body, as their responsibilities included (i) supervising and coordinating the development of the software; (ii) providing knowledge, contacts, and financial support; and (iii) supervising and supporting the community-driven teams. We chose the committee as a point of departure for the study because of these responsibilities. With 85.5% of their discussions focussing on governance issues (Table 1), the relevance of the R&D committee members as informants was undeniable. In addition to interviewing seven R&D committee members, two core team members were interviewed because +they were directly involved with specific organisational changes before joining the core team (i.e., when they still belonged only to the community’s periphery). As the study unfolded, hundreds of other informants pertaining to the community’s periphery became involved through observations of relevant mailing lists on the TYPO3 website (Table 2). + +--- Table 1 --- + +--- Figure 1 --- + +Starting in the year 2003, TYPO3 began to grow fast, and the number of registered developers doubled each year from 2003 to 2005. This continuous growth trend set the stage for the community changes that are the focus of this study. The time lag between the growth registered from 2003 to 2005 (Figure 1) and the start of the data collection process in 2006 was necessary to see how the community would respond to this growth. + +Data sources + +Multiple sources of data (Table 2) were employed to strengthen the design of the study and to capture the complexities of the case in question. These data sources allowed us to triangulate the data and validate the theoretical constructs. The data were collected on several occasions between 2006 and 2010. When the study began, TYPO3 was addressing organisational issues that had surfaced because of the growing size of the community. However, we soon discovered TYPO3 had experienced other organisational challenges in the past. Therefore, learning about the project’s history and its prior development was just as important as illuminating its current development. +We collected our data through interviews, observations of face-to-face R&D committee meetings, three relevant community mailing lists, and archival data. An introductory interview with the project founder, who also acted as the project leader from 2000 to 2007, provided a deeper understanding of the community, its history, its development up to that point, its structure, its internal work processes, its products, and its current and future strategies. The rest of the interviews with the community manager of the TYPO3 Association, the R&D committee, and core team members—some of whom had only recently made a move from the periphery to the core of the community—were focussed on managing deliberate changes in TYPO3. The interviews addressed the following main themes: (i) change initiatives; (ii) activities, roles, and practices related to the identified change initiatives; (iii) motivation; and (iv) background. The same interview guide was used throughout the process, but as new relevant information emerged about specific community changes, additional questions were incorporated into the following interviews. The interviews, which lasted about 60 minutes on average, were recorded and transcribed. + +Furthermore, over a two-day period in 2006, more than 18 hours were spent observing face-to-face meetings among R&D committee members. This method yielded insights into a range of organisational issues related to the community’s development and the background for the deliberate change initiatives. + +A review of 235 posts from the R&D committee mailing list gave access to the content and type of discussions, the contributions and roles of various individuals, and work coordination and delegation. In particular, this source of information allowed us to obtain a deeper +understanding of the organisational challenges facing the community during that time period and how those challenges were resolved. + +The interviews, the observations of the R&D committee’s meetings, and the R&D committee mailing list together led to the uncovering of a number of change processes in the TYPO3 community. Additional relevant mailing list data (namely, the human-computer interaction (HCI) team’s mailing list and the core team’s mailing list) were included in the data collection. Using archival data allowed us to cross-check some of the facts uncovered during the observation activities and interviews. + +Data analysis + +Since we were interested in both *if*, *how* and *why* deliberate changes are possible in a specific context, a case study design was deemed appropriate. More specifically, when studying contemporary activities and/or events over which the researcher has no (or very limited) control, a case study research is the obvious choice (Yin, 1994). Qualitative techniques were used to analyse the data (Eisenhardt, 1989; Miles & Huberman, 1984; Strauss & Corbin, 1998). Overall, the analysis focussed on organisational practices, change, and structuring while paying specific attention to grounded concepts and proceeded in three steps. First, we constructed case studies (Eisenhardt 1989) for each identified organisational change initiative. We focussed on major change initiatives that affected the entire community. At the time of the study, four change initiatives were ongoing: (i) reorganisation of product development, (ii) establishment of a non-profit organisation called the TYPO3 Association (a central hub from which to support active developers), (iii) installation of usability as a mindset (thus replacing the strong technical mindset... +in the community), and (iv) restructuring of the entire community to create more efficiency through a more transparent structure with clear responsibilities and increased team autonomy. Although the general character of three of the initiatives was structural and one of them was cultural (the usability initiative), all of the changes involved changes in both structures and practices. + +Second, we divided the coding process into open, axial, and selective coding and employed a constant comparative method within each coding phase to identify the concepts and relationships relevant to each type of change (Locke, 2001; Strauss & Corbin, 1998). Third, a cross-case analysis (Eisenhardt, 1989; Miles & Huberman, 1984) was used to identify any similarities and differences across the three change types. This process was repeated several times. Each time, the resulting conceptual insights were refined and further developed. The analysis generated four core categories that represent the mechanisms employed by TYPO3 to address deliberate changes (Table 4). + +The interviews, the observations from the R&D committee meetings, and the data from the three mailing lists enabled us to determine precisely the timing and order of deliberate changes and their intended effects. The same data sources were used to trace the unintended, emergent effects of the identified deliberate changes. However, the three mailing lists, which documented the reactions (or lack of reactions) of the entire community, played a central role. The interviews played a central role in establishing the timeline for the parts of the change processes (e.g., decision making) that took place offline. The preliminary findings were presented and discussed with the project leader and two core team members, who provided valuable comments that confirmed and elaborated upon the uncovered theoretical constructs. +Findings + +We observed multiple change initiatives in the community, some of them successful, some less successful. The most significant of these are summarised in Table 3. Change agents played a decisive role in all key tasks of the observed change management processes: recognition, decision making, and implementation. In the observed initiatives, all but one change agent originated from the community’s core. One reason for the prevalence of the core member change agents might be the fact that the identified initiatives were major and, as such, expected to have a wide-scale effect on the community. + +--- Table 3 --- + +Below, we sketch the four change initiatives (Table 3) by elaborating (i) the aims of each initiative, (ii) what made them deliberate, (iii) specifying the change agents, and (iv) whether the implementation was successful. + +The first change initiative “Reorganization of product development” was launched because the product development process was inefficient. It was characterized by a lack of release discussions between the core and the community, the community’s failure to test enough different software versions, failure to read existing instructions about different project contributions (i.e. release management procedures, testing instructions), and poor planning of subprojects (e.g. too many postponements, unrealistic deadlines). For its part, the Core Team did not have the capacity to respond to all of the inquiries, project proposals and general input. A meeting was arranged where potential solutions were discussed, demonstrating explicit intent to plan and execute the needed change. A Core Team and R&D Committee member, who was in +charge of the software release process at that time proposed a solution, which was subsequently adopted. Release management was consequently improved by introducing a rotating release manager function in July 2007. During this change process the R&D Committee’s tasks were taken over by the Core Team and one hierarchical layer got removed. This created more flexibility and readiness for the Core Team, and easier access for new contributions. Additionally, the core development mailing list was opened and created a direct communication channel between the core and periphery. The activity level increased drastically on the mailing list and this initiative more than doubled the amount of incoming patches to the core list and thus freed the Core Team members to also be able to pursue larger projects to a much higher extent than before. The initiative was thus successfully implemented. + +The second change initiative “Founding of a non-profit organization called the TYPO3 Association” intended to create a committee structure, which resembled a functional organizational structure. It consisted in establishing a non-profit organization called the TYPO3 Association and was initiated by the project founder. This complex task demanded deliberate action and took many discussions, especially during the Core Team meetings and TYPO3 conferences. The main goals of the Association were to support core development on a steadier basis and improve the efficiency of the project by “providing a central hub from which to support active developers as well as to concentrate its members into a pool of regular contributors” (mailing list). The TYPO3 Association was meant to support core development by providing funds to take care of the development that was not taken care of by the commercial interests. One way was through donations, i.e. individuals who earn their income (or part of it) by using this open source software choose to give some of this income back to the community in form of donations. Another way was membership, i.e. firms and individuals could become members of +the Association by paying an annual fee, which was used to sponsor software development in TYPO3. Furthermore, the Association was able to create transparency regarding decision-making, roles and activities. The change initiative was thus successfully implemented and the Association created a period of growth under goal-oriented and integrative leadership of the board whose chairman was the project leader. + +The third change initiative “New team structure” was a deliberate and direct response to the rapid community growth. The project founder was the change agent behind this initiative that sought to make particular responsibilities and tasks explicit in order to create more transparency in project activities (and not only at the upper echelons of the Association). At the team level, therefore, it was determined that the following should apply to team leaders’ tasks: (i) leaders are solely responsible for the team; (ii) members are appointed/accepted by the leader; (iii) decisions are made by the leader (however, agreement is sought with the team members as far as possible); (iv) delegation of tasks is encouraged; and (v) a minimum timeframe is set for the leader’s response to team members’ requests. By defining responsibilities, the community attempted to introduce a measure of accountability in team performance, which was considered vital in this virtual context due to the voluntary nature of participation. To formalize responsibilities and tasks, the project founder thus introduced “team contracts”. These contracts served the purpose of creating synergy between the already existing teams through elaboration of a written mission statement, which, as a minimum, contained the following team information: the team’s position in the organizational structure (i.e. to which committee or project does the team belong?), a description of the team’s mission, a specification of the team’s responsibilities, the name of the team leader, and the rules for becoming a team member. Although these contracts were introduced, tasks were still taken on by self-assignment. The motive underlying the team +contracts was to define two aspects: responsibility and authority. However, team contracts never really gained momentum and attempts at introducing formal authority at the team level did not succeed either. The initiative failed because the attempted structure left too few degrees of freedom to the project contributors. The type of executed authority resembled that of hierarchy (Demil & Lecocq, 2006; Powell, 1990) and unintentionally led to authority erosion. This accentuated the need for more autonomy with regard to following one’s own “personal itch”. + +Finally, the aim of fourth initiative “Installing usability as a mindset” was to redirect the project’s focus towards product usability. At the time, the project’s focus was almost entirely technical in nature, which limited the product’s appeal to those customer segments with low technical skills, e.g., a secretary who edits the content on a company website: “A lot of OSS is created by technicians for technicians. […] And then there are those [users] who use [the software] every third week. They don’t demand that many functions; they demand that they don’t need to remember how [the software] works because they are only using it every third week” (interview, project founder). + +The wish to introduce a greater degree of product usability was put forward by a newcomer to the TYPO3 community in 2001. This newcomer, i.e. a periphery member of the community, became the change agent, who made an explicit decision to launch a process of change, making this initiative a case of deliberate change. He was a software designer by profession and realized the need for TYPO3 to improve its design. The idea remained in the background until 2006, when the project leader established the human-computer interaction (HCI) team and an appertaining mailing list, which was intended to act as “the melting pot for ideas about usability improvements” (the HCI team mailing list). However, the progress was slow. A breakthrough first came about when the change agent started making a more focussed +effort to implement the usability idea. In the end, the change initiative was successfully implemented. + +While our findings are based on the analysis of all the observed initiatives in the community, we selected the fourth initiative “Installing usability as a mindset” as a representative initiative to illustrate the general traits of the organisational change mechanisms that drove the success of the change initiatives. By focusing the presentation of the study’s results on one particular change initiative our intention was to promote clarity and comprehensibility of the findings. + +In the following, we present our findings, which consist of the four mechanisms that our analysis revealed as central drivers of successful, deliberate change management in the community (Table 4). + +--- Table 4 --- + +**Individual initiative** + +Our data first of all reveal the community cannot be expected to embrace a change initiative—regardless of its inherent value to the community—unless there is a persistent change agent who will bring the initiative from the point of inception to successful implementation. This is a direct consequence of the absence of formal power and hierarchical influence in OSS communities. Since community members cannot be ordered to do something, they have to be persuaded to become active. The change agent of the HCI project expressed the difficulties in doing so by saying, “You can find developers that are interested in [design] topics, but you don’t +really get very far. And that’s what we experienced with the HCI team…a lot’ (interview, change agent). + +Even if a change agent has the right idea and engages with the right community members, this is not enough to set the change in motion. As a consequence, the change agent persevered for four years before the concept of usability penetrated the prevailing mindset and culture of the community. Persistence involves a high dose of patience, primarily because the community also needs time to adapt to organisational changes. This need was pointed out by one core member of TYPO3: “There is a gap between the design of the organisation and letting the organisation accumulate around the design…giving time to people to flock to the teams” (R&D committee meeting, core member). + +We found clear indications that it is less about organisational planning and decision making and more about individual effort and achievement that motivate community members to contribute to a change initiative. + +Do decisions matter in OS [communities]? No. The one thing that matters is what is actually done. Post factum situation. By doing things, people make decisions. If we make a decision, it doesn’t mean that people will be motivated to implement it, work by it. The only thing that matters is action. Consult people, hook them up with knowledge and resources, and hope that they do what you would like, what you expect… we should think of ourselves as service providers. (R&D committee meeting, core member) + +This was one of the key statements of our investigation, outlining the structure of an individual initiative as clearly as possible. This view was also supported by a project founder of TYPO3 with the short statement, “First you have to do things yourself, and then others will follow” (interview, project founder). +Before taking action, the change agent of the HCI project reflected upon what motivated him and other developers to do work for the project leader. He found a key driver was the project leader’s “front guy and guru status” and the fact that “he usually keeps his promises and is able to do huge workloads” (interview, change agent). Based on this insight the change agent tried to motivate others to participate in the HCI team: “I tried to find guys who were motivated by my work and then do work for me” (interview, change agent). The success of this approach was evident already in 2007 when the change agent became the HCI team leader. This success was also recognised by other community members: + +Someone from the usability mailing list comes up with a nifty and good-looking screenshot and proposes his usability changes to the core developers. They are fascinated and go implement it because it seems like a really great idea to them. Especially [the change agent] has been very successful with this way of getting his suggestions implemented, and now he’s the HCI team leader. (interview, core member) + +And even: + +I don’t know how many have seen the PDF [the change agent] produced, but I saw it and also met him in Frankfurt before the PHP conference ([core team member name] and I joined a meeting of him and [project leader])—and there is hard and impressive work being done. (core team mailing list, core member) + +In the end we found the role of change agents in communities is similar to that of product champions who experience progress over time only through persistent and enthusiastic effort (Tushman & Anderson, 1986). Persistence and leading by example are traits that define a change agent’s degree of individual initiative. Persistent change agents who are able to self-motivate and self-direct their performance, i.e., to exercise self-leadership (Manz, 1986), are an essential part of any organisational change initiative in OSS communities because it takes a great deal of time and persuasion to garner acceptance and support for any organisational change. +A change agent demonstrating high levels of commitment (personal motivation and skills) may develop mutual, cognitive-based trust, which, in turn, may strengthen the community members’ readiness to engage and collaborate (Chowdhury, 2005; McAllister, 1995). Thus, we put forward the following proposition, which is grounded in the above and similar behaviours observed in the other three change initiatives (Table 3): + +**Proposition 1:** The individual initiative of change agents is positively related to a successful implementation of deliberate organisational change initiatives in communities. + +**Reputation and reputation lending** + +Power struggles were visible during the change process for each initiative. For instance, during the observed R&D committee meeting, one member left the room because he was frustrated the rest of the group did not support his views. He was arguing against an excessively predetermined team structure, which was about to be implemented. However, he lost the debate because he was arguing against the stance of the change agent responsible for the particular change initiative, who had a higher status within the community. It was later revealed the opposing member was actually right and the team structure was, in fact, too prescriptive. This example shows how difficult it is to accomplish anything without the support of community members with higher social statuses. This difficulty exists even when the difference in social status between the change agent and the supporting high-status member is rather low (e.g., when both were members of the core team). + +We find that, by lending their reputations to lower-status members, high-status members can share their influence. This was clearly recognised by the project founder: “And then, it is +clear that for those individuals who have that kind of naturally given power, as I for example have, it is natural that other individuals whom we appoint and those close to us easily gain influence” (interview, project founder). + +In situations where a change agent has a rather lower status in the community, as was the case in the early days of the HCI team, the change agent can gain influence by teaming up with one or more community members who enjoy a high-status reputation. + +In the case of the HCI team, the change agent “did a lot of work for [the project founder]” to establish himself as a worthy community member. Eventually, he was invited to a TYPO3 Board meeting to discuss usability issues: “With [the project founder] at [the ] T3 Board we talked about why Drupal is easier than TYPO3 or why WordPress is easier than TYPO3”. By linking to high-status members in this way, the change agent gained respect and support from the high-status core members. They addressed the change agent in complimentary terms and praised his work: “As the usability guru, please give me your feedback on the description of the two mentioned features in the page tree below…” (core team mailing list, core member). + +But after he was appointed HCI team leader, it was evident the he had not yet gained the same respect from other members, as they were systematically circumventing the HCI team and instead discussed the usability issues on the core team’s mailing list. An effort was made to redirect the attention towards the HCI team, in particular towards the role of the change agent, endorsing him and building his authority. Some examples of that include: + +[By the way], this is [user interface] change, so it can be committed only if you get approval from [the change agent]. (core team mailing list, core member) +I agree with all this but we do not have anyone else properly educated in these questions. I do not trust anyone else in [the] HCI field for TYPO3 because no one showed good HCI skills so far. [The change agent] is the only one who did. (core team mailing list, core member) + +You might also have watched the podcast issue [2] where [the change agent] demonstrates some great ideas about usability improvements in TYPO3 or have seen the PDF [3]. (core team mailing list, core member) + +In the subsequent period the activity levels in the HCI team increased significantly. However, there seemed to be no obvious relationship between the content of the change initiatives and the skills of the high-status members supporting the initiatives. This finding implies a potential spillover effect between reputations rooted in technical contributions and reputations rooted in organisational contributions. + +There were also instances when high-status members (e.g., project and team leaders, core team members, and other respected members) met the change agents halfway. Our data show the leaders in TYPO3 work with the community’s initiatives through a process of mutual adjustments. The leaders notice promising initiatives, assess them, and try to provide them with the necessary resources: + +I tried to motivate him to build a team around that. I just noticed him. In this way, I try to enable people to work. It’s a bit intuitive also. I [have been] working already for ten years on this system, so the foundation for something like this was probably already laid a couple of years back. (interview, community manager) + +This type of leadership emphasises intuition and alertness. The main task consists of providing support for change initiatives in the form of knowledge and resources without making decisions on behalf of the community members. Rather, the leaders establish the infrastructure and framework that will hopefully assist the community change agents in paving the way for the intended improvements and changes. +High-status members lend their lateral authority and reputation to a change agent by providing any type of visible support, even if it is only verbal in nature. One reason this method works is that high-status members’ support provides the change agent with credibility, which is crucial if the initiative is to stand a chance of being implemented (Markus & Benjamin, 1996). This finding further suggests community leadership is shared via reputation lending, which also facilitates organisational changes in communities. Therefore, based on the above and similar behaviours observed in the other three initiatives (Table 3) we make the following prediction: + +Proposition 2: Reputation lending (from high status to lower status members) is positively related to a successful implementation of deliberate organisational change initiatives in communities. + +Change-oriented communication + +We found communication about change initiatives was essential to their successful implementation. Through meetings and presentations to small and large target audiences at various community events, change agents in TYPO3 communicated the rationales and arguments behind the initiatives. Still, it took the change agent behind the HCI initiative a long time to realise communicating the idea about usability was vital to its success. The change agent attracted support for the usability initiative by communicating (in a change-oriented fashion) the basic ideas behind the concept in several rounds of presentations to the developer community: “This is why [the project founder] and I decided that maybe we just need to find out how we can change that point of view to guide developers in a different direction—so a typical marketing and communication thing” (interview, change agent). +From 2007 to 2008, the change agent tried to motivate the community by communicating the relevance of usability to TYPO3 through presentations at the community’s main yearly events. + +The first presentation was just about usability flaws, ten major usability flaws […] at the Developer Days in 2007. Then, in 2008, at T3Con, I held a presentation about what can be done in a positive way with usability, solutions and future interfaces like, for example, the interfaces in “Minority Report” […]. If I look back, that was the second phase to motivate [people], saying, “Look, that’s possible if we work together”, and “Wouldn’t it be fun to have some amazing interfaces in there?” (interview, change agent) + +In all observed projects, the presentations helped change agents to gain the community’s trust in them and their capabilities. + +After I showed them [through presentations] that it could really get done, they kind of trusted in the words I said. Because usually it’s a very inner circle, only developers with developers, so they could trust each other. They have the same language. But now, there comes this strange design guy and he says, “You are doing everything wrong; you have to change everything, and you don’t even have the knowledge to understand what you are doing wrong.” That doesn’t really end in trust. (interview, change agent) + +In addition to establishing the trustworthiness of the change agent (Gurtman, 1992), the change-oriented communication process in TYPO3 also helped stimulate the community members to participate because the process also aimed to educate the target audience about the attempted changes. The community developers were the target: “Then through the Usability Week, we started, in some way, to educate [people]” (interview, change agent). + +This facilitation of community participation resembles a particular dimension of shared leadership, called voice, which is known to increase a person’s social influence among the members of a community (Carson, Tesluk, & Marrone, 2007). During the change initiatives, which had a successful outcome, the change agents excelled at initiating and facilitating +constructive, change-oriented dialogue and debates around how the community should achieve the needed changes. Thus, voice boosted the change agents’ level of social influence by increasing immersion and participation through various means, such as opening the core team’s mailing list (under a set of rules) to the rest of the community, implementing rotating release managers, presenting ideas at community events, and establishing Usability Week. Voice in the form of change-oriented communication may be associated with successful change implementations because voice is based on interpersonal events that promote communication and feedback, which, according to Ryan and Deci (1985), catalyse feelings of competence and thereby stimulate intrinsic motivation. Based on the above and on similar behaviours exhibited in the other three initiatives (Table 3), we make the following prediction: + +**Proposition 3:** Change-oriented communication is positively related to a successful implementation of deliberate organisational change initiatives in communities. + +**Motivation through challenging tasks** + +Because of the self-assignment principle (Crowston et al., 2007), one of the major challenges in open-source communities is motivating developers to work on tasks that are uninteresting but necessary to complete (Lakhani & von Hippel, 2003). We can see this problem extends to organisational change initiatives. This was also recognised by the change agent of the HCI project, “[…] usability topics are not really challenging for developers usually. It’s about removing staff, making staff simple, and that’s usually not the challenge for developers. It’s a challenge for me as designer” (interview, change agent). The resulting challenge was put more +generally by one member of the core team: “We were uncertain how to get people to do some of the more boring and time-consuming, but essential, tasks” (interview, core team member). + +Working with usability demanded the developers overcome three fundamental tasks. First, the developers needed to become motivated to work on usability issues. Second, the TYPO3 community had to attract skilled software designers who possessed the necessary knowledge regarding usability. Third, the change agent had to find a way to stimulate the developers to follow the designers’ recommendations. + +To motivate developers to work on usability issues, the change agent came up with the idea to create “fake challenges […] to motivate them to finish the goals” (interview, change agent). His approach was based on the idea that developers would be more willing to work on their tasks if they perceived them to be challenging. + +After a while I came up with the idea to have a ‘Usability Week’. The concept was pretty simple. I rented a castle for one week, and I locked 30 developers in that castle, and they had a certain task they needed to solve within that one week. So, the challenge was there in some way because they needed to solve the problem in one week, which is kind of tough because the problems I took [on] were too huge to solve in one week. So, there was a challenge even if the task was simple because they had time pressure. (interview, change agent) + +During Usability Week, five mixed teams were created. Each team consisted of three developers, one core developer, one manager, and one designer. Each day of the event three meetings took place. The meetings were designed to streamline the tasks and motivate the teams. + +To attract designers to the TYPO3 community and the usability project, the change agent used a different set of tools. He created an entrance barrier that the designers needed to overcome before they could join the community. +My major wish through that Usability Week wasn’t to solve those tasks but to find more designers who [were] able and motivated to join the TYPO3 community. My idea to make it more interesting to them was, again, to make it a little bit more complicated because they had to apply to the Usability Week. So, we had about 60 or 70 applications and only 30 places. In the end, only five designers out of 50 could join, and they were somehow charmed because they could attend and others couldn’t. It really worked out and they really stuck to the project and until today [are] doing some design work. (interview, change agent) + +Finally, to motivate the developers, the change agent needed to make the tasks related to usability issues more challenging. He achieved this by incorporating (i) novel task structure and content and (ii) freedom to execute the tasks in a different way than usual into simple problems. By doing so, the change agent successfully motivated the developers to solve those problems. + +For example, to structure a website we have something called a ‘page tree’, which looks like the tree in Explorer on your Windows machine, and that’s kind of very old style, how it is done […]. However, there is a framework called XJS, written in Java Script, and that is interesting for developers because it’s a new technology in some way and a new framework, and it’s hard to implement, and they need to change a lot. So, I decided that they should use XJS for that page tree, even if we don’t need it, but then I would be sure that in the end I would have the page tree I wished to have and they would have a challenging task to actually do it instead of writing some lines by themselves to change [the page tree]. (interview, change agent) + +We really had the freedom to totally change the core… Actually, the way […] we worked… we [were] taking the beta version of 3.9 back in time, and we just coded anything we liked inside the core. Usually, someone who creates an extension is [told] “never touch any core file”, [but here] we could really go deeply inside and delete files, replace files totally, and we did not have to focus on keeping [it] compatible with the old code and being compatible with the old […] extensions. (interview, developer) + +In the case of the HCI project, Usability Week turned out to be quite successful: + +They were challenged by whether they could reach the goals. This really moved the project hugely forward in one week […] In the end, I have to say, we didn’t reach any of our goals […] But they +got pretty far, and it really gave the whole [usability] project a new motivation. (interview, change agent) + +The self-assignment of tasks, which is the prime mechanism for work division and task allocation in OSS communities, is obviously an issue if the tasks do not attract enough interest and, consequently, remain undone. Task challenge here refers to a continuum ranging from low-to high-stimulation tasks (e.g., highly routinized tasks versus non-standardized, original tasks). The case of TYPO3 shows that increases in task challenge due to, for example, entrance barriers, competition, level of within-task stimulation, task novelty, or freedom to execute a task in a new way, can compensate for an initial lack of personal desire, which would normally drive the self-assignment of tasks. Our analysis shows that in the case of tasks related to the implementation of organisational change initiatives, the change agent needs to increase the perceived task challenge in accordance with the skills and interests of the targeted members. Thus, task challenge should be seen as a dynamic factor dependent on the person-task interaction (Campbell, 1988). Task challenge is associated with increased participation because it appeals to intrinsic motivation, the primary motivational factor in open-source communities (Lakhani & Wolf, 2005). In turn, increased participation improves performance (Hackman & Oldham, 1976; Herzberg, 1959). Furthermore, creating entrance barriers to team membership proved effective at activating a sense of achievement and recognition as stimuli (Herzberg, 1959). Hence, based on the above and the other three observed change initiatives (Table 3) we make the following prediction: + +Proposition 4: Increased task challenge is positively related to a successful implementation of deliberate organisational change initiatives in communities. + +Discussion +This study offers the first comprehensive investigation of deliberate change in OSS communities. It presents clear indications that OSS communities are indeed capable of changing deliberately and, therefore, not doomed to fail in the long run. A change is deliberate because it is desired by a community member—the change agent—and then supported by a sufficient coalition within the community; in the observed HCI project, the change initiative was carried out with the clear goal of improving the usability of TYPO3. + +Our study also shows that in OSS communities deliberate change is highly dependent on change agents who play an essential role in managing the key tasks of change processes: (i) change agents recognise the need for change and translate that into organisational goals; (ii) they create a sense of urgency and convince community members to make decisions in this matter; and (iii) they push the change process and ensure things are getting—often by doing things on their own. This is a clear contrast to hierarchical business organisations, where change is mostly driven by leaders with positional power and/or special functions and change agents only play a secondary role. Against this background, this study of deliberate change in OSS communities focuses on the investigation of change agents and the success drivers of their initiatives. The insights of this study can be summarised in a simple model: + +--- Figure 2 --- + +These findings are first of all relevant for the research on non-hierarchical organizational settings such as OSS communities. They provide insights into an area that was vastly under-researched so far. In addition, knowledge of change is as important for collaborative communities as it is for traditional business organisations because (i) it allows designing change processes more purposefully and (ii) it provides insights into the long-term behaviour of collaborative +communities in relation to their (competitive) environment. As long as they are based on a similar governance structure, there is good reason to assume these findings also apply to other types of communities of practice not related to software development (Bridwell-Mitchell, 2015). This gives a broader relevance to our findings since the importance of communities is increasing in an information- and knowledge-based economy (O’Mahony & Ferraro, 2007). + +However, the findings of this study also include some quite interesting and relevant findings that go beyond communities and also concern change processes in traditional business organisations. In this way, our paper can also contribute to the broader change literature. The elements of the above change model are not all completely new. We already know about change agents, informal power and leadership from investigations of other contexts. What is new and important, however, is that the complete absence of formal power does not prevent the execution of deliberate change and the critical role of change agents to drive the process. OSS project leaders and core team members do not have formal command authority to enforce decisions (von Hippel & von Krogh, 2003). This is also clearly illustrated by especially the third change initiative “New team structure” (Table 3), in which the project leader and founder was the change agent. Although he kept the team contracts on the agenda for two years, he was unable to implement this initiative. Had he had any kind of formal fiat in the community, this initiative would probably have lead to a different outcome. But OSS communities “do not rely on employment contracts and so are unable to be governed by formal authority, as is the case in a hierarchy” (Demil & Lecocq, 2006, p. 1454). This allows for some quite interesting perspectives and insights. + +The first important finding is the apparent irrelevance of decision making in a hierarchical sense, as expressed by community members. This point needs some clarification. It does not +mean there is no deliberate planning or decision making taking place in OSS communities. Instead, these statements relate to their power structure. In his article, Finkelstein (1992) distinguished various forms of management power. As outlined above, OSS communities are characterised by the inherent absence of formal power (‘structural power’ in the terminology of Finkelstein, 1992, p. 509, i.e., the “legislative right to exert influence” over others). Other forms of informal power, like ‘expert power’ and ‘prestige power’ not only exist in OSS communities, but they play an important role in the informal leadership that provides the foundation for the significance of the community’s core team (Fleming & Waguespack, 2007; O’Mahony & Ferraro, 2007). Individual initiative (proposition 1) as a mechanism of change resembles some change factors observed in ‘traditional’ organizations with formal leadership (i.e. hierarchies, Demil & Lecocq, 2006). Similarly to community change agents, agents in hierarchies make use of exemplary change or leading by example (Kotter, 2012). Also, individual initiative bears resemblance to the tasks performed by change champions (Ulrich, 1997) and product champions (Day, 1994), such as providing impetus for and strongly promoting the change initiative. However, the apparent irrelevance of decision making in community change points to a structural power deficit of change agents with regard to change initiatives. Change agents are able to convince relevant community members, decisions are made, and tasks are distributed, but this does not often result in action. In these situations, decisions are only relevant to legitimise the activities of change agents, not to trigger action. Often, change agents have to keep pushing to get things done; in other cases, they have to complete the tasks themselves. Against this background, individual initiative is a strategy to exert influence without formal power. Yet, it has to be noted this strategy only works locally, and informal power is still needed by change agents at other points. Individual initiative might even result in the acquisition of expert and prestige power because it makes change agents and their abilities visible. To date, the meaning of individual +initiative and the structure of low-power contexts are not very well understood. It might be expected that individual initiative also plays a role in high-power contexts as a strategy to exert influence without power. However, more research is needed in this regard. + +Another interesting point is the observations of what we have named ‘reputation lending’ (proposition 2). There is already some research on reputation and advancement in communities and other organisations without vertical lines of authority (Fleming & Waguespack, 2007). Research knows a lot about (i) what authority means for flat hierarchies and (ii) how authority is acquired there (Dahlander & O’Mahony, 2011). In the context of hierarchies, reputation lending parallels coalition formation, support building and gaining sponsorship from individuals with organizational clout, formal authority, and access to resources (Connor, 1998; Day, 1994; Kanter, 1994; Kotter, 2012). Such actions help legitimize the change initiative and the change agent as well as create acceptance of change by those affected (Buchanan & Boddy, 1992). Conceptually, reputation lending is also somewhat close to leader support in hierarchies (Amabile, Schatzel, Moneta, & Kramer, 2004). Leader support means using the formal power of managers to support activities by less-powerful organisational members, often in relation to innovation and change activities. This support can include resources and time, autonomy, and support in organisational decision making (Mumford, Scott, Gaddis, & Strange, 2002). In contrast, reputation lending implies using the informal power of community leaders to support change agents in their activities, mostly by giving them recognition, letting them participate in board meetings and decision-making procedures, and making them and their initiatives more visible in the community. This informal form of support has not been described so far in the literature. Still, this is interesting because the elements of visibility and acceptance play only a minor role in +leader support. This finding indirectly confirms the research showing the importance of informal networks and policy systems for change agent success (Battilana & Casciaro, 2012). + +We also discovered interesting findings with regards to the motivation of community members to carry out change-related tasks. As discussed in the conceptual section above, motivation has already been the focus of previous research. Lakhani and von Hippel (2003) found that participation in OSS communities is quite rewarding since “98% of the effort expended by information providers in fact returns direct learning benefits to those providers” (p. 923). However, we observed there are change-related tasks that are not rewarding and that it is rather challenging to motivate community members to work on them. In this regard, we observed the strategy of so-called ‘fake challenges’ (proposition 4). The underlying approach is to combine unattractive tasks with motivating elements like competitions or social gatherings. There is an interesting early description of the principle: the fence episode in the novel *The Adventures of Tom Sawyer* by Mark Twain (1876). Most readers perhaps remember: Tom had to paint Aunt Polly’s fence as a punishment after he dirtied his clothes in a fight. He hated this work; however, when one of his friends came to the spot, Tom was able to create the impression that it was a privilege and a pleasure to paint the fence. After a while, he was even able to sell painting permissions to his fellows. In this sense, the change agent was successful in creating a sense of exclusivity by restricting spaces at the challenge and transformed boring work into a socially attractive event. To our knowledge, this strategy has not been described by research on OSS communities so far. Ultimately, the strategy of creating challenging tasks is expected to improve the community members understanding and sense of ownership of the change initiative, and eventually enhance their motivation to participate in executing change. In that sense, this approach has the same objective as, for instance, empowerment of organizational members, +which is an important element in the change leadership literature within the context of hierarchies (Caldwell, 2003; Gill, 2003; Goffee & Scase, 1992). While both strategies thus seek to remove obstacles to change, they are in fact each other’s opposites. One strategy uses task design to deal with the downsides of an innate characteristic of OSS communities, i.e. member autonomy. The other, however, seeks to increase member autonomy in a hierarchical setting, where strong administrative controls provide formal powers to supervise and regulate the behaviour of organizational members (Demil & Lecocq, 2006). + +Although change processes have been theorized about and practiced in a variety of ways, the one finding that deliberate change in OSS communities has mostly in common with change in hierarchies is related to change-oriented communication (proposition 3). Through frequent communication change agents create opportunities for organizational members to understand and give input to the change process (Kotter, 2012). Practicing openness and widespread communication (Buchanan & Boddy, 1992) during a change process increases the chance of successful implementation because organizational communication plays a central role in eroding existing path dependencies (Cohen & Levinthal, 1990), thus paving the way for organizational change. + +Yet, the most important finding of this study is perhaps the very observation that OSS communities succeed in handling deliberate change processes without any formal or pre-assigned power. Certainly, informal power, persuasion, and group pressure are relevant to manage deliberate change in OSS communities to a certain extent. Situations can arise in which organisational members are faced with the decision to accept change or leave the community. Still, no community member can be ordered to accept change like in traditional business organisations. Nobody can be laid off, and sanctioning possibilities are generally very limited. If +community members comply with change, they do so because they believe in it or at least accept the majority decision. If a change project is not supported by a critical mass of the community, it will not be successful. We call this type of deliberate change ‘change by conviction’. Why is that relevant? If people comply with change voluntarily, there is a good chance negative side effects, resulting from enforcement, will be reduced (even though not completely eliminated because group members might submit to change unwillingly or leave the community). Indeed, we found some indications for that in our data, even though we were not directly looking for it. We are convinced these findings may also be applicable to hierarchical business organisations and that the latter can learn a lot from OSS communities to reduce the level of enforcement in change processes, thereby decreasing the levels of demotivation, insecurity, and resistance. Consequently, the relevance of our findings is much broader and does not only concern non-hierarchical settings such as OSS communities but helps shed additional light on deliberate organisational change in general. More research is, however, needed to substantiate these findings, clarify the impact of different elements of change on negative side effects, and explore possibilities for traditional business organisations. + +Managerial implications + +The most obvious managerial implication is that communities need to be aware of the central role of change agents in deliberate change to organise change processes accordingly. This study emphasizes the role and importance of individuals taking initiatives and responsibilities by outlining some critical success factors for realizing deliberate change in non-hierarchical settings such as OSS communities. +Another implication is that hierarchical organizations need also reconsider their use and appreciation of change agents, including self-appointed ones. Change agents are already being used in hierarchical business organisations but often in an unsystematic way. However, the results of this study suggest it would be useful to base all major change projects on change agents here as well. After decisions have been made, change agents can simply be assigned and endowed with the necessary power or supported by top managers. Contrary to the non-hierarchical case analysed in this study, there is no specific individual initiative needed at this point in hierarchical organisations. Still, it might be important for change agents to care more than usual about the second driver in our model and build a reputation for being the right person to organise the change process among all organisational members involved in it. The two last drivers point to communication and education, as well as to motivation. We are convinced a lot can be done to smooth change projects in hierarchical business organisations, and it might be even possible to establish a regime of change by conviction there. + +Limitations and future research + +The first limitation of this study is of theoretical nature. When investigating deliberate change in OSS communities, we are touching on a variety of different themes, including leadership, reputation building, informal power, motivation, innovation, and others. Each of these themes can be further developed, and many of them might potentially offer new insights. For the sake of rigour, we decided to focus on change, the meaning of change agents, and the drivers of change agent success. We have targeted this study primarily toward the research conversations on +communities and on change. This is a decision that was made to keep the study focused and detailed. + +Second, in this study we were not looking at organisational context factors that mediate the effect of the success drivers of change agent activities like the cultural context, size and age of the community, degree of formalisation, or others. We also did not look at the antecedents of change agent activities. This means our study is far from offering a complete model of change agent activity in communities. Still, we think our propositions can be useful stepping stones towards a more holistic model. + +Analysing classic concepts and/or phenomena such as deliberate change under entirely different and new(er) organizational regimes is important as it not only helps to clarify how such organizational settings work, it also sheds new light on the phenomenon under investigation. In our study, the realization of the phenomenon manifested itself in the form of self-appointment of change agents. While this was necessary for the phenomenon to exist in a completely different and non-hierarchical organizational setting, it also holds potential for being applied in hierarchical settings. + +**Conclusion** + +This study provides evidence that it is indeed possible to change complex organisations deliberately without formal power and hierarchical influence. All change initiatives we observed were grounded in the individual commitment of change agents. However, we also found the success of change agents’ initiatives depended on their ability to get sufficient support within the organisation. Key drivers of this are individual initiative, reputation and reputation lending, +change-oriented communication and education, and motivation through challenging tasks. There is reason to assume these insights also hold for a broader range of organisations, including hierarchical business organisations. This is relevant because there are indications that change by conviction reduces the negative side effects of deliberate change. + +References + +Amabile, T. M., Schatzel, E. A., Moneta, G. B., & Kramer, S. J. (2004). Leader behaviors and the work environment for creativity: Perceived leader support. *Leadership Quarterly, 15*(1), 5-32. + +Battilana, J., & Casciaro, T. (2012). Change Agents, Networks, and Institutions: A Contingency Theory of Organizational Change. *Academy of Management Journal, 55*(2), 381-398. + +Bonaccorsi, A., & Rossi, C. (2003). Why Open Source Software Can Succeed. *Research Policy, 32*, 1243-1258. + +Bridwell-Mitchell, E. N. (2015). Collaborative Institutional Agency: How Peer Learning in Communities of Practice Enables and Inhibits Micro-Institutional Change. *Organization Studies*. + +Brown, A. D., Colville, I., & Pye, A. (2015). Making sense of sensemaking in organization studies. *Organization Studies, 36*(2), 265-277. + +Buchanan, D., & Boddy, D. (1992). *The expertise of the change agent*. London: Prentice Hall. + +Bullock, R. J., & Batten, D. (1985). It's Just a Phase We're Going Through: A Review and Synthesis of OD Phase Analysis. *Group & Organization Studies, 10*(4), 383-412. + +Burnes, B. (1996). No such thing as ... a "one best way" to manage organizational change. *Management Decision, 34*(10), 11. +Burnes, B. (2009). *Managing change: a strategic approach to organisational dynamics* (5th ed.). Harlow, England; New York: Prentice Hall/Financial Times. + +By, R. T. (2005). Organisational change management: A critical review. *Journal of Change Management, 5*(4), 369-380. + +Caldwell, R. (2003). Models of change agency: A fourfold classification. *British Journal of Management, 14*, 131-142. + +Campbell, D. J. (1988). Task complexity: A review and analysis. *The Academy of Management Review, 13*(1), 40-52. + +Carson, J. B., Tesluk, P. E., & Marrone, J. A. (2007). Shared leadership in teams: An investigation of antecedent conditions and performance. *Academy of Management Journal, 50*(5), 1217-1234. + +Chowdhury, S. (2005). The role of affect- and cognition-based trust in complex knowledge sharing. *Journal of Managerial Issues, 17*(3), 310-327. + +Cohen, W. M., & Levinthal, D. A. (1990). Absorptive capacity: a new perspective on learning and innovation. *Administrative Science Quarterly, 35*(1), 128-152. + +Connor, D. R. (1998). *Managing at the speed of change*. Chichester, UK: John Wiley & Sons. + +Cromie, J. G., & Ewing, M. T. (2009). The rejection of brand hegemony. *Journal of Business Research, 62*, 218-230. + +Crowston, K., Li, Q., Wei, K., Eseryel, U. Y., & Howison, J. (2007). Self-organization of teams for free/libre open source software development. *Information and Software Technology, 49*, 564–575. + +Dahlander, L., & O'Mahony, S. (2011). Progressing to the Center: Coordinating Project Work. *Organization Science, 22*(4), 961-979. doi: 10.1287/orsc.1100.0571 +Day, D. (1994). Raising radicals: Different processes for championing innovative corporate ventures. *Organization Science, 5*, 148-173. + +De Souza, G., & Klein, H. J. (1995). Emergent leadership in the group goal-setting process (English). *Small group research, 26*(4), 475-496. + +del Val, M. P. (2003). Resistance to change: a literature review and empirical study. *Management Decision, 41*(2), 148. + +Demil, B., & Lecocq, X. (2006). Neither market nor hierarchy nor network: The emergence of bazaar governance. *Organization Studies, 27*(10), 1447-1466. + +Dunphy, D., & Stace, D. (1993). The strategic management of corporate change. *Human Relations, 46*(8), 905-920. + +Eisenhardt, K. M. (1989). Building theories from case study research. *Academy of Management Review, 14*(4), 532. + +Etzioni, A. (1964). *Modern organization*. Englewood Cliffs, N.J.: Prentice-Hall, Inc. + +Finkelstein, S. (1992). Power in top management teams: dimensions, measurement, and validation. *Academy of Management Journal, 35*(3), 505-538. + +Fiss, P. C., & Zajac, E. J. (2006). The symbolic management of strategic change: sensegiving via framing and decoupling. *Academy of Management Journal, 49*(6), 1173-1193. + +Fjeldstad, Ø. D., Snow, C. C., Miles, R. E., & Lettl, C. (2012). The architecture of collaboration. *Strategic Management Journal, 33*, 734-750. + +Fleming, L., & Waguespack, D. M. (2007). Brokerage, Boundary Spanning, and Leadership in Open Innovation Communities. *Organization Science, 18*(2), 165-180. + +Gill, R. (2003). Change management — or change leadership? *Journal of Change Management, 3*(4), 307-318. +Ginsberg, A., & Abrahamson, E. (1991). Champions of change and strategic shifts: The role of internal and external change advocates. *Journal of Management Studies, 28*(2), 173-190. + +Goffee, R., & Scase, R. (1992). Organizational change and the corporate career: the restructuring of managers’ job aspirations. *Human Relations, 45*(4), 363-384. + +Gurtman, M. B. (1992). Trust, distrust, and interpersonal problems: a circumplex analysis. *Journal of Personality & Social Psychology, 62*, 989-1002. + +Hackman, J. R., & Oldham, G. R. (1976). Motivation through the design of work - test of a theory. *Organizational Behavior and Human Performance, 16*(2), 250-250. + +Hars, A., & Ou, S. (2002). Working for free? Motivations for participating in open-source projects. *International Journal of Electronic Commerce, 6*(3), 25–39. + +Hayes, J. (2010). *The theory and practice of change management* (3rd ed.). New York: Palgrave Macmillan. + +Herzberg, F. (1959). *The motivation to work*. New York: John Wiley and Sons. + +Higgs, M., & Rowland, D. (2011). What Does It Take to Implement Change Successfully? A Study of the Behaviors of Successful Change Leaders. *Journal of Applied Behavioral Science, 47*(3), 309-335. + +Hon, A. H. Y., Bloom, M., & Crant, J. M. (2011). Overcoming Resistance to Change and Enhancing Creative Performance. *Journal of Management, 40*(3), 919-941. + +Hongseok, O., Labianca, G., & Myung-Ho, C. (2006). A mulitlevel model of group social capital. *Academy of Management Review, 31*(3), 569-582. + +Howell, J. M., & Avolio, B. J. (1993). Transformational leadership, transactional leadership, locus of control, and support for innovation: key predictors of consolidated-business-unit performance (English). *Journal of applied psychology, 78*(6), 891-902. +Huy, Q. N., Corley, K. G., & Kraatz, M. S. (2014). From Support to Mutiny: Shifting Legitimacy Judgments and Emotional Reactions Impacting the Implementation of Radical Change. *Academy of Management Journal, 57*(6), 1650-1680. + +Kanter, R. M. (1994). *The change masters*. London: Allen & Unwin. + +Kanter, R. M., Stein, B., & Jick, T. (1992). *The challenge of organizational change: How companies experience it and leaders guide it*. New York: Free Press. + +Kesting, P., & Ulhøi, J. P. (2010). Employee-driven innovation: extending the license to foster innovation. *Management Decision, 48*(1), 65-84. + +Kirzner, I. M. (1997). Entrepreneurial Discovery and the Competitive Market Process: An Austrian Approach. *Journal of Economic Literature, 35*(1), 60-85. + +Kotter, J. P. (2007). Leading Change: Why Transformation Efforts Fail. *Harvard Business Review, 85*(1), 96-103. + +Kotter, J. P. (2012). *Leading change*. Boston, Mass.: Harvard Business Review Press. + +Lakhani, K. R., & von Hippel, E. (2003). How open source software works: "free" user-to-user assistance. *Research Policy, 32*(2003), 923-943. + +Lakhani, K. R., & Wolf, R. G. (2005). Why hackers do what they do: Understanding motivation efforts in free/open source projects. In S. A. Hissam, B. Fitzgerald, J. Feller & K. R. Lakhani (Eds.), *Perspectives in free and open source software* (pp. 3-21). Cambridge, MA: MIT Press. + +Lawrence, P. R., & Lorsch, J. W. (1967). *Organization and environment: Managing differentiation and integration*. Cambridge, MA: Harvard University Press. + +Lee, G. K., & Cole, R. E. (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux Kernel Development. *Organization Science, 14*(6), 633-649. +Lerner, J., & Tirole, J. (2002). Some Simple Economics of Open Source. *The Journal of Industrial Economics, 50*(2), 197-234. + +Lewin, K. (1951). *Field theory in social science; selected theoretical papers* ([1st ed.). New York,: Harper. + +Liebhart, M., & Garcia-Lorenzo, L. (2010). Between planned and emergent change: decision maker’s perceptions of managing change in organisations. *International Journal of Knowledge, Culture and Change Management, 10*(5), 214-225. + +Locke, K. (2001). *Grounded theory in management research*. London: Sage Publications. + +Manz, C. C. (1986). Self-leadership: toward an expanded theory of self-influence processes in organizations. *Academy of Management Review, 11*, 585-600. + +Markus, M. L. (2007). The governance of free/open source software projects: Monolithic, multidimensional, or configurational? *Journal of Management and Governance, 11*(2), 151-163. + +Markus, M. L., & Benjamin, R. I. (1996). Change agentry - The next IS frontier. *MIS Quarterly, 20*(4), 385-407. + +Martinez-Torres, M. R., & Diaz-Fernandez, M. C. (2014). Current issues and research trends on open-source software communities. *Technology Analysis & Strategic Management, 26*(1), 55-68. + +McAllister, D. J. (1995). Affect and cognition based trust as foundations for interpersonal cooperation in organizations. *Academy of Management Journal, 38*(1), 24-59. + +Mehra, A., Smith, B., Dixon, A., & Robertson, B. (2006). Distributed leadership in teams: The network of leadership perceptions and team performance. *Leadership Quarterly, 17*, 232–245. +Miles, M. B., & Huberman, A. M. (1984). *Qualitative data analysis: A sourcebook of new methods*. Beverly Hills, CA: Sage Publications. + +Mintzberg, H. (1994). *The rise and fall of strategic planning*. New York (NY): The Free Press. + +Mintzberg, H., & Waters, J. A. (1985). Of strategies, deliberate and emergent. *Strategic Management Journal, 6*(3), 257-273. + +Mockus, A., Fielding, R. T., & Herbsleb, J. (2002). Two case studies of open source software development: Apache and Mozilla. *ACM Transactions on Software Engineering and Methodology, 11*(3), 309–346. + +Mooney, J. D., & Reiley, A. C. (1939). *The principles of organization*. New York: Harper and Brothers. + +Moran, J. W., & Brightman, B. K. (2001). Leading organizational change. *Career Development International, 6*(2), 111-118. + +Mumford, M. D., Scott, G. M., Gaddis, B., & Strange, J. M. (2002). Leading creative people: Orchestrating expertise and relationships. *Leadership Quarterly, 13*(6), 705. + +Nelson, R. R., & Winter, S. G. (1982). *An evolutionary theory of economic change*. Cambridge, Mass.: Belknap Press of Harvard University Press. + +O’Mahony, S., & Ferraro, F. (2007). The emergence of governance in an open source community. *Academy of Management Journal, 50*(5), 1079-1106. + +Patton, M. Q. (2002). *Qualitative research and evaluation methods* (3rd ed.). Thousand Oaks, CA: Sage Publications. + +Petkova, A. P., Rindova, V. P., & Gupta, A. K. (2013). No news is bad news: sensegiving activities, media attention, and venture capital funding of new technology organizations. *Organization Science, 24*(3), 865-888. +Powell, W. W. (1990). Neither market nor hierarchy: network forms of organization. *Research in Organizational Behavior, 12*, 295-336. + +Ryan, R. M., & Deci, E. L. (1985). Intrinsic and extrinsic motivations: Classic definitions and new directions. *Contemporary Educational Psychology, 25*, 54-67. + +Scacchi, W. (2002). Understanding the requirements for developing open source software systems. *IEE Proceedings--Software, 149*(1), 24-39. + +Schumpeter, J. A. (1934). *The theory of economic development; an inquiry into profits, capital, credit, interest, and the business cycle*. Cambridge, Mass.,: Harvard University Press. + +Scott, W. R. (1981). *Organizations: rational, natural and open systems*. Englewood Cliffs, NJ: Prentice Hall. + +Sharma, S., Sugumaran, V., & Rajagopalan, B. (2002). A framework for creating hybrid-open source software communities. *Information Systems Journal, 12*, 7-25. + +Somech, A. (2006). The Effects of Leadership Style and Team Process on Performance and Innovation in Functionally Heterogeneous Teams. *Journal of Management, 32*(1), 132-157. + +Strauss, A., & Corbin, J. (1998). *Basics of qualitative research - techniques and procedures for developing grounded theory* (2nd edition ed.). London: SAGE Publications. + +Teece, D. J., Pisano, G., & Shuen, A. (1997). Dynamic capabilities and strategic management. *Strategic Management Journal, 18*(7), 509-533. + +Tushman, M. L., & Anderson, P. (1986). Technological Discontinuities and Organizational Environments. *Administrative Science Quarterly, 31*(3), 439-466. + +Twain, M. (1876). *The adventures of Tom Sawyer*. Toronto: Belford Bros. + +Ulrich, D. (1997). *Human resource champions*. Cambridge, MA: Harvard University Press. +Van De Ven, A. H., & Poole, M. S. (1995). Explaining development and change in organizations. *Academy of Management Review, 20*(3), 510-540. + +Volberda, H. W., Van Den Bosch, F. A. J., & Mihalache, O. R. (2014). Advancing Management Innovation: Synthesizing Processes, Levels of Analysis, and Change Agents. *Organization Studies, 35*(9), 1245-1264. + +von Hippel, E., & von Krogh, G. (2003). Open Source Software and the 'Private-Collective' Innovation Model: Issues for Organization Science. *Organization Science, 14*(2), 209-223. + +Vujovic, S., & Ulhøi, J. P. (2008). Online innovation: the case of open source software development. *European Journal of Innovation Management, 11*(1), 142-156. + +Waddell, D., & Sohal, A. S. (1998). Resistance: a constructive tool for change management. *Management Decision, 36*(7/8), 543. + +Wylie, N., Sturdy, A., & Wright, C. (2014). Change agency in occupational context: lessons for HRM. *Human Resource Management Journal, 24*(1), 95-110. + +Yates, M. (2000). Developing leaders in a global landscape. In D. J. Giber, L. Carter & M. Goldsmith (Eds.), *Linkage Inc.'s best practices in leadership development handbook: Case studies, instruments, training* (1st ed.). San Francisco, CA: Jossey-Bass/Pfeiffer. + +Yin, R. K. (1994). *Case study research: design and methods* (2nd ed.). Thousand Oaks, CA: Sage Publications. +Biographies: + +Sladjana Nørskov is an External Lecturer at the Department of Management, Aarhus University. She received her Ph.D. from Aarhus School of Business. Her research interests include organizational development, user-centered innovation processes, community governance, and new organizational forms. + +Peter Kesting is an Associate Professor of Management at Aarhus University, Denmark. His research interests primarily concern innovation management, the cognitive and conceptual foundations of routine and decision-making, negotiations, and the life and work of Joseph A. Schumpeter. + +John Parm Ulhøi is a Professor of Organization and Management Theory at Aarhus University. His research interests include organisational development, new forms of organising, human and social capital, and innovation and entrepreneurship. Over the years, he has served as TIM-Division Board Member of the Academy of Management and as Editorial Board member of various journals. He has served as member of various International Expert Boards such as, for example, Directorate-General Research, The European Commission; Israel Science Foundation; European Science Foundation; The Belgian Office for Scientific, Technical and Cultural Affairs; The Research Council of Norway. +Figure 1. The growth of TYPO3 depicted as the number of registered developers, references, and extensions (2003-2005).\textsuperscript{1} Source: http://typo3.com/ + +\begin{figure} +\centering +\includegraphics[width=\textwidth]{typo3_growth.png} +\caption{The growth of TYPO3 depicted as the number of registered developers, references, and extensions (2003-2005). Source: http://typo3.com/} +\end{figure} + +\textsuperscript{1} The graph shows the number of registered developers from 2003 to 2005. Unfortunately, reliable statistics for the ensuing years could not be obtained. + +Figure 2. Model of the moderators of change initiatives in OSS communities + +\begin{figure} +\centering +\includegraphics[width=\textwidth]{change_initiatives.png} +\caption{Model of the moderators of change initiatives in OSS communities} +\end{figure} +Table 1. Topics discussed in the R&D Committee’s mailing list + +| Number, # | Governance-related postings | Technical postings | Other | Sum | +|-----------|-----------------------------|--------------------|-------|-----| +| 201 | 21 | 13 | 235 | +| 85.5 | 9.0 | 5.5 | 100 | + +Table 2. Data sources + +| Data source | Description | Purpose | Time | +|-------------|-------------|---------|------| +| Mailing list I | 235 postings from the R&D Committee mailing list | Insight into the contributions and role of each Committee member; an in-depth understanding of the organizational tasks and issues and how they were addressed | 2006 | +| Mailing list II | 1,088 postings from the HCI Team mailing list | Understanding organizational developments within the HCI (Usability) Team. Related to a particular change initiative. | 2006-2009 | +| Mailing list III | 1,191 postings (selected for their relevance from a total of 13,587 postings) from the Core Team mailing list | Understanding the interactions between the core and the periphery and how the interactions developed over time. Actions and reactions related to the identified change processes. | 2006-2008 | +| Interviews | 11 interviews: - 1 interview with the project founder - 1 interview with the community manager - 9 interviews with 9 Core Team members, out of whom 7 were also members of the R&D Committee | Understanding of the community, its history and development, and change in TYPO3. Managing change in TYPO3; follow-up on specific developments and change initiatives. | 2006-2010 | +| Observation | 18 hours (a two-day R&D Committee face-to-face meeting) | Insight into issues regularly addressed by the R&D Committee. The observations revealed a range of organizational issues and | 2006 | +Archival documentation + +Project description, bylaws, videos of conferences and meetings, summaries of meetings, and news + +Learning about the formal regulations and structures of the community. +Crosschecking some of the facts uncovered during the observation activities and interviews. + +2006-2010 + +Table 3. The four change initiatives + +| Change initiative | Components of the change initiative | Rationale behind changes | Change agent | Outcome | +|-----------------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|------------------| +| Reorganization of product development | New work processes +Feedback +Gate keeping +Closer interactions +Release management | Motivate contributors via feedback, gate keeping, and closer interactions, which were expected to act as rewards and retention mechanisms. Release management improved after setting up strict development phases. | Core member | Successfully implemented | +| Founding of a non-profit organization called the TYPO3 Association | Create a committee structure (similar to a functional structure) | Support core development on a steadier basis. Improve the efficiency of the project by "providing a central hub from which to support active developers as well as to concentrate its members into a pool of regular contributors." | Project founder | Successfully implemented | +| New team structure | Establishing 'Team Contracts' for each team. Implement a more transparent structure with clear responsibilities, increased team autonomy, elaborate structure. | Ensure responsibility and accountability for each task and role. | Project founder | Unsuccessful | +| Installing usability as a mindset | Usability as a mindset +Changing the mindset of developers. +Bringing software developers and designers together. | Create a team that would work to increase the usability of the TYPO3 system. Developers usually lack the user perspective. Designers are needed to create more user-friendly software. | Periphery member | Successfully implemented | +| Individual initiative | | +|--------------------------------------------------------------------------------------|------------------------------------------------------------------| +| Persistence | You need to be extremely enthusiastic and not afraid of setbacks because you will experience many, and it will take a long time to make changes happen. (Interview, core member) | +| Leading by example (creating credibility and merit in the community to gain followers for the change initiative) | But what didn’t work out is that I couldn’t motivate persons just to follow the guidance of my changes. So I created about, I would say, 200 mock-ups. And about 10 percent have been realized in TYPO3 until today. (Interview, change agent) So you need to prove to them that you have the skills and that you are able to assess their solutions. (Interview, change agent) | + +| Reputation and reputation lending | | +|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------| +| Endorsement by high-status members to the change agents | I also realized that [change agent’s name]– who is one of the most active participants in here – has been continuously working on a lot of TYPO3 HCI Topics: +- […] New Installer 2.0 +- Backend interface improvements for TYPO3 4.2 +- TemplaVoilá 2 (together with [name]) +- Starting to work on Extension Manager 2 (with [name]) +- And finally, [change agent’s name] is also an active member of the TYPO3.org redesign group (Core Team mailing list, core member) | +| Redirecting attention and work efforts towards the initiative | > Could you tell us a bit more about this? Maybe in the [developer list]? +Answer: Or HCI, that is. Please continue the discussion there. (…) can you re-send your mail in the HCI list please, once you feel like you want to continue the discussion. (Core Team mailing list) | +| Proactive recognition and support of initiatives by high-status members | It’s more of keeping this big overview and picking the cherries. It is a dynamic system. I never have an idea all of a sudden. (…) It’s mostly about things that are already under way. (Interview, core member) | +| | You work mostly with the things that are going on and try to find little suggestions or ask someone else: “What do you think about this idea, about this project? Do you have anything to add to that?” (…) It’s mostly that there are already ongoing projects. As a community manager I see, okay, this guy is working on it and this guy is working on it, and I try to connect them. (Interview, community manager) | + +| Change-oriented communication | | +|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------| +| Inform and educate the community about the rationale and arguments behind the initiatives | +|---| +| The breakthrough was the presentation for 5.0 with a guy called [name]. After that presentation, the spirit in the community changed because they saw that it is really possible to do this. […] (Interview, change agent) | +| I just watched the HCI podcast and was really impressed. Once we get there, we can all be very proud of not only a flexible product but a user-friendly product as well! As an ‘outsider’ to the HCI team, it produced two random thoughts I would like to share with you. […] After viewing the presentation I was overwhelmed when thinking about what it would mean to achieve all this. To really get a consistent look and field, it would require rewriting a lot of code and adapting tons of extensions. Some things like the installer might be easier since it is better modularized. But to achieve major changes, I strongly feel that it would be best to focus on the 5.0 development. (HCI mailing list, developer) | + +| Motivation through challenging tasks | +|---| +| Novel task structure and content | +| It was exciting for the developers to use a framework that is so powerful, so new, that has so many functions already inside. By just using the framework, we could use a lot of things out of the box that we could never just pluck into the old system. (Interview, core member) | +| Freedom to work in new ways | +| So removing everything and replacing them with totally new components for the whole frame and for the page tree, this was really [going] to bring something totally new in there. Our coding was driven by the huge set of features that were there. Every one of us was coding in the past and was in a position of coding extensions for a customer […] and to create new menu items was never possible in the past […] So we really at some point had the freedom to drop compatibility and this was quite helpful to go fast forward to say, ok, let’s delete everything and create new. (Interview, core developer) | \ No newline at end of file