18
0

added revision summary for RAD revisisted

This commit is contained in:
Benjamin Mako Hill 2020-11-24 10:59:10 -08:00
parent 211ba396c6
commit de63698442
4 changed files with 354 additions and 0 deletions

View File

@ -0,0 +1,7 @@
Rebuttal material for:
TeBlunthuis, Nathan, Aaron Shaw, and Benjamin Mako
Hill. 2018. “Revisiting The Rise and Decline in a Population of Peer
Production Projects.” In Proceedings of the 2018 CHI Conference on
Human Factors in Computing Systems (CHI 18), 355:1355:7. New York,
New York: ACM. https://doi.org/10.1145/3173574.3173929.

View File

@ -0,0 +1,65 @@
Rebuttal
-------------------------------------------------
Thanks to the reviewers and ACs for your careful attention to this submission. We appreciate the reviewers' positive feedback and constructive suggestions for improvement. Below, we respond and describe several minor adjustments that we believe will address the reviewers' concerns. We agree that these changes will improve the manuscript and look forward to implementing them before the camera ready deadline.
1.
R1 and R2 raised questions about generalizability of our findings to other peer production communities and to other communities of practice. Both R4 and R2 ask us to discuss possible underlying causes of the relationships we observe.
We plan to amend our discussion to explain the following: Because our dataset is limited to wikis, we cannot address these questions empirically. However, we believe several general mechanisms may drive our findings and that these likely apply to other communities without pre-defined hierarchies or formal structures. In particular, we will cite earlier works on oligarchy (Michels, 1915) as well as "the tyranny of structurelessness" (Freeman, 1972)—both theorized as features of democratic organizations more broadly—that suggest that a tendency toward "calcification" in open organizations is likely neither unique to wikis nor peer production.
2.
R3 and R4 suggest additional description of Halfaker et al.'s methods. We had cut a longer summary of RAD's methodology to ensure that our paper's size was clearly commensurate to its contribution. On the recommendation of R3 and R4, we will reintroduce some of this text. Following R2, who said "I particularly appreciate how they've nearly entirely skipped re-iterating a methods section," we will try to limit ourselves to 3 additional paragraphs. We agree that this new text can increase clarity and transparency without attempting to simply repeat RAD. We are flexible on this point and would welcome further guidance from the reviewers in their responses.
3.
R3 proposes that we describe "shortcomings of the original RAD paper." We appreciate this idea and will add a couple of sentences to the methods section to this effect. We think the biggest issues relate to unique aspects of English Wikipedia potentially driving RAD's findings. These include questions about whether something happened around 2007 (like the rise of Facebook) that drove Wikipedia's editor decline. Our results suggest that this was likely not the case.
4.
R3 calls for more discussion and scrutiny of our use of Namespace 4 instead of policy pages to operationalize norm entrenchment. We agree that this difference is important and was glossed over in our submitted manuscript. At the same time, we maintain that Namespace 4 provides the best available opportunity to study norm entrenchment in Wikia where many wikis do not have policy pages that precisely parallel Wikipedia's. We will highlight and flag this issue as an important threat to validity in the methods paragraph for Study 3.
5.
R1 suggests adding a visual indication of uncertainty to Figure 1. We will add error bars to each point indicating bootstrapped 95% confidence intervals. We have made this change and the error bars are, as expected, larger for later periods where data is thinner but do not alter the takeaways from the figure.
6.
R2 pointed out an important typo that we will fix (the reported estimate for β is correct). We will also have the manuscript proofread professionally before submission of a camera ready copy to address any other stylistic issues.
7.
In addition to the issues raised in the reviews, we also propose adding one new robustness check for Studies 2 and 3 that we identified after submission. In these studies, our units of analysis are newcomers and namespace 4 edits. Because the wikis in our sample have different numbers of both, the average effects we report could disproportionately reflect the experience of users in the communities that contribute the most observations to our sample. The average user experience across all the wikis in our sample remains the most reasonable estimate to report (as our models already do), but we wanted to know if our findings described only the experience of users from the bigger wikis.
To address this, we fit another set of regression models in which each wiki is given equal weight. Our conclusions are robust to this change and the re-weighted models suggest that the RAD dynamics of entrenchment and newcomer rejection may even be stronger in smaller or less active communities. We plan to add the results of the robustness check to the supplementary material and to add a few new sentences to the discussion that summarize the threat, our new robustness check, and the substantively unchanged findings. We propose this change here because we believe the addition reflects a minor but important improvement. We apologize for not identifying this issue before submission and we hope that the reviewers are amenable to this small addition.
Bullet-point summary of reviews:
-------------------------------------------------
Metareview (AC)
- The analysis is not as in-depth as in Halfaker
- Generalizability to non wiki communities?
- Possible underlying causes/mechanisms of these effects?
- More details on Halfaker et al.'s methods
Review 1 (2AC)
+ more discussion of how impact of the replication is limited to wikis.
+ Why or why not will this generalize to non-wiki peer production communities.
+ Error bars on Figure 1
Review 2
+ really likes that we don't re-iterate a methods section.
+ wants discussion of organizational mechanisms / "reasons these patterns happen"
+ Wants to know if Wikipedia and Wikias is the first time we noticed this dynamic. Seems to want a connection to broader theory about newcomers and groups.
+ They are OK if this is left to future work.
+ Noticed a typo in the interpretation of the coefficient of bot-revert
Review 3
+ "the difference between policy pages on wikipedia and project namespaces on the Wikia platform is significant enough to warrant greater discussion and scruitiny."
+ "articulate not just the findings, but perhaps some of the shortcomings of the original RAD paper."
Bike Rack
-------------
While this result is intriguing, we feel that understanding how community size and activity level interact withg newcomer retention requires further analysis that is outside the scope of replicating RAD and is best left to future work. We plan to add the robustness check to the supplementary material and to include a few sentences of new text in the discussion that summarizes the threat and the robustness check and that points to the opportunity for future work. We apologize for not including this material originally. The issue emerged in conversations with colleagues after the submission deadline.
We will add text to the discussion to describe this limitation and to assert our belief that, given more broadly applicable theoretical mechanisms, our results represent informative, but unconvincing, evidence favoring the notion that RAD's findings generalize beyond wikis.

View File

@ -0,0 +1,243 @@
Original reviews (full text)
CHI 2018 Papers
Reviews of submission #3186: "Revisiting The Rise and Decline in a
Population of Peer Production Projects"
------------------------ Submission 3186, Review 4 ------------------------
Reviewer: AC
Expertise
3 (Knowledgeable)
Recommendation
. . . Between possibly accept and strong accept; 4.5
Award Nomination
If accepted, this paper would be among the top 20% of papers presented at CHI (Best Paper: Honorable Mention nomination)
1AC: The Meta-Review
This short paper replicates and extends Halfaker's (2013) important paper
titled, "The Rise and Decline of an Open Collaboration System" (RAD)",
which examined open collaboration communities in Wikipedia. The authors
use similar methods as the earlier paper on a set of 700+ Wikia
communities. The results provide evidence that Halfaker's findings
extend to a wider set of wiki communities. The reviewers were uniformly
very positive about this paper and I concur.. Their reviews raised only a
small set of non-critical issues that the authors might want to consider:
R1 notes that the analysis is not as in-depth as in Halfaker
R1 notes that the focus is on more wiki communities and asks the authors'
thoughts on generalizability to non-wiki communities. Similarly, R2 would
like to see more discussion of the possible underlying causes of these
effects.
R3 notes, and I agree, that the paper would benefit from more details on
Halfaker's methods that are applied here. Not everyone reading the paper
will be aware of those methods or have the earlier paper available while
reading this one.
All authors may submit an optional 5000 character rebuttal to address any
misunderstanding or factual errors in the reviews. There is not much to
rebut for this paper, though perhaps the authors will want to briefly
address the minor points above.
Rebuttal response
(blank)
------------------------ Submission 3186, Review 1 ------------------------
Reviewer: 2AC
Expertise
3 (Knowledgeable)
Recommendation
. . . Between possibly accept and strong accept; 4.5
Award Nomination
If accepted, this paper would not be among the top 20% of papers presented at CHI
Review
This paper replicates prior work by Halfaker et al. (2013) tracking the
influx of newcomers to Wikipedia. They examine this same behavior for 740
Wikia wikis. I strongly recommend publication in CHI. Replicating
important contributions like Halfaker et al. (2013) is important and
should be done more often. Understanding how participation changes over
the lifecycle of a peer production community is both theoretically
interesting from the perspective of norm development and socialization in
an online organization and is practically relevant to the survival of
this type of community.
My only reservations with this paper are 1) authors were not able to
examine behavior in as much depth as Halfaker et al (e.g. could not track
edits to deleted pages, did not distinguish good faith from bad faith
edits) and 2) they examine Wikia which shares many similarities (and
editors) with Wikipedia limiting the impact of the replication on
generalizing about the lifecycle of non-wiki peer-production communities.
However, both limitations are understandable given the constraints of a
single research study. The authors do a good job of being transparent
about the limitations of 1). I would be interested to see more discussion
of 2). For example, do the authors believe these results will replicate
in non-wiki peer production communities, why or why not?
Minor points - it would be nice if there were error bars (CI, SE) on the
points in figure 1.
Rebuttal response
(blank)
------------------------ Submission 3186, Review 2 ------------------------
Expertise
4 (Expert )
Recommendation
. . . Between possibly accept and strong accept; 4.5
Award Nomination
If accepted, this paper would be among the top 5% of papers presented at CHI (Best Paper nomination)
Review
In this paper, the authors replicate and extend the analyses performed by
Halfaker et al. 2013 "The Rise and Decline" (RAD). The authors argue
that replication of this work is essential to turn the conclusions of one
study into generalizable knowledge. They apply measurements similar to
those used in Wikipedia across a broad set of Wikia wikis (and discuss
diversity of content, time period, etc. among these wikis). They
conclude that the patterns seen in RAD *are* in fact common to this broad
set of open production environments.
This is a great, short paper. The authors make efficient use of their
prose to describe the past results of RAD, argue the importance of their
study, and to discuss methodological differences between this research
and RAD. I particularly appreciate how they've nearly entirely skipped
re-iterating a methods section and instead only discuss the differences
between their work and RAD's methods. Reviewing RAD and then reading
this paper was straightforward.
My only regret is that the authors did not provide more discussion of
organizational reasons why these patterns happen. Is the study of
Wikipedia and Wikias the first time we noticed this dynamic? I doubt it!
It seems like these patterns should be common in any community of
practice. Regardless, given that this paper is a short, I think it's
totally forgive-able that this type of discussion is left for future
work.
I just have one nit-pick:
Pg. 4:
* "Our parameter estimate for tool reverted (β = 0.22, SE = 0.28)
suggests that newcomers who are rejected by a bot might be more likely to
survive." -- Is this wrong? It seems like a negative coef suggests that
newcomers who were rejected by a bot are *less* likely to survive.
Rebuttal response
(blank)
------------------------ Submission 3186, Review 3 ------------------------
Expertise
4 (Expert )
Recommendation
. . . Between possibly accept and strong accept; 4.5
Award Nomination
If accepted, this paper would be among the top 20% of papers presented at CHI (Best Paper: Honorable Mention nomination)
Review
This short paper presents an attempt at replicating the work of an
influential prior research paper - Halfaker et al's "The Rise and Decline
of an Open Collaboration System" (RAD). I applaud these authors for their
effort in replicating the findings from RAD for a number of reasons:
- Essentially, this is only the second instance of replicating prior work
that I have encountered in the CHI community. And I agree with the
authors that in a field that rewards novelty, replication is often rarely
done, and also leads to other issues such as the generalizability of
findings. Too often in CHI we have one-off studies of systems with no way
to assess some of the claims being made, and more importantly, form
theories within the field that we can call our own.
- Open collaboration has been the subject HCI research for some time now,
however, much of the claims of open collaboration studies have been made
through investigations of Wikipedia. This is a real issue that the
authors also explicitly set out to address. However, there are also good
reasons for why so many studies are based on Wikipedia as well - for
instance, it provides a sizable and (somewhat) easily accessible dataset
for researchers to investigate. Additionally, in "field" of open
collaboration is littered with one-off systems that are essentially
experiments - this makes it hard to do any form of comparative or
generalizable work with Wikipedia. Hence, I commend the authors'
resourcefulness in replicating the RAD study on a Wikia dataset - that is
not only somewhat similar to Wikipedia (in functionality) but also in
terms of size through the aggregation of 740 publicly hosted wikis in
their dataset.
- The authors are also rigorous in their replication study highlighting
not only the ways in which their analyses diverged from the original RAD
study. This was commendable for me as Wikia is a significantly different
platform from Wikipedia. By highlighting how they have to use different
statistical methods appropriate to nested community structure of their
data, and how they have accounted for potential repeated membership of
newcomers across the various Wikia wikis, the authors have made me more
confident in trusting their analyses.
- Most impressively, the were able to reproduce some of the main findings
found in the RAD paper - most notably that of the decline of
contributions to open collaboration systems, the survival and retention
of newcomers and the "calcification" of norms over time in wikis.
Overall, I found the findings relatively persuasive, with the exception
of the findings from Study 3 - examining the entrenchment of norms. It
seems to me that
project namespaces on the Wikia platform is significant enough to warrant
greater discussion and scrutiny.
If there is one criticism of this paper is the heavy reliance on the
assumption that the reader would be familiar with the original RAD paper.
This may not necessarily be the case and thus I feel that the authors
would do well to better articulate not just the findings, but perhaps
some of the shortcomings of the original RAD paper.
Overall I am pretty impressed with this submission and the succinct
clarity with which the authors have not only managed to report the
findings of reproducing the original RAD paper, but also summarizing
overall findings and making a case for replication of prior research. I
would recommend the acceptance of this paper for the CHI conference
wholeheartedly.
Rebuttal response
(blank)

View File

@ -0,0 +1,39 @@
We made the following substantive changes to our manuscript. Each of these changes was described in our rebuttal and the points in this summary correspond to the points in our rebuttal document.
1. We added a paragraph beginning "Despite our efforts at generalization" to the end of the discussion section. This paragraph refers to prior work that provides general mechanisms for norm entrenchment, increasing newcomer rejection, and newcomer retention and argues for the generalizability of our findings.
2. We made the following changes to our methods section:
i. To provide readers with a better mental model of the structure of the RAD study, we explain that RAD present three interdependent analyses.
ii. We amended ¶2 to clarify that RAD's plots of newcomer survival and rejection showed good-faith newcomers but that our replications of these plots show all newcomers.
iii. To better help readers' grasp RAD's argument, we modified ¶2-6 to specify that RAD's time series plots are provided in support of their explanations of Wikipedia's decline while their regression models provide evidence for mechanisms.
iv. We have inserted a sentence to ¶3 explaining that RAD fit two models, one for all newcomers, and one for good-faith newcomers. We also list all the variables in RAD's regression model and amend the final sentence to explain that we replicate the model on all newcomers.
v. We inserted a new paragraph (¶4) that summarizes RAD's analysis of algorithmic tools.
vi. We amended the first sentence of ¶5 (formerly ¶3) to mention guidelines.
vii. We inserted a sentence to ¶5 describing RAD's plot showing changes in edits to norm pages over time.
viii. We broke ¶3 into two paragraphs (¶5-6). ¶7 now more fully describes RAD's second logistic regression and mentions the result that essay pages were less calcified.
ix. We split paragraph 8 into two paragraphs. The first describes RAD's use of a sample of "good faith" newcomers in greater depth. The second explains that we do not attempt to replicate this part of RAD's analysis.
x. We removed the former ¶5 because its content is now covered in the paragraphs above.
3. We added a paragraph to the end of the methods section describing two limitations of RAD an that our analysis partially addresses one.
4. We added two sentances to the methods section of Study 3 to explain that using all edits to namespace 4 is a threat to validity of our replication but represents the best available opportunity to study norm entrenchment on Wikia.
5. We added error bars to Figure 1.
6. We corrected a typo in the interpretation of the parameter estimate for tool reverted in the results for Study 2.
7. We added a new paragraph to the discussion section (¶4) describing the threat of varying levels of activity and numbers of newcomers across wikis in our sample. We explain how we address the threat and refer to supplementary material. We also add material to our supplementary material to describe and interpret the robustness check.
8. We carefully edited our paper for style and clarity. We had our paper professionally proofread.
9. We have unblinded our paper, added our copyright blurb, and added an "Acknowledgement" section. We also added a new section called "Access to Data" that includes a hyperlink to an archival copy of our code and dataset that we have published in the Harvard Dataverse.