added material for TWA 2017
This commit is contained in:
parent
3e6d27447e
commit
4fc082a8ac
7
cscw_changelogs/2017-the_wikipedia_adventure/README.txt
Normal file
7
cscw_changelogs/2017-the_wikipedia_adventure/README.txt
Normal file
@ -0,0 +1,7 @@
|
||||
Material for paper:
|
||||
|
||||
Narayan, Sneha, Jake Orlowitz, Jonathan Morgan, Benjamin Mako Hill, and Aaron
|
||||
Shaw. 2017. “The Wikipedia Adventure: Field Evaluation of an Interactive
|
||||
Tutorial for New Users.” In Proceedings of the 20th ACM Conference on
|
||||
Computer-Supported Cooperative Work & Social Computing (CSCW ’17). New York,
|
||||
New York: ACM. https://doi.org/10.1145/2998181.2998307
|
1088
cscw_changelogs/2017-the_wikipedia_adventure/refs-processed.bib
Normal file
1088
cscw_changelogs/2017-the_wikipedia_adventure/refs-processed.bib
Normal file
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,718 @@
|
||||
From: <papers2017@cscw.acm.org>
|
||||
Date: Tue, Jul 12, 2016 at 11:15 PM
|
||||
Subject: CSCW 2017 notification - #516
|
||||
To: snehanarayan@gmail.com
|
||||
Cc: papers2017@cscw.acm.org
|
||||
|
||||
|
||||
Dear Sneha Narayan -
|
||||
|
||||
Congratulations!
|
||||
|
||||
Your paper:
|
||||
|
||||
516 - The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial
|
||||
for New Users
|
||||
|
||||
is one of the 52% of CSCW 2017 submissions invited to revise and resubmit.
|
||||
There were 530 total submissions to CSCW 2017, a similar number to last
|
||||
year. The reviewers for this submission believe that it has the potential
|
||||
to be revised within four weeks -- revisions are due August 9, 2016 -- to
|
||||
become a contribution to what will be an exceptional conference.
|
||||
|
||||
The program committee expects all authors to take advantage of this four
|
||||
week revision period to improve their submissions by addressing reviewers'
|
||||
comments (below). Some submissions need only minor revisions, while others
|
||||
will require considerable work over the next four weeks to result in an
|
||||
acceptable submission, and will not succeed without significant effort.
|
||||
Your reviews, especially the summary report from the Coordinator, should
|
||||
make clear what you should do. You can gauge your prospects from your
|
||||
reviews and the summary report: overall scores of 4s and 5s indicate the
|
||||
reviewers are very confident your paper will be acceptable within four
|
||||
weeks with small edits. Overall scores of 3 and 4 indicate you have some
|
||||
work to do. Scores of 3 and below indicate that some reviewers have serious
|
||||
reservations, though other reviewers see promise.
|
||||
|
||||
The same reviewers will read and evaluate your revised submission (though
|
||||
additional reviewers may be added for papers where the reviewers are
|
||||
divided). You need not satisfy every reviewer or make every suggested
|
||||
change, but your revision will need to convince most of the reviewers that
|
||||
it is now ready for publication. For some papers the reviewers have
|
||||
requested a lot of work, you might feel that it is too much to achieve in a
|
||||
four week period. If you have the time to reach that goal: great! If not,
|
||||
that is okay, you are free to withdraw your submission. Please decide
|
||||
whether or not the key points made by reviewers can be adequately addressed
|
||||
in the time provided, given other demands on your time. If you choose to
|
||||
withdraw your paper, please notify us explicitly at papers2017@cscw.acm.org.
|
||||
Papers that are revised and re-submitted in the next round will receive
|
||||
revised reviews.
|
||||
|
||||
Your revision must be accompanied by a separate "Summary of Changes"
|
||||
document (in PDF format) that lists the reviewers' comments and your
|
||||
responses, even for comments that did not lead to changes in the manuscript
|
||||
(in which case you might explain why you chose not to make certain
|
||||
suggested changes). This could be a set of bullet points, a table, or
|
||||
numbered points by which reviewers' comments are summarized along with your
|
||||
changes. This is not a rebuttal, but rather a description of changes made,
|
||||
or of reasons you could not or chose not to take the reviewers' advice. To
|
||||
become acceptable, your submission must be revised, and your document
|
||||
describing the changes will greatly help reviewers see what you have or
|
||||
have not changed, along with your reasons for doing so.
|
||||
|
||||
Just to be clear, you must submit a revised paper and summary of changes by
|
||||
the deadline. Any paper where a revision and summary are not submitted
|
||||
will be considered to be withdrawn.
|
||||
|
||||
Example summaries from past years' papers can be found at
|
||||
http://bit.ly/16U8BGM.
|
||||
|
||||
Please submit your revision and the response document at your "Submissions
|
||||
in Progress" page at https://precisionconference.com/~cscw17a/ by 11:59 PM
|
||||
PDT, August 9, 2016.
|
||||
|
||||
CSCW 2017 will be a great conference, and we sincerely hope you are part of
|
||||
it! If you have any issues or questions, please let us know. And thanks
|
||||
again for submitting.
|
||||
|
||||
Sincerely,
|
||||
Louise Barkhuus, Marcos Borges, Wendy A. Kellogg
|
||||
CSCW 2017 Co-chairs
|
||||
|
||||
|
||||
|
||||
------------------------ Submission 516, Review 4 ------------------------
|
||||
|
||||
Title: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial
|
||||
for New Users
|
||||
|
||||
Reviewer: AC
|
||||
|
||||
Expertise
|
||||
|
||||
2 (Passing Knowledge)
|
||||
|
||||
First Round Overall Recommendation
|
||||
|
||||
3 (Maybe acceptable (with significant modifications))
|
||||
|
||||
Contribution and Criteria for Evaluation
|
||||
|
||||
The paper presents the design and evaluation of a gamified tool for
|
||||
socializing and retaining new Wikipedia editors. Contribution criteria
|
||||
include (1) a description and rationale for the system; (2) system
|
||||
novelty and rationale for how it leads to learning; and (3) a
|
||||
methodologically sound evaluation.
|
||||
|
||||
First Round Review (if needed)
|
||||
|
||||
|
||||
Coordinator's First-Round Report to Authors
|
||||
|
||||
The paper presents the design and evaluation of a gamified tool for
|
||||
socializing and retaining new Wikipedia editors. The study found that
|
||||
users liked—but did not learn from—the system.
|
||||
|
||||
The focus on improving the experience of newcomers in Wikipedia is
|
||||
relevant and important. Reviewers describe the study as well motivated
|
||||
and exceptionally well-written. Read R3’s comments on the writing
|
||||
quality and congratulate yourself!
|
||||
|
||||
The reviewers, however, have many concerns about the paper—each
|
||||
focusing on a different aspect of the work. The concerns the reviewers
|
||||
note /may/ be addressable during the revise and resubmit period, but it
|
||||
will be an exceptionally herculean effort. Also, please keep in mind that
|
||||
there is no guarantee of acceptance even after making changes. So, it is
|
||||
at the authors’ discretion about whether or not to proceed with
|
||||
revisions or withdraw the paper.
|
||||
|
||||
There is split amongst the reviewers as to whether the failure of the
|
||||
tool is interesting or not. R1 raises concerns that the failure of the
|
||||
tool could be predicted from existing literature, suggesting little
|
||||
rationale for doing the work in the first place. R2 asks whether there
|
||||
is something fundamentally different about people who continue to
|
||||
contribute to Wikipedia, and as such whether the system holds value in
|
||||
practice. R3, on the other hand, sees much value in the systems
|
||||
contribution of the work as well as the real-world evaluation. R3's
|
||||
review has some suggestions of alternative framings that may make the
|
||||
contribution more valuable.
|
||||
|
||||
In treatment of related work, many improvements are needed. R1 notes that
|
||||
the discussion of the well-known concept of legitimacy/authenticity in
|
||||
learning environments is missing. R2 also points to missing literature
|
||||
about Wikipedian experience.
|
||||
|
||||
R2 and R3 raise a number of methodological questions about the paper. R2
|
||||
suggests the distribution of participants across the timeline may bias
|
||||
the results. R3, on the other hand, sees opportunity here, suggesting
|
||||
additional statistical analysis related to longevity and power users.
|
||||
Both R2 and R3 question the methodological choice and contribution of
|
||||
measuring perceptions of learning rather than actual learning. Overall,
|
||||
this points to a need for at the very least justifying the methodological
|
||||
choices and at the most carrying out additional statistical analyses.
|
||||
|
||||
In summary, there is quite a bit of work to be done. I wish the authors
|
||||
the best of luck, should they choose to continue in the review process.
|
||||
|
||||
|
||||
Requested Revisions
|
||||
|
||||
REQUIRED:
|
||||
- Provide justification for why the study was worth carrying out, in
|
||||
response to R1 and R2’s concerns. R3’s review may have some insight
|
||||
into alternative framings.
|
||||
- State the research questions more explicitly, as per R2’s
|
||||
recommendation
|
||||
- Address R1 and R2’s concerns about missing literature
|
||||
- Ensure that the narrative around Wikipedia is clear to readers who
|
||||
do
|
||||
not have an in-depth background in production/editing details.
|
||||
- Improve the clarity of the results by using percentages or another
|
||||
baseline that allows comparison between numbers, as per R2’s review.
|
||||
- Provide justification for measuring perceptions of learning versus
|
||||
actual learning.
|
||||
- Provide a robust discussion of why the results are meaningful for
|
||||
researchers and/or practitioners.
|
||||
|
||||
OPTIONAL, RECOMMENDED
|
||||
- Consider carrying out additional statistical analyses as
|
||||
recommended by
|
||||
R3.
|
||||
- Provide a short justification for use of English language
|
||||
Wikipedia, as
|
||||
per R2’s review.
|
||||
|
||||
Formatting and Reference Issues
|
||||
|
||||
|
||||
|
||||
------------------------ Submission 516, Review 1 ------------------------
|
||||
|
||||
Title: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial
|
||||
for New Users
|
||||
|
||||
|
||||
Expertise
|
||||
|
||||
4 (Expert)
|
||||
|
||||
First Round Overall Recommendation
|
||||
|
||||
3 (Maybe acceptable (with significant modifications))
|
||||
|
||||
Contribution and Criteria for Evaluation
|
||||
|
||||
In this paper, the authors present the design and two-pronged evaluation
|
||||
of a tutorial for new wikipedia editors that uses elements of
|
||||
gamification like missions and badges to help coach new editors and help
|
||||
them learn best practices and social norms of Wikipedia. The outcome is
|
||||
that users like the system but, based on behavioral measures, they don't
|
||||
actually learn from it. Learning interventions are a classic kind of
|
||||
research problem, and the paper should include robust measures of
|
||||
learning, as well as a good description of the designed intervention
|
||||
itself, why the design is expected to lead to learning, and a clear
|
||||
description of the study.
|
||||
|
||||
Assessment of the Paper
|
||||
|
||||
This is a reasonably well motivated study with connections to appropriate
|
||||
literature and the writing is engaging and understandable. The problem of
|
||||
enculturating newcomers into projects like Wikipedia is well documented
|
||||
and this paper investigates a potential intervention with an admirably
|
||||
well-planned study. Designing learning interventions is really difficult
|
||||
and I commend the authors on a well-executed effort.
|
||||
|
||||
Still, I am ambivalent about the paper because I would have predicted
|
||||
these outcomes based on the literature alone. In the discussion, the
|
||||
authors note that one mismatch between Wikipedia and the tutorial as
|
||||
designed involve the “gradual peripheral participation” of newcomers
|
||||
as they take on the identity of “Wikipedian.” They suggest that maybe
|
||||
speeding up this process is unnatural. I would argue that the most
|
||||
important concept from the literature on learning is missing from this
|
||||
discussion, and that’s “legitimacy” (also sometimes referred to in
|
||||
education and learning literature as “authenticity”.) The authors
|
||||
explain that by doing tasks in a pretend version of Wikipedia, they make
|
||||
it a safe space for newcomers to practice, yet performing “canned”
|
||||
tasks in a pretend system is the opposite of offering a legitimate form
|
||||
of participation. I immediately wonder, why not use what we know from the
|
||||
literature to create low-risk missions that newcomers can complete while
|
||||
legitimately contributing to the encyclopedia? Risk taking is a
|
||||
fundamental characteristic of games that makes them engaging; it
|
||||
certainly seems like it would play a role in people’s motivation in a
|
||||
scenario like this. Rather than eliminating risk, the literature on
|
||||
legitimate peripheral participation would suggest that finding the right
|
||||
degree of risk is required to facilitate progressive entree into a set of
|
||||
shared practices.
|
||||
|
||||
I am disappointed by the missed opportunity here, the outcome mainly
|
||||
seems to verify that what we know shouldn’t work based on the
|
||||
literature in fact doesn’t work. Yet still the paper isn’t bad and
|
||||
the study is carefully crafted and reported.
|
||||
|
||||
With some extension and reflection, I think the discussion could help
|
||||
point future research in a more fruitful direction. There are millions of
|
||||
pages written on the challenges of designing learning interventions that
|
||||
change people’s behavior, this paper ends on a painfully obvious note.
|
||||
It’s true that usability isn’t all it takes, but what can we learn
|
||||
from TWA adventure about the design of systems to facilitate
|
||||
enculturation into a community of practice? What can we take away from
|
||||
this that might inform more successful tutorial systems in the future?
|
||||
|
||||
Formatting and Reference Issues
|
||||
|
||||
|
||||
|
||||
------------------------ Submission 516, Review 2 ------------------------
|
||||
|
||||
Title: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial
|
||||
for New Users
|
||||
|
||||
|
||||
Expertise
|
||||
|
||||
4 (Expert)
|
||||
|
||||
First Round Overall Recommendation
|
||||
|
||||
2 (Probably NOT acceptable)
|
||||
|
||||
Contribution and Criteria for Evaluation
|
||||
|
||||
This paper's contribution is the design and evaluation of a structured
|
||||
introduction to a peer production community (English Wikipedia) called
|
||||
"The Wikipedia Adventure". TWA's design is rooted in theories of
|
||||
gamification, and its utility is evaluated through a user survey and an
|
||||
invitation-based field experiment. The paper reports on the survey
|
||||
respondents' satisfaction with TWA, and how their experiment results
|
||||
reveal some of the challenges of affecting lasting changes to contributor
|
||||
patterns in peer production communities. These findings are then
|
||||
discussed in relation to cultural factors in Wikipedia, issuses of
|
||||
self-selection and voluntary participation, and the limitations of
|
||||
gamification.
|
||||
|
||||
When evaluating a paper that describes the design of a system, the two
|
||||
main criteria are that the system and/or its development setting is/are
|
||||
novel, and that the way the system is evaluated is methodologically
|
||||
sound.
|
||||
|
||||
Assessment of the Paper
|
||||
|
||||
As mentioned in the contribution section, this paper's contribution is
|
||||
the design and evaluation of a structured introduction to a peer
|
||||
production community based on gamification, called "The Wikipedia
|
||||
Adventure". This is a great idea and sounds like a useful addition to
|
||||
Wikipedia. The paper is written in a way that makes it easy to read, and
|
||||
provides the reader with a good introduction to how TWA's design is
|
||||
rooted in theories of gamification, thus applying these principles in
|
||||
what appears to be a novel setting. The paper also does a good job of
|
||||
discussing the findings, organizing them in a way that is easy to follow
|
||||
and touching on important points (e.g. cultural factors, and the
|
||||
limitations of self-selection and gamification).
|
||||
|
||||
The overall ideas and approach taken in this paper are sound, they are in
|
||||
line with the criteria described previously. Unfortunately, there are
|
||||
two major issues and several minor ones that need to be resolved before
|
||||
this paper is ready for publication. The first major issue is that the
|
||||
methodology used to evaluate performance in the invitation-based
|
||||
experiment measures contribution in a skewed manner and does not
|
||||
establish why that is appropriate. Secondly, the paper fails to consider
|
||||
arguments put forth by Panciera et al's "Wikipedians are Born, Not Made"
|
||||
paper. This review will expand on both of these major issues below.
|
||||
Further below will be notes and comments with suggestions for improvement
|
||||
for specific sections of the paper, some of which are rather substantial
|
||||
as well.
|
||||
|
||||
1: Evaluating TWA effectiveness by number of contributions
|
||||
----------------------------------------------------------
|
||||
|
||||
A major part of the paper is the evaluation of TWA's effect on subsequent
|
||||
contributions. To evaluate this an invitation-based field experiment is
|
||||
used, and the paper does a great job of justifying why that is
|
||||
appropriate in this setting. The experiment runs from February 2014 and
|
||||
three months forward. Exact dates are not given, so let us assume that
|
||||
it ran until the end of April 2014. User contributions are then measured
|
||||
until the end of May 2014.
|
||||
|
||||
There are two problems with this approach that the paper fails to address
|
||||
properly. One is the issue of right-truncation found in the data.
|
||||
Contributors who joined in early February 2014 would have about four
|
||||
months to make edits, whereas those who joined in late April would only
|
||||
have about a month. The model does contain a control variable for number
|
||||
of days in the experiment, but why is that appropriate in this context?
|
||||
If we examine other work in the same domain, they tend to either use a
|
||||
much longer time period (e.g. the Teahouse paper, citation 23, which uses
|
||||
6-9 months) or ensure that the time period is fixed (e.g. Kittur et al.
|
||||
"Herding the Cats: The Influence of Groups in Coordinating Peer
|
||||
Production", WikiSym 2009; or Zhu et al. "Effectiveness of Shared
|
||||
Leadership in Online Communities", CSCW 2012).
|
||||
|
||||
Related to the right-truncation problem is the fact that the paper also
|
||||
fails to discuss and justify what a reasonable timespan for measuring the
|
||||
effect of TWA is, and that it will have an effect on the number of
|
||||
contributions made. It might for instance be that TWA instead has an
|
||||
effect on how long it takes before a user drops out of the system. If we
|
||||
assume that TWA has an effect on contributions, what timespan is needed
|
||||
to measure that effect? The paper assumes that a month is adequate to
|
||||
discover it, whereas one might suspect that it is only measurable over a
|
||||
longer period of time. If it is the case that a short period of time is
|
||||
appropriate (for instance because these users are likely to drop out
|
||||
after a certain amount of time) the paper needs to properly establish
|
||||
that, either by measuring it or referring to previous work.
|
||||
|
||||
2: Wikipedians Are Born, Not Made
|
||||
---------------------------------
|
||||
|
||||
In their GROUP 2009 paper "Wikipedians Are Born, Not Made: A Study of
|
||||
Power Editors on Wikipedia", Panciera et al. show data that argues that
|
||||
those contributors who are going to stick around behave in a way that is
|
||||
different from the very beginning. In followup work published in 2010
|
||||
they find similar differences in another peer production community.
|
||||
(Panciera et al. "Lurking? cyclopaths?: a quantitative lifecycle analysis
|
||||
of user behavior in a geowiki." CHI 2010)
|
||||
|
||||
These two papers and the argument they put forth are relevant because
|
||||
they question who TWA is designed for. In the related work a reference to
|
||||
Bryant et al's "Becoming Wikipedian" is made, thereby suggesting that TWA
|
||||
is designed to teach someone how to be a Wikipedian. As Panciera et al's
|
||||
paper argues along the lines of these contributors already being
|
||||
Wikipedians, should TWA be designed to instead help these contributors
|
||||
stay productive?
|
||||
|
||||
If Wikipedians are born, not made, then one could also question whether
|
||||
these contributors are at all going to use TWA. Maybe they ignore TWA
|
||||
because they are already productive and do not need it? Since the paper
|
||||
never makes any references to these papers and discusses issues related
|
||||
to this (e.g. "is the Teahouse more effective since it allows them to get
|
||||
answers when they need help?"), this whole topic area is left hanging.
|
||||
|
||||
---
|
||||
Below follows comments/notes for each section of the paper.
|
||||
|
||||
Introduction:
|
||||
* An overall issue here is that there are few citations to sources. For
|
||||
instance a claim is made that "newly created accounts are the primary
|
||||
source of spam and vandalism on Wikipedia". Consider a "[citation
|
||||
needed]" added after that.
|
||||
* When citing multiple papers it is preferable that they are in order,
|
||||
e.g "[14, 23, 17]" should be "[14, 17, 23]" (page 1). This minor issue
|
||||
also occurs elsewhere in the paper.
|
||||
* "Unlike prior systems, TWA creates a structured experience that guides
|
||||
newcomers through critical pieces of Wikipedia knowledge..." Do we know
|
||||
that there are no other prior systems that offer a similar experience? It
|
||||
might be that there are none within the Wikipedia domain, but what about
|
||||
outside it? That sentence is making a rather bold claim.
|
||||
* After reading the introduction, what is the reader expected to remember
|
||||
as the main findings in this paper? At the end of the introduction the
|
||||
following sentence is found: "The study underscores the importance of
|
||||
conducting multiple types of evaluations of social systems." Is that the
|
||||
main contribution? What about the implications for gamified structured
|
||||
introductions to peer production?
|
||||
|
||||
Background:
|
||||
* "...women reported that they found that contributing to Wikipedia
|
||||
involved a high level of conflict and that they lacked confidence in
|
||||
their expertise [8]. This suggests that more effective onboarding tools
|
||||
could help incorporate newcomers." This is an important side of
|
||||
Wikipedia, but how does TWA's design help mitigate this issue? Are there
|
||||
design elements in TWA that aims to boost confidence in one's expertise?
|
||||
* At the end of the introduction we find the following two questions:
|
||||
"Would a gamified tutorial produce a positive, enjoyable educational
|
||||
experience for new Wikipedians? Would playing the tutorial impact
|
||||
newcomer participation patterns?" These are the paper's _research
|
||||
questions_! It would be very helpful to the reader if they were displayed
|
||||
more clearly, e.g. as separate items. They should not be hidden.
|
||||
|
||||
System Design:
|
||||
* "...it does not depend on the availability, helpfulness, or
|
||||
intervention of existing Wikipedia editors..." The underlying argument
|
||||
here is that scalability is preferable to personal interaction when
|
||||
socializing newcomers (in peer production communities). Why is that the
|
||||
better solution? As discussed previously, TWA might be designed for
|
||||
contributors who are not going to stick around, why are those the right
|
||||
audience for it? Is the goal to provide _everyone_ with a scalable
|
||||
impersonale introduction, or is it better to provide _some_ (typically
|
||||
based on self-selection) with a personal introduction (e.g. the
|
||||
Teahouse)?
|
||||
|
||||
Game-like elements (subsection of System Design):
|
||||
* In "Missions" a distinction is made between "basic" and "advanced"
|
||||
editing techniques. It appears to be somewhat arbitrary, why is adding
|
||||
sources advanced editing, but watchlists are not?
|
||||
* Your readers might not now what watchlists are, take care to write for
|
||||
a general audience, not everyone knows a lot about how Wikipedia works
|
||||
behind the scenes.
|
||||
|
||||
Study 1: User Survey:
|
||||
* This paper doesn't discuss any other language editions of Wikipedia
|
||||
besides the English one, and makes the assumption that "Wikipedia" equals
|
||||
the English edition. Adding a mention that Wikipedia exists in multiple
|
||||
languages and explaining why English was chosen as the language where
|
||||
TWA was launched would be very helpful.
|
||||
* The paper aims to measure "educational effectiveness". Why is a survey
|
||||
the appropriate way to measure that? Based on the description of the
|
||||
survey, it seems that it never asks specific questions to test whether
|
||||
TWA's users learned specific things, in other words whether the education
|
||||
was successful. Later when describing the results the phrase "learning to
|
||||
edit Wikipedia" is used, isn't that the _key_ learning goal of TWA? Yet
|
||||
the survey asks Likert-scale questions. In other words, you're measuring
|
||||
whether TWA users are under the impression that they learned something,
|
||||
not whether they actually did.
|
||||
* Figure 4 uses counts. While it shows that none of the questions had
|
||||
responses from all participants, it makes comparisions between questions
|
||||
with different response rates very difficult. Using percentages would
|
||||
allow for direct comparisons, and makes the references to the figure in
|
||||
the text easier to follow along with. The text refers to four questions
|
||||
with a certain percentage of responses, but leaves the math to the
|
||||
reader.
|
||||
* The survey leaves many questions unanswered, some of which the paper
|
||||
might want to address. Were any negative questions asked? Were there any
|
||||
control questions, such as a similar question worded slightly differently
|
||||
to allow for comparison between responses? As it is, this survey comes
|
||||
across as a set of positive statements about TWA that respondents agreed
|
||||
to. Given that respondents self-select and no attempts to contact users
|
||||
who didn't go through TWA appears to have been made, it is likely there
|
||||
is a bias in the responses, and that bias should be discussed.
|
||||
|
||||
Study 2: Field Experiment:
|
||||
* The description of how accounts were selected to be included is rather
|
||||
confusing. First it describes 1,967 accounts that met the same criteria
|
||||
as for the user survey, however 10,000 individuals ("accounts"?) were
|
||||
invited to the beta. Why is one an order of magnitude larger than the
|
||||
other? Then in the second paragraph of "Methods" it describes the
|
||||
selection criteria, that at least one contribution would have to be made
|
||||
after getting invited. This would perhaps be much less confusing if the
|
||||
criteria were first explained, particularly how the experiment and
|
||||
control groups were set up, and then how many accounts were identified.
|
||||
* "This is a larger proportion of users than took up the invitation in
|
||||
Study 1, which may be due to changes in the invitation text." Earlier in
|
||||
the paper study 1 refers to a "beta", whereas this appears to be not. If
|
||||
this is the case, this is an important difference between the two that
|
||||
should be made clear to the reader.
|
||||
* "we measure the overall contributions as the total number of edits made
|
||||
by each account from the time of inclusion in the study until May 31,
|
||||
2014." When exactly is "time of inclusion", is that when they got the
|
||||
invite? What about when they completed one (or all) TWA mission(s)? The
|
||||
concern here is that all contributions are measured, whereas the
|
||||
experiment sets up a pre/post-scenario. Later on the paper refers to
|
||||
"subsequent contributions", indicating that contributions after a certain
|
||||
point in time was measured. This quickly becomes rather confusing,
|
||||
spelling out clearly what points in a user's account history is used
|
||||
(e.g. "we measure contributions at four points in time: when the user
|
||||
registered their account, the time of invitation, when they first started
|
||||
using TWA, and the end of the experiment") would be very helpful.
|
||||
* Why is a six-edit radius chosen when measuring word persistence?
|
||||
Halfaker et al. make no claim about what the radius should be in the
|
||||
referenced work, and Ekstrand et al suggest a 15 edit radius in a related
|
||||
paper (Ekstrand and Riedl "rv you're dumb: identifying discarded work in
|
||||
Wiki article history." WikiSym 2009) The six-edit radius also comes with
|
||||
an issue that is unadressed: how long does it take for an edit made by a
|
||||
contributor in the study to reach that six-edit radius? If it hasn't been
|
||||
reached at the end of the study period, that edit has to be discarded as
|
||||
its quality is unknown. In a related paper, Farzan and Kraut instead
|
||||
chose to use percentage of words that survived as a measure of quality
|
||||
(Farzan and Kraut "Wikipedia classroom experiment: bidirectional benefits
|
||||
of students' engagement in online production communities" CHI 2013)
|
||||
* Tables 1, 2, 3, and 4, as well as figure 6 should be brought closer
|
||||
together so it's easier to follow along. Table 1 occurs before the text
|
||||
that refers to it, and table 4 is two pages further along. Putting all
|
||||
tables and figure 6 on the same page might be a good solution.
|
||||
* Table 3 refers to users "reached" a mission. It is confusing how 181
|
||||
users reached the final mission but did not complete it, yet in the text
|
||||
it seems these 181 users actually did.
|
||||
* The post-hoc power analysis is very useful!
|
||||
|
||||
Discussion:
|
||||
* "The new editors in our study may have had unpleasant experiences
|
||||
during their initial time on Wikipedia..." It appears that the survey
|
||||
asked no questions about this, yet is it not a very important issue
|
||||
related to TWA's success?
|
||||
* In "Limitations of gamification" the following sentence is found:
|
||||
"...our study is among the first that compares levels of participation in
|
||||
a task among individuals who were introduced to gamified learning first
|
||||
to those that were not." This is an _important_ finding, it shouldn't be
|
||||
hidden back here but instead be up front in the introduction!
|
||||
|
||||
Formatting and Reference Issues
|
||||
|
||||
|
||||
|
||||
------------------------ Submission 516, Review 3 ------------------------
|
||||
|
||||
Title: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial
|
||||
for New Users
|
||||
|
||||
Reviewer: AC-Reviewer
|
||||
|
||||
Expertise
|
||||
|
||||
4 (Expert)
|
||||
|
||||
First Round Overall Recommendation
|
||||
|
||||
3 (Maybe acceptable (with significant modifications))
|
||||
|
||||
Contribution and Criteria for Evaluation
|
||||
|
||||
This paper presents the results of a deployment of a gameification-based
|
||||
system designed to retain new editors in Wikipedia. It is a negative
|
||||
results paper: the authors claim that they have conclusive evidence that
|
||||
the system did not work (although I have suggested a few additional lines
|
||||
of inquiry below that might problematize this assertion).
|
||||
|
||||
The committee will have to have a discussion about how to evaluate this
|
||||
paper, and likely negative results papers more generally.
|
||||
|
||||
Assessment of the Paper
|
||||
|
||||
This paper presents the results of a deployment of a gameification-based
|
||||
system designed to retain new editors in Wikipedia. It is a negative
|
||||
results paper: the authors claim that they have conclusive evidence that
|
||||
the system did not work (although I have suggested a few additional lines
|
||||
of inquiry below that might problematize this assertion).
|
||||
|
||||
The paper is very well-written and has some large positives. It also is a
|
||||
negative results paper, and the committee will have to decide how to
|
||||
handle this. In general, I’m strongly sympathetic to arguments to
|
||||
include more negative results papers in our proceedings, but I’m quite
|
||||
unclear on the details of how to do so (e.g. what defines a top-quality
|
||||
negative results paper?). I’m hopeful that this paper can instigate a
|
||||
broader discussion on this topic at the PC meeting.
|
||||
|
||||
All of that said, this paper also has a number of idiosyncratic
|
||||
limitations that make it perhaps not the best trial balloon for negative
|
||||
results papers. Below, I outline what I believe to be the paper’s
|
||||
positives and then describe these limitations in more detail, phrased as
|
||||
both critiques and questions.
|
||||
|
||||
Overall, my recommendation is to invite the authors to revise and
|
||||
resubmit. If this occurs, I’ll want to see the below critiques
|
||||
addressed and the below questions answered (both through direct answers
|
||||
in the response to reviewers and through clarifications and changes to
|
||||
the paper). I’m hopeful through, through the R&R process, this paper
|
||||
can become an ideal negative results trial balloon.
|
||||
|
||||
|
||||
Important positives:
|
||||
|
||||
* The authors built a system to solve a real-life problemand did a
|
||||
real-life, relatively large-scale deployment. Awesome!
|
||||
* The paper is easily in the top 95% in terms of writing quality. This is
|
||||
true both at the sentence level and at the narrative level. As a person
|
||||
who has to review lots of papers, this was a breath of fresh air.
|
||||
* The design of the game is quite well-thought-out, save a few relatively
|
||||
arbitrary decisions. I was particularly compelled by the use of
|
||||
gameification techniques that are also present in “real Wikipedia”
|
||||
(e.g. barnstar-like rewards).
|
||||
|
||||
Critiques:
|
||||
|
||||
CRITIQUE #1 – Excessive import placed on trivial self-report data: It
|
||||
is well-known that self-report data from participants is inferior to
|
||||
observations of actual behavior, and that self-report data can be quite
|
||||
unreliable more generally. As such, in my view, it is not a contribution
|
||||
to show that self-report data didn’t end up panning out in the
|
||||
behavioral results.
|
||||
|
||||
In the next draft of this paper, I would like to see the authors address
|
||||
this issue. This might mean framing this paper as a full-on negative
|
||||
results paper, but lighter weight adaptations might be possible.
|
||||
|
||||
|
||||
Open questions:
|
||||
|
||||
QUESTION #1: As noted above, this paper is a negative results paper at
|
||||
its core, and we’ll have to have a broad discussion about this at the
|
||||
PC meeting, assuming the paper makes it this far. In the event that this
|
||||
occurs, can the authors provide a more robust argument as to why these
|
||||
negative results are important for other researchers and practitioners?
|
||||
|
||||
The paper attempts to argue that one contribution that comes out of its
|
||||
negative results is to distrust self-report data, but this is well-known
|
||||
(see below). The other negative results argument in the paper is that
|
||||
these results add to growing evidence of long-term gameificiation
|
||||
failures. I find this argument much more compelling. In other words, by
|
||||
expanding on this argument, the authors may be able to address this
|
||||
question.
|
||||
|
||||
That said, regardless of how this question is addressed in the second
|
||||
draft, I’d like to see it done both through changes to the paper and
|
||||
through discussion in the response to reviewers.
|
||||
|
||||
QUESTION #2 – Is there a possibility that the statistical framework
|
||||
employed is not appropriate for this particular study?
|
||||
|
||||
The authors utilize a two-level statistical approach that I haven’t
|
||||
seen before in the CSCW/CHI literature. I enjoyed thinking about this
|
||||
approach, and the authors did a relatively good job explaining it. That
|
||||
said, I’m currently not convinced that it was the appropriate framework
|
||||
for this study. Here’s my reasoning:
|
||||
|
||||
(1) The goal here is to introduce a treatment that ultimately will
|
||||
produce strong new members of the Wikipedia community at a higher rate
|
||||
than the control.
|
||||
(2) Let’s say the game produces 3 such members out of 100 new editors
|
||||
and the control produces 1, which looks like it might be the case.
|
||||
Let’s also say that this pattern additionally persists over a large n.
|
||||
(3) If this is true, why do we care about the potentially moderating
|
||||
effect of the invitations?
|
||||
|
||||
The authors argue that new editors that responded to the invitation to
|
||||
play the game might just be new editors who are engaged and, critically,
|
||||
would have been power editors whether or not the game existed. However,
|
||||
barring a random fluke, shouldn’t these future power editors also have
|
||||
been in the control group? If I’m right here, I’m thinking the
|
||||
invitation doesn’t matter and a more traditional statistical analysis
|
||||
(or at least one targeted at identifying rare events) is appropriate.
|
||||
|
||||
I could be wrong, but I want the authors to respond to this question,
|
||||
both through feedback to reviewers and clarifications in the paper.
|
||||
|
||||
As an important side note, if we agree that this framework is the right
|
||||
way to go in the end, the authors should puff their chests more about
|
||||
this by claiming it as a contribution (assuming it hasn’t been used at
|
||||
CSCW before).
|
||||
|
||||
Question #3 – Are the outcome variables considered here the best
|
||||
outcome variables? Are some critical variables missing?
|
||||
|
||||
The authors seem focused on the average effects across the entire control
|
||||
and treatment groups (the two treatment groups, to be specific). However,
|
||||
would it not also be reasonable to consider the metric I describe above:
|
||||
the % of new editors that go on to be power editors? Since power editors
|
||||
end up contributing most of the edits anyway *over the long term*, to me
|
||||
this seems like the way to go (i.e. if this group of editors were
|
||||
followed for years, statistically significant differences would begin to
|
||||
emerge). If the authors agree, the authors need to reanalyze their data
|
||||
with this metric in mind.
|
||||
|
||||
Another related outcome variable that might be useful to analyze is how
|
||||
long the new editors in each group remained active editors in the
|
||||
community (i.e. survival analysis). Because the data is quite old, this
|
||||
should be an easy new analysis to run, and longevity has been a variable
|
||||
interest in a number of peer production studies.
|
||||
|
||||
In their second draft and the feedback to reviewers, I would like to see
|
||||
the authors discuss either new analyses related to power users or why thy
|
||||
did not consider this outcome variable. I would also like to see the same
|
||||
for survival analysis.
|
||||
|
||||
QUESTION #4: Is there a path towards positive results?
|
||||
|
||||
As noted above, I believe some discussion around this paper and negative
|
||||
results papers more generally will have to happen at the PC meeting.
|
||||
However, I think there are so missed opportunities here for positive
|
||||
results and that the authors were too quick to settle for negative
|
||||
results. This is likely an important factor to consider when deciding
|
||||
whether to accept a negative results paper.
|
||||
|
||||
Most notably, there are several, well-motivated unexplored avenues that
|
||||
could lead to positive results that would have a much larger impact than
|
||||
the negative results presented here:
|
||||
|
||||
* As noted above, examining additional outcome variables is important,
|
||||
most notably # of power editors and longevity.
|
||||
* Does the game work if folks are forced to play it prior to editing
|
||||
Wikipedia, as would be the case in most other institutionalized
|
||||
socialization contexts? This is not just a hypothetical: this game could
|
||||
be used in all Wikipedia Education Project classes and related endeavors.
|
||||
|
||||
Formatting and Reference Issues
|
File diff suppressed because it is too large
Load Diff
Binary file not shown.
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user