18
0

added wide walls paper (CHI 2020)

This commit is contained in:
Benjamin Mako Hill 2020-11-24 10:33:58 -08:00
parent 636ce10a4d
commit 211ba396c6
4 changed files with 304 additions and 0 deletions

View File

@ -0,0 +1,33 @@
We thank the reviewers for their careful attention and feedback. Below, we describe adjustments that we believe address the reviewers central concerns. Although relatively minor, we feel that these changes will improve the manuscript enormously. We have room in the paper to make the changes described.
1
R2, R3, & R4 criticized the way we present our findings in terms of learning. We agree that we overstated our findings in this regard and will make a series of minor changes to address this issue:
- In our background, we will clarify that “wide walls” support increased engagement which can lead to learning, but that our study only directly measures engagement.
- We will explain that our results provide no direct evidence of learning but that we believe our findings for H2 provide “evidence in support of the theory that wider walls can also support learning.”
- We will articulate the reasoning behind the latter point: a) previous quantitative studies of Scratch measure learning as the presence of certain blocks; b) Moreno-Leon et al. (CHI 2017) validated these approaches by comparing them to expert assessments; c) because all users in our sample had access to variables before the treatment, an increase in non-SCV variable use is difficult to explain except through increased familiarity with data structures in general; d) Dasgupta et al. (CSCW 2016) and others described an increase in block use associated with exposure as evidence in support of learning.
- We will remove the phrase “strong evidence of learning” [R2]. We intended to convey the large effect sizes for our 2SLS models.
- We will remove the word “learning” from our title [R2].
2
R4 & R1 suggest that our discussion of our methodology was too dense and obtuse. We will address this in several ways:
- We will revise our analytic strategy section for clarity. We will have colleagues without econometric training read our revision to ensure that it is accessible and understandable.
- We will add a citation, with short description, to a methodologically similar econometric study from education research.
- We will standardize on terminology (e.g., “quasi-experiment” over “natural”; we used them interchangeably).
- Per R1, we will edit our threats section to clearly explain that our method produces a local treatment effect on affected users—i.e., a subset of Scratch users who differ systematically from all Scratchers in observable and likely unobservable ways (e.g., they are more experienced [R4, R1]).
3
R1, R2 & R4 were concerned that our findings might be driven by novelty. We will add a new paragraph to our threats section to describe this limitation. We will mention that SCV introduces minimal structural novelty (no new blocks, just one new checkbox) but that the functional novelty introduced by SCV is significant and a possible alternative explanation for our findings, especially H1. To some degree, this limitation extends to any causal inference technique (lab and quasi-experimental) that relies on measuring relatively short-term effects—a old criticism of experimental evaluation and user testing in HCI. We will explain that our analysis for H2—where no structure or functionality captured in the dependent variable is new—seems to suggest that that our findings are not only a function of novelty.
4
To more precisely express what we mean by wide walls [R3], we will quote text describing the design rationale from the SCV systems paper. Specifically, we will describe how SCV sought to support a broader range of projects that connected to existing Scratch community practices and needs (per Resnicks definition). We believe this will explain how SCV was designed not just as a new feature in a toolkit. In the discussion, we will expand the section on the tension inherent in widening walls in terms of learnability. We will cite Resnick and Silverman who frame this as a tension between wide walls and low floors.
5
R3 raised concerns about generalizability. We will edit our methods and threats sections to explain that 2SLS achieves strong internal validity (i.e. unbiased estimation of a local causal effect) at the potential expense of external validity. We will explain that this is an important trade-off in quasi-experimental field studies which are best understood as complementary to lab and qualitative studies. We will remind readers that no single study proves a theory, that questions of generalizability are common to every study, and that one becomes more confident about the validity of a theory from multiple studies in different settings. We will edit our manuscript to carefully convey that this is /a/ test and reflects only a first piece of contingent evidence in support of the widely-cited theory.
6
We will address R3s comment on constructivism and transmission of knowledge.
7
We will fix the stylistic errors [R2], remove unnecessary quotes [R1], and fix the minor issues raised by R1. We will have our work professionally proof-read.

View File

@ -0,0 +1,9 @@
rebuttal information for:
Dasgupta, Sayamindu, and Benjamin Mako Hill. 2018. “How Wide Walls Can
Increase Engagement: Evidence From a Natural Experiment in Scratch.” In
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
(CHI 18), 361:1361:11. New York, New York: ACM.
https://doi.org/10.1145/3173574.3173935.

View File

@ -0,0 +1,221 @@
Bike Rack
- Per R1, we will mention that our study only focuses on the effect on users present before/after the intervention and not on the effect that widening walls can have in eliciting participation from previous uninterested users.
Meta:
The average score is just below 3. Two big concerns
1AC is leaning towards reject (says that in score and in review). 2AC also gave a 2.5 score.
They are talking about “novelty effect”. Is that a thing?
Claims on learning
1AC:
There are a few places that need improvement, with the central issue on
the claims of learning. Both R2 and R3 raise concern that what the paper
reports cannot be regarded as a measure of learning. R2 suggests the word
“learning” be replaced with a more suitable term, such as engagement,
exploration, or simply (and accurately) data structure use. R3 finds
claims on learning to be too strong for what is supported by data and
results.
R2:
Some issue must be taken with the position of this works relationship
with learning, as no measure of learning was performed.
“Finally, we recognize that all quantitative measures of learning are
limited. For one, measures such as ours can only observe outcomes of
learning. In this specific analysis, what we detect is only the presence
of data structures in projects. We do not know if those data structures
get used in a meaningful way in the projects that we observe. We do not
even know if they get used. Furthermore, we do not know the purpose of
their use. “
This is such a significant limitation that it should not be considered a
measure of learning. If learning is not being measured in any way, the
variable should be called “data structure use”. I suggest the word
“learning” be removed from the title of this paper, as
“engagement” should adequately describe user engagement with new
concepts/data structures without insinuating a provable measure of
learning.
R3:
At a higher level, however, I'm left a bit uneasy with the authors'
operational definition for learning and engagement here. The specific
hypotheses made are about "frequency of use" which is fine. The
inferential jump between frequency of use an implying greater learning is
unfounded given that there is no evidence that (1) the uses of these
features did anything meaningful, or (2) that a learner understood what
it was doing. Thus claims like "this reflects strong evidence of
learning" are simply too strong in my view (indeed the authors state "all
quantitative measures of learning are limited"). Perhaps restating these
findings using a term other than learning could be a reasonable fix.
Validity concerns
1AC:
Reviewers also have raised issues with the validity of results. R2 and R3
both wonder if the observed differences after the updated Scratcher
policy is applied can be attributed to the wide walls phenomenon, or they
are due to a novelty effect. R1 questions the impact of the
within-subjects setting the natural experiment afforded, with potential
confounding factors of time and experience.
R1:
If you included only users who received Scratcher status due to the
change, then your research design is within subjects. Authors need to
discuss this in threats to validity. I think you have all of the
information to determine whether this is a threat to validity, but its
unclear how sudden access to a new tool affects use of that tool (might
be related to the bandwidth issue).
R2, R3:
<see concerns about wider walls below>
Wider walls and/vs novelty
R2
How is what was seen in this paper different from a novelty effect? The
authors should explicitly compare/contrast what is similar and what would
be different if this were a novelty effect.
I wonder whether each occurrence of a feature added to a system can be
considered evidence of "wider walls", and this concerns me. Is it .
possible to add a creative feature to a system, and not consider it
widening the walls? What makes widening the walls different from
increased functionality and self-expression in a system - or is that what
widening walls is? I'm afraid the definition of this construct, according
to the paper, leaves quite a large range of possibilities for what is
wider walls.
At a higher level, however, I'm left a bit uneasy with the authors'
operational definition for learning and engagement here. The specific
hypotheses made are about "frequency of use" which is fine. The
inferential jump between frequency of use an implying greater learning is
unfounded given that there is no evidence that (1) the uses of these
features did anything meaningful, or (2) that a learner understood what
it was doing. Thus claims like "this reflects strong evidence of
learning" are simply too strong in my view (indeed the authors state "all
quantitative measures of learning are limited"). Perhaps restating these
findings using a term other than learning could be a reasonable fix.
R3:
Additional nuance in the authors' operational definition for "wider
walls" would help communicate the significance of the work. For example,
Bruckman and others tell us that design of constructionist learning
environments must be carefully done to ensure that the tools of thought
available make it likely that the learner will engage with that which is
to be learned (while creating personally meaningful artifacts). Adding
another toy in the curated sandbox for the sake of adding it does not
necessarily expand the scope of "that which is to be learned", though it
*might* increase the scope of the design space of learner artifacts. Does
widening walls mean both of these things or is just one sufficient? In
fact, I can also easily imagine a situation where adding a feature to the
learning environment might negatively impact learning---e.g., increasing
cognitive load. Overall, the paper could be improved with a more
deliberate definition of what is (and also is not) widening walls. This
would also help us understand the extent to which they believe their
results have implications for the design learning environments beyond
Scratch and why.
Generalizability:
1AC:
Reviewers are concerned that what is reported in the paper might be a
one-off observation made on the specific platform and setting. We see one
case study here, but its not clear if the knowledge is generalizable
in determining if adding a feature is going to widen the wall or not. R2
points out that the definition is only vaguely presented in the paper. R3
thinks practical implications are severely limited and also calls for a
clearer definition of widening walls. R1 asks for a deeper discussion of
whether widening the wall might potentially raise the floor.
R3:
Theoretically speaking the authors have made a contribution
with this work. Practically speaking, I remain somewhat unconvinced
about the significance of this work--the authors have not convinced me
that this is actionable in any context in any way other than the
carefully constrained one in which they conducted the study.
[...]
The authors have clearly presented a small but detailed quantitative
examination of adding one feature to Scratch. The study's scope of
contribution is significantly hindered by an overly constrained setting,
coupled with the fact that very few learners actually explored the new
SCV feature. The analysis may be statistically meaningful, but it's not
clear there are many practical take-aways here.
Analytical strategy wat?
1AC:
In terms of writing, reviewers find the analytical strategy section to be
overly obtuse, dense, and difficult to follow. Assuming that most CHI
audience is not familiar with econometrics, the paper needs to do a
better job of describing the methodology in a more accessible manner.
R2:
While the methodology is novel and appears to introduce rigor that would
be informative for the CHI community that encounters intention-to-treat
studies with large quantities of data, the description of the methodology
is very difficult to understand, from a non econometrists point of
view. This makes the results somewhat difficult to follow as well. The
discussion section could be expanded to give more of an overview of the
results, broken down by hypothesis. I do not have the expertise to
further evaluate the methodology and claims made in this paper.
R3:
It does get a bit dense in the mid
section (Analytical Strategy), and it would be an improvement to simplify
the explanation to be a bit more direct.
Misc: (also has a bunch of editing things, which well say will be proof-read)
R1:
Minor issues related to clarity:
Its unclear what the following sentence means. If an inclusion
criterion was receiving Scratcher status as of the change, then
wouldnt all projects in the dataset be from Scratchers after the
change and New Scratcher before the change? “We also included a binary
variable that indicates whether the learner has received Scratcher status
or not (Is Scratcher?), and a binary variable indicating whether the
learner has used SCV variables in a previous shared de novo project (Used
Cloud Data).”
For Is Scratcher? variable, what is coded as 0 and what is coded at 1?
For the statement, “Given a positive causal relationship between use of
SCV and Uses Data Structures?, it is still difficult to disentangle the
degree to which SCV…” make it more obvious that this is a
hypothetical statement.
2SLS is defined a column and a half after it is first used.
Threats to validity typically go in the discussion section.
R2:
The authors should disambiguate their use of constructivist and
constructionist. In learning science circles, constructivism is
often linked with Piaget, and constructionism with Papert. One is
cognitive theory, the other, an educational method.
[...]
Of 33 references there appears to be 1 CHI reference, 2 CSCW, and ~3 IDC.
It may be a good idea to link this work closer to other CHI work.
R3:
The authors also state that
their quasi-experimental, econometrics-inspired analysis is a
methodological contribution to the field of HCI. While I have not seen
this specific type of analysis in prior CHI literature, I'm not sure the
fundamental goals here are new.
[...]
A minor semantic quibble: The first paragraph of the background section
implies that Piaget's general theory of constructivist learning is
distinct from "transmission of knowledge from the teacher to the
student". While I understand the point you're trying to make,
technically speaking, constructivism is at play even when a teacher is
lecturing to a student (it's just not a particularly effective strategy
for engaging a student in actively constructing a robust mental model of
the content being taught).

View File

@ -0,0 +1,41 @@
We made the following substantive changes to our manuscript.
Each of these changes was described in our rebuttal and the points in this summary correspond to the points in our rebuttal text.
1. We made a series of changes to the paper to de-emphasize our claims in regards to learning:
i. In our background, we clarified that “wide walls” support increased engagement which can lead to learning, but that our study only directly measures engagement.
ii. At several points, we explained that our results provide no direct evidence of learning but that we believe our findings for H2 provide evidence in support of the theory that wider walls can also support learning.
iii. In both our background and our measures section, we added text to articulate the reasoning behind the latter point: a) previous quantitative studies of Scratch measure learning as the presence of certain blocks; b) Moreno-Leon et al. (CHI 2017) validated these approaches by comparing them to expert assessments; c) because all users in our sample had access to variables before the treatment, an increase in non-SCV variable use is difficult to explain except through increased familiarity with data structures in general; d) Dasgupta et al. (CSCW 2016) and others described an increase in block use associated with exposure as evidence in support of learning.
iv. We removed the phrase “strong evidence of learning” and similar text in several other places.
v. We removed the word “learning” from our title.
2. We made several changes designed to make our methodology more clear:
i. We revised our analytic strategy section for clarity.
ii. We added a citation, with a short description, to a methodologically similar econometric study from education policy research.
iii. We standardized on the term “natural experiment” instead of “quasi-experiment” which is also reflected in our title.
iv. We edited our threats section to clearly explain that our method produces a local treatment effect on affected users—i.e., a subset of Scratch users who differ systematically from all Scratchers in observable and likely unobservable ways.
3. We added a new paragraph to our threats section to describe the challenge of separating the effect of wide walls from novelty.
4. To more precisely express what we mean by wide walls, we quoted text describing the design rationale from the SCV systems paper and have added a citation to Resnick and Silverman.
5. We have edited our methods and threats sections to explain that 2SLS achieves strong internal validity (i.e. unbiased estimation of a local causal effect) at the potential expense of external validity. We have edited our manuscript to carefully convey that this is /a/ test and reflects only a first piece of contingent evidence in support of the widely-cited theory.
6. We have edited our manuscript to address R3s comment on constructivism and transmission of knowledge.
7. We carefully edited our paper for style and clarity. We had our paper professionally proofread and made a large number of stylistic improvements.
We made several other changes:
1. While carefully reviewing our code a final time, we found a minor bug (a fencepost error) in a function that generated our measure of whether projects subsequent to the policy change had used SCV. Although many of our specific estimates are different, the differences are all very minor and the sign and magnitudes of the estimates are identical. Because we wrote our paper using KnitR (a literate programming language), all the numbers in the paper are created automatically from data and we are confident that the current draft has been updated completely.
2. We have unblinded our paper, added our copyright blurb, and added an "Acknowledgements" section.