added wide walls paper (CHI 2020)

2020-11-24 10:33:58 -08:00 · 2020-11-24 10:33:58 -08:00 · 211ba396c6
commit 211ba396c6
parent 636ce10a4d
4 changed files with 304 additions and 0 deletions
--- a/chi_rebuttals/2018-wide_walls/wide_walls-2018-rebuttal.txt
+++ b/chi_rebuttals/2018-wide_walls/wide_walls-2018-rebuttal.txt
@ -0,0 +1,33 @@
+We thank the reviewers for their careful attention and feedback. Below, we describe adjustments that we believe address the reviewers’ central concerns. Although relatively minor, we feel that these changes will improve the manuscript enormously. We have room in the paper to make the changes described.
+
+1
+R2, R3, & R4 criticized the way we present our findings in terms of learning. We agree that we overstated our findings in this regard and will make a series of minor changes to address this issue:
+
+- In our background, we will clarify that “wide walls” support increased engagement which can lead to learning, but that our study only directly measures engagement.
+- We will explain that our results provide no direct evidence of learning but that we believe our findings for H2 provide “evidence in support of the theory that wider walls can also support learning.”
+- We will articulate the reasoning behind the latter point: a) previous quantitative studies of Scratch measure learning as the presence of certain blocks; b) Moreno-Leon et al. (CHI 2017) validated these approaches by comparing them to expert assessments; c) because all users in our sample had access to variables before the treatment, an increase in non-SCV variable use is difficult to explain except through increased familiarity with data structures in general; d) Dasgupta et al. (CSCW 2016) and others described an increase in block use associated with exposure as evidence in support of learning.
+- We will remove the phrase “strong evidence of learning” [R2]. We intended to convey the large effect sizes for our 2SLS models.
+- We will remove the word “learning” from our title [R2].
+
+2
+R4 & R1 suggest that our discussion of our methodology was too dense and obtuse. We will address this in several ways:
+
+- We will revise our analytic strategy section for clarity. We will have colleagues without econometric training read our revision to ensure that it is accessible and understandable. 
+- We will add a citation, with short description, to a methodologically similar econometric study from education research.
+- We will standardize on terminology (e.g., “quasi-experiment” over “natural”; we used them interchangeably).
+- Per R1, we will edit our threats section to clearly explain that our method produces a local treatment effect on affected users—i.e., a subset of Scratch users who differ systematically from all Scratchers in observable and likely unobservable ways (e.g., they are more experienced [R4, R1]).
+
+3
+R1, R2 & R4 were concerned that our findings might be driven by novelty. We will add a new paragraph to our threats section to describe this limitation. We will mention that SCV introduces minimal structural novelty (no new blocks, just one new checkbox) but that the functional novelty introduced by SCV is significant and a possible alternative explanation for our findings, especially H1. To some degree, this limitation extends to any causal inference technique (lab and quasi-experimental) that relies on measuring relatively short-term effects—a old criticism of experimental evaluation and user testing in HCI. We will explain that our analysis for H2—where no structure or functionality captured in the dependent variable is new—seems to suggest that that our findings are not only a function of novelty.
+ 
+4
+To more precisely express what we mean by wide walls [R3], we will quote text describing the design rationale from the SCV systems paper. Specifically, we will describe how SCV sought to support a broader range of projects that connected to existing Scratch community practices and needs  (per Resnick’s definition). We believe this will explain how SCV was designed not just as a new feature in a toolkit. In the discussion, we will expand the section on the tension inherent in widening walls in terms of learnability. We will cite Resnick and Silverman who frame this as a tension between wide walls and low floors.
+
+5
+R3 raised concerns about generalizability. We will edit our methods and threats sections to explain that 2SLS achieves strong internal validity (i.e. unbiased estimation of a local causal effect) at the potential expense of external validity. We will explain that this is an important trade-off in quasi-experimental field studies which are best understood as complementary to lab and qualitative studies. We will remind readers that no single study proves a theory, that questions of generalizability are common to every study, and that one becomes more confident about the validity of a theory from multiple studies in different settings. We will edit our manuscript to carefully convey that this is /a/ test and reflects only a first piece of contingent evidence in support of the widely-cited theory.
+
+6
+We will address R3’s comment on constructivism and transmission of knowledge.
+
+7
+We will fix the stylistic errors [R2], remove unnecessary quotes [R1], and fix the minor issues raised by R1. We will have our work professionally proof-read.
--- a/chi_rebuttals/2018-wide_walls/wide_walls-2018-reviews.txt
+++ b/chi_rebuttals/2018-wide_walls/wide_walls-2018-reviews.txt
@ -0,0 +1,9 @@
+rebuttal information for:
+
+Dasgupta, Sayamindu, and Benjamin Mako Hill. 2018. “How ‘Wide Walls’ Can
+Increase Engagement: Evidence From a Natural Experiment in Scratch.” In
+Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
+(CHI ’18), 361:1–361:11. New York, New York: ACM.
+https://doi.org/10.1145/3173574.3173935. 
+
+
--- a/chi_rebuttals/2018-wide_walls/wide_walls-2018-reviews_reorganized.txt
+++ b/chi_rebuttals/2018-wide_walls/wide_walls-2018-reviews_reorganized.txt
@ -0,0 +1,221 @@
+
+Bike Rack
+- Per R1, we will mention that our study only focuses on the effect on users present before/after the intervention and not on the effect that widening walls can have in eliciting participation from previous uninterested users.
+
+Meta:
+The average score is just below 3. Two big concerns
+
+1AC is leaning towards reject (says that in score and in review). 2AC also gave a 2.5 score.
+They are talking about “novelty effect”. Is that a thing?
+
+Claims on learning
+1AC:
+  There are a few places that need improvement, with the central issue on
+   the claims of learning. Both R2 and R3 raise concern that what the paper
+   reports cannot be regarded as a measure of learning. R2 suggests the word
+   “learning” be replaced with a more suitable term, such as engagement,
+   exploration, or simply (and accurately) data structure use. R3 finds
+   claims on learning to be too strong for what is supported by data and
+   results.
+
+R2:
+   Some issue must be taken with the position of this work’s relationship
+   with learning, as no measure of learning was performed.
+   “Finally, we recognize that all quantitative measures of learning are
+   limited. For one, measures such as ours can only observe outcomes of
+   learning. In this specific analysis, what we detect is only the presence
+   of data structures in projects. We do not know if those data structures
+   get used in a meaningful way in the projects that we observe. We do not
+   even know if they get used. Furthermore, we do not know the purpose of
+   their use. “
+   This is such a significant limitation that it should not be considered a
+   measure of learning. If learning is not being measured in any way, the
+   variable should be called “data structure use”. I suggest the word
+   “learning” be removed from the title of this paper, as
+   “engagement” should adequately describe user engagement with new
+   concepts/data structures without insinuating a provable measure of
+   learning.
+
+
+R3:
+  At a higher level, however, I'm left a bit uneasy with the authors'
+   operational definition for learning and engagement here.  The specific
+   hypotheses made are about "frequency of use" which is fine.  The
+   inferential jump between frequency of use an implying greater learning is
+   unfounded given that there is no evidence that (1) the uses of these
+   features did anything meaningful, or (2) that a learner understood what
+   it was doing.  Thus claims like "this reflects strong evidence of
+   learning" are simply too strong in my view (indeed the authors state "all
+   quantitative measures of learning are limited").  Perhaps restating these
+   findings using a term other than learning could be a reasonable fix.
+
+
+
+
+
+Validity concerns
+1AC:
+  Reviewers also have raised issues with the validity of results. R2 and R3
+   both wonder if the observed differences after the updated Scratcher
+   policy is applied can be attributed to the wide walls phenomenon, or they
+   are due to a novelty effect. R1 questions the impact of the
+   within-subjects setting the natural experiment afforded, with potential
+   confounding factors of time and experience.
+
+
+R1:
+  If you included only users who received Scratcher status due to the
+   change, then your research design is within subjects. Authors need to
+   discuss this in threats to validity. I think you have all of the
+   information to determine whether this is a threat to validity, but it’s
+   unclear how sudden access to a new tool affects use of that tool (might
+   be related to the bandwidth issue).
+
+R2, R3:
+  <see concerns about wider walls below>
+
+
+Wider walls and/vs novelty
+R2
+  How is what was seen in this paper different from a novelty effect? The
+   authors should explicitly compare/contrast what is similar and what would
+   be different if this were a novelty effect.
+
+
+  I wonder whether each occurrence of a feature added to a system can be
+   considered evidence of "wider walls", and this concerns me. Is it .
+   possible to add a creative feature to a system, and not consider it
+   widening the walls? What makes widening the walls different from
+   increased functionality and self-expression in a system - or is that what
+   widening walls is? I'm afraid the definition of this construct, according
+   to the paper, leaves quite a large range of possibilities for what is
+   wider walls.
+
+
+   At a higher level, however, I'm left a bit uneasy with the authors'
+   operational definition for learning and engagement here.  The specific
+   hypotheses made are about "frequency of use" which is fine.  The
+   inferential jump between frequency of use an implying greater learning is
+   unfounded given that there is no evidence that (1) the uses of these
+   features did anything meaningful, or (2) that a learner understood what
+   it was doing.  Thus claims like "this reflects strong evidence of
+   learning" are simply too strong in my view (indeed the authors state "all
+   quantitative measures of learning are limited").  Perhaps restating these
+   findings using a term other than learning could be a reasonable fix.
+
+R3:
+   Additional nuance in the authors' operational definition for "wider
+   walls" would help communicate the significance of the work.  For example,
+   Bruckman and others tell us that design of constructionist learning
+   environments must be carefully done to ensure that the tools of thought
+   available make it likely that the learner will engage with that which is
+   to be learned (while creating personally meaningful artifacts). Adding
+   another toy in the curated sandbox for the sake of adding it does not
+   necessarily expand the scope of "that which is to be learned", though it
+   *might* increase the scope of the design space of learner artifacts. Does
+   widening walls mean both of these things or is just one sufficient? In
+   fact, I can also easily imagine a situation where adding a feature to the
+   learning environment might negatively impact learning---e.g., increasing
+   cognitive load. Overall, the paper could be improved with a more
+   deliberate definition of what is (and also is not) widening walls.  This
+   would also help us understand the extent to which they believe their
+   results have implications for the design learning environments beyond
+   Scratch and why.
+
+
+
+Generalizability:
+1AC:
+   Reviewers are concerned that what is reported in the paper might be a
+   one-off observation made on the specific platform and setting. We see one
+   case study here, but it’s not clear if the knowledge is generalizable
+   in determining if adding a feature is going to widen the wall or not. R2
+   points out that the definition is only vaguely presented in the paper. R3
+   thinks practical implications are severely limited and also calls for a
+   clearer definition of widening walls. R1 asks for a deeper discussion of
+   whether widening the wall might potentially raise the floor.
+
+R3:
+
+   Theoretically speaking the authors have made a contribution
+   with this work.  Practically speaking, I remain somewhat unconvinced
+   about the significance of this work--the authors have not convinced me
+   that this is actionable in any context in any way other than the
+   carefully constrained one in which they conducted the study.
+   [...]
+  The authors have clearly presented a small but detailed quantitative
+   examination of adding one feature to Scratch.  The study's scope of
+   contribution is significantly hindered by an overly constrained setting,
+   coupled with the fact that very few learners actually explored the new
+   SCV feature.  The analysis may be statistically meaningful, but it's not
+   clear there are many practical take-aways here.
+
+
+
+Analytical strategy wat?
+1AC:
+   In terms of writing, reviewers find the analytical strategy section to be
+   overly obtuse, dense, and difficult to follow. Assuming that most CHI
+   audience is not familiar with econometrics, the paper needs to do a
+   better job of describing the methodology in a more accessible manner.
+
+R2:
+   While the methodology is novel and appears to introduce rigor that would
+   be informative for the CHI community that encounters intention-to-treat
+   studies with large quantities of data, the description of the methodology
+   is very difficult to understand, from a non econometrist’s point of
+   view. This makes the results somewhat difficult to follow as well. The
+   discussion section could be expanded to give more of an overview of the
+   results, broken down by hypothesis. I do not have the expertise to
+   further evaluate the methodology and claims made in this paper.
+
+R3:
+   It does get a bit dense in the mid
+   section (Analytical Strategy), and it would be an improvement to simplify
+   the explanation to be a bit more direct.
+
+
+Misc: (also has a bunch of editing things, which we’ll say will be proof-read)
+R1:
+   Minor issues related to clarity:
+   It’s unclear what the following sentence means. If an inclusion
+   criterion was receiving Scratcher status as of the change, then
+   wouldn’t all projects in the dataset be from Scratchers after the
+   change and New Scratcher before the change? “We also included a binary
+   variable that indicates whether the learner has received Scratcher status
+   or not (Is Scratcher?), and a binary variable indicating whether the
+   learner has used SCV variables in a previous shared de novo project (Used
+   Cloud Data).”
+   For Is Scratcher? variable, what is coded as 0 and what is coded at 1?
+   For the statement, “Given a positive causal relationship between use of
+   SCV and Uses Data Structures?, it is still difficult to disentangle the
+   degree to which SCV…” make it more obvious that this is a
+   hypothetical statement.
+   2SLS is defined a column and a half after it is first used.
+   Threats to validity typically go in the discussion section.
+
+R2:
+  The authors should disambiguate their use of ‘constructivist’ and
+   ‘constructionist’. In learning science circles, constructivism is
+   often linked with Piaget, and constructionism with Papert. One is
+   cognitive theory, the other, an educational method.
+[...]
+   Of 33 references there appears to be 1 CHI reference, 2 CSCW, and ~3 IDC.
+   It may be a good idea to link this work closer to other CHI work.
+
+R3:
+   The authors also state that
+   their quasi-experimental, econometrics-inspired analysis is a
+   methodological contribution to the field of HCI.  While I have not seen
+   this specific type of analysis in prior CHI literature, I'm not sure the
+   fundamental goals here are new.
+[...]
+   A minor semantic quibble: The first paragraph of the background section
+   implies that Piaget's general theory of constructivist learning is
+   distinct from "transmission of knowledge from the teacher to the
+   student".  While I understand the point you're trying to make,
+   technically speaking, constructivism is at play even when a teacher is
+   lecturing to a student (it's just not a particularly effective strategy
+   for engaging a student in actively constructing a robust mental model of
+   the content being taught).
+
--- a/chi_rebuttals/2018-wide_walls/wide_walls-2018-revision_summary.txt
+++ b/chi_rebuttals/2018-wide_walls/wide_walls-2018-revision_summary.txt
@ -0,0 +1,41 @@
+We made the following substantive changes to our manuscript.
+
+Each of these changes was described in our rebuttal and the points in this summary correspond to the points in our rebuttal text. 
+
+1. We made a series of changes to the paper to de-emphasize our claims in regards to learning:
+
+   i. In our background, we clarified that “wide walls” support increased engagement which can lead to learning, but that our study only directly measures engagement.
+
+   ii. At several points, we explained that our results provide no direct evidence of learning but that we believe our findings for H2 provide evidence in support of the theory that wider walls can also support learning.
+
+   iii. In both our background and our measures section, we added text to articulate the reasoning behind the latter point: a) previous quantitative studies of Scratch measure learning as the presence of certain blocks; b) Moreno-Leon et al. (CHI 2017) validated these approaches by comparing them to expert assessments; c) because all users in our sample had access to variables before the treatment, an increase in non-SCV variable use is difficult to explain except through increased familiarity with data structures in general; d) Dasgupta et al. (CSCW 2016) and others described an increase in block use associated with exposure as evidence in support of learning.
+
+   iv. We removed the phrase “strong evidence of learning” and similar text in several other places.
+
+   v. We removed the word “learning” from our title.
+
+2. We made several changes designed to make our methodology more clear:
+
+  i. We revised our analytic strategy section for clarity.
+
+  ii. We added a citation, with a short description, to a methodologically similar econometric study from education policy research.
+
+  iii. We standardized on the term “natural experiment” instead of “quasi-experiment” which is also reflected in our title.
+
+  iv. We edited our threats section to clearly explain that our method produces a local treatment effect on affected users—i.e., a subset of Scratch users who differ systematically from all Scratchers in observable and likely unobservable ways.
+
+3. We added a new paragraph to our threats section to describe the challenge of separating the effect of wide walls from novelty.
+
+4. To more precisely express what we mean by wide walls, we quoted text describing the design rationale from the SCV systems paper and have added a citation to Resnick and Silverman.
+
+5. We have edited our methods and threats sections to explain that 2SLS achieves strong internal validity (i.e. unbiased estimation of a local causal effect) at the potential expense of external validity. We have edited our manuscript to carefully convey that this is /a/ test and reflects only a first piece of contingent evidence in support of the widely-cited theory.
+
+6. We have edited our manuscript to address R3’s comment on constructivism and transmission of knowledge.
+
+7. We carefully edited our paper for style and clarity. We had our paper professionally proofread and made a large number of stylistic improvements.
+
+We made several other changes:
+
+1. While carefully reviewing our code a final time, we found a minor bug (a fencepost error) in a function that generated our measure of whether projects subsequent to the policy change had used SCV. Although many of our specific estimates are different, the differences are all very minor and the sign and magnitudes of the estimates are  identical. Because we wrote our paper using KnitR (a literate programming language), all the numbers in the paper are created automatically from data and we are confident that the current draft has been updated completely.
+
+2. We have unblinded our paper, added our copyright blurb, and added an "Acknowledgements" section.