17
0
cdsc_examples_repository/dissertations/nathante_uw_2021/ch2_identifying.tex

934 lines
109 KiB
TeX

%
%% This is file `sample-authordraft.tex',
%% generated with the docstrip utility.
%%
%% The original source files were:
%%
%% samples.dtx (with options: `authordraft')
%%
%% IMPORTANT NOTICE:
%%
%% For the copyright see the source file.
%%
%% Any modified versions of this file must be renamed
%% with new filenames distinct from sample-authordraft.tex.
%%
%% For distribution of the original source see the terms
%% for copying and modification in the file samples.dtx.
%%
%% This generated file may be distributed as long as the
%% original source files, as listed above, are part of the
%% same distribution. (The sources need not necessarily be
%% in the same archive or directory.)
%%
%% The first command in your LaTeX source must be the \documentclass command.
% \documentclass[sigconf,authordraft]{acmart}
%%%% As of March 2017, [siggraph] is no longer used. Please use sigconf (above) for SIGGRAPH conferences.
%%%% As of May 2020, [sigchi] and [sigchi-a] are no longer used. Please use sigconf (above) for SIGCHI conferences.
%%%% Proceedings format for SIGPLAN conferences
% \documentclass[sigplan, anonymous, authordraft]{acmart}
%%%% Proceedings format for conferences using one-column small layout
%\documentclass[acmsmall,authordraft]{acmart}
% NOTE that a single column version is required for submission and peer review. This can be done by changing the \doucmentclass[...]{acmart} in this template to
% \documentclass[sigconf,review=True]{acmart}
\chapterprecishere{
% Most explanations of changes in online group size focus on internal factors like social structures or design decisions.
% do not make the , and render critical questions like “which other groups are a given group's strongest competitors or mutualists?” unanswerable.
% TODO: Polish abstract
% Online groups interact with each other as people, content and ideas flow among them.
We introduce a method for inferring competitive and mutualistic interactions between online groups from time series participation data based on the theoretical framework of community ecology. Platforms often host multiple online groups with highly overlapping topics and members. How can researchers and designers understand how interactions between related groups affect measures of group health? Inspired by population ecology, prior social computing research has studied competition and mutualism among related groups by correlating group size with degrees of overlap in content and membership. The resulting body of evidence is puzzling as overlaps seem sometimes to help and other times to hurt. We suggest that this confusion results from aggregating intergroup relationships into an overall environmental effect instead of focusing on networks of competition and mutualism among groups as our approach does. We compare population and community ecology analyses of online community growth by analyzing clusters of subreddits with high user overlap but varying degrees of competition and mutualism.
}
%%
%% The code below is generated by the tool at http://dl.acm.org/ccs.cfm.
%% Please copy and paste the code instead of the example below.
%%
% \begin{CCSXML}
% <ccs2012>
% <concept>
% <concept_id>10010520.10010553.10010562</concept_id>
% <concept_desc>Computer systems organization~Embedded systems</concept_desc>
% <concept_significance>500</concept_significance>
% </concept>
% <concept>
% <concept_id>10010520.10010575.10010755</concept_id>
% <concept_desc>Computer systems organization~Redundancy</concept_desc>
% <concept_significance>300</concept_significance>
% </concept>
% <concept>
% <concept_id>10010520.10010553.10010554</concept_id>
% <concept_desc>Computer systems organization~Robotics</concept_desc>
% <concept_significance>100</concept_significance>
% </concept>
% <concept>
% <concept_id>10003033.10003083.10003095</concept_id>
% <concept_desc>Networks~Network reliability</concept_desc>
% <concept_significance>100</concept_significance>
% </concept>
% </ccs2012>
% \end{CCSXML}
% \ccsdesc[500]{Computer systems organization~Embedded systems}
% \ccsdesc[300]{Computer systems organization~Redundancy}
% \ccsdesc{Computer systems organization~Robotics}
% \ccsdesc[100]{Networks~Network reliability}
%%
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
% \keywords{datasets, neural networks, gaze detection, text tagging}
%% A "teaser" image appears between the author and affiliation
%% information and the body of the document, and typically spans the
%% page.
% \begin{teaserfigure}
% \includegraphics[width=\textwidth]{sampleteaser}
% \caption{Seattle Mariners at Spring Training, 2010.}
% \Description{Enjoying the baseball game from the third-base
% seats. Ichiro Suzuki preparing to bat.}
% \label{fig:teaser}
% \end{teaserfigure}
%%
%% This command processes the author and affiliation and title
%% information and builds the first part of the formatted document.
% \fontsize{12pt}{24pt}
% \selectfont
%% We're going for a "known puzzle" + "clarifying confusion" framing
%% Rememver to frame aronud the depvar
%% TODO: rewrite with a new outline
%% Introduction, Related Work, Materials & Methods, Results, Discussion, Conclusions
%% Put research question in the introduction.
%% Put hypotheses in Related Work.
%% Consider Hypothesizing that mutualism will be more common than competition because subreddits in these clusters are specialized.
%% Cut unneeded ecological terms
%% Define needed ecological terms
\section{Introduction}
\label{sec:intro}
% Why we need an ecological approach
%Online groups are important places where people collaborate to produce information sources, engage in discussions and participate in culture.
Although the fact is frequently ignored in social computing scholarship, online groups do not exist in isolation.\footnote{We use the term ``online group'' instead of ``online community'' to help avoid confusion with our term ``community ecology'' which plays an important conceptual and analytic role in our paper.} Indeed, although studying interdependence between online groups is different and complex \citep{hill_studying_2019}, research in social computing has sought to quantify how online groups share users or topics \citep{datta_identifying_2017, del_tredici_semantic_2018, tan_all_2015, hessel_science_2016}, and how such interactions relate to outcomes like the emergence of new groups \citep{tan_tracing_2018}, contributions to peer-produced knowledge \citep{vincent_examining_2018}, and the spread of hate speech \citep{chandrasekharan_you_2017}. Although this work has demonstrated that intergroup interactions matter very little intergroup research has tackled questions of group success---i.e., why some online groups succeed in maintaining active and long-lived participation while most do not.
%\citep{kraut_role_2014, resnick_starting_2012}. % commented out since there was no response
Can intergroup relationships
% competition or mutualism between online groups
explain whether online groups will grow or decline?
% NOTE: I guess you've added the footnote above to address the reviewer concern. It's important but (a) I think it's too early in the manuscript to bring this in and (b) it should be in a footnote. -mako
% I moved it below by the RQ.
%a growing body of social computing research shows that online groups, such as wikis, discussion forums and mailing lists spawn new groups and wage conflicts against, compete with and help each other citep{datta_identifying_2017, tan_tracing_2018, wang_impact_2012, zhu_impact_2014}.
% individual chances of success while mutualistic dynamics increase them.
% How do relationships between groups shape their chances of success?
% What's wrong with previous ecological approaches
% Should we introduce ecological theory in the introduction at all?
Studies in social computing have drawn from organizational ecology to answer this question \citep{wang_impact_2012, zhu_impact_2014, resnick_starting_2012, zhu_selecting_2014}. Inspired by the ecological study of biological systems, organizational ecology is an influential body of theory in sociology that studies competition and mutualism among human organizations
% , ranging from commercial industries to social movements \citep{hannan_population_1977, baum_ecological_2006}.
% NOTE: There's a jump between this sentence and the last one. I think we might need to signal, somehow, that orgecol is not puzzling or the results in soccomp are puzzling in regards to them. I've changed puzzling below to inconsistent but we should make it clear what it's inconsistent with. -mako
Although ecological studies of firms and social movements have developed a clear and established body of theory with strong empirical support \citep{baum_ecological_2006}, similar studies of online groups have yielded inconsistent results that differ both from one context to another and from theoretical predictions. For example, wikis whose memberships overlap with other wikis survived longer \citep{zhu_selecting_2014}, but Usenet groups with overlapping memberships failed more quickly \citep{wang_impact_2012}.
% NOTE: I'm not sure conflation is the right term here. I've reworked this paragraph below -mako
% I think you nailed it. -- nate
We argue that these confusing results are the result of a conflation of concepts and measures from two distinct strands of theory in organizational ecology: \emph{population ecology} and \emph{community ecology}. Both define competition as a form of interdependence that \emph{decreases} growth and mutualism as one that \emph{increases} growth. However, population ecology focuses on modeling the how overlapping resources among groups affect their subsequent growth, decline, or survival \citep{astley_two_1985, baum_ecological_2006, dobrev_dynamics_2001}. It does not attempt to directly study competitive and mutualistic interactions. On the other hand, community ecology recognizes that groups often exist within ``ecological communities,'' or clusters of highly related entities, and provides an approach for inferring competitive and mutualistic interactions among these. Although the stated goal of ecological research in social computing has been to understand how groups influence each others' ability to sustain participation, ecological research in social computing has relied exclusively on concepts and measures from population ecology. This paper seeks to explain the puzzling set of findings in ecological social computing research by introducing community ecology.
%These strands have different concepts of ecological dynamics, different levels of analysis and make distinct theoretical predictions \citep{astley_two_1985}.
% despite the fact that doing so is vital to
% Our contributions to CSCW are theoretical, methodological, and empirical.
% Our theoretical contribution, articulated in §\ref{sec:community_ecology},
% We then demonstrate both approaches by investigating our research question:
% \textit{(\textbf{RQ}) How does community ecology's view of competition and mutualism in online groups compare to that of population ecology?}
% Our overarching goal is to introduce community ecology as a theoretical and methodological framework for understanding how the relationships between specific online groups shape their growth or decline.
We do so in a three-part empirical study using a dataset drawn from the 10,000 communities on Reddit with the most contributors to analyze 641 clusters of online groups with overlapping participants.
In Study A, we conduct the most important type of population ecology analysis, a test of what is called density dependence theory, and find support for the theory.
%This suggests that competition is strongest when user overlap is high and mutualism is weakest when overlap is low.
This analysis suggests that high degrees of user overlap are associated with competition.
%VAR models are widely used in biological ecology to make inferences about competitive or mutualistic interactions between species.
In Study B, we introduce our method for community ecology analysis that infers networks of competitive and mutualistic interactions by using clustering analysis and vector autoregression (VAR) models of group size over time \citep{sims_macroeconomics_1980, canova_var_2007, ives_estimating_2003}. We illustrate the method in four case studies and present a large-scale computational analysis showing that mutualistic interactions are far more common than competitive ones.
Finally, in Study C, we bring Study A and Study B together to compare population ecology and community ecology by extending the density dependence model from Study A with a variable accounting for competition and mutualism. While we find that adding this variable does not help predict growth, including ecological interactions in our VAR models improves time series forecasting.
% importance of accounting for mutualistic and competitive interactions in predicting the growth of online groups. We
% While models including , .
We discuss how these findings illuminate the differences between population ecology and community ecology and show how the two perspectives are complementary.
While Study A suggests that competition is strongest when user overlap is high, Study B finds widespread mutualism among groups with overlapping membership.
Although these findings might seem contradictory, they reflect how population ecology studies overlapping resources related to favorable or unfavorable environmental conditions, while community ecology studies competitive and mutualistic interactions playing out in local networks of specific groups. By demonstrating that mutualistic and competitive interactions within clusters of highly related groups are important---and by describing how to measure them---this paper lays the groundwork for future research to investigate and design for interdependence between online groups that supports their growth and success.
%we demonstrate that interactions are important and how to inferred and are useful for time series forecasts of
% and inform design
% by understanding
%lays the groundwork for future research toward design
% understanding how different forms of
% To answer this question, We validate our approach by showing in §\ref{sec:res.forecasting} that
% % NOTE: Is it (1) the top 1000? It would be nice to summarize the comprehensiveness here. (2) I'm ambivalent about the word "network" here. -mako
% We make four specific empirical contributions: Reddit in §\ref{sec:res.characterizing} and .
% and provide an explanation for why previous ecological research in social computing has led to confusing and inconsistent results.
% NOTE: Is the sentence below correct? I guess so (at least indirectly) but I haven't read the new discussion. -mako New discussion isn't written yet, but right now that explanation is in the background section. :) -N
% NOTE: cut this last sentence? -mako - I think this last sentence will be a more accurate reflection of the discussion. -N
% We
% We
% We make a theoretical contribution by introducing the community ecology perspective that We also make a methodological contribution by providing a method for inferring these relationships from time-series data on group sizes
% Where prior approaches aggregate individual relationships between groups, our approach makes it possible to answer critical questions like ``which are a given online group's mutualists or competitors?''
% In the process, our theoretical work brings clarity to a confusing set of empirical results in prior research.
%Discussing this seemingly contrasting finding motivates future investigations into how competitive or mutualistic ecological communities form and why some environments for online groups are competitive or mutualistic.
% This method builds on a popular approach in biology that provides robust inferences about networks of ecological relationships. , analysis of stability, forecasts of future participation, and can scale to analyze systems of dozens of related communities. We apply this approach to four datasets.
% We validate our method using simulated data to show that it can identify a full range of ecological relationships and conduct a series of three case studies of groups hosted on the platform Reddit in \textsection \ref{sec:case.studies}. Although limited, these case studies make a third contribution in the form of empirical findings that suggest that specific patterns of relationships vary substantially across networks of groups and that mutualism appears to be much more common than competition.
\section{Related Work}
\label{sec:related.work}
% One sentence on "timeliness." Find citations (Chowdry, Benkler,
Online groups are important sites for social support \citep{de_choudhury_mental_2014}, entertainment \citep{ducheneaut_alone_2006}, information sharing \citep{benkler_wealth_2006}, and political mobilization of disinformation campaigns and protest movements \citep{choudhury_social_2016, benkler_social_2013, krafft_disinformation_2020}.
% knowledge of the ecosystem of online groups is important for advancing social science and informing future designs to support and manage online groups.
Although an online group's ability to achieve its goals depends on attracting and retaining contributors, few develop a sizable group of participants \citep{benkler_wealth_2006, dimaggio_social_2001, johnson_emergence_2014, koh_encouraging_2007, kraut_role_2014}. Many attempts to explain the success and growth of online groups look to properties of individual groups like characteristics of founders \citep{kraut_role_2014}, language use \citep{danescu-niculescu-mizil_no_2013}, turnover \citep{dabbish_fresh_2012}, and designs for regulating behavior \citep{halfaker_rise_2013, teblunthuis_revisiting_2018}.
Recent research suggests that interdependence among online groups is also important to explain success and failure \citep{cunha_are_2019, kairam_life_2012, tan_all_2015, tan_tracing_2018}.
For example, banning hate subreddits reduced hate speech in related subreddits \citep{chandrasekharan_you_2017}. In a very different context, there is evidence that Reddit and Stack Overflow receive substantial benefits from activity on Wikipedia \citep{vincent_examining_2018}.
% ; and editors make valuable and qualitatively different contributions across different languages of Wikipedia \cite{hale_cross-language_2015}. In addition, growth trajectories of online groups initially about similar topics can diverge \cite{zhang_understanding_2021}.
Our work contributes to this literature by providing a new conceptual lens and statistical method for studying competition and mutualism between online groups.
% , which theorizes how online groups depend on distinct types of resources.
% As we discuss in §\ref{sec:rdp}, the nature of these resources makes possible conditions for mutualism or competition. In §\ref{sec:ecology_background}, we explain how prior ecological studies of online groups extended RDT to consider how overlapping resources between communities can drive competition and mutualism and propose our first hypothesis which replicates part of these studies in Reddit, our empirical context. Finally, in §\ref{sec:community_ecology}, we draw anew from biology and organizational ecology to present our community ecology approach and propose hypotheses to validate its usefulness for predicting the growth of online groups.
\subsection{Online Groups Depend on Resources}
\label{sec:rdp}
Like prior ecological research in social computing and information systems, we build on resource dependence theory (RDT) \citep{butler_membership_2001, wang_impact_2012}.
\citet{butler_membership_2001} introduces
RDT to argue that growth in online groups is driven by positive feedback as participants contribute resources such as content, information, attention, or social interactions, which motivate further contributions by subsequent participants. That said, online groups do not grow forever and RDT explains that growth is self-limiting because costs of participation increase in larger groups \citep{butler_membership_2001, butler_attraction-selection-attrition_2014}.
% While growth far from the only criteria of success for an online group, much social computing research follows RDT by seeking to support groups' growth and survival through the attraction or retention of members \cite{koh_encouraging_2007, kraut_role_2014, cunha_are_2019}.
% For example, explanations of Wikipedia's transition from growth to decline structures for quality assurance in a growing project that constituted barriers to newcomer participation \cite{halfaker_rise_2013, teblunthuis_revisiting_2018} spawned significant interest in designs for increasing newcomer retention that have met with limited success \citep[e.g.][]{halfaker_snuggle:_2014, morgan_tea_2013, narayan_wikipedia_2017}. Social structures like leadership, organizational practices, network structure, and design decisions can lower costs and increase benefits of participation \cite{butler_membership_2001, kraut_role_2014, tsugawa_impact_2019}.
%TODO: incorporate the below citations to "demonstrate that this is of importance to the social computing audience"" Also cite Charlie's paper about cross-platform interdependence
%We review this foundational work in §\ref{sec:resource_dep} and then narrow our focus to prior ecological studies and other empirical work about interdependence between online groups in §\ref{sec:ecology_background}. Then, in §\ref{sec:community_ecology} we review sociological research developing community ecology theory and apply it to online groups.
% It also builds closely on two bodies of ecological theory: first, explanations from population ecology that describe entities as sharing resources in environments and second, explanations from community ecology that theorize networks of specific community relationships.
% In our background we introduce the first two bodies of related work in sections \ref{sec:resource_dep} and \ref{sec:ecology_background}.
% Frame around the dependent variable:
% Explaining participation is important because
% 1. It's a longstanding concern of the field
% 2. Online Groups are important to society
% models
% ranging from entertainment, information exchange, social interaction, to the collaborative production of knowledge and organization of collective action
% This positive feedback between the value of prior contributions and the motivation for future contributions drives community growth.
% Think about the implications of our findings for the rival vs nonrival resources that could be in play.
% Maybe try to deepen the discussion of resource competition, or maybe its better to avoid getting dragged into this.
Ecological approaches recognize that interrelated online groups may share resources with one another in ways that constrain their growth and survival. \textit{Rival} resources like participants' time, attention, and efforts raise the possibility of competition because they become unavailable to others when used by one group \citep{benkler_wealth_2006, kubiszewski_production_2010, ostrom_public_1977,romer_endogenous_1990}. RDT suggests that declines in online participation can be explained in terms of competition over important rival resources \citep{wang_impact_2012}.
% Online participation in general has opportunity costs and may compete with alternatives like sleep, entertainment, or work \cite{becker_theory_1965, butler_attraction-selection-attrition_2014}.
% So online groups that provide similar benefits may be the most likely competitors because once someone has obtained satisfying benefits from one group they may go offline or switch to another activity instead of seeking similar benefits from competitor groups.\footnote{Economists refer to these as ``substitutes.' }
% providing the same benefits at lesser costs might be a compelling alternative.
% If different online groups can substitute for participation in one another and participation is rival this will lead to competition between the communities and decrease participation in both.
% Public goods are nonrival because their usefulness is not diminished when others use them.
On the other hand, online groups also rely on \textit{nonrival} resources. They can even produce connective and communal public goods like opportunities to communicate or collections of information \citep{fulk_connective_1996} which can be ``antirival'' when their usefulness increases as a result of others using them \citep{kubiszewski_production_2010, weber_political_2000}. For example, the usefulness of a communication network increases as more people join it \citep{fulk_connective_1996, katz_network_1985}. Similarly, the usefulness of an information good can increase as more people come to know, refer to, and depend upon it \citep{kubiszewski_production_2010, weber_political_2000}.
% as when
%Awareness that an online group provides an audience can motivate participation \cite{zhang_group_2011}.
If multiple online groups help build the same connective or communal public goods, they may form mutualistic interactions where contributions to one group may ``spill over'' and motivate participation in mutualist groups \citep{zhu_impact_2014}.
Ecological approaches seek to understand how different types of resources will limit or promote growth.
% as was demonstrated when Chinese government blocked the Chinese language edition of Wikipedia, unblocked contributors decreased their participation
%
%As a result, researchers, designers, and managers of online communities often set aside thorny questions of interdependence between online communities.
%While extensions of the resource dependence framework recognize the importance of exit from online communities \cite{butler_attraction-selection-attrition_2014}, they do not say where people go when they leave. % Before turning to our theory of community ecology, we note differences between ecological theory and analysis in organization and biological science from other uses of the term ecology in HCI and social computing.
% The term ``ecology'' often connotes interconnectedness, complexity, growth, and nature, and also crises of resource sustainability, loss, and extinction \cite{worster_natures_1994, blevis_ecological_2015}. Most references technologists make to ``ecology''
% For example Nardi and O'Day invoke the ecological metaphor in describing their vision for individuals to cultivate intentional and localized relationships with technology \cite{nardi_information_2000, bowker_bonnie_2001}.
% This continues a long-running intellectual exchange between social and biological sciences. Economic thought was strongly influenced by Darwinian evolution and ecologists in biology were influenced by economic models to understand and solve problems in forestry and conservation \cite{kropotkin_mutual_2012, worster_natures_1994}. Once modern ecological science was developed it was not long before it was applied to understand human societies \cite[e.g.][]{park_human_1936, hawley_human_1986}. Because theories of organizational ecology were crafted to address particular concerns in organization science and are laden with assumptions appropriate to traditional firms with fixed and durable boundaries, our ecological approach also draws from biology.
% TODO This section needs a number of new concrete examples. Revisit the ecological literature as well. Also perhaps add some examples from the interview paper (which we'll cite and anonymize).
\subsection{Population Ecology, Density Dependence and Overlapping Resources}
\label{sec:ecology_background}
% Our theoretical approach draws from ecology.
While this paper focuses on the ecological study of online groups, other social computing and HCI scholars have used the term ``ecology'' (and related concepts like ``ecoystem'' and ``environment'') to denote an assemblage of sites, devices, or platforms \citep{nardi_information_1999,wang_coming_2015}. We use the term more narrowly to refer to conceptual and mathematical models of ecological dynamics.
In particular, our work builds on a tradition rooted in \textit{organizational ecology}. First developed in the late 1970s by sociologists studying interactions between firms, organizational ecology was inspired by, and has drawn closely from, ecological studies in biology \citep{hannan_population_1977}.
Because online groups bear similarities to traditional organizations, organizational ecology provides a compelling theoretical framework for understanding interdependence among online groups. It has inspired at least three high-quality empirical studies of how resources shared by online groups shared shape their growth, decline, or survival \citep{wang_impact_2012, zhu_impact_2014, zhu_selecting_2014}.
These studies draw from the \textit{population ecology} strand of organizational ecology
%, while we introduce \textit{community ecology} as an alternative.
that studies ecological dynamics within a population of groups. In organizational ecology, populations have been defined as sets of organizations sharing an organizational industry or business model \citep{hannan_organizational_1989}. In social computing, populations have been defined as online groups sharing a given social media platform \citep{wang_impact_2012, zhu_impact_2014, zhu_selecting_2014}.
While population ecology involves several distinct theoretical propositions, \textit{density dependence theory} (DDT) is perhaps the most prominent and is the subject of all three prior ecological studies of online groups \citep{wang_impact_2012, zhu_impact_2014, zhu_selecting_2014}. DDT models competitive or mutualistic forces in a population of groups as a function of \textit{density} which, in the earliest and most influential studies of DDT, is simply the size of the population. In this way, DDT assumes that every group in the population is facing the same competitive and mutualistic pressures \citep{aldrich_organizations_2006}.
However, online groups sharing a platform have diverse topics \citep{kairam_life_2012}, norms \citep{chandrasekharan_internets_2018, fiesler_reddit_2018}, and user bases \citep{tan_all_2015}. Because groups sharing few resources are unlikely to be strongly interdependent, ecological studies of online groups have modeled density dependence based on the concept of \emph{overlap density} \citep{baum_ecological_2006, dobrev_dynamics_2001, wang_impact_2012, zhu_impact_2014, zhu_selecting_2014}. Rather than the number of groups that exist in a population, overlap density measures the extent to which an one group's members or topics overlap with all other groups'. Overlap density thus characterizes a group's \emph{niche} or local \emph{resource environment} defined by its distinctive topic and membership.
%Unlike \citet{datta_identifying_2017}, we do not divide user frequency by the number of subreddits where the user appears because we do not wish to assume that users who comment in many subreddits are less ecologically important.
%Overlap density is thus not a property of a population of groups, but a property of the resource environment a particular group faces.
% While foundational studies of density dependence in organizational research measu
% red density and growth at the population level, ecological studies of online groups .\footnote{Although it is less common in organizational research, overlap density has also been used by some organizational ecologists \cite[e.g.][]{dobrev_dynamics_2001}.}
% Are this paragraph and the next one necessary or just confusing?
DDT proposes a model for the growth of organizational populations that has a similar structure to \citet{butler_membership_2001} RDT model for the growth of online groups.
In DDT, mutualism is the engine of positive feedback driving population growth. Organizational ecologists show how successful organizations in an emerging industry develop nonrival resources like the legitimacy of a business model or industrial know-how that attract new organizations to enter the market \citep{carroll_density_1989,hannan_organizational_1989}. Similarly, a population of online groups, such as those sharing a platform, may grow in size as their platform gains in popularity, as established groups spin off new ones, and as useful knowledge develops that can be shared between groups \citep{tan_tracing_2018, zhu_impact_2014}.
% TODO add a footnote to show the analytical equivalence between the models and connection to Malthus.
In RDT, growth of online groups is self-limiting because of the challenges in managing large groups \citep{butler_membership_2001}. In DDT, competition among population members over rival resources limits growth \citep{hannan_organizational_1989}. DDT thus proposes a trade-off in which low density reflects limited opportunities for mutualistic contributions of nonrival resources like legitimacy, connectivity, and knowledge, but high density reflects competition over rival resources.
Therefore, DDT predicts that the relationship between density and positive outcomes like growth or survival is $\cap$-shaped (inverse-U-shaped) \citep{baum_ecological_2006, carroll_density_1989}.
% Save the potential conflict between RDT and DDT for the discussion
% An individual online group's growth may be limited by the ability of their social structures to scale to include more members \citep{butler_membership_2001} or due to competition with other groups over members \citep{hannan_organizational_1989}.
%In a homogenous population or in cases where litt
%Population ecologists have used a number of definitions of population, but they often refer to sets of organizations having the same organizational form or business model.
%This is because many environments present a trade-off between mutualism and competition: mutualistic forces are stronger when density is low and competitive forces are stronger when density is higher. The intuition is that low-density environments reflect poor environmental conditions for success---if conditions were good then they would attract more growing communities hence be more dense. On the other hand, high-density environments are thought to become crowded and competitive \citepp{hannan_organizational_1989}.
Tests of DDT in populations of online groups yield inconsistent results. In \citet{wang_impact_2012}, user overlap in Usenet newsgroups is associated with decreasing numbers of participants. Similarly, \citet{teblunthuis_population_2020} find that topical overlaps between online petitions are negatively associated with participation. By contrast, \citet{zhu_impact_2014} find that membership overlap is positively associated with increasing survival of new Wikia wikis. Only \citet{zhu_selecting_2014} find support for the $\cap$-shaped relationship predicted by DDT in an enterprise social media platform.
In Study A, we provide a test of DDT using data from Reddit. The classical logic of DDT appears reasonable in the context of Reddit because low overlap density is likely to reflect an impoverished environment lacking in non-rival resources like skills and knowledge of experienced users, while a group with high overlap is likely to face competition over its members \citep{zhu_selecting_2014, zhu_impact_2014}:
\textit{(\textbf{H1}) The relationship between overlap density and the growth of online groups is $\cap$-shaped (inverse-U-shaped).}
% such as the
%DDT sees competition and mutualism as environmental properties of an online group's niche.
DDT proposes that very high levels of density will decrease growth because of increasing forces of competition within a niche. However, to conclude that groups with the greatest membership overlap are likely competitors would be to commit a well-known statistical fallacy
% (the term ecological fallacy does not refer to theories of population or community ecology, but rather to ``ecological correlations,'' meaning correlations involving aggregates)
\citep{piantadosi_ecological_1988, robinson_ecological_1950}.
The density of a group's environment suggests that it faces competition or mutualism, but it does not tell us which overlapping communities are competitors and which are mutualists.
% DDT therefore relates resource overlaps to the growth of online groups, yet stops short of inferring competitive or mutualistic interactions among them. It does not provide a way of learning when and why groups are mutualists or competitors and this limits its ability to inform designs that take these interactions into account.
Community ecology overcomes this limitation of DDT.
\subsection{Introducing Community Ecology \label{sec:community_ecology}}
Perhaps the most natural way to understand the distinction between population ecology and community ecology is in where they believe ecological dynamics like competition and mutualism play out \citep{astley_two_1985}. While population ecology locates competition and mutualism within an environmental niche, community ecology locates competition and mutualism in networks of interdependent groups called \emph{ecological communities} \citep{aldrich_organizations_2006}. In organizational ecology, this can mean studying interactions between different organizational populations \citep[e.g.][]{sorensen_recruitment-based_2004, mcpherson_ecology_1983}, or networks of interactions between organizations \citep[e.g][]{powell_network_2005, margolin_normative_2012}.
%Doing so makes visible the distinctive roles that particular groups play.
While varying conceptions of community ecology are found in the organizational ecology literature \citep{freeman_community_2006}, the approach we describe is identical in structure to that taken by \citet{aldrich_organizations_2006} and \citet{hawley_human_1986}.
Community ecology focuses on \emph{ecological interactions} \citep{aldrich_organizations_2006}.
%In organizational ecology, these interactions are referred to as ``commensal relationships.'' However, biologists use the term ``commensal'' quite differently to mean an unreciprocated mutualistic interaction in which one species provides benefits to another while being unaffected by it. While for the most part, we draw our conceptions and terminology from organizational ecology rather than biology, the use of the term ``commensalism'' in organizational ecology can be confusing. We therefore adopt the term ``ecological interaction.''
Ecological interactions can be mutualistic when one group has a positive influence on the second such that growth in the first group leads to growth in the second. They can also be competitive if one group has a negative effect on the second such that growth in the first group leads to decline in the second. Ecological interactions can be reciprocated if mutualism (or competition) from one group to another group is returned in kind. An ecological interaction can also be mutualistic in one direction and competitive in the other. The competitive or mutualistic interactions in an ecological community are quantified by the \emph{community matrix}, a central analytical object in community ecology in both biology and organization science \citep{verhoef_community_2010, novak_characterizing_2016, aldrich_organizations_2006}.
In Study B, we demonstrate community ecology by inferring networks of ecological interactions in ecological communities on Reddit. Because our understanding of community ecology theory does not suggest hypotheses about what we will find, we conduct an exploratory data analysis to determine whether mutualism or competition among subreddits is more common on Reddit and present case studies illustrating the types of ecological communities we identify.
%So a commensal relationship exists between each pair of groups in an ecological community.
% There are six possible ecological interactions as described in Table \ref{tab:interaction.types}. Note that they can be reciprocal (as in full mutualism and competition) or not (as in partial mutualism and competition). In our framework ``predation'' is an interaction that is positive in one direction but negative in the other. It is also possible that growth or decline in the first group has no effect on the second group, and visa-versa, a situation termed ``neutrality.''
% \begin{table}
% \caption{The five possible ecological interactions between two online groups. Values in the column ``i $\rightarrow$ j'' represent the sign of $\phi_{i,j}$ group i's effect on group j. Based on table 11.1 from \citet{aldrich_organizations_2006}.}
% \centering
% \begin{tabular}{c|c|c}
% i $\rightarrow$ j ($\phi_{i,j}$)& i $\rightarrow$ j ($\phi_{i,j}$) & Interaction type \\ \hline
% $+$ & $+$ & Full mutualism \\
% $+$ & $\cdot$ & Partial mutualism \\
% $+$ & $-$ & Predation \\
% $-$ & $\cdot$ & Partial competition \\
% $-$ & $-$ & Full competition \\
% $\cdot$ & $\cdot$ & Neutrality
% \end{tabular}
% \label{tab:interaction.types}
% \end{table}
% by conceiving of community ecology as the study of relationships between different groups.
% Relationships studied in community ecology are defined by how they , but they are also important because networks of relationships
%and give rise to higher-order properties like stability.
%Our community ecology approach instead focus on relationships between communities from overlap density approaches to focuses on relationships between communities as a step toward solving the puzzle.
%Consider the example of how \citet{zhu_impact_2014} find membership overlap is associated with increasing survival of new Wikia wikis, but in \citepos{wang_impact_2012} study of Usenet groups user overlaps are associated with decreasing group sizes.
% Consider cutting this since we don't look at any other factors
%study period, and they found a stronger relationship when overlapping members were from more established groups. Perhaps the growth Wikia wikis was limited by knowledge of how to build a Wiki which was provided by more experienced users and user overlaps were correlated with access to such knowledge. While
% What's the point of these three paragraphs?
\subsection{Predicting Growth}
In Study C we build upon our analyses from Study A and Study B by testing whether community ecology can explain the growth and decline of online groups in ways that population ecology can not. We do this by analyzing in two different ways whether accounting for ecological interactions helps predict future group sizes.
% We expect it to do so because resource overlaps as modeled by DDT may be a poor proxy for the degree to which a group's environment is competitive or mutualistic.
In general, competition for overlapping resources will have no effect on group growth if something besides the overlapping resource limits growth \citep{verhoef_community_2010}. For example, two wikis might share a large number of contributors (they have high user overlap), but their growth might be limited by a lack of core contributors who perform important administrative tasks like policy making and software administration \citep{zhu_impact_2014}. Community ecology relaxes the assumption that competition and mutualism are caused by user overlap density and instead seeks to infer these relationships from data. We test the importance of this conceptual shift for predicting growth by testing two hypotheses. The first uses a model comparison approach to test if adding a measure of ecological interactions to the density dependence model in Study A improves prediction of growth: (\textit{\textbf{H2}) A model with ecological interactions and density dependence predicts growth in online groups better than density dependence alone.}
Support for H2 may be a relatively low bar for assessing whether ecological interactions are important factors shaping the growth of online groups because of confounding moderator or mediator variables related to the occurrence of ecological interactions.
% For example, suppose mutualistic interactions were correlated with declining ecological communities.
Therefore, we also use a time series forecasting approach to test whether modeling ecological interactions is useful for making time series forecasts of participation in online groups:
%We seek to demonstrate in whether including commensal relationships in time series forecasting models improves forecasting performance.
(\textit{\textbf{H3}) The addition of ecological interactions to a baseline time series model improves the forecasting performance.}
While this does not directly compare population ecology and community ecology, it validates that ecological interactions are important.
%With commensalism, we can seek to explain the puzzling results of resource overlap studies by exploring our second research question:\noindent \textbf{RQ2: How are degrees of user overlap and types of commensal relationships related?}
% This paragraph isn't helping very much
% Ecological dynamics play out through the network of such relationships over time as represented by the \emph{community matrix}, $\Phi$.
% Analysis of the community matrix can reveal indirect relationships between groups and properties of an ecological community like stability \cite{ives_estimating_2003}.
%Seeing interdependence between online groups through a community ecology-based network of dynamical relationships can make visible special roles that particular groups play in an ecological community through their many mutualistic or competitive relationships.
% Next we take a first methodological step toward answering questions like these by adapting vector autoregression models from biology and macroeconomics as an approach to inferring community matrices. We then apply our approach in three case studies of related groups hosted on Reddit to reveal three qualitatively different ecological communities.
%% SOME BIKERACK RAISING MORE ISSUES WITH THE NICHE OVERLAP APPROACH
% study online groups additionally shifts from an analogy of online communities as individual members of a biological species to online communities as species themselves and seeking to understand functional relationships between different online groups.
% Yet a closer examination of the analogy to density-dependence in organizational or biological populations reveals conceptual awkwardness. At issue is the referent of the term ``niche.'' Should we use ``niche'' to refer to a set of resources that an online community can utilize? This is what ``niche'' means in both overlap density and in our version of community ecology.
% Social exposure is also important, but we don't deal with that in this . The idea here is that the cost-benefit structure depends on alternatives which can lower costs or .
%VAR analysis can quantify the stability of the system and affords exploration of counterfactual forecasts to simulate hypothetical interventions \citep{ives_estimating_2003}.
\section{Materials \& Methods}
\label{sec:methods}
% The presentation of our materials and methods is organized as follows: First we introduce the methods and measures for Study A, beginning with
% \emph{user overlap} %(§\ref{sec:mes.overlap})
% which is aggregated into \emph{overlap density} %(§\ref{sec:mes.density})
% to predict subreddit \textit{growth} %(§\ref{sec:mes.growth})
% in a loglinear regression model. Then, for Study B, we present
% our clustering procedure for identifying ecological communities % (§\ref{sec:clustering})
% on which we fit VAR models % (§\ref{sec:var})
% predicting \emph{group size}. % (§\ref{sec:mes.group.size}).
% To explore the types of ecological communities found on Reddit, we derive two measures from these models for each cluster: \emph{average ecological interaction}
%(§\ref{sec:mes.avg.mut})
% which quantifies the degree of competition and mutualism in the ecological community and \emph{ecological interaction strength} %(§\ref{sec:mes.abs.int}) % which quantifies its overall intensity of ecological interactions. Next, we draw competition-mutualism networks in example ecological communities based on interpreting the VAR models using impulse response functions (IRFs) %(§\ref{sec:mes.irf}).
% Then, in Study C, we test H2 to compare community ecology and density dependence theory by adding \emph{subreddit average mutualism} %(§\ref{sec:mes.sub.mut})
% to the model from Study A. Finally, we test H3 by evaluating whether including ecological interactions in the VAR models improves time series forecasting. % (§\ref{sec:mes.forecasting}).
\subsection{Data}
Our data are drawn from the publicly available Pushshift archive of Reddit submissions and comments which we obtained from December 5\textsuperscript{th} 2005 to April 13\textsuperscript{th} 2020
\citet{baumgartner_pushshift_2020}. Within this dataset, we limit our analysis to submissions and comments from the 10,000 subreddits with the highest number of comments. There are 702 subreddits larger than the smallest subreddit included in our dataset having a majority of submissions marked ``NSFW,'' which typically indicates pornographic material. As others have done in large-scale studies of Reddit \citep[e.g.,][]{datta_identifying_2017}, we exclude these subreddits to avoid asking members of our research team to inspect clusters including pornography. The top 10,000 subreddits provide a sufficiently large number of ecological communities for our statistical analysis.
\subsection{Study A: Density Dependence Theory} % and Community Ecology}
\label{methods:density}
\subsubsection{User overlap \nopunct} \label{sec:mes.overlap}
$o_{i,j}$ quantifies the degree to which two subreddits ($i$ and $j$) share users.
%From it we construct clusters of related groups in §\ref{sec:clustering} and quantify overlap density in §\ref{sec:mes.density}.
\citet{zhu_impact_2014} and \citet{wang_impact_2012} both measure user overlap between two groups by counting the number of users contributing to both groups at least once and exclude users who appear in more than 10 groups. In our preliminary analysis, we found that this measure led to similarity measures and clusters with poor face validity. These issues may have stemmed from how Reddit users often peripherally participate in many groups while participating heavily in few \citep{tan_all_2015, hamilton_loyalty_2017, zhang_community_2017}. Therefore, our measure of user overlap follows \citet{datta_identifying_2017} by using the number of comments each user makes in each pair of groups.
To measure user overlap between subreddits, we first build user frequency vectors by counting the number of times each user comments in each subreddit. We prevent giving undue weight to subreddits with higher overall activity levels by normalizing the comment counts for each subreddit by the maximum number of comments by a single author in the subreddit:
\begin{equation}
f_{u,j} = \frac{n_{\mathrm{u,j}}}{max_{v\in\mathrm{J}}n_{v,j}} \label{eq:user.frequency}
\end{equation}
\noindent where $n_{u,j}$, the user frequency, is the number of times that user $u$ authors a comment in subreddit $j$.
This results in a user frequency vector $F_j$ for each subreddit that is sparse and high-dimensional, having one element for each user account that comments in any subreddit in our dataset.
% In the course of developing our clustering analysis described in §\ref{sec:clustering}, we found that following an approach analogous to latent semantic analysis (LSA) improved the quality of our clusters.
Next, we use LSA to reduce the dimensionality of the user frequency vectors.
LSA is based on the singular value decomposition and is common in natural language processing and information retrieval. LSA preserves subreddit similarities while removing noise and dealing with sparsity \citep{dumais_latent_2004}:
\begin{align}
\mathbf{F} &= \mathbf{U \Sigma V}^T \\ \nonumber
\widetilde{F_{j}} &= \mathbf{U_k}^TF_j \label{eq:user.frequency.svd}
\end{align}
\noindent $\mathbf{F}$ is the matrix where columns are author frequency vectors $F_j$ and $\mathbf{U \Sigma V}^T$ is its singular value decomposition. Truncating the singular value decomposition to use only the first $k$ left-singular vectors gives $\mathbf{U_k}$. Left-multiplying a subreddit's author frequency vector by $\mathbf{U_k}$ transforms the high-dimensional author frequencies into $\widetilde{F_j}$, their approximation in the $k$-dimensional space.
% We choose $k=600$ in the course of our grid search for a good clustering described below in §\ref{sec:clustering}.
%clustering with a high silhouette coefficient.
We then obtain our measure of \textit{user overlap} by taking the cosine similarities between the resulting vectors for a pair of subreddits:
\begin{equation}
o_{i,j} = \frac{\widetilde{F_{j}} \cdot \widetilde{F_{i}}} {\norm{\widetilde{F_i}} \norm{\widetilde{F_j}}} \label{eq:user.overlap}
\end{equation}
\noindent where $\norm{\widetilde{F_i}} = \sqrt{\sum_{x=1}^k \widetilde{f_{x,i}}^2}$ is the euclidean norm of the transformed user frequencies for subreddit $i$.
%We use the following methods and measures in our tests of our hypothesis that the relationship between user overlap density the growth of online groups is $\cap$-shaped (H1) and our hypothesis that accounting for ecological interactions will help explain growth beyond overlap density (H2):
% We measure \emph{overlap density} and \emph{growth} to and . To test \textit{\textbf{H2}}, we add the overall influence of ecological interactions on a subreddit
\subsubsection{Growth\nopunct}\label{sec:mes.growth} is the dependent variable in our density dependence model testing H1 and is also used in our test of H2 as part of Study B. Growth is measured as the change in the (log-transformed) size of a subreddit over the final 24 weeks of our data, from to November 4\textsuperscript{th} 2019 to April 13\textsuperscript{th} 2020.
\subsubsection{Overlap density\nopunct} \label{sec:mes.density} $d_i$ is the normalized average user overlap for a given subreddit. It is the independent variable in our density dependence model testing H1:
\begin{align}\label{eq:user.overlap.density}
d^*_{i} &= \frac{1}{\left|S\right|-1} \sum_{j\in R;j\ne i} \mathrm{o}_{i,j} \nonumber \\
d_{i} &= \frac{d_i^*}{\mathrm{max}_j d_j^*}
\end{align}
\noindent where $S$ is the set of groups in our dataset.
\subsubsection{Regression model for H1} \label{sec:reg.H1}
To test H1, we fit Model 1 % in Equation \ref{eq:M1}
which has first and second-order terms for overlap density to allow for a curvilinear relationship between \emph{overlap density} and \emph{growth}.
\begin{align}
\mathrm{Model~1} & & Y_i = B_0 + B_1 d_{i} + B_2 d^2_{i} \label{eq:M1}
\end{align}
\noindent where $Y_i$ is the growth of subreddit $i$ and $d_i$ is its overlap density.
\subsection{Study B: Introducing Community Ecology}
%Here we review the prior work on which we build our methodological approach to inferring competitive and mutualistic relationships between online groups. %\textsection \ref{sec:inferring} describes our own methodological contributions.
\subsubsection{Clustering to identify ecological communities}
\label{sec:clustering}
Analyzing networks of ecological interactions is the key difference between community ecology and population ecology.
% In Study A we set out to survey the types of ecological communities found on Reddit to provide a comparison with a large-scale population ecology analysis.
% in \ref{sec:clustering}
%Here, we use a heuristic approach based on clustering algorithms to find ecological communities of online groups that all have high user overlap.
To identify ecological communities of related subreddits, we use a clustering procedure based on the user overlap measure described above in §\ref{sec:mes.overlap}.
We selected a clustering model using grid search to obtain a high silhouette coefficient \citep{rousseeuw_silhouettes_1987}. The silhouette coefficient captures the degree to which a clustering creates groups of subreddits with high within-cluster similarity.
% relative to similarity with subreddits in other clusters.
Our description of our measure for user overlap in §\ref{sec:mes.overlap} does not explain how we choose the number of LSA dimensions $k$.
To do so, we ran the affinity propagation \citep{frey_clustering_2007}, HDBSCAN \citep{mcinnes_hdbscan_2017} and \textit{k}-means clustering algorithms and selected the algorithm, hyperparameters, and LSA dimensions $k$ that resulted in the clustering with a high silhouette coefficient having less than 5,000 isolated subreddits, and at least 50 clusters. We limit the number of isolated subreddits because some choices of hyperparamters for the HDBSCAN algorithm could improve the silhouette coefficient, but at the cost of greatly increasing numbers of isolated subreddits. Choosing a relatively high limit to the number of isolates helps ensure that our clusters contain highly related communities. We chose an HDBSCAN clustering with 731 clusters, 4964 isolated subreddits, $k=600$ LSI dimensions, and a silhouette score of 0.48.
We exclude the isolated subreddits from our analysis. More details about our clustering selection process are found in the online supplement.
%In order to test H2 and answer RQ1, we estimate the community matrix of commensal relationships between selected communities of online groups.
We evaluate the external validity of the chosen clustering using the purity evaluation criterion \citep{manning_introduction_2018}
% :
% \begin{equation}45
% \mathrm{Purity}=\frac{1}{N}\sum_{m\in M}\max_{d\in D}{|m \cap d|}
% \end{equation}
% \noindent Where $N$ is the number of clusters $M$, $D$ are ``true'' classes to which subreddits might belong and $max_{d\in D}|m \cap d|$ is the greatest number of subreddits in cluster $m$ that belong to the same class $d$.
To do so, an undergraduate research assistant examined a random sample of 100 clusters including 744 subreddits. By visiting the subreddits and using her own judgment, the assistant flagged subreddits that did not seem like a good fit for their assigned cluster. Using these labels and excluding 25 subreddits that have been deleted, made private, or banned, we calculated the purity of our clustering as 0.92. This means that we believe that 92\% of subreddits belong to their assigned cluster.
% Note that although we clustered subreddits based on user overlap, we obtain a high purity score based on a subjective evaluation of the subreddits' contents.
%\subsection{Inferring Mutualistic and Competitive Interactions}
% We find f(N.clusters) clusters and f(N.isolates) isolated subreddits. The median cluster has median.cluster.size subreddits and the largest cluster has
\subsubsection{Group size\nopunct} \label{sec:mes.group.size} is the dependent variable of the models we use to infer ecological interactions. Measured as the number of distinct commenting users in a subreddit each week, group size quantifies the number of people who participate in a subreddit over time. Typical of social media participation data, group size is highly skewed. Therefore, we transform it by adding 1 and taking the natural logarithm.
% The following three paragraphs probably belong in the methods section, but I'm trying to satisfy the reviewers.
\subsubsection{Inferring ecological interactions using Vector Auto Regression}
\label{sec:var}
The community matrix $\mathbf{\Phi}$ of ecological interactions can be inferred from time series data using vector autoregression models (VAR models). VAR models are a workhorse in biological ecology because VAR(1) models (i.e., VAR models with a single autoregressive term) have a close relationship with the Gompertz of population growth which is widely used in ecology \citep{ives_estimating_2003}. Even in the presence of unmodeled nonlinearities, VAR(1) models can reliably identify competition or mutualism in empirically realistic scenarios \citep{certain_how_2018}. VAR models also been widely adopted in the social sciences, particularly in political science and in macroeconomics \citep{box-steffensmeier_time_2014}.
% \citet{sims_macroeconomics_1980} advocated VAR modeling in macroeconomics to address a problem in the field as an alternative to structural equation modeling (SEM), which required detailed specification of a large number of theoretical assumptions to identify.
%similar to structural equation models but require fewer theoretical assumptions but are
%VAR models are flexible enough to model a wide range of systems so long as sufficiently long time-series data are available \citep{sims_macroeconomics_1980}.
VAR(1) models can be intuitively understood as a generalization of auto-regressive AR(1) models in time series analysis. But while AR(1) models predict the state of a single time series as a function of its previous value, VAR(1) models simultaneously predict multiple time series as a function of the values of every other variable in the system \citep{canova_var_2007, ives_estimating_2003}:
\begin{equation}\label{eq:var1}
Y_t = B_0 + B_1t + \sum_{k \in K}A_k x_{k,t} + \sum_{j \in M}\Phi_{j} y_{j,t-1} + \epsilon_t
\end{equation}
\noindent where $Y_t$ is a vector containing the sizes of a set of online groups ($M$) at time $t$. $B_0$ is the vector of intercept terms and $B_1$ is the vector of linear time trends ($b_{1,j}$) for each community ($j$). $\Phi_{j}$ represents the influence of $y_{j,t-1}$, the size of the $j^{\mathrm{th}}$ online group at time $t-1$ on $Y_t$. $\Phi_{j}$ is a column of $\mathbf{\Phi}$, a matrix of coefficients in which the diagonal elements correspond to intrinsic growth rates (marginal to the trend) for each online group and the off-diagonal elements are intergroup influences, and $\epsilon_t$ is the vector of error terms
Additional time-dependent predictors ($x_{k,t}$) can be included in the vectors $X_{k}$ with coefficients $a_k$. Because subreddits are created at different times, growth trends must begin only after the subreddit is created. We use $X_{k}$ to introduce a counter-trend during the period prior to the creation of subreddits so that each group's growth trend begins in the period the group is created. For each group $j$ created at time $t^0_j$ we fill $X_{j}$ with the sequence $[1,2,3,\ldots\ ,t^0_j-1,0,0,0,\ldots\ ]$. In other words, $X_{j}$ adds a counter-trend only during the period prior to the first comment in subreddit $j$. We fix the elements $a_{j,i}$ of $A_j$ equal to 0 unless $i=j$, so the counter trend only influences subreddit $j$. This effectively sets $a_{j,j}$ approximately equal to $-b_{1,j}$.
We fit VAR(1) models using ordinary least squares as implemented in the \texttt{vars} \texttt{R} package to predict the group size each week using over the history of each subreddit prior to November 4\textsuperscript{th} 2019 \citep{pfaff_var_2008}. We hold out 24 weeks of data for forecast evaluation and fit our models on the remainder. To ensure that sufficient data is available for fitting the models, we exclude 946 subreddits and 89 clusters having less than 156 weeks of activity.
% where the cluster data lacks the necessary degrees of freedom to fit the model because the length of the training time series is less than the size of cluster plus 2.
% We hold out the weeks from fit.date to to.date for evalution. % Some of the clusters were too large or had too low levels of activity We include only We include a vector of intercept terms (to account for different equilibrium community sizes) and a vector of trends (to account for long-run endogenous growth) because we found that including these terms greatly improved the fit of our models to the data. Our VAR(1) models have this form in vector notation:
%$$ Y_t = \Mu + \Phi_1 Y_{t-1} + \ldots + \Phi_p Y_{t-p} + \epsilon_t $$
% TODO: avoid mixing matrix and vector notation.
\subsubsection{Characterizing ecological communities}
\label{sec:characterizing.ecological.communities}
In Study B, we interpret the community matrix $\mathbf{\Phi}$ as a directed network of ecological interactions, a \emph{competition-mutualism network} \citep{ives_estimating_2003}. Although the elements of $\mathbf{\Phi}$ correspond to direct associations between group sizes \citep{novak_characterizing_2016}, ecological interactions can also be indirect. Consider 3 one-directional interactions between three groups ($a$, $b$, $c$) such that growth in $a$ predicts decreased growth in $b$ ($\phi_{a,b} < 0$), growth in $b$ predicts decreased growth in $c$ ($\phi_{b,c} < 0$), but $a$ and $c$ do not directly interact ($\phi_{a,c} \approx 0$).
This does not necessarily mean that groups A and C are independent. Rather, an exogenous increase in A predicts a decrease in B and thereby an eventual increase in C. Such indirect relationships are analyzed by using impulse response functions (IRFs) to interpret a VAR model \citep{box-steffensmeier_time_2014}. In large VAR models containing many groups, the great number of parameters can mean that few specific elements of $\mathbf{\Phi}$ will be statistically significant, even as many weak direct relationships can combine into statistically significant IRFs \citep{canova_var_2007}.
\subsubsection{Average ecological interaction\nopunct} \label{sec:mes.avg.mut} $\overline{m}$ measures the extent to which an overall ecological community is mutualistic or competitive by taking the mean point estimate of the off-diagonal coefficients of $\mathbf{\Phi}$:
\begin{equation}\label{eq:average.interaction}
\overline{m} = \frac{1}{\left|M\right| - 1} \sum_{i\in M} \sum_{j\in M;j\ne i} \phi_{i,j}
\end{equation}
\noindent if $\overline{m} > 0$ then mutualistic interactions within the ecological community are stronger than competitive ones, and if $\overline{m} < 0$ then competitive interactions are stronger then mutualistic ones.
\subsubsection{Ecological interaction strength\nopunct} \label{sec:mes.abs.int} $\kappa$ quantifies the overall strength of ecological interactions in an ecological community as the mean absolute value of the point estimates of the off-diagonal coefficients of $\mathbf{\Phi}$:
\begin{equation}\label{eq:average.absolute.interaction}
\kappa = \frac{1}{\left|M\right| - 1} \sum_{i\in M} \sum_{j\in M;j\ne i} \left| \phi_{i,j} \right|
\end{equation}
\noindent where $\left| \phi_{i,j} \right|$ is the absolute value of the coefficient $\phi_{i,j}$.
Ecological communities of subreddits with overlapping users vary in both the overall strength of ecological interactions and in the overall degree of mutualism and competition between member groups. If an ecological community's average ecological interaction is positive, we say the ecological community is mutualistic. If it is negative, we say the ecological community is competitive. The average ecological interaction can be close to 0 in two ways. First, the ecological interaction strength can simply be low. Alternatively, the ecological community can have a mixture of competitive and mutualistic interactions that cancel one another out when averaged. % Such an ecological community can have high ecological interaction strength.
\subsubsection{Impulse response functions\nopunct}\label{sec:mes.irf} (IRFs) of our VAR(1) models correspond to our visualizations of example competition-mutualism networks in §\ref{sec:case.studies}. An IRF predicts how much each group's size would change in response to a sudden increase in the size of each other group \citep{verhoef_community_2010}:
\begin{equation}
\mathbf{\Theta_t} = \mathbf{\Theta_{t-1}}\mathbf{\Phi}, t = 1,2,... \label{eq:irf}
\end{equation}
\noindent where $\mathbf{\Theta_t}$ is the impulse response function at time $t$. $\mathbf{\Theta_0}$ is an $M$-by-$M$ identity matrix so our impulses represent a log-unit increase of 1 to each group. $\mathbf{\Theta_t}$ is a matrix with elements $\theta^t_{i,j}$ corresponding to the response of group $j$ to the impulse of group $i$. We draw an edge $i \rightarrow j$ in the competition-mutualism network if the 95\% CI of $\theta^t_{i,j}$ does not include zero at any time $10>=t>0$. If $\theta^t_{i,j} >0 $, the edge indicates mutualism and if $\theta^t_{i,j} < 0$ the edge indicates competition.\footnote{In higher-order VAR($p$) models that use $p>1$ past observations as predictors $\theta^t_{i,j}$ can be less than 0 for some $t_a$ and greater than 0 for some $t_b$. However, this is not possible in the VAR(1) models we use.} We compute the IRFs with bootstrapped confidence intervals (CI) based on 1,000 samples using the \texttt{vars} \texttt{R} package.
% The community matrix $\Phi$ is interpretable as a network of commensal relationships \citep{ives_estimating_2003}. While the coefficients of $\mathbf{\Phi}$ correspond to direct associations between group sizes \cite{novak_characterizing_2016}, commensal relationships can also be indirect. Consider relationships between three groups (A, B, C) such that A partially competes with B and B partially competes with C but A and C have no direct relationship. A VAR(1) model inferring these relationships will have negative coefficients for $\phi_{AB}$ and $\phi_{BC}$ but $\phi_{AC}$ will be nearly zero.
% TODO plot the examples on figure 1.
%The central prediction of density dependence theory is that there will be a curviliear, inverse-U-shaped ($\cap$-shaped) relationship between overlap density and growth.
\subsection{Study C: Predicting growth}
\subsubsection{Average subreddit mutualism\nopunct}\label{sec:mes.sub.mut} $m_j$ is the independent variable for our test of H2 and measures the average influence of other subreddits in the ecological community on a given subreddit $j$, which we calculate by taking the mean of off-diagonal elements of row $j$ of the community matrix:
\begin{equation}\label{eq:average.subreddit.mutualism}
m_j = \frac{1}{\left|M\right|-1}\sum_{i\in M;i\ne j} \phi_{i,j}
\end{equation}
\noindent where $M$ is the set of subreddits in the ecological community and $\left|M\right|$ is the number of subreddits in $M$. We use the mean instead of the sum because different ecological communities have different numbers of subreddits.
\subsubsection{Regression models for H2} We test H2 by using likelihood ratio tests to compare Model 1 % (above in \ref{sec:reg.H1})
and Model 2 % in Equation \ref{eq:M2}
which adds \emph{average subreddit mutualism} ($m_i$) as a predictor. We also fit Model 3 % in Equation \ref{eq:M3}
which we compare to Model 2 to test if overlap density explains variation that average subreddit mutualism does not.
\begin{align}
\mathrm{Model~2} & & Y_i &= B_0 + B_1 d_{i} + B_2 d^2_{i} + B_3 m_i \label{eq:M2} \\
\mathrm{Model~3} & & Y_i &= B_0 + B_3 m_i \label{eq:M3}
\end{align}
\noindent where $Y_i$ is the growth of subreddit $i$, $d_i$ is its overlap density, $m_i$ is its average subreddit mutualism, and $B_0$, $B_1$, $B_2$, and $B_3$ are regression coefficients.
\subsubsection{Forecasting growth using ecological interactions}
\label{sec:mes.forecasting}
To test H3, we evaluate whether modeling ecological interactions improves time series forecasting of future participation in online groups by comparing the model in Equation \ref{eq:var1} to a baseline model with off-diagonal elements of $\mathbf{\Phi}$ fixed to 0. This baseline model is equivalent to our VAR model, but excludes ecological interactions.
We use two forecasting metrics with differing assumptions: root-mean-square-error (RMSE) and the continuous ranked probability score (CRPS). RMSE is commonly used, non-parametric, and intuitive, but does not take differing scales of the predicted variable or forecast uncertainty into account. Thus, in our setting it may place excessive weight on the forecasts of larger subreddits where errors may have greater magnitude simply because the absolute magnitude of the variance is greater. By rewarding forecasts where the true value has high probability under the predictive distribution, the CRPS accounts for variance in the data and rewards forecasts for both accuracy and precision and is thus a ``proper scoring rule'' for evaluating probabilistic forecasts \citep{gneiting_strictly_2007}. Our CRPS calculations assume that the predictive forecast distribution for each community is normal with standard deviations given by the 68.2\% forecast confidence interval. We calculate CRPS using the \texttt{scoringRules} \texttt{R} package \citep{jordan_evaluating_2019}.
\section{Results}
\label{sec:results}
% The organization of our results follows that of our methods. We begin with Study A % (§\ref{sec:res:studyA})
% in which we find, as predicted by H1, that the relationship between overlap density and growth is $\cap$-shaped relationship. Then, in Study B,% (§\ref{sec:res.characterizing})
% we explore a typology of ecological communities along two dimensions: (1) the degree to which a community is mutualistic or competitive, and (2) the overall strength of ecological interactions between the communities member groups. In the N.clusters ecological communities analyzed in our VAR(1) analysis, we find that mutualistic relationships are much more common than competitive ones. Our case studies % (§\ref{sec:case.studies})
% illustrate the typology using 4 example ecological communities. Finally, in Study C, we do not find support for H2 %in §\ref{sec:res.likelihood.ratio.test}
% as adding average subreddit mutualism to the density dependence model does not improve growth prediction. But we do find, in support of H3, that ecological interactions improve forecasting performance in our time series models.
\begin{figure*}
\centering
\includegraphics[width=\linewidth]{figures/knitr-fig_densityxgrowth-1}
\caption{Relationship between density and growth. A 2D histogram of subreddits with overlap density (log-transformed) on the X-axis and the change in the logarithm of the number of distinct commenting users on the Y-axis. The black line shows the marginal effect of overlap density on growth as predicted by Model 2. The gray region shows the 95\% confidence interval of the marginal effect. \label{fig:density}}
\end{figure*}
% In §\ref{sec:ecology_background} we presented H1 before RQ1 but we report results for H1 in the same section as H2 since they refer to the same regression model.
%We first present high-level findings that demonstrate advantages of our community ecology approach upon the overlap density approach. We find that accounting for commensal relationships in time-series models increases forecasting accuracy; that including subreddit average commensalism explains additional variation in subreddit over overlap density; and we compare the conclusions drawn density dependence analysis based on the correlation of overlap density and growth can lead about the ecological environment than our analysis modeling commensal relationships between groups. Finally, we examine the distribution of \emph{average commensalism} and \emph{average absolute commensalism} to illuminate a typology of ecological communities which we illustrate through
\subsection{Study A: Density Dependence Theory}
\label{sec:res:studyA}
%As discussed in §\ref{sec:ecology_background}, population ecology approaches in social computing propose that the relationship between overlap-density and growth/survival outcomes reflect an environment that may be competitive, mutualistic, or a mixture of both \citep{wang_impact_2012,zhu_impact_2014}.
We test the classical prediction of density dependence theory as formulated in H1 using Model 1 % (Equation \ref{eq:M1} in §\ref{methods:density})
which has first- and second-order terms for the effect of overlap density on growth. As described in §\ref{sec:ecology_background}, H1 hypothesizes that overlap density will have a curvilinear $\cap$-shaped (inverse-U-shaped) relationship with growth indicated by a positive first-order regression coefficient and a negative second-order coefficient.
\begin{table}
\centering
% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Thu, Jul 29, 2021 - 05:22:21 PM
\begin{tabular}{@{\extracolsep{5pt}}lccc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
& Model 1 & Model 2 & Model 3 \\
Overlap density & 1.50$^{*}$ (0.26) & 1.50$^{*}$ (0.26) & \\
Overlap density$^2$ & $-$2.08$^{*}$ (0.41) & $-$2.09$^{*}$ (0.41) & \\
Average subreddit commensalism & & 0.12 (0.26) & 0.11 (0.26) \\
Constant & $-$0.23$^{*}$ (0.03) & $-$0.23$^{*}$ (0.04) & $-$0.04$^{*}$ (0.01) \\
\hline \\[-1.8ex]
Log Likelihood & -4970 & -4970 & -4986 \\
Observations & 4,090 & 4,090 & 4,090 \\
\hline
\hline \\[-1.8ex]
\textit{Note:} & \multicolumn{3}{r}{$^*$p$<0.01$} \\
\end{tabular}
\caption{Loglinear regression predicting subreddit growth as a function of overlap density. The model supports the prediction of density dependence theory of a $\cap$-shaped relationship between overlap density and growth. \label{tab:density}}
\end{table}
As predicted, we observe a $\cap$-shaped relationship between overlap density and growth. Figure \ref{fig:density} plots the marginal effects of overlap density on growth for the median subreddit laid over the data on which the model is fit. Table \ref{tab:density} shows regression coefficients for Models 1-3. For about half of subreddits, increasing overlap density is associated with higher growth rates. The point where increasing density ceases to predict increasing growth and begins to predict decreasing growth is at the 49\textsuperscript{th} percentile.
Prototypical subreddits at this overlap density grew slightly (95\% CI:[0.001,0.06]). Yet subreddits at the lower and upper extremes of overlap density slightly declined on average. Typical groups at the 20\textsuperscript{th} percentile of overlap density decline by 1.1 members (95\% CI:[-1.1,-1.15]) and typical groups at the 80\textsuperscript{th} percentile decline by 1.2 members (95\% CI:[-1.1,-1.28]).
While we find support for the classical theoretical prediction of a curvilinear, ($\cap$-shaped) relationship between overlap density and growth, this does not imply that relationships between highly overlapping communities are more competitive.
% Instead our results below % in §\ref{sec:res.characterizing}
% show that relationships in ecological communities of subreddits with high user overlaps are typically mutualistic.
\subsection{Study B: Introducing Community Ecology}
\label{sec:res.characterizing}
% describe the figure and the main takeaway
% As described in §\ref{sec:characterizing.ecological.communities}, an ecological community can have positive or negative average ecological interaction §\ref{sec:mes.avg.mut} indicating if it is competitive or mutualistic and ecological interaction strength §\ref{sec:mes.abs.int} provides a way to distinguish ecological communities with a mixture of competitive and mutualistic interactions from those where ecological interactions are weak.
Figure \ref{fig:commense.x.abs.commense} visualizes the distribution of average ecological interaction and ecological interaction strength over the 641 ecological communities we identify.
We observe ecological communities characterized by strong forms of both mutualism and competition, others having mixtures of the two, and some with few significant ecological interactions. Mutualism is more common than competition, with the mean community having an average ecological interaction of 0.03 ($t=14.5$, $p<0.001$). We find that 524 clusters (81.7\%) are mutualistic. Not only are most ecological communities mutualistic, but more mutualistic ecological communities have greater ecological interaction strength (Spearman's $\rho=0.58$, $p<0.001$).
% Note that due to our clustering procedure, our analysis examines ecological interactions among subreddits with relatively high degrees of user overlap.
Therefore, our community ecology analysis suggests that among groups with similar users, mutualistic ecological interactions are more common than competitive ones.
\begin{figure}
\includegraphics[width=\linewidth]{figures/knitr-plot_commense_x_abs_commense-1}
\caption{Two-dimensional histogram showing ecological communities on Reddit in our typology. The X-axis shows the overall degree of mutualism or competition in clusters of subreddits with high user overlap based on the average ecological interaction. The Y-axis shows the ecological interaction strength representing the overall magnitude of competition or mutualism.}
\label{fig:commense.x.abs.commense}
\end{figure}
\subsubsection{Example ecological communities}
\label{sec:case.studies}
We present four case studies to illustrate our typology of ecological communities of online groups. Figure \ref{fig:commense.x.abs.commense} shows that we find clusters of subreddits characterized by mutualism, competition, a mixture of mutualism and competition, and few ecological relationships at all. We select one case from each of these four types using our measures of average ecological interaction (§\ref{sec:mes.avg.mut}) and ecological interaction strength (§\ref{sec:mes.abs.int}). To allow for more interesting network structures, we draw our cases from the 367 large clusters having at least five subreddits.
\input{resources/network-figures.tex}
Figure \ref{fig:networks}, presents visualizations of competition-mutualism networks representing statistically significant impulse response functions as described in §\ref{sec:mes.irf}. During our analysis, we also examined the terms of the vector autoregression parameter $\mathbf{\Phi}$, the impulse response functions, and model fits and forecasts, all of which are available in our online supplement. We also visited each subreddit in the clusters and read their sidebars and top posts to support our brief qualitative descriptions.
\subsubsection{Mutualism among mental health subreddits}
% TODO, cite somebody on mental health.
To find a case characterized by mutualism, we selected the top 37 large clusters with the greatest average ecological interaction. From these, we arbitrarily chose one interesting ecological community, the \textit{mental health} cluster, which includes 11 subreddits for supporting people in struggles with mental health, addiction, and surviving abuse.
Constitutive subreddits include those focused on specific mental health diagnoses like \texttt{r\Slash bpd} (bipolar disorder) and \texttt{r\Slash cptsd} (complex post traumatic stress disorder) while others like \texttt{r\Slash survivorsofabuse} and \texttt{r\Slash adultsurvivors}
are support groups.
The interactions among these subreddits are dense and primarily mutualistic as shown in Figure \ref{fig:mut.network}. There are a handful of competitive interactions like the reciprocal competition detected between \texttt{r\Slash codedependence} and \texttt{r\Slash bpd}. We also observe some interactions that are mutualistic in one direction and competitive in the other. For example, growth in \texttt{r\Slash addiction} predicts increasing growth in \texttt{r\Slash cptsd} even as that growth in \texttt{r\Slash cptsd} predicts decreasing growth in \texttt{r\Slash addiction}. This suggests a pattern in which \texttt{r\Slash cptsd} siphons members from \texttt{r\Slash addiction}. That said, the density of mutualistic interactions shown in Figure \ref{fig:mut.network} suggests that different subreddits have complementary roles in this ecological community as people turn to different types of groups for help with interrelated problems. While attempting to explain why different online groups form mutualistic or competitive interactions is left to future research, the example of mental health subreddits shows how groups with related topics and overlapping participants can have mutualistic interactions where growth in one predicts growth in many of the rest.
\subsubsection{Competition among real estate and finance subreddits}
To find competitive clusters, we selected from the 36 large clusters with the lowest average ecological interaction an ecological community that we label \textit{finance}. Among the 6 subreddits in this cluster, \texttt{r\Slash realestateinvesting}, \texttt{r\Slash realestate} and \texttt{r\Slash commercialrealestate} all deal in different aspects of the real estate industry, while \texttt{r\Slash financialindependence} and \texttt{r\Slash fatfire} (the acronym ``fire'' means ``financial independence/retire early'') are focused on building wealth and becoming financially independent and \texttt{r\Slash financialplanning} is a general purpose subreddit for financial advice.
In contrast to the mental health ecological community, the finance cluster has mostly competitive ties as visualized in Figure \ref{fig:comp.network}. The fact that even this cluster, among the most competitive in our data, contains a number of mutualistic ties reflects just how prevalent mutualism is among subreddits with high degrees of user overlap. That said, we detect three reciprocal competitive interactions among the three subreddits that focus on real estate. The edges from \texttt{r\Slash fatfire} to \texttt{r\Slash commercialrealestate} and \texttt{r\Slash financialindependence} are competitive as well.
Interestingly, all interactions between the general finance subreddits (\texttt{r\Slash financialplanning} and \texttt{r\Slash financialindependence}) and \texttt{r\Slash realestate} are mutualistic.
%Interestingly, are mutualistic.
\subsubsection{Mixed interactions among timepiece subreddits}
Next, we turn to an example of an ecological community with low average ecological interaction but high ecological interaction strength.
We first select the 36 %(10\%)
large clusters with the average ecological interaction closest to 0. To find an ecological community with a mixture of mutualism and competition, we select from the 15 clusters with the greatest ecological interaction strength from within this group and chose the \textit{timepiece} cluster containing 7 subreddits about watches.
As shown in Figure \ref{fig:mixed.network}, the ecological community of timepiece subreddits is dense with ecological interactions (although not as dense as the mental health subreddits). We observe both reciprocated mutualistic interactions, like that between \texttt{r\Slash rolex} and \texttt{r\Slash gshock}, and competitive interactions like that between \texttt{r\Slash gshock} and \texttt{r\Slash seiko}. We also observe numerous unreciprocated competitive and mutualistic relationships like the mutualism between \texttt{r\Slash watchexchange} and \texttt{r\Slash watchcirclejerk}\footnote{The suffix is widely understood on Reddit to signify a jokey, meme, or satirical subreddit.}
and the competition between \texttt{r\Slash japanesewatches} and \texttt{r\Slash seiko}.
Though the average ecological interaction among these subreddits is near 0, our analysis reveals a complex ecological community with a mixture of competition and mutualism.
\subsubsection{Sparse interactions among Call of Duty subreddits}
To find a case where ecological interactions are weak, we return to the group of the 36 %(10\%)
large clusters with the average ecological interaction closest to 0 but select from the 15 clusters within this group with the lowest ecological interaction strength. From these, we chose the \textit{Call of Duty} cluster containing five groups about the popular military first-person shooter series of video games.
% % more quotations
The Call of Duty ecological community is sparse, having only two significant ecological interactions among its 5 member groups. This ecological community includes subreddits about different editions of the series such as \texttt{r\Slash blackops3}, \texttt{r\Slash infinitewarfar} and \texttt{r\Slash wwii} as well as one about a popular spin-off zombie game \texttt{r\Slash codzombies} and the more general \texttt{r\Slash callofduty} subreddit. We find that growth in \texttt{r\Slash blackops3} or \texttt{r\Slash codzombies} predicts growth in \texttt{r\Slash infinitewarfare} and no other ecological interactions.
The timepiece and Call of Duty ecological communities illustrate how subreddits with overlapping users can have relatively strong or weak forms of ecological interdependence. Although both clusters are characterized by high degrees of user overlap and low average ecological interaction, the timepiece cluster has a dense competition-mutualism network while the call of duty network is sparse.
\subsection{Study C: Predicting Growth}
\label{sec:res.studyC}
We now compare the environmental approach of population ecology with the relational approach of community ecology.
In Study B, we presented examples of diverse ecological communities among subreddits with overlapping members. However, the presence of this diversity this does not mean that ecological interactions are related to the growth of online groups, the key outcome of previous ecological studies. We therefore hypothesized that ecological interactions will improve the predictive performance of a density dependence model in H2.
\subsubsection{Ecological interactions do not improve growth prediction}
\label{sec:res.likelihood.ratio.test}
To test H2, we compare Model 1, our density dependence model having first- and second-order terms for overlap density, with Model 2, which also includes average subreddit mutualism (§\ref{sec:mes.sub.mut}) as a predictor. We also examine Model 3, in which the only predictor is average subreddit mutualism. Table \ref{tab:density} shows regression coefficients for our models.
We do not observe a statistically significant association between average subreddit mutualism and growth ($B_3=0.12, SE=0.26$).
% We observe that average subreddit mutualism is positively associated with growth , which makes sense as subreddits with greater average subreddit mutualism benefit more from mutualism or are hurt less from competition.
Moreover, a likelihood ratio test comparing Model 1 and Model 2 does not support H2 as Model 2 does not predict subreddit growth better than Model 1 ($\chi^2 = 0.23$, $p>0.05$).
% Therefore, average subreddit mutualism does not help predict growth compared to the density dependence model alone.
Comparing Model 2 to Model 3 shows that overlap density explains variation that average subreddit mutualism does not ($\chi^2 = 33$, $p<0.001$).
%This suggests that the density of a subreddit's niche helps explain subreddit growth in important ways not captured by ecological interactions.
Overlap density helps explain a group's future growth, but the overall degree of mutualism or competition a group faces in its ecological community does not.
% In §\ref{sec:discussion}, we discuss how overlap density may only capture the hospitality of a group's environment and may be independent of mutualism and competition within its ecological community.
\subsubsection{Forecasting accuracy}
\label{sec:res.forecasting}
The likelihood ratio tests in §\ref{sec:res.likelihood.ratio.test} are limited because improvements in predictive performance (or lack thereof) may be due to unobserved factors predictive of growth that are correlated with average subreddit mutualism. We hypothesized in H3 that the intergroup dependencies in our VAR models can better forecast the size of subreddits compared to baseline time series models that do not account for ecological interactions. As described in §\ref{sec:mes.forecasting}, we test H3 by comparing two forecasting metrics: the root-mean-square-error (RMSE) and the continuous ranked probability score (CRPS).
VAR models including ecological interactions have forecasting performance superior to the baseline model in terms of both RMSE and CRPS. We evaluate the 24-week forecast performance for all subreddits which were assigned to clusters. The RMSE under the baseline model (0.84) is greater than the RMSE of the VAR models (0.75) and the CRPS of the baseline model (72,853) is also greater than the CRPS of the VAR models (72,669). This reflects a substantive improvement in forecast accuracy robust to the choice of the forecasting metric.
Our baseline model contains a constant term and a trend term for each group and therefore accounts for all time-invariant within-group variation. Because overlap density is a subreddit-level variable that does not vary over time,
we know that the improvement in forecasting performance comes from modeling ecological interactions in ways not captured by overlap density.
\section{Threats to Validity}
\label{sec:limitations}
Our work is subject to several important threats to validity that we cannot fully address. First, we study ecological communities on only one platform hosting online groups and our results may not generalize to other platforms or time periods.
Additionally, while our community ecology approach assumes that ecological interactions drive dynamics in the size of groups over time and cause groups to grow or decline, drawing causal inference using our method would depend on several untestable assumptions. For example, our ability to infer causal relationships might be limited if groups we do not consider---including groups on other platforms---play a role in an ecological community. Regression estimates in Models 1-3 may be confounded by omitted variables and cannot support causal interpretation.
Therefore, we refrain from claiming that the relationships we infer are causal.
The method we propose for identifying ecological interactions between online groups has limitations common to all time series analysis of observational data.
Potential omitted variables might also include additional time lags of group size. Although we chose to use VAR(1) models with only 1 time lag, we hope future work can improve upon our approach and model more complex dynamics with additional lags.
% Our results are offered as limited temporal associations consistent with inferred ecological interactions.
Like most other time series analysis, vector autoregression assumes that the error terms are stationary. This is difficult to evaluate empirically and may not be realistic \citep{canova_var_2007}. Future work might relax these assumptions using more complex models with time-varying parameters, state space models \citep{box-steffensmeier_time_2014}, nonlinear time series models \citep{cenci_regularized_2019, kantz_nonlinear_2003}, or stationarity-enforcing priors \citep{heaps_enforcing_2020}. Such approaches may require additional contextual knowledge and be difficult to scale to an analysis of hundreds of different ecological communities, but may prove fruitful in future work focusing on ecological communities of interest. Such models may also be useful in future work investigating how ecological interactions change over time.
Additional threats to validity stem from our use of algorithmic clustering to identify ecological communities.
Organizational ecologists have rarely attempted to estimate the full community matrix for an entire population containing a large number of groups because of data and statistical limitations \citep[e.g.][]{ruef_emergence_2000, sorensen_recruitment-based_2004}. For instance, 100 million possible ecological interactions exist within a set of 10,000 communities. Attempting to infer them all raises considerable computational and statistical challenges.
% This makes it necessary to narrow the scope to the ecological communities of interest in ways appropriate to the research question.
We chose to use a clustering analysis to explore the typical ecological communities on a platform.
% Yet, a
While we choose clusters based on high degrees of user overlap and validate our clustering in terms of the silhouette coefficient and purity criteria, we might have obtained different results if we had clustered in a different way. Additionally, our efforts to obtain clusters with a high silhouette coefficient lead us to remove a large number of subreddits from our analysis. Thus, our results are not representative of Reddit overall, but only of those subreddits that were included in our analysis. Furthermore, clustering algorithms like the one we use may not have unique solutions and different initial conditions and hyperparameters might lead to different results. While these allow us to scale up our analysis, future work should use principled definitions of an ecological community based on qualitative contextual knowledge in focused studies of particular ecological communities.
% future investigations should also consider qualitative approaches to constructing ecological communities.
% Finally, our three cases studies are limited in that they can offer only a proof-of-concept analysis and an enticing hint at more comprehensive future analyses with more rigorously defined populations of online groups.
% Although we found varying results in the three ecological communities we selected, these case studies can provide little explanation for when one should expect to find different forms of commensalism in online groups. Our hope is that these initial results can point in new directions for research.
% % We looked at three different sets of related online groups and found three qualitatively different ecological communities.
% As is true in all case study research, there is little reason to expect findings from any one of our case studies to generalize to any specific other set of contexts.
\section{Discussion}
\label{sec:discussion}
To introduce community ecology and compare it to population ecology, we presented three studies. In Study A, we found support for H1 showing---as predicted by density dependence theory---that overlap density has an $\cap$-shaped association with subreddit growth.
Subreddits with moderate overlap density in our data declined less than subreddits with either very low or very high overlap density.
According to population ecology theory, this suggests that high-density environments are competitive and less conducive to growth than medium-density environments.
%prevalence of mutualism among highly overlapping subreddits contrast with our results for
Surprisingly, this contrasts with our results in Study B, where we studied the diversity of ecological communities using vector autoregression models of group size over time to infer networks of ecological interactions.
%surveyed clusters of highly overlapping groups on Reddit to.
We find ecological communities that are mutualistic or competitive, that mix the two, or that have few significant ecological interactions at all. Overall, however, ecological communities of subreddits are typically mutualistic and mutualistic interactions are stronger on average than competitive ones. Although we find evidence of density dependence, density-dependent competition does not necessarily reflect typical relationships in ecological communities of highly overlapping subreddits.
%As discussed more below, our results are due to the fact that support for H1 does not necessarily mean that most relationships between subreddits with the greatest degrees of user overlap are competitive.
Our results in Study C show that the size of the other members of an ecological community improves time series forecasts of participation in online groups. However, average subreddit mutualism did not help predict growth.
This suggests that population ecology and community ecology offer complementary environmental and relational perspectives.
Population ecology's focus on environmental factors such as niche and overlap density is useful for predicting growth, but does not provide a way to study networks of mutualism and competition.
Community ecology unpacks density and provides insights about the specific relationships between groups. While modeling these interactions helps forecast participation levels in groups, the existence of these interactions may be independent of future growth. For example, if mutualistic relationships are common in declining ecological communities, that would explain our result for H2.
% these interactions helps time series forecasting, but whether the interactions
% While we advance community ecology as an alternative framework to population ecology, our results show that population ecology and community ecology are complementary perspectives.
% We tested H2 to find out whether including subreddit average mutualism improves the ability of a density dependence model to predict the size of a subreddit n.test weeks in the future and found that it did not. Therefore,
% Yet in support of H3, including ecological interactions in the vector autoregression (VAR) models substantially improves their forecasting performance.
% Our findings in Study A and Study B may appear contradictory, their coincidence in our data points to ways in which population ecology and community ecology conceive of different kinds of ecological dynamics.
The complementary nature of the two ecologies is seen in the coincidence of our findings in Study A and Study B.
Indeed, these results can help explain the puzzling set of empirical results about the relationship between overlap density and outcomes like growth, decline and survival \citep{wang_impact_2012, zhu_impact_2014, zhu_selecting_2014}.
Studies of density dependence theory in social computing measure the density of an online group's niche in terms of its overlap in participants or topics.
%Resource overlaps seem to reflect competitive forces in some circumstances but mutualistic ones in others.
Our analysis clearly shows that resource overlaps between two groups might have little to do with whether they are mutualists or competitors. Instead, overlaps may simply reflect the hospitality of the environment to groups with overlapping topics or user bases.
As a result, the differing environmental conditions of Wikis and Usenet groups might explain why user overlap was associated with the survival of wikis \citep{zhu_impact_2014} but with the decline of Usenet groups \citep{wang_impact_2012}. Wikia was a young and growing platform during \citepos{zhu_impact_2014} data collection period when the growth of groups may have been limited by knowledge of how to build a wiki, and this knowledge was provided by overlapping experienced users.
Usenet was in decline during \citepos{wang_impact_2012} study period and this may have produced competitive environmental conditions as users became more scarce.
%Users of groups with high overlap density may have greater commitment to the platform than to any particular group and competition over such users may become fierce when a platform goes into decline.
% as users with comm
% because
% and \citeauthor{tan_all_2015} \cite{tan_all_2015} observe that accounts posting in fewer different groups are more likely to leave a platform.
% As \citeauthor{kraut_building_2012} \cite{kraut_building_2012} argue, commitment to subgroups can enhance commitment to a broader group. This suggests that On the other hand, members of a group with high overlap density may have little commitment to it in particular.
% This suggests that commitment to a
% We suggest that when commitment to the platform declines this may amplify competition as
% may present environmental conditions for strong competition over those members
% This suggests that
% Such groups may face greater challenges in sustaining participation when the platform goes into decline.
The widespread mutualism found in Study B resonates with long-held understandings of ecological interactions in evolutionary theory \citep{kropotkin_mutual_2012}. Competition is unlikely to persist because it decreases survival. Because mutualism increases survival, it will be favored by natural selection \citep{armstrong_competitive_1980, axelrod_evolution_1981}. Similarly, competition can be avoided if groups adopt specialized roles in their ecological community, a dynamic known as resource partitioning in organizational ecology \citep{carroll_concentration_1985,menge_competition_1972,schoener_resource_1974}. Resource partitioning theory suggests that the competition among real estate subreddits observed in Figure \ref{fig:comp.network} may be due to a lack of specialization. If specialization does not emerge over time, such groups of competing subreddits may have decreased survival. By contrast, mental health support groups like those observed in Figure \ref{fig:comp.network} appear to have distinctive purposes or roles. Future work to test such mechanisms in ecological communities of online groups may reveal ways that online groups complement or cooperate with each other.
%Our results demonstrate population ecology's approach to competition and mutualism in a test of density dependence theory and provide an evaluation of community ecology's ability to predict subreddit growth.
%Future work should directly test this hypothesis about the relationships between platform-based and subgroup-based commitment.
% In general, competition over overlapping resources will have no effect on group growth if something besides the overlapping resource limits growth \cite{verhoef_community_2010}. For example, two wikis might share a large number of contributors (have high user overlap), but their growth might be limited by a lack of core contributors who perform important administrative tasks like policy making and software administration \cite{zhu_impact_2014}. Community ecology relaxes the assumption that competition and mutualism are caused by user overlap density and instead seeks to infer them from data.
% To illustrate our approach, we presented 4 example ecological communities found on Reddit §\ref{sec:case.studies}.
Within large platforms for online groups, the great number of ecological communities that can be studied should make it possible for future work to apply methods from network science to construct and test generalizable theories about the roles of different types of resources, design features of platforms, and governance institutions in these ecological interactions. Future work should also incorporate community ecology analysis in case studies of important topics such ecological communities engaged in peer production, political mobilization, misinformation, or mental health support.
Although we focused on online groups within a single platform, groups may use multiple platforms with distinctive affordances for different purposes \citep{fiesler_moving_2020, kiene_technological_2019}. Since the VAR method relies only on time series data to infer ecological interactions, it can be applied to study ecological communities spanning social media platforms. Community ecology can thus provide a bridge between quantitative studies of participation in online groups and theories of interconnected information ecologies \citep{nardi_information_1999}. While we focus on relationships between groups sharing a platform, one can apply our concepts and methods to understand how interdependent systems of technologies and users give rise to higher levels of social organization on social media platforms \citep{astley_two_1985, aldrich_organizations_2006}.
\subsection{Implications for Design}
% While Resnick et al.~\citep{resnick_starting_2012}
In the final chapter of their book on \textit{Building Successful Online Communities}, \citet{kraut_building_2012} advise managers of online groups to select an effective niche and beware of competition. However, these recommendations are based on little direct evidence from studies of online groups and offer almost no concrete steps that designer or group should take based on either piece of advice. Although further research into ecological interactions is needed before design principles can be derived, we provide a framework for online group managers to think about ecological constraints on group size.
While intuition suggests that online group managers might seek out mutualistic relationships and avoid competitive ones, it is often not obvious whether another group with overlapping users is a competitor or mutualist.
Our method provides a way for group managers to know.
Competitors have a negative impact on growth, but ecological theory suggests that specialization is an adaptive strategy in response to competition \citep{aldrich_organizations_2006, carroll_concentration_1985, kraut_building_2012, powell_network_2005}.
%For example, the growth of Wikipedia caused other online encyclopedia projects to shift their focus \cite{hill_almost_2013}.
Using our method, group managers might identify competitors limiting the growth of their groups. With the knowledge of this analysis in hand, they might be able to escape a competitive dynamic by specializing.
While competitive relationships are defined by how they decrease the size of groups, competition can also be important to the health of the broader ecological community. Exit to an alternative group can be an avenue for political change in response to grievances and poor governance \citep{hirschman_exit_1970, frey_emergence_2019}. The threat of competition with other groups may make expressions of voice more persuasive to moderators or platforms \citep{hirschman_exit_1970}.
Groups looking to increase activity should desire to seek out mutualistic relationships, and we believe that designers of online platforms can help them do so. Features such as meta-groups, group search, recommendation engines, and practices like linking related groups may lower barriers between groups and support mutualism. However, it is not obvious to what extent particular features will support competition, mutualism, or both. Using our method, managers and designers can test features intended to support mutualism.
\section{Conclusion}
% Rewrite conclusion
While explanations for the rise or decline of online groups often look to internal mechanisms, understanding the role of interdependence between online groups is increasingly important.
While prior research has investigated competition and mutualism among online groups with overlapping users and topics using the population ecology framework \citep{wang_impact_2012, zhu_impact_2014, zhu_selecting_2014}, this approach does not provide a way to infer competitive or mutualistic interactions among related groups.
We introduce the community ecology framework as a complementary perspective to population ecology.
% The two ecologies both seek to explain why online groups grow or survive, but they focus on different levels of analysis \cite{astley_two_1985}.
By inferring competition-mutualism networks directly from time-series data, our community ecology approach helps resolve the empirical tensions raised by prior ecological work in social computing and reveal that most interactions within clusters of subreddits with highly overlapping users are mutualistic. Our methods provide a foundation for future work investigating related online groups.
% \printbibliography[title={References},heading=secbib]