diff --git a/.gitignore b/.gitignore
index b07ec85..645319d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,6 +8,8 @@ index_files/
*.ttt
*.fff
*.revealjs.md
+site_libs/
+.quarto/
# R stuff
.Rproj.user
diff --git a/_article.qmd b/_article.qmd
new file mode 100644
index 0000000..fc89df2
--- /dev/null
+++ b/_article.qmd
@@ -0,0 +1,399 @@
+```{r}
+#| label: setup
+
+profile <- Sys.getenv("QUARTO_PROFILE", unset="acm")
+if (profile == "acm") {
+ class_wide <- ".column-body"
+} else {
+ class_wide <- ".column-page"
+}
+
+envs <- Sys.getenv()
+```
+
+# Introduction
+
+Following Twitter's 2022 acquisition, Mastodon---an open-source, decentralized social network and microblogging community---saw an increase in activity and attention as a potential Twitter alternative [@heFlockingMastodonTracking2023; @cavaDriversSocialInfluence2023]. While millions of people set up new accounts and significantly increased the size of the network, many of these newcomers and potential newcomers found the process confusing and many accounts did not remain active. Unlike centralized social media platforms, Mastodon is a network of independent servers with their own rules and norms [@nicholsonMastodonRulesCharacterizing2023]. Each server can communicate with each other using the shared ActivityPub protocols and accounts can move between Mastodon servers, but the local experience can vary widely from server to server.
+
+Although attracting and retaining newcomers is a key challenge for online communities [@krautBuildingSuccessfulOnline2011 p. 182], Mastodon's onboarding process has not always been straightforward. Variation among servers can also present a challenge for newcomers who may not even be aware of the specific rules, norms, or general topics of interest on the server they are joining [@diazUsingMastodonWay2022]. Further, many Mastodon servers have specific norms which people coming from Twitter may find confusing, such as local norms around content warnings [@nicholsonMastodonRulesCharacterizing2023]. Various guides and resources for people trying to join Mastodon offered mixed advice on choosing a server. Some suggest that the most important thing is to simply join any server and work from there [@krasnoffMastodon101How2022; @silberlingBeginnerGuideMastodon2023], while others have created tools and guides to help people find potential servers of interest by size and location[@thekinrarMastodonInstances; @kingMastodonMe2024].
+
+Mastodon's decentralized design has long been in tension with the disproportionate popularity of a small set of large, general-topic servers within the system [@ramanChallengesDecentralisedWeb2019a]. Analysing the activity of new accounts that join the network, we find that users who sign up on such servers are less likely to remain active after 91 days. We also find that many users who move accounts tend to gravitate toward smaller, more niche servers over time, suggesting that established users may also find additional utility from such servers.
+
+In response to these findings, we propose a potential way to create server and tag recommendations on Mastodon. This recommendation system could both help newcomers find servers that match their interests and help established accounts discover "neighborhoods" of related servers.
+
+# Background
+
+## Empirical Setting
+
+The Fediverse is a set of decentralized online social networks which interoperate using shared protocols like ActivityPub. Mastodon is a software program used by many Fediverse servers and offers a user experience similar to the Tweetdeck client for Twitter. It was first created in late 2016 and saw a surge in interest in 2022 during and after Elon Musk's Twitter acquisition.
+
+Mastodon features three kinds of timelines. The primary timelines is a "home" timeline which shows all posts from accounts followed by the user. Mastodon also supports a "local" timeline which shows all public posts from the local server and a "federated" timeline which includes all posts from users followed by other users on their server. The local timeline is unique to each server and can be used to discover new accounts and posts from the local community. On larger servers, this timeline can be unwieldy; however, on smaller servers, this presents the opportunity to discover new posts and users of potential interest.
+
+Discovery has been challenging on Masotodon. Text search, for instance, was impossible on most servers until support for this feature was added on an optional, opt-in basis using Elasticsearch in late 2023 [@rochkoMastodon2023]. Recommendation systems are currently a somewhat novel problem in the context of decentralized online social networks. @trienesRecommendingUsersWhom2018 developed a recommendation system for finding new accounts to follow on the Fediverse which used collaborative filtering based on BM25 in an early example of a content discovery system on Mastodon.
+
+Individual Mastodon servers can have an effect on the end experience of users. For example, some servers may choose to federate with some servers but not others, altering the topology of the Fediverse network for their users. At the same time, accounts need to be locked into one specific server. Because of Mastodon's data portability, users can move their accounts freely between servers while retaining their followers, though their post history remains with their original account.
+
+## The Mastodon Migrations
+
+Mastodon saw a surge in interest in 2022 and 2023, particularly after Elon Musk's Twitter acquisition. In particular, four events of interests drove measurable increases in new users to the network: the announcement of the acquisition (April 14, 2022), the closing of the acquisition (October 27, 2022), a day when Twitter suspended a number of prominent journalists (December 15, 2022), and a day when Twitter experienced an outage and started rate limiting accounts (July 1, 2023). Many Twitter accounts announced they were setting up Mastodon accounts and linked their new accounts to their followers, often using tags like #TwitterMigration [@heFlockingMastodonTracking2023] and driving interest in Mastodon in a process @cavaDriversSocialInfluence2023 found consistent with social influence theory.
+
+Some media outlets have framed reports on Mastodon [@hooverMastodonBumpNow2023] through what @zulliRethinkingSocialSocial2020 calls the "Killer Hype Cycle", whereby the media finds a new alternative social media platform, declares it a potential killer of some established platform, and later calls it a failure if it does not displace the existing platform. Such framing fails to take systems like the Fediverse seriously for their own merits: completely replacing existing commercial systems is not the only way to measure success, nor does it account for the real value the Fediverse provides for its millions of active users.
+
+Mastodon's approach to onboarding has also changed over time. In much of 2020 and early 2021, the Mastodon developers closed sign-ups to their flagship server and linked to an alternative server, which saw increased sign-ups during this period. They also linked to a list of servers on the "Join Mastodon" webpage [@mastodonggmbhServers], where all servers are pre-approved and follow the Mastodon Server Covenant which guarantees certain content moderation standards and data protections. Starting in 2023, the Mastodon developers shifted toward making the flagship server the default when people sign up on the official Mastodon Android and iOS apps [@rochkoNewOnboardingExperience2023; @rothItGettingEasier2023].
+
+## Newcomers in Online Communities
+
+Onboarding newcomers is an important part of the life cycle of online communities. Any community can expect a certain amount of turnover, and so it is important for the long-term health and longevity of the community to be able to bring in new members [@krautBuildingSuccessfulOnline2011 p. 182]. However, the process of onboarding newcomers is not always straightforward.
+
+The series of migrations of new users into Mastodon in many ways reflect folk stories of "Eternal Septembers" on previous communication networks, where a large influx of newcomers challenged the existing norms [@driscollWeMisrememberEternal2023; @kieneSurvivingEternalSeptember2016]. Many Mastodon servers do have specific norms which people coming from Twitter may find confusing, such as local norms around content warnings [@nicholsonMastodonRulesCharacterizing2023]. Variation among servers can also present a challenge for newcomers who may not even be aware of the specific rules, norms, or general topics of interest on the server they are joining [@diazUsingMastodonWay2022]. Mastodon servers open to new accounts must thus be both accommodating to newcomers while at the same ensuring the propagation of their norms and culture, either through social norms or through technical means.
+
+# Data
+
+```{r}
+#| label: fig-account-timeline
+#| fig-cap: "Accounts in the dataset created between January 2022 and March 2023. The top panels shows the proportion of accounts still active 45 days after creation, the proportion of accounts that have moved, and the proportion of accounts that have been suspended. The bottom panel shows the count of accounts created each week. The dashed vertical lines in the bottom panel represent the annoucement day of the Elon Musk Twitter acquisition, the acquisition closing day, a day where Twitter suspended a number of prominent journalist, and a day when Twitter experienced an outage and started rate limiting accounts."
+#| fig-height: 2.75
+#| fig-width: 6.75
+#| fig-env: figure*
+#| fig-pos: tb!
+
+library(here)
+source(here("code/helpers.R"))
+account_timeline_plot()
+```
+
+Mastodon has an extensive API which allows for the collection of public posts and account information. We collected data from the public timelines of Mastodon servers using the Mastodon API with a crawler which runs once per day. We also collected account information from the opt-in public profile directories on these servers.
+
+```{r}
+#| label: data-counts
+#| cache: true
+
+library(arrow)
+library(tidyverse)
+library(here)
+source(here("code/helpers.R"))
+
+accounts <- load_accounts(filt = FALSE) %>%
+ filter(created_at >= "2020-08-14") %>%
+ filter(created_at < "2024-01-01")
+
+tag_posts <- "data/scratch/all_tag_posts.feather" %>%
+ arrow::read_ipc_file(. , col_select = c("host", "acct", "created_at")) %>%
+ filter(created_at >= as.Date("2023-05-01")) %>%
+ filter(created_at < as.Date("2023-08-01"))
+
+text_format <- function(df) {
+ return (format(nrow(df), big.mark=","))
+}
+
+num_tag_posts <- tag_posts %>% text_format()
+num_tag_accounts <- tag_posts %>% distinct(host, acct) %>% text_format()
+num_tag_servers <- tag_posts %>% distinct(host) %>% text_format()
+
+num_accounts_unfilt <- accounts %>% text_format()
+num_account_bots <- accounts %>% filter(bot) %>% text_format()
+num_account_nostatuses <- accounts %>% filter(is.na(last_status_at)) %>% text_format()
+num_account_suspended <- accounts %>% mutate(suspended = replace_na(suspended, FALSE)) %>% filter(suspended) %>% text_format()
+num_accounts_moved <- accounts %>% filter(has_moved) %>% text_format()
+num_account_limited <- accounts %>% filter(limited) %>% text_format()
+num_account_samedaystatus <- accounts %>% filter(last_status_at <= created_at) %>% text_format()
+num_account_filt <- load_accounts(filt = TRUE) %>% text_format()
+```
+
+**Mastodon Profiles**: We collected accounts using data previously collected from posts on public Mastodon timelines from October 2020 to August 2023. We then queried for up-to-date information on those accounts including their most recent status and if the account had moved as of February 2024. Through this process, we discovered a total of `r num_accounts_unfilt` account created between August 14, 2020 and January 1, 2024. We then filtered out accounts which were bots (`r num_account_bots` accounts), had been suspended (`r num_account_suspended` accounts), had been marked as moved to another account (`r num_accounts_moved` accounts), had been limited by their local server (`r num_account_limited` accounts), had no statuses (`r num_account_nostatuses` accounts), or had posted their last status on the same day as their account creation (`r num_account_samedaystatus` accounts). This gave us a total of `r num_account_filt` accounts which met all the filtering criteria. Note that because we got updated information on each account, we include only accounts on servers which still existed at the time of our profile queries and which returned records for the account.
+
+**Tags**: Mastodon supports hashtags, which are user-generated metadata tags that can be added to posts. Clicking the link for a tag shows a stream of posts which also have that tag from the federated timeline, which includes accounts on the same server and posts from accounts followed by the accounts on the local server. We collected `r num_tag_posts` statuses posted by `r num_tag_accounts` accounts on `r num_tag_servers` unique servers from between May to July 2023 which contained at least one hashtag.
+
+# Analysis and Results
+
+## Survival Model
+
+*Are accounts on suggested general servers less likely to remain active than accounts on other servers?*
+
+```{r, cache.extra = tools::md5sum("code/survival.R")}
+#| cache: true
+#| label: fig-survival
+#| fig-env: figure
+#| fig-cap: "Survival probabilities for accounts created during May 2023."
+#| fig-width: 3.375
+#| fig-height: 2.5
+#| fig-pos: h!
+
+library(here)
+source(here("code/survival.R"))
+plot_km
+```
+
+```{r}
+#| label: table-coxme
+library(ehahelper)
+library(broom)
+
+cxme_table <- tidy(cxme) %>%
+ mutate(conf.low = exp(conf.low), conf.high=exp(conf.high)) %>%
+ mutate(term = case_when(
+ term == "factor(group)1" ~ "Join Mastodon",
+ term == "factor(group)2" ~ "General Servers",
+ term == "small_serverTRUE" ~ "Small Server",
+ TRUE ~ term
+ )) %>%
+ mutate(exp.coef = paste("(", round(conf.low, 2), ", ", round(conf.high, 2), ")", sep="")) %>%
+ select(term, estimate, exp.coef , p.value)
+```
+
+Using `r text_format(sel_a)` accounts created from May 1 to June 30, 2023, we create a Kaplan–Meier estimator for the probability that an account will remain active based on whether the account is on one of the largest general instances [^1] featured at the top of the Join Mastodon webpage or otherwise if it is on a server in the Join Mastodon list. Accounts are considered active if they have made at least one post after the censorship period `r active_period` days after account creation.
+
+[^1]: `r paste(general_servers, collapse=", ")`
+
+::: {.content-visible unless-profile="icwsm"}
+
+::: {#tbl-cxme .column-body}
+```{r}
+if (knitr::is_latex_output()) {
+ cxme_table %>% knitr::kable(format="latex", booktabs=TRUE, digits=3)
+} else {
+ cxme_table %>% knitr::kable(digits = 3)
+}
+```
+
+Coefficients for the Cox Proportional Hazard Model with Mixed Effects. The model includes a random effect for the server.
+
+:::
+
+
+We also construct a Mixed Effects Cox Proportional Hazard Model
+
+$$
+h(t_{ij}) = h_0(t) \exp\left(\begin{aligned}
+ &\beta_1 \text{Join Mastodon} \\
+ &+ \beta_2 \text{General Servers} \\
+ &+ \beta_3 \text{Small Server} \\
+ &+ b_{j}
+\end{aligned}\right)
+$$
+
+where $h(t_{ij})$ is the hazard for account $i$ on server $j$ at time $t$, $h_0(t)$ is the baseline hazard, $\beta_1$ is the coefficient for whether the account is on a server featured on Join Mastodon, $\beta_2$ is the coefficient for whether the account is on one of the largest general instances, $\beta_3$ is the coefficient for whether the account is on a small server with less than 100 accounts, and $b_{j}$ is the random effect for server $j$.
+
+
+
+We again find that accounts on the largest general instances are less likely to remain active than accounts on other servers, while accounts created on smaller servers are more likely to remain active.
+
+:::
+
+## Moved Accounts
+
+*Do accounts tend to move to larger or smaller servers?*
+
+Mastodon users can move their accounts to another server while retaining their connections (but not their posts) to other Mastodon accounts. This feature, built into the Mastodon software, offers data portability and helps avoid lock-in.
+
+```{r}
+#| label: table-ergm-table
+#| echo: false
+#| warning: false
+#| message: false
+#| error: false
+
+library(here)
+library(modelsummary)
+library(kableExtra)
+library(purrr)
+library(stringr)
+load(file = here("data/scratch/ergm-model-early.rda"))
+load(file = here("data/scratch/ergm-model-late.rda"))
+
+if (knitr::is_latex_output()) {
+ format <- "latex_tabular"
+} else {
+ format <- "html"
+}
+
+x <- modelsummary(
+ list("Coef." = model.early, "Std.Error" = model.early, "Coef." = model.late, "Std.Error" = model.late),
+ estimate = c("{estimate}", "{stars}{std.error}", "{estimate}", "{stars}{std.error}"),
+ statistic = NULL,
+ gof_omit = ".*",
+ coef_rename = c(
+ "sum" = "Sum",
+ "nonzero" = "Nonzero",
+ "diff.sum0.h-t.accounts" = "Smaller server",
+ "nodeocov.sum.accounts" = "Server size\n(outgoing)",
+ "nodeifactor.sum.registrations.TRUE" = "Open registrations\n(incoming)",
+ "nodematch.sum.language" = "Languages match"
+ ),
+ align="lrrrr",
+ stars = c('*' = .05, '**' = 0.01, '***' = .001),
+ output = format
+ ) %>% add_header_above(c(" " = 1, "Model A" = 2, "Model B" = 2))
+```
+
+:::: {#tbl-ergm-table `r class_wide`}
+
+```{r}
+x
+```
+
+Exponential family random graph models for account movement between Mastodon servers. Accounts in Model A were created in May 2022 and moved to another account at some later point. Accounts in Model B were created at some earlier point and moved after October 2023.
+
+::::
+
+To corroborate our findings, we also use data from thousands of accounts which moved between Mastodon servers, taking advantage of the data portability of the platform. Conceiving of these moved accounts as edges within a weighted directional network where nodes represent servers, edges represent accounts, and weights represent the number of accounts that moved between servers, we construct an exponential family random graph model (ERGM) with terms for server size, open registrations, and language match between servers. We find that accounts are more likely to move from larger servers to smaller servers.
+
+
+# Proposed Recommendation System
+
+*How can we build an opt-in, low-resource recommendation system for finding Fediverse servers?*
+
+Based on these findings, we suggest a need for better ways for newcomers to find servers and propose a viable way to create server and tag recommendations on Mastodon. This system could both help newcomers find servers that match their interests and help established accounts discover "neighborhoods" of related servers.
+
+One challenge in building such a system is the decentralized nature of the system. A single, central actor which collects data from servers and then distributes recommendations would be antithetical to the decentralized nature of Mastodon. Instead, we propose a system where servers can report the top hashtags by the number of unique accounts on the server using them during the last three months. Such a system would be opt-in and require few additional server resources since tags already have their own database table.
+
+## Recommendation System Design
+
+We use Okapi BM25 to construct a term frequency-inverse document frequency (TF-IDF) model to associate the top tags with each server using counts of tag-account pairs from each server for the term frequency and the number of servers that use each tag for the inverse document frequency. We then L2 normalize the vectors for each tag and calculate the cosine similarity between the tag vectors for each server.
+
+$$
+tf = \frac{f_{t,s} \cdot (k_1 + 1)}{f_{t,s} + k_1 \cdot (1 - b + b \cdot \frac{|s|}{avgstl})}
+$$
+
+where $f_{t,s}$ is the number of accounts using the tag $t$ on server $d$, $k_1$ and $b$ are tuning parameters, and $avgstl$ is the average sum of account-tag pairs. For the inverse document frequency, we use the following formula:
+
+$$
+idf = \log \frac{N - n + 0.5}{n + 0.5}
+$$
+
+where $N$ is the total number of servers and $n$ is the number of servers where the tag appears as one of the top tags. We then apply L2 normalization:
+
+$$
+tfidf = \frac{tf \cdot idf}{\| tf \cdot idf \|_2}
+$$
+
+## Applications
+
+```{r}
+#| eval: false
+library(tidyverse)
+library(igraph)
+library(arrow)
+
+sim_servers <- "data/scratch/server_similarity.feather" %>% arrow::read_ipc_file() %>% rename("weight" = "Similarity")
+#sim_net <- as.network(sim_servers)
+g <- graph_from_data_frame(sim_servers, directed = FALSE)
+
+g_strength <- log(sort(strength(g)))
+normalized_strength <- (g_strength - min(g_strength)) / (max(g_strength) - min(g_strength))
+
+server_centrality <- enframe(normalized_strength, name="server", value="strength")
+server_centrality %>% arrow::write_ipc_file("data/scratch/server_centrality.feather")
+```
+
+### Server Similarity Neighborhoods
+
+Mastodon provides two feeds in addition to a user's home timeline populated by accounts they follow: a local timeline with all public posts from their local server and a federated timeline which includes all posts from users followed by other users on their server. We suggest a third kind of timeline, a *neighborhood timeline*, which filters the federated timeline by topic.
+
+We calculate the pairwise similarity between two servers with TF-IDF vectors $A$ and $B$ using cosine similarity:
+
+$$
+\text{similarity}(A, B) = \frac{A \cdot B}{\|A\| \|B\|}
+$$
+
+::: {#tbl-sim-servers .content-visible unless-profile="icwsm"}
+
+```{r}
+#| label: table-sim-servers
+library(tidyverse)
+library(arrow)
+
+sim_servers <- "data/scratch/server_similarity.feather" %>% arrow::read_ipc_file()
+server_of_interest <- "hci.social"
+server_table <- sim_servers %>%
+ arrange(desc(Similarity)) %>%
+ filter(Source == server_of_interest | Target == server_of_interest) %>%
+ head(5) %>%
+ pivot_longer(cols=c(Source, Target)) %>%
+ filter(value != server_of_interest) %>%
+ select(value, Similarity) %>%
+ rename("Server" = "value")
+
+if (knitr::is_latex_output()) {
+ server_table %>% knitr::kable(format="latex", booktabs=TRUE, digits=3)
+} else {
+ server_table %>% knitr::kable(digits = 3)
+}
+```
+
+Top five servers most similar to hci.social
+
+:::
+
+### Server Discovery
+
+Given a set of popular tags and a list of servers, we build a recommendation system where users select tags from a list of popular tags and receive server suggestions. The system first creates a subset of vectors based on the TF-IDF matrix which represents the top clusters of topics. After a user selects the top tags of interest to them, it suggests servers which match their preferences.
+
+### Tag Similarity
+
+We also calculate the similarity between tags using the same method. This can be used to suggest related tags to users based on their interests.
+
+::: {.content-visible unless-profile="icwsm"}
+
+## Rubustness to Limited Data
+
+```{r}
+#| label: fig-simulations-rbo
+#| fig-env: figure*
+#| cache: true
+#| fig-width: 6.75
+#| fig-height: 3
+#| fig-pos: tb
+library(tidyverse)
+library(arrow)
+simulations <- arrow::read_ipc_file("data/scratch/simulation_rbo.feather")
+
+simulations %>%
+ group_by(servers, tags, run) %>% summarize(rbo=mean(rbo), .groups="drop") %>%
+ mutate(ltags = as.integer(log2(tags))) %>%
+ ggplot(aes(x = factor(ltags), y = rbo, fill = factor(ltags))) +
+ geom_boxplot() +
+ facet_wrap(~servers, nrow=1) +
+ #scale_y_continuous(limits = c(0, 1)) +
+ labs(x = "Tags (log2)", y = "RBO", title = "Rank Biased Overlap with Baseline Rankings by Number of Servers") +
+ theme_minimal() + theme(legend.position = "none")
+```
+
+A challenge for a federated recommendation system like we propose is that it needs buy in from a sufficient number of servers to provide value. There is also a tradeoff between the amount of tags to expose for each server and potential concerns about exposing too much data.
+
+We simulated various scenarios that limit both servers that report data and the number of tags they report. We used rank biased overlap (RBO) to then compare the outputs from these simulations to the baseline with more complete information from all tags on all servers [@webberSimilarityMeasureIndefinite2010]. In particular, we gave a higher weight to suggestions with a higher rank, with weights decaying by a factor of $k^{0.80}$. @fig-simulations-rbo shows how the average agreement with the baseline scales, which take the top 256 tags from each server.
+
+:::
+
+# Discussion
+
+The analysis can also be improved by additionally focusing on factors lead to accounts remaining active or dropping out, which a particular focus on the actual activity of accounts over time. For instance, do accounts that interact with other users more remain active longer? Are there particular markers of activity that are more predictive of account retention? Future work could use these to provide suggests for ways to helps newcomers during the onboarding process.
+
+The observational nature of the data limit some of the causal claims we can make. It is unclear, for instance, if accounts on general servers are less likely to remain active because of the server itself or because of the type of users who join such servers. For example, it is conceivable that the kind of person who spends more time researching which server to join is more invested in their Mastodon experience than one who simply joins the first server they find.
+
+Future work is necessary to determine the how well the recommendation system is at helping users find servers that match their interests. This may involve user studies and interviews to determine how well the system works in practice.
+
+While the work presented here is based on observed posts on the public timelines, simulations may be helpful in determining the robustness of the system to targeted attacks. Due to the decentralized nature of the system, it is feasible that a bad actor could set up zombie accounts on servers to manipulate the recommendation system. Simulations could help determine how well the system can resist such attacks and ways to mitigate this risk.
+
+# Conclusion
+
+Based on analysis of trace data from millions of new Fediverse accounts, we find evidence that suggests that servers matter and that users tend to move from larger servers to smaller servers. We then propose a recommendation system that can help new Fediverse users find servers with a high probability of being a good match based on their interests. Based on simulations, we demonstrate that such a tool can be effectively deployed in a federated manner, even with limited data on each local server.
+
+# References {#references}
+
+::: {.content-visible unless-profile="icwsm"}
+
+# Glossary {.appendix}
+
+*ActivityPub*: A decentralized social networking protocol based on the ActivityStreams 2.0 data format.
+
+*Fediverse*: A set of decentralized online social networks which interoperate using shared protocols like ActivityPub.
+
+*Mastodon*: An open-source, decentralized social network and microblogging community.
+
+*Hashtag*: A user-generated metadata tag that can be added to posts.
+
+*Federated timeline*: A timeline which includes all posts from users followed by other users on their server.
+
+*Local timeline*: A timeline with all public posts from the local server.
+:::
\ No newline at end of file
diff --git a/_quarto-acm.yml b/_quarto-acm.yml
new file mode 100644
index 0000000..726bd09
--- /dev/null
+++ b/_quarto-acm.yml
@@ -0,0 +1,18 @@
+manuscript:
+ article: acm.qmd
+ code-links:
+ - text: Preprocessing
+ href: code/preprocess.py
+ - text: R code
+ href: code/helpers.R
+ - href: code/survival.R
+ notebooks:
+ - notebook: notebooks/_moved.qmd
+ - notebook: notebooks/arima.qmd
+ environment: renv.lock
+
+execute:
+ freeze: auto
+
+
+
diff --git a/_quarto-icwsm.yml b/_quarto-icwsm.yml
new file mode 100644
index 0000000..6512f00
--- /dev/null
+++ b/_quarto-icwsm.yml
@@ -0,0 +1,14 @@
+project:
+ type: manuscript
+ render:
+ - article.qmd
+manuscript:
+ article: icwsm.qmd
+ environment: renv.lock
+execute:
+ echo: false
+ error: false
+ warning: false
+ message: false
+ freeze: auto
+ cache: true
\ No newline at end of file
diff --git a/_quarto.yml b/_quarto.yml
index 9a2a292..ed9f760 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -1,30 +1,13 @@
project:
type: manuscript
-
-manuscript:
- article: article.qmd
- code-links:
- - text: Preprocessing
- href: code/preprocess.py
- - text: R code
- href: code/helpers.R
- - href: code/survival.R
- notebooks:
- # - notebook: _tags.qmd
- # - notebook: _pull_pull.qmd
- - notebook: notebooks/_moved.qmd
- - notebook: notebooks/arima.qmd
- # - notebook: Presentation.qmd
- environment: renv.lock
format:
- html:
- comments:
- hypothesis: true
- pdf:
- template: template.tex
-
-execute:
- freeze: true
-
-
-
+ acm-html:
+ hypothesis: false
+ acm-pdf:
+ output-file: mastodon-recommendations-acm.pdf
+ include-in-header:
+ - text: |
+ \usepackage{siunitx}
+ docx: default
+profile:
+ default: acm
\ No newline at end of file
diff --git a/acm.qmd b/acm.qmd
new file mode 100644
index 0000000..bff7d97
--- /dev/null
+++ b/acm.qmd
@@ -0,0 +1,100 @@
+---
+title: "Do Servers Matter on Mastodon? Data-driven Design for Decentralized Social Media"
+short-title: Mastodon Recommendations
+authors:
+ - name: Carl Colglazier
+ affiliation:
+ name: Northwestern University
+ city: Evanston
+ state: Illinois
+ country: United States
+ corresponding: true
+bibliography: references.bib
+format:
+ acm-html:
+ comments:
+ hypothesis: false
+ acm-pdf:
+ output-file: mastodon-recommendations-acm.pdf
+ keep-md: true
+ include-in-header:
+ - text: |
+ \usepackage{siunitx}
+acm-metadata:
+ # comment this out to make submission anonymous
+ anonymous: true
+ # comment this out to build a draft version
+ #final: true
+
+ # comment this out to specify detailed document options
+ # acmart-options: sigconf, review
+
+ # acm preamble information
+ copyright-year: 2018
+ acm-year: 2018
+ copyright: acmcopyright
+ doi: XXXXXXX.XXXXXXX
+ conference-acronym: "Conference acronym 'XX"
+ conference-name: |
+ Make sure to enter the correct
+ conference title from your rights confirmation email
+ conference-date: June 03--05, 2018
+ conference-location: Woodstock, NY
+ price: "15.00"
+ isbn: 978-1-4503-XXXX-X/18/06
+
+ # if present, replaces the list of authors in the page header.
+ shortauthors: Colglazier
+
+ # The code below is generated by the tool at http://dl.acm.org/ccs.cfm.
+ # Please copy and paste the code instead of the example below.
+ ccs: |
+ \begin{CCSXML}
+
+
+ 10003120.10003130.10003233.10010519
+ Human-centered computing~Social networking sites
+ 500
+
+
+ 10002951.10003317.10003338
+ Information systems~Retrieval models and ranking
+ 300
+
+
+ 10010405.10010497.10010498
+ Applied computing~Document searching
+ 300
+
+
+ 10003120.10003130
+ Human-centered computing~Collaborative and social computing
+ 300
+
+
+ \end{CCSXML}
+
+ \ccsdesc[500]{Human-centered computing~Social networking sites}
+ \ccsdesc[300]{Information systems~Retrieval models and ranking}
+ \ccsdesc[300]{Applied computing~Document searching}
+ \ccsdesc[300]{Human-centered computing~Collaborative and social computing}
+ keywords:
+ - decentralized online social networks
+abstract: |
+ When trying to join Mastodon, a decentralized collection of interoperable social networking servers, new users face the dilemma of choosing a home server. Using trace data from millions of new Mastodon accounts, we show that new accounts are less likely to remain active on the network's largest general instances compared to others. Additionally, we observe a trend of users migrating from larger to smaller servers. Addressing the challenge of onboarding and server selection, the paper proposes a decentralized recommendation system for server using hashtags and the Okapi BM25 algorithm. This system leverages servers' top hashtags and their frequency to create a recommendation mechanism that respects Mastodon's decentralized ethos. Simulations demonstrate that such a tool can be effective even with limited data on each local server.
+execute:
+ echo: false
+ error: false
+ warning: false
+ message: false
+ freeze: false
+ cache: true
+fig-width: 6.75
+knitr:
+ opts_knit:
+ verbose: true
+code-block-border-left: false
+code-block-bg: false
+---
+
+{{< include _article.qmd >}}
\ No newline at end of file
diff --git a/article.qmd b/article.qmd
deleted file mode 100644
index 58713f1..0000000
--- a/article.qmd
+++ /dev/null
@@ -1,373 +0,0 @@
----
-title: Recommending Servers on Mastodon
-short-title: Mastodon Recommendations
-authors:
- - name: Carl Colglazier
- affiliation:
- name: Northwestern University
- city: Evanston
- state: Illinois
- country: United States
- corresponding: true
-bibliography: references.bib
-pdf-engine: pdflatex
-format:
- html: default
- pdf+icwsm:
- fig-pos: 'ht!bp'
- cite-method: natbib
- template: template.tex
- keep-md: true
- link-citations: false
- acm-pdf:
- output-file: mastodon-recommendations-acm.pdf
-acm-metadata:
- # comment this out to make submission anonymous
- anonymous: true
- # comment this out to build a draft version
- #final: true
-
- # comment this out to specify detailed document options
- # acmart-options: sigconf, review
-
- # acm preamble information
- copyright-year: 2018
- acm-year: 2018
- copyright: acmcopyright
- doi: XXXXXXX.XXXXXXX
- conference-acronym: "Conference acronym 'XX"
- conference-name: |
- Make sure to enter the correct
- conference title from your rights confirmation emai
- conference-date: June 03--05, 2018
- conference-location: Woodstock, NY
- price: "15.00"
- isbn: 978-1-4503-XXXX-X/18/06
-
- # if present, replaces the list of authors in the page header.
- shortauthors: Colglazier
-
- # The code below is generated by the tool at http://dl.acm.org/ccs.cfm.
- # Please copy and paste the code instead of the example below.
- ccs: |
- \begin{CCSXML}
-
-
- 10010520.10010553.10010562
- Computer systems organization~Embedded systems
- 500
-
-
- 10010520.10010575.10010755
- Computer systems organization~Redundancy
- 300
-
-
- 10010520.10010553.10010554
- Computer systems organization~Robotics
- 100
-
-
- 10003033.10003083.10003095
- Networks~Network reliability
- 100
-
-
- \end{CCSXML}
-
- \ccsdesc[500]{Computer systems organization~Embedded systems}
- \ccsdesc[300]{Computer systems organization~Redundancy}
- \ccsdesc{Computer systems organization~Robotics}
- \ccsdesc[100]{Networks~Network reliability}
-
- keywords:
- - decentralized online social networks
-abstract: |
- When trying to join the Fediverse, a decentralized collection of interoperable social networking websites, new users face the dillema of choosing a home server. Using trace data from millions of new Fediverse accounts, we show that new accounts on the flagship server are less likely to remain active and that accounts that move between servers tend to move from larger servers to smaller server. We then use the insights from our analysis to build a tool that can help new Fediverse users find servers with a high probability of being a good match based on their interests. Based on simulations, we demonstrate that such a tool can be effective even with limited data on each local server.
-execute:
- echo: false
- error: false
- warning: false
- message: false
- freeze: auto
-fig-width: 6.75
-knitr:
- opts_knit:
- verbose: true
-#filters:
-# - parse-latex
----
-
-# Introduction
-
-The Fediverse has emerged as a viable alternative to corporate, centralized social media such as Twitter and Reddit. Over the course of the last two years, millions of people have set up new accounts, significantly increasing the size of the network. In the wake of Elon Musk's Twitter aquisition, Mastodon, a popular Fediverse software which offers a Twitter-like experience, saw in increase in activity and scrutiny.
-
-We show how the onboarding process for Mastodon has changed over time with a particular focus on the largest, flagship Mastodon server. Users who sign up to this server are less likely to remain active. Based on data from over a million Mastodon accounts, we also find that many users who move accounts tend to gravitate toward smaller, more niche servers over time.
-
-We design a potential way to create server and tag recommendations on Mastodon, which could both help newcomers find servers that match their interests and help established accounts discover "neighborhoods" of related servers.
-
-# Background
-
-## Empirical Setting
-
-The Fediverse is a set of decentralized online social networks which interoperate using shared protocols like ActivityPub. Mastodon is a software program used by many Fediverse servers and offers a user experience similar to the Tweetdeck client for Twitter. It was first created in late 2016 and saw a surge in interest in 2022 during and after Elon Musk's Twitter acquisition.
-
-Discovery has been challenging on Masotodon. The developers and user base tend to be skeptical of algorithmic intrusions, instead opting for timelines which only show posts in reverse chronological order. Search is also difficult. Public hashtags are searchable, but most servers have traditionally not supported searching keywords or simple strings. Accounts can only be searched using their full `username@server` form.
-
-Mastodon features a "home" timeline which shows all public posts from accounts that share the same home server. On larger servers, this timeline can be unwieldy; however, on smaller servers, this presents the opportunity to discover new posts and users of potential interest.
-
-Mastodon offers its users high levels of data portability. Users can move their accounts accross instances while retaining their follows (their post data; however, does not move with the new account). The choice of an initial instance consequentially is not irreversible.
-
-## Newcomers in Online Communities
-
-Onboarding newcomers is an important part of the lifecycle of online communities. Any community can expect a certain amount of turnover, and so it is important for the long-term health and longevity of the community to be able to bring in new members [@krautBuildingSuccessfulOnline2011 p. 182]. However, the process of onboarding newcomers is not always straightforward. Newcomers may have difficulty finding the community, understanding the norms and expectations, and finding a place for themselves within the community. This can lead to high rates of attrition among newcomers.
-
-## The Mastodon Migrations
-
-```{r}
-#| label: fig-account-timeline
-#| fig-cap: "Accounts in the dataset created between January 2022 and March 2023. The top panels shows the proportion of accounts still active 45 days after creation, the proportion of accounts that have moved, and the proportion of accounts that have been suspended. The bottom panel shows the count of accounts created each week. The dashed vertical lines in the bottom panel represent the annoucement day of the Elon Musk Twitter acquisition, the acquisition closing day, a day where Twitter suspended a number of prominent journalist, and a day when Twitter experienced an outage and started rate limiting accounts."
-#| fig-height: 2.75
-#| fig-width: 6.75
-#| fig-env: figure*
-#| fig-pos: tb!
-
-library(here)
-source(here("code/helpers.R"))
-account_timeline_plot()
-```
-
-Mastodon saw a surge in interest in 2022 and 2023, particularly after Elon Musk's Twitter acquisition. In particular, four events of interests drove measurable increases in new users to the network: the announcement of the acquisition (April 14, 2022), the closing of the acquisition (October 27, 2022), a day when Twitter suspended a number of prominent journalists (December 15, 2022), and a day when Twitter experienced an outage and started rate limiting accounts (July 1, 2023). Many Twitter accounts announced they were setting up Mastodon accounts and linked their new accounts to their followers, often using tags like #TwitterMigration [@heFlockingMastodonTracking2023] and driving interest in Mastodon in a process @cavaDriversSocialInfluence2023 found consistent with social influence theory.
-
-The series of migrations of new users into Mastodon in many ways reflect folk stories of "Eternal Septembers" on previous communication networks, where a large influx of newcomers challenged the existing norms [@driscollWeMisrememberEternal2023]. Many Mastodon servers do have specific norms which people coming from Twitter may find confusing, such as local norms around content warnings [@nicholsonMastodonRulesCharacterizing2023]. Variation amoung servers can also present a challenge for newcomers who may not even be aware of the specific rules, norms, or general topics of interest on the server they are joining [@diazUsingMastodonWay2022].
-
-Some media outlets have framed reports on Mastodon [@hooverMastodonBumpNow2023] through what @zulliRethinkingSocialSocial2020 calls the "Killer Hype Cycle", whereby the media finds a new alterntive social media platform, declares it a potential killer of some established platform, and laters calls it a failure if it does not displace the existing platform. Such framing fails to take systems like the Fediverse seriously for their own merits: completely replacing existing commercial systems is not the only way to measure success, nor does it account for the real value the Fediverse provides for its millions of active users.
-
-# Data
-
-**Mastodon Profiles**: We collected accounts using data previously collected from posts on public Mastodon timelines from October 2020 to August 2023. We then queried for up-to-date information on those accounts including their most recent status and if the account had moved as of February 2024. This gave us a total of N accounts. Note that because we got updated information on each account, we include only accounts on servers which still exist and which returned records for the account.
-
-**Moved Profiles**: We found a subset of N accounts which had moved from one server to another.
-
-**Tags**: We collect N posts which contained between 2 and 5 hashtags.
-
-# Analysis and Results
-
-## Survival Model
-
-*Are accounts on suggested general servers less likely to remain active than accounts on other servers?*
-
-```{r, cache.extra = tools::md5sum("code/survival.R")}
-#| cache: true
-#| label: fig-survival
-#| fig-env: figure
-#| fig-cap: "Survival probabilities for accounts created during May 2023."
-#| fig-width: 3.375
-#| fig-height: 2.5
-#| fig-pos: h!
-
-library(here)
-source(here("code/survival.R"))
-plot_km
-```
-
-```{r}
-#| label: table-coxme
-library(ehahelper)
-library(broom)
-
-cxme_table <- tidy(cxme) %>%
- mutate(conf.low = exp(conf.low), conf.high=exp(conf.high)) %>%
- mutate(term = case_when(
- term == "factor(group)1" ~ "Join Mastodon",
- term == "factor(group)2" ~ "General Servers",
- term == "small_serverTRUE" ~ "Small Server",
- TRUE ~ term
- )) %>%
- mutate(exp.coef = paste("(", round(conf.low, 2), ", ", round(conf.high, 2), ")", sep="")) %>%
- select(term, estimate, exp.coef , p.value)
-```
-
-::: {#tbl-cxme .column-body}
-```{r}
-if (knitr::is_latex_output()) {
- cxme_table %>% knitr::kable(format="latex", booktabs=TRUE, digits=3)
-} else {
- cxme_table %>% knitr::kable(digits = 3)
-}
-```
-
-Coefficients for the Cox Proportional Hazard Model with Mixed Effects. The model includes a random effect for the server.
-:::
-
-Using accounts created during from May 1 to June 30, 2023, we create a Kaplan–Meier estimator for the probability that an account will remain active based on whether the account is on one of the largest general instances (`r paste(general_servers, collapse=", ")`) featured at the top of the Join Mastodon webpage or otherwise if it is on a server in the Join Mastodon list. Accounts are considered active if they have made at least one post after the censorship period `r active_period` days after account creation.
-
-We also contruct a Mixed Effects Cox Proportional Hazard Model with coefficients for whether the account is on a small server (less than a hundred accounts), and whether the account in featured on JoinMastodon or is featured as one of the largest general instances. We again find that accounts on the largest general instances are less likely to remain active than accounts on other servers, while accounts created on smaller servers are more likely to remain active.
-
-## Moved Accounts
-
-*Do accounts tend to move to larger or smaller servers?*
-
-Mastodon users can move their accounts to another server while retaining their connections (but not their posts) to other Mastodon accounts. This feature, built into the Mastodon software, offers data portability and helps avoid lock-in.
-
-```{r}
-#| label: ergm-table
-#| echo: false
-#| warning: false
-#| message: false
-#| error: false
-
-library(here)
-library(modelsummary)
-library(kableExtra)
-library(purrr)
-library(stringr)
-load(file = here("data/scratch/ergm-model-early.rda"))
-load(file = here("data/scratch/ergm-model-late.rda"))
-
-if (knitr::is_latex_output()) {
- format <- "latex"
-} else {
- format <- "html"
-}
-
-x <- modelsummary(
- list("Coef." = model.early, "Std.Error" = model.early, "Coef." = model.late, "Std.Error" = model.late),
- estimate = c("{estimate}", "{stars}{std.error}", "{estimate}", "{stars}{std.error}"),
- statistic = NULL,
- gof_omit = ".*",
- coef_rename = c(
- "sum" = "(Sum)",
- "diff.sum0.h-t.accounts" = "Smaller server",
- "nodeocov.sum.accounts" = "Server size\n(outgoing)",
- "nodeifactor.sum.registrations.TRUE" = "Open registrations\n(incoming)",
- "nodematch.sum.language" = "Languages match"
- ),
- align="lrrrr",
- stars = c('*' = .05, '**' = 0.01, '***' = .001),
- output = format
- #output = "markdown",
- #table.envir='table*',
- #table.env="table*"
- ) %>% add_header_above(c(" " = 1, "Model A" = 2, "Model B" = 2))
-
-if (knitr::is_latex_output()) {
- x %>% reduce(str_c, capture.output(.), sep="\n") %>% gsub("table", "table*", .) %>% knitr::raw_latex()
-} else {
- x
-}
-```
-
-# Proposed Recommendation System
-
-*How can we build an opt-in, low-resource recommendation system for finding Fediverse servers?*
-
-Tailored servers focused on a particular topic and community have advantages for onboarding newcomers; however, it may be difficult for new and existing Mastodon users to discover these communities. To address this gap, we propose a recommendation system for finding new servers. This system would be opt-in and low-resource, requiring only a small amount of data from each server.
-
-First, we construct the ideal system based on observed data. That is, we use the data from all posts we collected from all servers to construct an ideal recommender. We then simulate various scenarios that limit both servers that report data and the number of tags they report. We use rank biased overlap (RBO) to then compare the outputs from these simulations to the baseline with more complete information from all tags on all servers.
-
-## Recommendation System Design
-
-We use Okapi BM25 to construct a term frequency-inverse document frequency (tf-idf) model to associate the top tags with each server using counts of tag-account pairs from each server for the term frequency and the number of servers that use each tag for the inverse document frequency. We then L2 normalize the vectors for each tag and calculate the cosine similarity between the tag vectors for each server.
-
-$$
-tf = \frac{f_{t,d} \cdot (k_1 + 1)}{f_{t,d} + k_1 \cdot (1 - b + b \cdot \frac{|d|}{avgdl})}
-$$
-
-where $f_{t,d}$ is the frequency of term $t$ in document $d$, $k_1$ and $b$ are tuning parameters, and $avgdl$ is the average document length.
-
-$$
-idf = \log \frac{N - n + 0.5}{n + 0.5}
-$$
-
-where $N$ is the total number of documents and $n$ is the number of documents containing the term.
-
-$$
-\text{similarity}(A, B) = \frac{A \cdot B}{\|A\| \|B\|}
-$$
-
-## Applications
-
-```{r}
-#| eval: false
-library(tidyverse)
-library(igraph)
-library(arrow)
-
-sim_servers <- "data/scratch/server_similarity.feather" %>% arrow::read_ipc_file() %>% rename("weight" = "Similarity")
-#sim_net <- as.network(sim_servers)
-g <- graph_from_data_frame(sim_servers, directed = FALSE)
-
-g_strength <- log(sort(strength(g)))
-normalized_strength <- (g_strength - min(g_strength)) / (max(g_strength) - min(g_strength))
-
-server_centrality <- enframe(normalized_strength, name="server", value="strength")
-server_centrality %>% arrow::write_ipc_file("data/scratch/server_centrality.feather")
-```
-
-::: {#tbl-sim-servers}
-
-```{r}
-#| label: table-sim-servers
-library(tidyverse)
-library(arrow)
-
-sim_servers <- "data/scratch/server_similarity.feather" %>% arrow::read_ipc_file()
-server_of_interest <- "hci.social"
-server_table <- sim_servers %>%
- arrange(desc(Similarity)) %>%
- filter(Source == server_of_interest | Target == server_of_interest) %>%
- head(5) %>%
- pivot_longer(cols=c(Source, Target)) %>%
- filter(value != server_of_interest) %>%
- select(value, Similarity) %>%
- rename("Server" = "value")
-
-if (knitr::is_latex_output()) {
- server_table %>% knitr::kable(format="latex", booktabs=TRUE, digits=3)
-} else {
- server_table %>% knitr::kable(digits = 3)
-}
-```
-
-Top five servers most similar to hci.social
-
-:::
-
-### Server Discovery
-
-This system can empower users to find other servers of potential interest to them. For instance, we can build a system which recommends potential server matches to a new user.
-
-### Server Neighborhoods
-
-Mastodon provides two feeds in addition to a user's home timeline populated by accounts they follow: a local timeline with all public posts from their local server and a federated timeline which includes all posts from users followed by other users on their server. We suggest a third kind of timeline, a *neighborhood timeline*, which filters the federated timeline by topic.
-
-## Rubustness to Limited Data
-
-```{r}
-#| label: fig-simulations-rbo
-#| fig-env: figure*
-#| cache: true
-#| fig-width: 6.75
-#| fig-height: 3
-#| fig-pos: tb
-library(tidyverse)
-library(arrow)
-simulations <- arrow::read_ipc_file("data/scratch/simulation_rbo.feather")
-
-simulations %>%
- group_by(servers, tags, run) %>% summarize(rbo=mean(rbo), .groups="drop") %>%
- mutate(ltags = as.integer(log2(tags))) %>%
- ggplot(aes(x = factor(ltags), y = rbo, fill = factor(ltags))) +
- geom_boxplot() +
- facet_wrap(~servers, nrow=1) +
- #scale_y_continuous(limits = c(0, 1)) +
- labs(x = "Tags (log2)", y = "RBO", title = "Rank Biased Overlap with Baseline Rankings by Number of Servers") +
- theme_minimal() + theme(legend.position = "none")
-```
-
-We simulated various scenarios that limit both servers that report data and the number of tags they report. We used rank biased overlap (RBO) to then compare the outputs from these simulations to the baseline with more complete information from all tags on all servers. @fig-simulations-rbo shows how the average agreement with the baseline scales linearly with the logarithm of the tag count.
-
-# Conclusion
-
-Based on analysis of trace data from millions of new Fediverse accounts, we find evidence that suggests that servers matter and that users tend to move from larger servers to smaller servers. We then propose a recommendation system that can help new Fediverse users find servers with a high probability of being a good match based on their interests. Based on simulations, we demonstrate that such a tool can be effectively deployed in a federated manner, even with limited data on each local server.
\ No newline at end of file
diff --git a/code/scratch/__init__.py b/code/scratch/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/code/scratch/build_suggestion.py b/code/scratch/build_suggestion.py
new file mode 100644
index 0000000..e09da30
--- /dev/null
+++ b/code/scratch/build_suggestion.py
@@ -0,0 +1,62 @@
+from federated_design import *
+
+if __name__ == '__main__':
+ jm = pl.read_json("data/joinmastodon-2023-08-25.json")
+ jm_servers = set(jm["domain"].unique().to_list())
+ jm_td = TagData(jm_servers, 256, min_server_accounts=2)
+ jm_tfidf = jm_td.bm(n_server_accounts=0, n_servers=2, n_accounts=10)
+ mat = built_tfidf_matrix(jm_tfidf, jm_td.tag_to_index, jm_td.host_to_index)
+ m = (mat.T / (scipy.sparse.linalg.norm(mat.T, ord=2, axis=0) + 0.0001))
+ server_similarlity = cosine_similarity(m.tocsr())
+
+l = []
+for i in range(np.shape(server_similarlity)[0] - 1):
+ #s_index = min(i, np.shape(baseline_similarlity)[0] - 1)
+ l.append(
+ pl.DataFrame({
+ "Source": list(jm_td.host_to_index.keys())[i],
+ "Target": list(jm_td.host_to_index.keys())[i+1:],
+ "Similarity": server_similarlity[i][i+1:]
+ })
+ )
+
+similarity_df = pl.concat(l).filter(pl.col("Similarity") > 0.0)
+
+
+jm = pl.read_json("data/joinmastodon-2023-08-25.json")
+server_samples = set(pl.scan_ipc("data/scratch/all_tag_posts.feather").select("host").unique().collect().sample(fraction = 1.0)["host"].to_list())
+jm_servers = set(jm["domain"].unique().to_list())
+jm_td = TagData(server_samples, 256, min_server_accounts=2)
+jm_tfidf = jm_td.bm(n_server_accounts=0, n_servers=2, n_accounts=10)
+mat = built_tfidf_matrix(jm_tfidf, jm_td.tag_to_index, jm_td.host_to_index)
+m = (mat.T / (scipy.sparse.linalg.norm(mat.T, ord=2, axis=0) + 0.0001))
+server_similarlity = cosine_similarity(m.tocsr())
+#has_info = np.array((np.sum(mat, axis=1).T > 0).tolist()[0])
+tag_use_counts = np.sum(mat > 0, axis=1).T
+has_info = (tag_use_counts >= 3).tolist()[0]
+
+tag_names = np.array(list(jm_td.tag_to_index.keys()))[has_info]
+m_selected = m.tocsr()[:, has_info]
+
+tag_sm = cosine_similarity(m_selected.T)
+from sklearn.cluster import AffinityPropagation
+
+ap = AffinityPropagation(affinity="precomputed", random_state=0).fit(tag_sm)
+clusters = pl.DataFrame({"tag": tag_names, "cluster": ap.labels_, "servers": tag_use_counts[[has_info]].tolist()[0]})
+clusters.sort("servers", descending=True).unique("cluster")
+
+
+tag_index_included = (np.sum(tag_sm, axis=0) > 0)
+included_tag_strings = np.array(list(jm_td.tag_to_index.keys()))[tag_index_included]
+tag_sm_matrix = tag_sm[np.ix_(tag_index_included, tag_index_included)]
+# import Affinity Prop
+from sklearn.cluster import AffinityPropagation
+ap = AffinityPropagation(affinity="precomputed", random_state=0).fit(tag_sm_matrix)
+clusters = pl.DataFrame({"tag": included_tag_strings, "cluster": ap.labels_})
+# select a random element from each cluster
+clusters.group_by("cluster").agg([pl.col("tag").shuffle().first().alias("tag")]).sort("cluster")["tag"].to_list()
+
+clusters.group_by("cluster").agg([pl.col("tag").len().alias("count")]).sort("count", descending=True)
+
+
+clusters.filter(pl.col("servers") >= 10)
\ No newline at end of file
diff --git a/code/scratch/federated_design.py b/code/scratch/federated_design.py
index da77d3a..dd5e8ed 100644
--- a/code/scratch/federated_design.py
+++ b/code/scratch/federated_design.py
@@ -72,7 +72,7 @@ class TagData:
server_accounts = self.server_accounts(n_server_accounts)
num_servers = len(self._all_tag_posts_topn.unique("host"))
D = server_accounts.rename({"accounts_sum": "D"}).with_columns((pl.col("D") / pl.col("D").mean()).alias("nd"))
- tf = td._all_tag_posts_topn.join(D, on="host", how="inner").with_columns(
+ tf = self._all_tag_posts_topn.join(D, on="host", how="inner").with_columns(
((pl.col("accounts") * (k + 1))/(pl.col("accounts") + k*(1-b+b*pl.col("nd")))).alias("tf")
)
idf = most_seen_tags.with_columns(
@@ -142,76 +142,3 @@ def run_simulations():
print(np.mean(s))
all_runs = pl.concat(runs)
all_runs.write_ipc("data/scratch/simulation_rbo.feather")
-
-
-jm = pl.read_json("data/joinmastodon-2023-08-25.json")
-jm_servers = set(jm["domain"].unique().to_list())
-jm_td = td = TagData(jm_servers, 32, min_server_accounts=5)
-jm_tfidf = jm_td.tfidf(n_server_accounts=5, n_servers=3, n_accounts=10)
-mat = built_tfidf_matrix(jm_tfidf, jm_td.tag_to_index, jm_td.host_to_index)
-similarlity = cosine_similarity(mat.tocsr().T)
-# Export server similarity
-tag_sm = cosine_similarity(mat.tocsr())
-tag_index_included = (np.sum(tag_sm, axis=0) > 0)
-included_tag_strings = np.array(list(jm_td.tag_to_index.keys()))[tag_index_included]
-tag_sm_matrix = tag_sm[np.ix_(tag_index_included, tag_index_included)]
-# import Affinity Prop
-from sklearn.cluster import AffinityPropagation
-ap = AffinityPropagation(affinity="precomputed", random_state=0).fit(tag_sm_matrix)
-clusters = pl.DataFrame({"tag": included_tag_strings, "cluster": ap.labels_})
-# select a random element from each cluster
-clusters.group_by("cluster").agg([pl.col("tag").shuffle().first().alias("tag")]).sort("cluster")["tag"].to_list()
-
-example_topics = ["tech", "linux", "hacking", "gamedev"]
-example_indices = [s in example_topics for s in included_tag_strings]
-similar_servers = cosine_similarity(np.array(example_indices).reshape(-1,1).T, mat[np.ix_(tag_index_included)].T)
-np.array(list(jm_td.host_to_index.keys()))[np.argsort(-similar_servers[0])][0:10]
-
-#np.array(list(jm_td.host_to_index.keys()))[np.argsort(-similarlity[jm_td.host_to_index["historians.social"]])][0:10]
-#np.array(list(jm_td.host_to_index.keys()))[np.where(np.sum(mat, axis=0) < 0.01)[1]]
-
-server_samples = set(pl.scan_ipc("data/scratch/all_tag_posts.feather").select("host").unique().collect().sample(fraction = 1.0)["host"].to_list())
-
-td = TagData(server_samples, 256, min_server_accounts=2)
-tfidf = td.bm(n_server_accounts=0, n_servers=2, n_accounts=10)#.filter(pl.col("accounts") / pl.col("D") > 0.0001)
-baseline_host_to_index = td.host_to_index
-full_mat = built_tfidf_matrix(tfidf, td.tag_to_index, td.host_to_index).T
-#m = (full_mat.T / scipy.sparse.linalg.norm(full_mat.T, ord=2, axis=0)).T
-
-m = (full_mat / scipy.sparse.linalg.norm(full_mat, ord=2, axis=0)) # good one
-
-baseline_similarlity = cosine_similarity(m)
-l = []
-for i in range(np.shape(baseline_similarlity)[0] - 1):
- #s_index = min(i, np.shape(baseline_similarlity)[0] - 1)
- l.append(
- pl.DataFrame({
- "Source": list(td.host_to_index.keys())[i],
- "Target": list(td.host_to_index.keys())[i+1:],
- "Similarity": baseline_similarlity[i][i+1:]
- })
- )
-
-similarity_df = pl.concat(l).filter(pl.col("Similarity") > 0.0)
-similarity_df.write_ipc("data/scratch/server_similarity.feather")
-
-server = "hci.social"
-similarity_df.filter((pl.col("Source") == server) | (pl.col("Target") == server)).sort("Similarity", descending=True)[0:10]
-tfidf.filter(pl.col("host") == server)[0:10]
-
-tfidf.filter(pl.col("tags") == "aoir2023")
-
-m = (full_mat.T / scipy.sparse.linalg.norm(full_mat.T, ord=2, axis=0)) # good one
-tag_similarity = cosine_similarity(m)
-l = []
-for i in range(np.shape(tag_similarity)[0] - 1):
- #s_index = min(i, np.shape(baseline_similarlity)[0] - 1)
- l.append(
- pl.DataFrame({
- "Source": list(td.tag_to_index.keys())[i],
- "Target": list(td.tag_to_index.keys())[i+1:],
- "Similarity": tag_similarity[i][i+1:]
- })
- )
-
-tag_similarity_df = pl.concat(l).filter(pl.col("Similarity") > 0.0)
\ No newline at end of file
diff --git a/code/scratch/federated_scratch.py b/code/scratch/federated_scratch.py
new file mode 100644
index 0000000..bdd3769
--- /dev/null
+++ b/code/scratch/federated_scratch.py
@@ -0,0 +1,75 @@
+jm = pl.read_json("data/joinmastodon-2023-08-25.json")
+jm_servers = set(jm["domain"].unique().to_list())
+
+jm_td = td = TagData(jm_servers, 256, min_server_accounts=2)
+jm_tfidf = td.bm(n_server_accounts=0, n_servers=2, n_accounts=10)#.filter(pl.col("accounts") / pl.col("D") > 0.0001)
+
+mat = built_tfidf_matrix(jm_tfidf, jm_td.tag_to_index, jm_td.host_to_index)
+m = (mat.T / scipy.sparse.linalg.norm(mat.T, ord=2, axis=0)) # good one
+server_similarlity = cosine_similarity(m.tocsr())
+
+# Export server similarity
+tag_sm = cosine_similarity(mat.tocsr())
+tag_index_included = (np.sum(tag_sm, axis=0) > 0)
+included_tag_strings = np.array(list(jm_td.tag_to_index.keys()))[tag_index_included]
+tag_sm_matrix = tag_sm[np.ix_(tag_index_included, tag_index_included)]
+# import Affinity Prop
+from sklearn.cluster import AffinityPropagation
+ap = AffinityPropagation(affinity="precomputed", random_state=0).fit(tag_sm_matrix)
+clusters = pl.DataFrame({"tag": included_tag_strings, "cluster": ap.labels_})
+# select a random element from each cluster
+clusters.group_by("cluster").agg([pl.col("tag").shuffle().first().alias("tag")]).sort("cluster")["tag"].to_list()
+
+example_topics = ["tech", "linux", "hacking", "gamedev"]
+example_indices = [s in example_topics for s in included_tag_strings]
+similar_servers = cosine_similarity(np.array(example_indices).reshape(-1,1).T, mat[np.ix_(tag_index_included)].T)
+np.array(list(jm_td.host_to_index.keys()))[np.argsort(-similar_servers[0])][0:10]
+
+#np.array(list(jm_td.host_to_index.keys()))[np.argsort(-similarlity[jm_td.host_to_index["historians.social"]])][0:10]
+#np.array(list(jm_td.host_to_index.keys()))[np.where(np.sum(mat, axis=0) < 0.01)[1]]
+
+server_samples = set(pl.scan_ipc("data/scratch/all_tag_posts.feather").select("host").unique().collect().sample(fraction = 1.0)["host"].to_list())
+
+td = TagData(server_samples, 256, min_server_accounts=2)
+tfidf = td.bm(n_server_accounts=0, n_servers=2, n_accounts=10)#.filter(pl.col("accounts") / pl.col("D") > 0.0001)
+baseline_host_to_index = td.host_to_index
+full_mat = built_tfidf_matrix(tfidf, td.tag_to_index, td.host_to_index).T
+#m = (full_mat.T / scipy.sparse.linalg.norm(full_mat.T, ord=2, axis=0)).T
+
+m = (full_mat / scipy.sparse.linalg.norm(full_mat, ord=2, axis=0)) # good one
+
+baseline_similarlity = cosine_similarity(m)
+l = []
+for i in range(np.shape(baseline_similarlity)[0] - 1):
+ #s_index = min(i, np.shape(baseline_similarlity)[0] - 1)
+ l.append(
+ pl.DataFrame({
+ "Source": list(td.host_to_index.keys())[i],
+ "Target": list(td.host_to_index.keys())[i+1:],
+ "Similarity": baseline_similarlity[i][i+1:]
+ })
+ )
+
+similarity_df = pl.concat(l).filter(pl.col("Similarity") > 0.0)
+similarity_df.write_ipc("data/scratch/server_similarity.feather")
+
+server = "hci.social"
+similarity_df.filter((pl.col("Source") == server) | (pl.col("Target") == server)).sort("Similarity", descending=True)[0:10]
+tfidf.filter(pl.col("host") == server)[0:10]
+
+tfidf.filter(pl.col("tags") == "aoir2023")
+
+m = (full_mat.T / scipy.sparse.linalg.norm(full_mat.T, ord=2, axis=0)) # good one
+tag_similarity = cosine_similarity(m)
+l = []
+for i in range(np.shape(tag_similarity)[0] - 1):
+ #s_index = min(i, np.shape(baseline_similarlity)[0] - 1)
+ l.append(
+ pl.DataFrame({
+ "Source": list(td.tag_to_index.keys())[i],
+ "Target": list(td.tag_to_index.keys())[i+1:],
+ "Similarity": tag_similarity[i][i+1:]
+ })
+ )
+
+tag_similarity_df = pl.concat(l).filter(pl.col("Similarity") > 0.0)
\ No newline at end of file
diff --git a/code/survival.R b/code/survival.R
index 6135b9c..aea9242 100644
--- a/code/survival.R
+++ b/code/survival.R
@@ -28,7 +28,7 @@ a <- load_accounts() %>%
a %>% select(status, created_at, last_status_at, active_time_censored)
activity <- arrow::read_feather(
- "data/scratch/activity.feather",
+ here("data/scratch/activity.feather"),
col_select = c("server", "logins")
) %>%
arrange(desc(logins))
diff --git a/icwsm.qmd b/icwsm.qmd
new file mode 100644
index 0000000..551e0b8
--- /dev/null
+++ b/icwsm.qmd
@@ -0,0 +1,37 @@
+---
+title: "Do Servers Matter on Mastodon? Data-driven Design for Decentralized Social Media"
+short-title: Mastodon Recommendations
+authors:
+ - name: Carl Colglazier
+ affiliation:
+ name: Northwestern University
+ city: Evanston
+ state: Illinois
+ country: United States
+ corresponding: true
+bibliography: references.bib
+pdf-engine: pdflatex
+format:
+ pdf:
+ output-file: mastodon-recommendations-icwsm.pdf
+ fig-pos: 'ht!bp'
+ cite-method: natbib
+ template: template.tex
+ keep-md: true
+ link-citations: false
+abstract: |
+ When trying to join Mastodon, a decentralized collection of interoperable social networking servers, new users face the dilemma of choosing a home server. Using trace data from millions of new Mastodon accounts, we show that new accounts are less likely to remain active on the network's largest general instances compared to others. Additionally, we observe a trend of users migrating from larger to smaller servers. Addressing the challenge of onboarding and server selection, the paper proposes a decentralized recommendation system for server using hashtags and the Okapi BM25 algorithm. This system leverages servers' top hashtags and their frequency to create a recommendation mechanism that respects Mastodon's decentralized ethos. Simulations demonstrate that such a tool can be effective even with limited data on each local server.
+execute:
+ echo: false
+ error: false
+ warning: false
+ message: false
+ freeze: false
+ cache: true
+fig-width: 6.75
+knitr:
+ opts_knit:
+ verbose: true
+---
+
+{{< include _article.qmd >}}
\ No newline at end of file
diff --git a/notebooks/_moved.qmd b/notebooks/_moved.qmd
index 1d2dca6..9c66664 100644
--- a/notebooks/_moved.qmd
+++ b/notebooks/_moved.qmd
@@ -191,13 +191,13 @@ ggnet2(
```{r}
#| label: tbl-ergm
#| tbl-cap: ERGM model output
-# #| cache: true
-model.early <- run_network(edgeNet.early)
-model.late <- run_network(edgeNet.late)
-save(model.early, file = here("data/scratch/ergm-model-early.rda"))
-save(model.late, file = here("data/scratch/ergm-model-late.rda"))
-#load(file = here("data/scratch/ergm-model-early.rda"))
-#load(file = here("data/scratch/ergm-model-late.rda"))
+#| cache: true
+#model.early <- run_network(edgeNet.early)
+#model.late <- run_network(edgeNet.late)
+#save(model.early, file = here("data/scratch/ergm-model-early.rda"))
+#save(model.late, file = here("data/scratch/ergm-model-late.rda"))
+load(file = here("data/scratch/ergm-model-early.rda"))
+load(file = here("data/scratch/ergm-model-late.rda"))
library(kableExtra)
modelsummary(
list("Coef." = model.early, "Std.Error" = model.early, "Coef." = model.late, "Std.Error" = model.late),
diff --git a/notebooks/_push_pull.qmd b/notebooks/_push_pull.qmd
index 163f3cd..3cdbce6 100644
--- a/notebooks/_push_pull.qmd
+++ b/notebooks/_push_pull.qmd
@@ -315,7 +315,7 @@ last.data_plot
#### Early
-::: {#tbl-early .column-page}
+::: {#tbl-early .column-body}
```{r}
early.table
@@ -327,7 +327,7 @@ Caption
#### Email
-::: {#tbl-email .column-page}
+::: {#tbl-email .column-body}
```{r}
email.table
@@ -337,7 +337,7 @@ email.table
#### Last
-::: {#tbl-last .column-page}
+::: {#tbl-last .column-body}
```{r}
last.table
diff --git a/notebooks/arima.qmd b/notebooks/arima.qmd
index 48477fa..ca924e6 100644
--- a/notebooks/arima.qmd
+++ b/notebooks/arima.qmd
@@ -14,7 +14,7 @@ _How does mastodon.social factor into the aggregate Mastodon onboarding process?
Throughout its history, Mastodon's flagship server, mastodon.social, has allowed and disallowed open sign-ups at various times. When the website did not allow sign-ups, it displayed a message redirecting those interested in signing up for an account to mastodon.social or alternatively to a list of potential servers at joinmastodon.com.
-We found three main periods during which mastodon.social did not accept new signups by first noting the times in @fig-account-timeline where the proportion of new accounts on mastodon.social drops to zero. We then used the Internet Archive to verify that signups were disabled during these periods.
+We found three main periods during which mastodon.social did not accept new signups by first noting the times where the proportion of new accounts on mastodon.social drops to zero. We then used the Internet Archive to verify that signups were disabled during these periods.
1. An extended period of through the end of October 2020.
diff --git a/references.bib b/references.bib
index 96f1591..163041f 100644
--- a/references.bib
+++ b/references.bib
@@ -100,6 +100,22 @@
keywords = {communities,content moderation,elon musk,mastodon,platforms,social,social media,twitter}
}
+@inproceedings{kieneSurvivingEternalSeptember2016,
+ title = {Surviving an ``{{Eternal September}}'': {{How}} an Online Community Managed a Surge of Newcomers},
+ shorttitle = {Surviving an "{{Eternal September}}"},
+ booktitle = {Proceedings of the 2016 {{CHI Conference}} on {{Human Factors}} in {{Computing Systems}}},
+ author = {Kiene, Charles and {Monroy-Hern{\'a}ndez}, Andr{\'e}s and Hill, Benjamin Mako},
+ year = {2016},
+ pages = {1152--1156},
+ publisher = {ACM},
+ address = {New York, NY},
+ doi = {10.1145/2858036.2858356},
+ urldate = {2016-07-05},
+ abstract = {We present a qualitative analysis of interviews with participants in the NoSleep community within Reddit where millions of fans and writers of horror fiction congregate. We explore how the community handled a massive, sudden, and sustained increase in new members. Although existing theory and stories like Usenet's infamous "Eternal September" suggest that large influxes of newcomers can hurt online communities, our interviews suggest that NoSleep survived without major incident. We propose that three features of NoSleep allowed it to manage the rapid influx of newcomers gracefully: (1) an active and well-coordinated group of administrators, (2) a shared sense of community which facilitated community moderation, and (3) technological systems that mitigated norm violations. We also point to several important trade-offs and limitations.},
+ isbn = {978-1-4503-3362-7},
+ keywords = {newcomers,norms and governance,online communities,peer production,qualitative methods}
+}
+
@misc{kingMastodonMe2024,
title = {Mastodon {{Near Me}}},
author = {King, Jaz-Michael},
@@ -177,6 +193,34 @@
keywords = {community rules,Mastodon,online communities}
}
+@inproceedings{ramanChallengesDecentralisedWeb2019a,
+ title = {Challenges in the {{Decentralised Web}}: {{The Mastodon Case}}},
+ shorttitle = {Challenges in the {{Decentralised Web}}},
+ booktitle = {Proceedings of the {{Internet Measurement Conference}}},
+ author = {Raman, Aravindh and Joglekar, Sagar and Cristofaro, Emiliano De and Sastry, Nishanth and Tyson, Gareth},
+ year = {2019},
+ month = oct,
+ series = {{{IMC}} '19},
+ pages = {217--229},
+ publisher = {Association for Computing Machinery},
+ address = {New York, NY, USA},
+ doi = {10.1145/3355369.3355572},
+ urldate = {2024-03-06},
+ abstract = {The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, PeerTube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, modern DW platforms mostly rely on two key innovations: first, their open source software allows anybody to setup independent servers ("instances") that people can sign-up to and use within a local community; and second, they build on top of federation protocols so that instances can mesh together, in a peer-to-peer fashion, to offer a globally integrated platform. In this paper, we present a measurement-driven exploration of these two innovations, using a popular DW microblogging platform (Mastodon) as a case study. We focus on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards re-centralisation. Finally, our measurements shed light on the behaviour of both administrators (i.e., people setting up instances) and regular users who sign-up to the platforms, also discussing a few techniques that may address some of the issues observed.},
+ isbn = {978-1-4503-6948-0}
+}
+
+@misc{rochkoMastodon2023,
+ title = {Mastodon 4.2},
+ author = {Rochko, Eugen},
+ year = {2023},
+ month = sep,
+ journal = {Mastodon Blog},
+ urldate = {2024-03-06},
+ abstract = {In this massive update we've added search and removed friction. What's not to love?},
+ howpublished = {https://blog.joinmastodon.org/2023/09/mastodon-4.2/}
+}
+
@misc{rochkoNewOnboardingExperience2023,
title = {A New Onboarding Experience on {{Mastodon}}},
author = {Rochko, Eugen},
@@ -233,6 +277,23 @@
howpublished = {https://instances.social/}
}
+@misc{trienesRecommendingUsersWhom2018,
+ title = {Recommending {{Users}}: {{Whom}} to {{Follow}} on {{Federated Social Networks}}},
+ shorttitle = {Recommending {{Users}}},
+ author = {Trienes, Jan and Cano, Andr{\'e}s Torres and Hiemstra, Djoerd},
+ year = {2018},
+ month = nov,
+ number = {arXiv:1811.09292},
+ eprint = {1811.09292},
+ primaryclass = {cs},
+ publisher = {arXiv},
+ doi = {10.48550/arXiv.1811.09292},
+ urldate = {2024-03-06},
+ abstract = {To foster an active and engaged community, social networks employ recommendation algorithms that filter large amounts of contents and provide a user with personalized views of the network. Popular social networks such as Facebook and Twitter generate follow recommendations by listing profiles a user may be interested to connect with. Federated social networks aim to resolve issues associated with the popular social networks - such as large-scale user-surveillance and the miss-use of user data to manipulate elections - by decentralizing authority and promoting privacy. Due to their recent emergence, recommender systems do not exist for federated social networks, yet. To make these networks more attractive and promote community building, we investigate how recommendation algorithms can be applied to decentralized social networks. We present an offline and online evaluation of two recommendation strategies: a collaborative filtering recommender based on BM25 and a topology-based recommender using personalized PageRank. Our experiments on a large unbiased sample of the federated social network Mastodon shows that collaborative filtering approaches outperform a topology-based approach, whereas both approaches significantly outperform a random recommender. A subsequent live user experiment on Mastodon using balanced interleaving shows that the collaborative filtering recommender performs on par with the topology-based recommender.},
+ archiveprefix = {arxiv},
+ keywords = {Computer Science - Information Retrieval,Computer Science - Social and Information Networks}
+}
+
@article{webberSimilarityMeasureIndefinite2010,
title = {A Similarity Measure for Indefinite Rankings},
author = {Webber, William and Moffat, Alistair and Zobel, Justin},
diff --git a/template.tex b/template.tex
index 9191b03..0d2eedb 100644
--- a/template.tex
+++ b/template.tex
@@ -12,6 +12,11 @@
\frenchspacing % DO NOT CHANGE THIS
\setlength{\pdfpagewidth}{8.5in} % DO NOT CHANGE THIS
\setlength{\pdfpageheight}{11in} % DO NOT CHANGE THIS
+
+\usepackage{amsmath}
+\usepackage{amssymb}
+\usepackage{booktabs}
+\usepackage{siunitx}
%
% These are recommended to typeset algorithms but not required. See the subsubsection on algorithms. Remove them if you don't have algorithms in your paper.
%\usepackage{algorithm}
@@ -78,11 +83,6 @@
% \vskip{- -- No negative value may be used to alter spacing above or below a caption, figure, table, section, subsection, subsubsection, or reference
\setcounter{secnumdepth}{0} %May be changed to 1 or 2 if section numbers are desired.
-
-\usepackage{amsmath}
-\usepackage{booktabs}
-
-\usepackage{siunitx}
\newcolumntype{d}{S[
input-open-uncertainty=,
input-close-uncertainty=,