--- title: "Do Servers Matter on Mastodon?" subtitle: "Data-driven Design for Decentralized Social Media" author: "Carl Colglazier" institute: - "Community Data Science Collective" - "Northwestern University" date: "2024-03-08" title-slide-attributes: data-background: "#4c3854" format: revealjs: width: 1600 height: 900 date-format: long margin: 0.2 center-title-slide: false #disable-layout: true theme: [default, presentation.scss] slide-number: false keep-md: true pdf-max-pages-per-slide: 1 template-partials: - title-slide.html # beamer: # aspectratio: 169 # theme: metropolis # colortheme: seahorse knitr: opts_chunk: dev: "ragg_png" retina: 1 dpi: 300 execute: freeze: auto cache: true echo: false # fig-width: 5 # fig-height: 6 prefer-html: true --- ## Empirical Context ::: {.columns} :::: {.column} - **The Fediverse**: A set of decentralized online social networks which interoperate using shared protocols like ActivityPub. - **Mastodon**: An open-source, decentralized social network and microblogging community. :::: :::: {.column} ![A screenshot of Mastodon 2.9 (2019), from the Mastodon Blog.](images/Mastodon_Single-column-layout.png) :::: ::: # The Fediverse is a network of _thousands_ of interconnected servers {background-color="black" data-background-image="images/mastodon_map.png" background-repeat="repeat" background-size="200px" background-opacity="0.5" .center auto-animate=true .fade-out} ## Mastodon grew significantly in 2022 and 2023 ```{r} #| label: fig-account-timeline #| fig-width: 5 #| fig-height: 2.5 library(here) source(here("code/helpers.R")) account_timeline_plot() ``` ## Which server should I join? ### Conflicting advice ::: {.columns} ::: {.column} Just join any server! ::: ::: {.column} Join the _right_ server! ::: ::: ::: {.fragment} ### Which is right? {.center-xy} ::: --- ![](images/joinmastodon-screenshot.png){.center} # Does server choice matter? {.center} ## Survival model for new accounts Are they more likely to stay active after 91 days. ::: {.columns} ::: {.column} ```{r, cache.extra = tools::md5sum("code/survival.R")} #| cache: true #| label: fig-survival #| fig-env: figure #| fig-cap: "Survival probabilities for accounts created during May 2023." #| fig-width: 3.375 #| fig-height: 2.25 #| fig-pos: h! library(here) source(here("code/survival.R")) plot_km ``` ::: ::: {.column .small} ```{r} #| label: tbl-coxme library(ehahelper) library(broom) cxme_table <- tidy(cxme) %>% mutate(conf.low = exp(conf.low), conf.high=exp(conf.high)) %>% mutate(term = case_when( term == "factor(group)1" ~ "Join Mastodon", term == "factor(group)2" ~ "General Servers", term == "small_serverTRUE" ~ "Small Server", TRUE ~ term )) %>% mutate(exp.coef = paste("(", round(conf.low, 2), ", ", round(conf.high, 2), ")", sep="")) %>% select(term, estimate, exp.coef , p.value) cxme_table %>% knitr::kable(digits = 3) ``` ::: ::: ## Accounts that move Do they move to larger servers or to smaller servers? ::: {.small} ```{r} #| label: tbl-ergm-table #| echo: false #| warning: false #| message: false #| error: false library(here) library(modelsummary) library(kableExtra) library(purrr) library(stringr) load(file = here("data/scratch/ergm-model-early.rda")) load(file = here("data/scratch/ergm-model-late.rda")) if (knitr::is_latex_output()) { format <- "latex_tabular" } else { format <- "html" } x <- modelsummary( list("Coef." = model.early, "Std.Error" = model.early, "Coef." = model.late, "Std.Error" = model.late), estimate = c("{estimate}", "{stars}{std.error}", "{estimate}", "{stars}{std.error}"), statistic = NULL, gof_omit = ".*", coef_rename = c( "sum" = "Sum", "nonzero" = "Nonzero", "diff.sum0.h-t.accounts" = "Smaller server", "nodeocov.sum.accounts" = "Server size\n(outgoing)", "nodeifactor.sum.registrations.TRUE" = "Open registrations\n(incoming)", "nodematch.sum.language" = "Languages match" ), align="lrrrr", stars = c('*' = .05, '**' = 0.01, '***' = .001), output = format ) %>% add_header_above(c(" " = 1, "Model A" = 2, "Model B" = 2)) x ``` ::: # Our analysis suggests {.center} - Accounts on large, general servers fare worse - Moved accounts go to smaller servers Can we build a system that helps people find servers? # Recommendation System Concept - Report top **hashtags** used by the most accounts on each server - Build an $M \times N$ server-tag matrix - Normalize with Okai BM25 TF-IDF and L2 normalization ::: {.fragment} Using this matrix, we can - Calculate similarity between servers using tags - Calculate similarity between tags using servers - Reccommend servers based on affinity toward certain tags ::: ## Example: Server Similarity ::: {#tbl-sim-servers} ```{r} #| label: table-sim-servers library(tidyverse) library(arrow) library(here) sim_servers <- here("data/scratch/server_similarity.feather") %>% arrow::read_ipc_file() server_of_interest <- "hci.social" server_table <- sim_servers %>% arrange(desc(Similarity)) %>% filter(Source == server_of_interest | Target == server_of_interest) %>% head(7) %>% pivot_longer(cols=c(Source, Target)) %>% filter(value != server_of_interest) %>% select(value, Similarity) %>% rename("Server" = "value") if (knitr::is_latex_output()) { server_table %>% knitr::kable(format="latex", booktabs=TRUE, digits=3) } else { server_table %>% knitr::kable(digits = 3) } ``` Top five servers most similar to hci.social ::: # Future Work - Evaluation of the recommendation system - More specific analysis of account attributes - Simulations for robustness