559 lines
13 KiB
Plaintext
559 lines
13 KiB
Plaintext
---
|
|
title: "Do Servers Matter on Mastodon?"
|
|
subtitle: "Data-driven Design for Decentralized Social Media"
|
|
author: "Carl Colglazier"
|
|
institute:
|
|
- "Community Data Science Collective"
|
|
- "Northwestern University"
|
|
date: "2024-03-14"
|
|
bibliography: ../references.bib
|
|
title-slide-attributes:
|
|
data-background: "#4c3854"
|
|
format:
|
|
revealjs:
|
|
#embed-resources: true
|
|
width: 1600
|
|
height: 900
|
|
date-format: long
|
|
margin: 0.2
|
|
center-title-slide: false
|
|
#disable-layout: true
|
|
theme: [default, presentation.scss]
|
|
slide-number: true
|
|
keep-md: true
|
|
pdf-max-pages-per-slide: 1
|
|
reference-location: document
|
|
template-partials:
|
|
- title-slide.html
|
|
knitr:
|
|
opts_chunk:
|
|
dev: "svg" #"ragg_png"
|
|
retina: 1
|
|
dpi: 300
|
|
execute:
|
|
freeze: auto
|
|
cache: true
|
|
echo: false
|
|
# fig-width: 5
|
|
# fig-height: 6
|
|
prefer-html: true
|
|
---
|
|
|
|
## Goals for Today
|
|
|
|
::: {.big}
|
|
|
|
- Contextualize work on decentralized online social networks like Mastodon
|
|
|
|
- Present a data-driven analysis of server choice on Mastodon
|
|
|
|
- Introduce a recommendation system for server choice
|
|
|
|
- Discuss directions for future work
|
|
|
|
:::
|
|
|
|
# The Big Picture {.center}
|
|
|
|
What is decentralized social media and why does it matter?
|
|
|
|

|
|
|
|
## Emergance of the Social Web
|
|
|
|
:::::: {.spread}
|
|
|
|
Internet technologies are _sociotechnical_ systems.
|
|
|
|
The social internet as we know it today emerged both from the develeopment of **protocols** and systems [@abbateInventingInternet2000] and thousands of largely non-commercial **social communities** [@driscollModemWorldPrehistory2022].
|
|
|
|
:::: {.columns}
|
|
::: {.column width=33%}
|
|
:::: {.fragment fragment-index=1}
|
|
#### Era
|
|
::::
|
|
:::: {.fragment fragment-index=2}
|
|
ARPANET
|
|
::::
|
|
:::: {.fragment fragment-index=3}
|
|
Early Internet
|
|
::::
|
|
:::: {.fragment fragment-index=4}
|
|
Commercial Web
|
|
::::
|
|
:::
|
|
::: {.column width=33%}
|
|
:::: {.fragment fragment-index=1}
|
|
#### Spaces
|
|
::::
|
|
:::: {.fragment fragment-index=2}
|
|
Email, Usenet
|
|
::::
|
|
:::: {.fragment fragment-index=3}
|
|
BBS, IRC
|
|
::::
|
|
:::: {.fragment fragment-index=4}
|
|
Social media
|
|
::::
|
|
:::
|
|
::: {.column width=33%}
|
|
:::: {.fragment fragment-index=1}
|
|
#### Technologies
|
|
::::
|
|
:::: {.fragment fragment-index=2}
|
|
TCP/IP
|
|
::::
|
|
:::: {.fragment fragment-index=3}
|
|
HTML
|
|
::::
|
|
:::: {.fragment fragment-index=4}
|
|
APIs, AJAX
|
|
::::
|
|
:::
|
|
:::
|
|
|
|
::::::
|
|
|
|
## Current Trends
|
|
|
|
::: {}
|
|
|
|
+ High **distrust** of social media companies [@AmericansWidelyDistrust2021]
|
|
|
|
+ Challenges in performing content moderation and maintaining social communities at **scale** [@gillespieContentModerationAI2020]
|
|
|
|
+ Post-API Era: **closure** of APIs on major platforms to researchers and tinkerers [@freelonComputationalResearchPostAPI2018]
|
|
|
|
:::
|
|
|
|
## Protocol-based Social Media
|
|
|
|
::::: {.spread}
|
|
|
|
The commercial internet has trended toward centralization, but this may be neither desirable nor sustainable [@masnickProtocolsNotPlatforms].
|
|
|
|
::: {.columns}
|
|
::: {.column}
|
|
#### Platforms
|
|
|
|
We have accounts on the same website
|
|
:::
|
|
::: {.column}
|
|
#### Protocols
|
|
|
|
We use the same protocol
|
|
:::
|
|
:::
|
|
|
|
|
|
::: {.columns}
|
|
::: {.column}
|
|
The (single) website controls:
|
|
|
|
- My data
|
|
|
|
- Content moderation
|
|
|
|
- Monetization
|
|
:::
|
|
::: {.column}
|
|
I can choose who controls:
|
|
|
|
- My data
|
|
|
|
- Content moderation (local)
|
|
|
|
- Monetization (if any)
|
|
:::
|
|
:::
|
|
|
|
:::::
|
|
|
|
|
|
## Empirical Context
|
|
|
|
::: {.columns}
|
|
|
|
:::: {.column}
|
|
|
|
- **The Fediverse**: A set of decentralized online social networks which interoperate using shared protocols like ActivityPub.
|
|
|
|
- **Mastodon**: An open-source, decentralized social network and microblogging community.
|
|
|
|
::::
|
|
|
|
:::: {.column}
|
|
|
|

|
|
|
|
::::
|
|
|
|
:::
|
|
|
|
# The Fediverse is a network of _thousands_ of interconnected servers {background-color="black" data-background-image="images/mastodon_map.png" background-repeat="repeat" background-size="200px" background-opacity="0.5" .center auto-animate=true .fade-out}
|
|
|
|
::: {.footer}
|
|
Background image: Jaz-Michael King
|
|
:::
|
|
|
|
## A Timeline of Mastodon
|
|
|
|
```{mermaid}
|
|
timeline
|
|
title Mastodon and Fediverse Timeline
|
|
2008: OStatus Protocol
|
|
2016: Mastodon releases v0.1
|
|
2018: ActivityPub standard published
|
|
2019: Mastodon drops OStatus
|
|
2022: Elon Musk Twitter acquisition
|
|
: Truth Social launches using Mastodon code
|
|
2023: Mastodon reaches 2M active users
|
|
: Threads (Meta) begins experimental support for ActivityPub
|
|
```
|
|
|
|
|
|
## Avoiding the "Twitter Killer" Hype
|
|
|
|
@zulliRethinkingSocialSocial2020 describe this pattern:
|
|
|
|
1. A writer discovers an alternative technology system
|
|
|
|
2. Media hypes it as a "killer" of a major platform
|
|
|
|
3. The system does not in fact "kill" the major platform
|
|
|
|
4. The system is declared a failure
|
|
|
|
This has happened mutliple times already.
|
|
|
|
## How do we define sucesss for sytems like Mastodon?
|
|
|
|
|
|
:::: {.columns}
|
|
::: {.column}
|
|
We should instead take social communities on their own terms.
|
|
|
|
**Do people find value in the system?**
|
|
|
|
In my view, the most interesting thing about Mastodon is the "local timeline", which shows posts from your server.
|
|
:::
|
|
::: {.column}
|
|
> "One of the things the Internet was good for was gathering together people in different places who shared a common interest"
|
|
|
|
--Michael Lewis, _Moneyball_ (2003)
|
|
:::
|
|
:::
|
|
|
|
## Which server should I join?
|
|
|
|
### Conflicting advice
|
|
|
|
::: {.columns}
|
|
::: {.column}
|
|
|
|
Just join any server!
|
|
|
|
:::
|
|
::: {.column}
|
|
|
|
Join the _right_ server!
|
|
|
|
:::
|
|
:::
|
|
|
|
::: {.fragment}
|
|
### Which is right? {.center-xy}
|
|
:::
|
|
|
|
## There are a lot of options {autoslide=2500 .fade-in}
|
|
|
|
```{r}
|
|
#| results: asis
|
|
#| cache: true
|
|
library(here)
|
|
library(tidyverse)
|
|
library(jsonlite)
|
|
|
|
jm <- here("data/joinmastodon.json") %>% jsonlite::fromJSON() %>% as_tibble
|
|
|
|
dir_name <- "images/server_images/"
|
|
if (!dir.exists(dir_name)) {
|
|
dir.create(dir_name, recursive = TRUE)
|
|
}
|
|
|
|
# save all the server images locally if they are not already saved
|
|
# location "images/server_images/{domain}.png"
|
|
save_image <- function(domain, proxied_thumbnail) {
|
|
file_path <- paste0(dir_name, domain, ".png") # Corrected file path
|
|
tryCatch({
|
|
if (!file.exists(file_path)) { # Check if file doesn't exist
|
|
download.file(proxied_thumbnail, file_path, mode = "wb")
|
|
}
|
|
return(file_path)
|
|
}, error = function(e) {
|
|
return(NA)
|
|
})
|
|
}
|
|
server_images <- jm %>%
|
|
filter(!is.na(blurhash)) %>%
|
|
select(domain, proxied_thumbnail) %>%
|
|
rowwise() %>%
|
|
mutate(image = save_image(domain, proxied_thumbnail)) %>%
|
|
ungroup()
|
|
```
|
|
|
|
```{r}
|
|
#| results: asis
|
|
web_image <- function(url) {
|
|
random_number <- as.integer(5*runif(1, 0, 1))
|
|
paste0('<img src="', url, '" data-fragment-index="', random_number, '" class="fragment fade-in" data-autoslide="1000" style="max-width: 100px;"/>')
|
|
}
|
|
|
|
server_images %>%
|
|
select(image) %>%
|
|
mutate(thumb = map(image, web_image)) %>%
|
|
head(125) %>%
|
|
pull(thumb) %>%
|
|
paste0(collapse = "\n") %>%
|
|
cat()
|
|
```
|
|
|
|
# But does server choice matter? {.center}
|
|
|
|
## Mastodon grew significantly in 2022 and 2023
|
|
|
|
```{r}
|
|
#| label: fig-account-timeline
|
|
#| fig-width: 6
|
|
#| fig-height: 2.5
|
|
#| fig-caption: "Number of accounts created on Mastodon. each week from late 2020-2023. The top of the graph shows the proportion of these accounts which moved or remained active after 91 days."
|
|
library(here)
|
|
source(here("code/helpers.R"))
|
|
account_timeline_plot()
|
|
```
|
|
|
|
# The Mastodon Onboarding Process Has Changed Over Time
|
|
|
|

|
|
|
|
## The Flagship Instance
|
|
|
|
:::: {.columns}
|
|
::: {.column width=60%}
|
|
+ **Mastodon.social** was the first Mastodon instance and is the largest.
|
|
|
|
+ There have been some historical concerns that its size was an issue.
|
|
|
|
+ At certain times, it has **closed** registrations.
|
|
|
|
1. An extended period of through the end of October 2020.
|
|
|
|
2. A temporary issue when the email host limited the server in mid-2022.
|
|
|
|
3. Two periods in late 2022 and early 2023.
|
|
:::
|
|
::: {.column width=40%}
|
|

|
|
:::
|
|
:::
|
|
|
|
## The Pull-Pull Effect: Did Closing Mastodon.social Affect Other Servers?
|
|
|
|
We can use an interrupted time series analysis to test this.
|
|
|
|
$$
|
|
\begin{aligned}
|
|
y_t &= \beta_0 + \beta_1 \text{open}_t + \beta_2 \text{day}_t + \beta_3 (\text{open} \times \text{day})_t \\
|
|
&\quad + \beta_4 \sin\left(\frac{2\pi t}{7}\right) + \beta_5 \cos\left(\frac{2\pi t}{7}\right) \\
|
|
&\quad + \beta_6 \sin\left(\frac{4\pi t}{7}\right) + \beta_7 \cos\left(\frac{4\pi t}{7}\right) \\
|
|
&\quad + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \epsilon_t
|
|
\end{aligned}
|
|
$$
|
|
|
|
where $y_t$ is the number of new accounts on a server at time $t$, $\text{open}_t$ is a binary variable indicating if the server is open to new sign-ups, $\text{day}_t$ is an increasing integer represnting the date, and $\epsilon_t$ is a white noise error term. We use the sine and cosine terms to account for weekly seasonality.
|
|
|
|
## Mastodon.online used to be more influential
|
|
|
|
| Period | Setting | $p < 0.05$ |
|
|
|------------|:----------------|:----|
|
|
| 2020-2021 | mastodon.online | Yes |
|
|
| | JoinMastodon | No |
|
|
| | Other | No |
|
|
| Mid 2022 | JoinMastodon | No |
|
|
| | Other | No |
|
|
| Early 2023 | JoinMastodon | No |
|
|
| | Other | No |
|
|
|
|
Results from ARIMA models for the number of new accounts mastodon.online, servers linked in joinmastodon.org, and all other servers.
|
|
|
|
## The current Mastodon onboarding process
|
|
|
|
:::: {.columns}
|
|
::: {.column width=60%}
|
|
+ While Mastodon once pushed newcomers _away_ from mastodon.social, it now treats it like the **default server**
|
|
|
|
+ Secondarily, newcomers are directed to "Join Mastodon"
|
|
:::
|
|
::: {.column width=40%}
|
|

|
|
:::
|
|
:::
|
|
|
|
## Accounts on the largest general servers are less likely to remain active after 91 days
|
|
|
|
::: {.columns}
|
|
|
|
::: {.column}
|
|
|
|
```{r, cache.extra = tools::md5sum("code/survival.R")}
|
|
#| cache: true
|
|
#| label: fig-survival
|
|
#| fig-env: figure
|
|
#| fig-cap: "Survival probabilities for accounts created during May 2023."
|
|
#| fig-width: 3.375
|
|
#| fig-height: 2.25
|
|
#| fig-pos: h!
|
|
|
|
library(here)
|
|
source(here("code/survival.R"))
|
|
plot_km
|
|
```
|
|
|
|
:::
|
|
|
|
::: {.column .small}
|
|
|
|
```{r}
|
|
#| label: tbl-coxme
|
|
library(ehahelper)
|
|
library(broom)
|
|
|
|
cxme_table <- tidy(cxme) %>%
|
|
mutate(conf.low = exp(conf.low), conf.high=exp(conf.high)) %>%
|
|
mutate(term = case_when(
|
|
term == "factor(group)1" ~ "Join Mastodon",
|
|
term == "factor(group)2" ~ "General Servers",
|
|
term == "small_serverTRUE" ~ "Small Server",
|
|
TRUE ~ term
|
|
)) %>%
|
|
mutate(exp.coef = paste("(", round(conf.low, 2), ", ", round(conf.high, 2), ")", sep="")) %>%
|
|
select(term, estimate, exp.coef , p.value)
|
|
|
|
cxme_table %>% knitr::kable(digits = 3)
|
|
```
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
## Accounts that move between servers are more likely to move to smaller servers
|
|
|
|
::: {.small}
|
|
|
|
```{r}
|
|
#| label: tbl-ergm-table
|
|
#| echo: false
|
|
#| warning: false
|
|
#| message: false
|
|
#| error: false
|
|
|
|
library(here)
|
|
library(modelsummary)
|
|
library(kableExtra)
|
|
library(purrr)
|
|
library(stringr)
|
|
load(file = here("data/scratch/ergm-model-early.rda"))
|
|
load(file = here("data/scratch/ergm-model-late.rda"))
|
|
|
|
if (knitr::is_latex_output()) {
|
|
format <- "latex_tabular"
|
|
} else {
|
|
format <- "html"
|
|
}
|
|
|
|
x <- modelsummary(
|
|
list("Coef." = model.early, "Std.Error" = model.early, "Coef." = model.late, "Std.Error" = model.late),
|
|
estimate = c("{estimate}", "{stars}{std.error}", "{estimate}", "{stars}{std.error}"),
|
|
statistic = NULL,
|
|
gof_omit = ".*",
|
|
coef_rename = c(
|
|
"sum" = "Sum",
|
|
"nonzero" = "Nonzero",
|
|
"diff.sum0.h-t.accounts" = "Smaller server",
|
|
"nodeocov.sum.accounts" = "Server size\n(outgoing)",
|
|
"nodeifactor.sum.registrations.TRUE" = "Open registrations\n(incoming)",
|
|
"nodematch.sum.language" = "Languages match"
|
|
),
|
|
align="lrrrr",
|
|
stars = c('*' = .05, '**' = 0.01, '***' = .001),
|
|
output = format
|
|
) %>% add_header_above(c(" " = 1, "Model A" = 2, "Model B" = 2))
|
|
|
|
x
|
|
```
|
|
|
|
:::
|
|
|
|
# Our analysis suggests server choice _does_ matter {.center}
|
|
|
|
Can we build a system that helps people find servers?
|
|
|
|
# Recommendation System Concept
|
|
|
|
- Report top **hashtags** used by the most accounts on each server
|
|
- Build an $M \times N$ server-tag matrix
|
|
- Normalize with Okai BM25 TF-IDF and L2 normalization
|
|
|
|
|
|
::: {.fragment}
|
|
Using this matrix, we can
|
|
|
|
- Calculate similarity between servers using tags
|
|
- Calculate similarity between tags using servers
|
|
- Reccommend servers based on affinity toward certain tags
|
|
:::
|
|
|
|
## Example: Server Similarity
|
|
|
|
::: {#tbl-sim-servers}
|
|
|
|
```{r}
|
|
#| label: table-sim-servers
|
|
library(tidyverse)
|
|
library(arrow)
|
|
library(here)
|
|
|
|
sim_servers <- here("data/scratch/server_similarity.feather") %>% arrow::read_ipc_file()
|
|
server_of_interest <- "hci.social"
|
|
server_table <- sim_servers %>%
|
|
arrange(desc(Similarity)) %>%
|
|
filter(Source == server_of_interest | Target == server_of_interest) %>%
|
|
head(7) %>%
|
|
pivot_longer(cols=c(Source, Target)) %>%
|
|
filter(value != server_of_interest) %>%
|
|
select(value, Similarity) %>%
|
|
rename("Server" = "value")
|
|
|
|
if (knitr::is_latex_output()) {
|
|
server_table %>% knitr::kable(format="latex", booktabs=TRUE, digits=3)
|
|
} else {
|
|
server_table %>% knitr::kable(digits = 3)
|
|
}
|
|
```
|
|
|
|
Top five servers most similar to hci.social
|
|
|
|
:::
|
|
|
|
## Server Recs
|
|
|
|
<iframe width="100%" height="100%" src="https://carlcolglazier.com/files/jsdemo/witch.html" title="Quarto Documentation"></iframe>
|
|
|
|
|
|
# Future Work
|
|
|
|
- Evaluation of the recommendation system
|
|
- More specific analysis of account attributes
|
|
- Simulations for robustness
|
|
|
|
# References {#refs .scrollable} |