Shukry Zablah

2021-12-04

An animated exploration of Bayesian updating.

Bayesian Updating

We recently saw the posterior approximation of the globe tossing model via grid approximation.

$$W \sim \textrm{Binomial}(N, p)$$ $$p \sim \textrm{Uniform}(0, 1)$$

library(tibble)
library(dplyr)
library(ggplot2)
library(gganimate)
library(stringr)

We generalized a function that could take a sequence of observations, e.g. WWWLWLLLW, and plot the posterior distribution based on the observed data.

plot_posterior_approximation <- function(sequence, grid_length = 25) {
  num_water <- stringr::str_count(sequence, "W")
  df <- tibble::tibble(grid = seq(from = 0, to = 1, length.out = grid_length),
             prior = rep(1, grid_length),
             likelihood = dbinom(num_water,
                                 size=stringr::str_length(sequence),
                                 prob=grid),
             posterior_unstandardized = likelihood * prior,
             posterior = posterior_unstandardized / sum(posterior_unstandardized))
  ggplot2::ggplot(df, ggplot2::aes(x = grid, y = posterior)) +
    ggplot2::geom_point() +
    ggplot2::geom_line() +
    ggplot2::labs(x = "probability of water",
         y = "posterior probability",
         title = "Posterior (grid approximation)")
}

To emphasize the process of Bayesian updating though, now we create animations of the posterior grid approximation being updated as if the sequence had been observed one observation at a time.

It's a simple addition. We require the posteriors of all subsequences in the same tidy dataframe.

compute_posterior_approximation <- function(sequence, grid_length = 25) {
  num_water <- stringr::str_count(sequence, "W")
  df <- tibble::tibble(grid = seq(from = 0, to = 1, length.out = grid_length),
             prior = rep(1, grid_length),
             likelihood = dbinom(num_water,
                                 size=stringr::str_length(sequence),
                                 prob=grid),
             posterior_unstandardized = likelihood * prior,
             posterior = posterior_unstandardized / sum(posterior_unstandardized))
  return(df)
}

compute_bayesian_update <- function(sequence, grid_length = 25) {
  df <- tibble::tibble()
  for (i in seq(from = 1, to = stringr::str_length(sequence) + 1)) {
    subsequence <- substr(sequence, 1, i)
    posterior_approximation <- compute_posterior_approximation(subsequence)
    posterior_approximation <- dplyr::mutate(posterior_approximation, sequence = subsequence)
    df <- bind_rows(df, posterior_approximation)
  }
  return(df)
}

animate_bayesian_update <- function(sequence, grid_length = 25) {
  df <- compute_bayesian_update(sequence, grid_length)
  animation <- ggplot2::ggplot(df, ggplot2::aes(x = grid, y = posterior, group = sequence)) +
    ggplot2::geom_point() +
    ggplot2::geom_line() +
    ggplot2::labs(x = "probability of water",
                  y = "posterior probability",
                  title = "Posterior (grid approximation)",
                  subtitle = "Subsequence {closest_state}") +
    gganimate::transition_states(sequence)
  animate(animation, fps=60)
}

So we can create the gif with one line:

animate_bayesian_update("WLWWWLWLW")

And we can look at the frames of the gif by plotting the facet:

ggplot(compute_bayesian_update("WLWWWLWLW"),
       aes(x = grid, y = posterior)) +
  geom_point() +
  geom_line() +
  labs(x = "probability of water",
       y = "posterior probability",
       title = "Posterior (grid approximation)") +
  facet_wrap(~ sequence)