Exploring Passing Networks

The analysis of passing in elite soccer is common place. Often media shows simple pass counts and pass completion rates but there are much better ways of viewing this type of data. For example, we can create a passing network based on the average position of players when making a pass. We can also show where the number of passes they make between themselves and another player. This could be extremely powerful data to show how the passes between players and from where on the pitch. So let’s take a look at a couple of passing networks below based on data from the FA Women’s Super League.

An example of the code to create a passing network can be found below. This code may have a better method, but this is something that I found to work quite effectively. Any comments would be appreciated.

data2 <- data1 %>% 
  filter(match_id == 19770) %>% 
  filter(team.id == 746)

  # full game passing stats for labels
  passes1 <- data2 %>% 
    filter(type.name == "Pass") %>% 
    group_by(pass.outcome.name) %>% 
    tally() %>% 
    filter(!pass.outcome.name %in% c("Injury Clearance", "Unknown")) %>% 
    mutate(pass.outcome.name = fct_explicit_na(pass.outcome.name, "Complete"))
  pass_n1 <- sum(passes1$n)
  pass_pc1 <- passes1[passes1$pass.outcome.name == "Complete",]$n / pass_n1 * 100
  # filter events before time of first substitution, if at least one substitution
  min_events1 <- data2 %>% 
    group_by(player.id) %>% 
    dplyr::summarise(period = min(period), timestamp = min(timestamp)) %>% 
    na.omit() %>% 
    arrange(period, timestamp)
  if(nrow(min_events1) > 11) {
    max_event1 <- min_events1[12,]
    idx1 <- which(data2$period == max_event1$period & data2$timestamp == max_event1$timestamp) - 1
    data2 <- data2[1:idx1,]
  # get nodes and edges for plotting
  # node position and size based on touches
  nodes1 <- data2 %>% 
    filter(type.name %in% c("Pass", "Ball Receipt*", "Ball Recovery", "Shot", "Dispossessed", "Interception", "Clearance", "Dribble", "Shot", "Goal Keeper", "Miscontrol", "Error")) %>% 
    group_by(player.id, player.name) %>% 
    dplyr::summarise(x = mean(x, na.rm=T), y = mean(y, na.rm=T), events = n()) %>% 
    na.omit() %>% 
  # edges based only on completed passes
  edgelist1 <- data2 %>% 
    mutate(pass.outcome.name = fct_explicit_na(pass.outcome.name, "Complete")) %>%
    filter(type.name == "Pass" & pass.outcome.name == "Complete") %>% 
    select(from = player.name, to = pass.recipient.name) %>% 
    group_by(from, to) %>% 
    dplyr::summarise(n = n()) %>% 
  edges1 <- left_join(edgelist1, 
            nodes1 %>% select(player.id, player.name, x, y),
            by = c("from" = "player.name"))
  edges1 <- left_join(edges1, 
            nodes1 %>% select(player.id, player.name, xend = x, yend = y),
            by = c("to" = "player.name"))
  edges1 <- edges1 %>% 
    group_by(player1 = pmin(from, to), player2 = pmax(from, to)) %>% 
    dplyr::summarise(n = sum(n), x = x[1], y = y[1], xend = xend[1], yend = yend[1]) %>% 
  # filter minimum number of passes and rescale line width
  nodes1 <- nodes1 %>% 
    mutate(events = rescale(events, c(2, 30), c(1, 200)))

  # rescale node size
  edges1 <- edges1 %>% 
    filter(n >= 3) %>%
    mutate(n = rescale(n, c(1, 30), c(3, 75)))

From this, we can see two different plots. Arsenal kept their centre backs quite close together, with their full backs getting high up the pitch, possibly to aid in attack, whilst their midfield three stayed quite compact.

Manchester City on the other hand had their centre backs spread apart and their full backs close. Their midfield appears to have had two holding players with possibly one higher up to offer more in attack in this game.

As we can see, this can offer a lot of insights to how a team might play, but this is only the start. We can expand these further to provide even greater insights in to the data we are viewing. That is something I will expand on in future posts.

comments powered by Disqus