Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data - Web Scraping & Radar Plots - Sweep Sports Analytics Sweep Sports Analytics (2024)

Reading Time: 12 minutes

* NOTE (August, 2022): There has been an issue with reading the tables using the “htmltab” function in steps 5 and 6. I have added the updated working code here. I would recommend you look at the latest tutorial here: https://sweepsportsanalytics.com/2022/07/31/soccer-analytics-tutorial-scraping-epl-data-using-r-2022-update/

Finally, a day off from EURO 2020 action! A day we could just sit back and relax. I enjoyed seeing the Colombian Luis Diaz’ amazing goal against Brazil and Seleção’s controversial come back. I stayed up to watch the Atlanta Hawks win another Game 1, this time against Giannis and the Bucks in the Eastern Conference Finals. Fun times.

We have received a few requests from sports analytics enthusiasts through our Instagram and Facebook pages for guides. Below is a written guide on (a) scraping data from fbref.com, (b) manipulating the data for analysis, (c) creating radar plots.

Disclaimers:

  • I started writing this at around midnight and it took a few hours. I woke up at 6 AM to the sound of my 10-week-old daughter trying to sing what I am assuming was an Iron Maiden song from the late 80s. That means I finished writing this while tired, but fully loaded with caffeine and a good mood. Please mind any mistakes, typos, and the use of GIFs.
  • I am not a professor. I am not a computer scientist. I do not have much experience in teaching. I am a passionate sports fan with a love for data. Yes, I do have an expertise in data science. But, some code might be messy. Some code can be improved. Being results-oriented, I only care that it works. So, thanks for being considerate!
  • This is my first ever tutorial so please provide some feedback. Feel free to contact us!
  • There are a few affiliate links throughout the post leading to some cool products I like and have bought myself.

For a visual walkthrough, check out our video here. It came out a bit longer than expected. It has some extra detail and explanations.

Let’s dig in!

Step 1: Download R Studio

The debate on which programming language is best for data science has been going on for a while. R and Python are the main choices. Both are awesome and it’s rather a matter of preference, as well as what kind of projects you have in mind.

That being said, having a statistical background, I have opted to use R. So, first step, if you have not done so, download the latest version of R and R Studio from the links below.

https://cran.r-project.org/

https://www.rstudio.com/products/rstudio/download/

Step 2: Install packages

R has A LOT of packages you can use. Let’s start by installing the ones we use.

Open R Studio and run the below commands one by one. When installing the package “colorspace”, type “no” and Enter if prompted.

###### Step 2: Install packages###### For the below 2 commands, if prompted, type "no" and Enterinstall.packages("colorspace")install.packages("curl")install.packages("BasketballAnalyzeR")install.packages("ggplot2")install.packages("htmltab")install.packages("stringr")install.packages("dplyr")install.packages("gridExtra")install.packages("cowplot")

Once you install the above packages once, you will no longer need to install them on your system.

Step 3: Load libraries

Run the below commands to load the libraries we use.

###### Step 3: Load libraries#####library(curl)library(BasketballAnalyzeR)library(ggplot2)library(htmltab)library(stringr)library(dplyr)library(gridExtra)library(cowplot)

Step 4: Read fbref.com URLs

All data in this tutorial is from the free resource fbref.com. It’s a great place for statistics and historical data. I really appreciate the work these folks have done. Have a look to see what’s available

Run the below code.

###### Step 4: Read fbref.com URLs###### Group Aurl1 <- "https://fbref.com/en/matches/caa84313/Italy-Switzerland-June-16-2021-UEFA-Euro"url2 <- "https://fbref.com/en/matches/95a9ebd1/Turkey-Italy-June-11-2021-UEFA-Euro"url3 <- "https://fbref.com/en/matches/f09b64db/Turkey-Wales-June-16-2021-UEFA-Euro"url4 <- "https://fbref.com/en/matches/d9eaa85c/Wales-Switzerland-June-12-2021-UEFA-Euro"url5 <- "https://fbref.com/en/matches/b756c626/Italy-Wales-June-20-2021-UEFA-Euro"url6 <- "https://fbref.com/en/matches/fa85a731/Switzerland-Turkey-June-20-2021-UEFA-Euro"url_group_A <- rbind(url1, url2, url3, url4, url5, url6)# Group Burl7 <- "https://fbref.com/en/matches/e594174b/Belgium-Russia-June-12-2021-UEFA-Euro"url8 <- "https://fbref.com/en/matches/25bb1fa2/Denmark-Belgium-June-17-2021-UEFA-Euro"url9 <- "https://fbref.com/en/matches/2c48acb2/Finland-Russia-June-16-2021-UEFA-Euro"url10 <- "https://fbref.com/en/matches/c3c2ffa2/Denmark-Finland-June-12-2021-UEFA-Euro"url11 <- "https://fbref.com/en/matches/bd35edec/Finland-Belgium-June-21-2021-UEFA-Euro"url12 <- "https://fbref.com/en/matches/04188c5c/Russia-Denmark-June-21-2021-UEFA-Euro"url_group_B <- rbind(url7, url8, url9, url10, url11, url12)# Group Curl13 <- "https://fbref.com/en/matches/f3d39a29/Netherlands-Austria-June-17-2021-UEFA-Euro"url14 <- "https://fbref.com/en/matches/b47a0ea6/Austria-North-Macedonia-June-13-2021-UEFA-Euro"url15 <- "https://fbref.com/en/matches/e0eed6e8/Ukraine-North-Macedonia-June-17-2021-UEFA-Euro"url16 <- "https://fbref.com/en/matches/0e9919a5/Netherlands-Ukraine-June-13-2021-UEFA-Euro"url17 <- "https://fbref.com/en/matches/841065f5/North-Macedonia-Netherlands-June-21-2021-UEFA-Euro"url18 <- "https://fbref.com/en/matches/7ed46abd/Ukraine-Austria-June-21-2021-UEFA-Euro"url_group_C <- rbind(url13, url14, url15, url16, url17, url18)# Group Durl19 <- "https://fbref.com/en/matches/6599f4ab/Scotland-Czech-Republic-June-14-2021-UEFA-Euro"url20 <- "https://fbref.com/en/matches/1e930db9/Croatia-Czech-Republic-June-18-2021-UEFA-Euro"url21 <- "https://fbref.com/en/matches/764c27dc/England-Croatia-June-13-2021-UEFA-Euro"url22 <- "https://fbref.com/en/matches/027b11df/England-Scotland-June-18-2021-UEFA-Euro"url23 <- "https://fbref.com/en/matches/20b1972b/Czech-Republic-England-June-22-2021-UEFA-Euro"url24 <- "https://fbref.com/en/matches/0305e42c/Croatia-Scotland-June-22-2021-UEFA-Euro"url_group_D <- rbind(url19, url20, url21, url22, url23, url24)# Group Eurl25 <- "https://fbref.com/en/matches/107fd412/Spain-Sweden-June-14-2021-UEFA-Euro"url26 <- "https://fbref.com/en/matches/d35ad7a8/Poland-Slovakia-June-14-2021-UEFA-Euro"url27 <- "https://fbref.com/en/matches/c6533f76/Sweden-Slovakia-June-18-2021-UEFA-Euro"url28 <- "https://fbref.com/en/matches/14874531/Spain-Poland-June-19-2021-UEFA-Euro"url29 <- "https://fbref.com/en/matches/ee6087f4/Sweden-Poland-June-23-2021-UEFA-Euro"url30 <- "https://fbref.com/en/matches/7b46b857/Slovakia-Spain-June-23-2021-UEFA-Euro"url_group_E <- rbind(url25, url26, url27, url28, url29, url30)# Group Furl31 <- "https://fbref.com/en/matches/95d34c87/France-Germany-June-15-2021-UEFA-Euro"url32 <- "https://fbref.com/en/matches/ba500d70/Hungary-Portugal-June-15-2021-UEFA-Euro"url33 <- "https://fbref.com/en/matches/988198ba/Hungary-France-June-19-2021-UEFA-Euro"url34 <- "https://fbref.com/en/matches/e33c4403/Portugal-Germany-June-19-2021-UEFA-Euro"url35 <- "https://fbref.com/en/matches/5a7e53d8/Portugal-France-June-23-2021-UEFA-Euro"url36 <- "https://fbref.com/en/matches/a4888546/Germany-Hungary-June-23-2021-UEFA-Euro"url_group_F <- rbind(url31, url32, url33, url34, url35, url36)

Step 5: Read a single pair of tables for a single game

I will now read two single tables, the summary stats of Portugal players and the summary stats of France players for the game between them on June 23rd.

###### Step 5: Read a single pair of tables for a single game###### Choose a game from the list of URLs from the previous stepselected_game <- url35# Some data manipulation to get the date and teams from the URLsgame_data <- substr(selected_game, 39, nchar(selected_game)-10)date <- substr(game_data, nchar(game_data)-11, nchar(game_data))teams <- substr(game_data, 1, nchar(game_data)-13)teams <- str_replace(teams, "Czech-Republic", "Czech Republic")teams <- str_replace(teams, "North-Macedonia", "North Macedonia")teamA <- sub("-.*", "", teams)teamB <- sub(".*-", "", teams)#read the first pair of tables#First we need the URL. url <- url35#Let's get the html data and assign the 4th table to the variable statA, indicating team A.statA <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[4]]#We see that the column names are messed up because of the way the stats table is set up. #The header row as well as the first row contain header info, so let's create new column names using both rows.colnames(statA) <- paste0(colnames(statA), " >> ", statA[1, ])names(statA)[1:5] <- paste0(statA[1,1:5])#Then let's delete the first row.statA <- statA[-c(1),]#Add the date and team names to the stats.statA <- cbind(date, Team=teamA, Opponent=teamB, statA)#Read the html and get the 11th table, which is the same type of stats for the opposing team.statB <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[11]]colnames(statB) <- paste0(colnames(statB), " >> ", statB[1, ])names(statB)[1:5] <- paste0(statB[1,1:5])statB <- statB[-c(1),]statB <- cbind(date, Team=teamB, Opponent=teamA, statB)stat_both <- rbind(statA, statB)#define the game's data frameall_stat <- stat_bothSys.sleep(15)#combine the two table rowsstat_both <- rbind(statA, statB)stat_both$Player <- str_trim(stat_both$Player, side = c("both", "left", "right"))

Let’s have a look at our table.

View(stat_both)

Step 6: Read all tables for all games

Now that we’ve seen how to get data for one game and one type of table, let’s get data for ALL games and ALL tables. Yes, I want it all.

###### Step 6: Read all tables for all games######combine all game URLs for all groupsselected_urls <- rbind(url_group_A, url_group_B, url_group_C, url_group_D, url_group_E, url_group_F)#initialize tablesall_stat <- NULLfull_stat <- NULLfor (g in 1:length(selected_urls)){ # Get the game info from the URL game_data <- substr(selected_urls[g], 39, nchar(selected_urls[g])-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) #Let's get the html data and assign the 4th table to the variable statA, indicating team A. statA <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[4]] #We see that the column names are messed up because of the way the stats table is set up. #The header row as well as the first row contain header info, so let's create new column names using both rows. colnames(statA) <- paste0(colnames(statA), " >> ", statA[1, ]) names(statA)[1:5] <- paste0(statA[1,1:5]) #Then let's delete the first row. statA <- statA[-c(1),] #Add the date and team names to the stats. statA <- cbind(date, Team=teamA, Opponent=teamB, statA) #Read the html and get the 11th table, which is the same type of stats for the opposing team. statB <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[11]] colnames(statB) <- paste0(colnames(statB), " >> ", statB[1, ]) names(statB)[1:5] <- paste0(statB[1,1:5]) statB <- statB[-c(1),] statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) stat_both$Player <- str_trim(stat_both$Player, side = c("both", "left", "right")) #define the game's data frame all_stat <- stat_both Sys.sleep(15) #loop for all tables related to the game for(i in 5:10){ statA <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[i]] colnames(statA) <- paste0(colnames(statA), " >> ", statA[1, ]) names(statA)[1:6] <- paste0(statA[1,1:6]) statA <- statA[-c(1),] statA <- cbind(date, Team=teamA, Opponent=teamB, statA) statB <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[i+7]] colnames(statB) <- paste0(colnames(statB), " >> ", statB[1, ]) names(statB)[1:6] <- paste0(statB[1,1:6]) statB <- statB[-c(1),] statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) ifelse (i==10 ,all_stat <- merge(all_stat, stat_both, by=c("Player","date", "Team", "Opponent", "Age", "Min"), all=T), all_stat <- merge(all_stat, stat_both, all=T)) Sys.sleep(15) } #add the game tables to the total data frame full_stat <- rbind(full_stat, all_stat)}#remove any duplicatesall_stat_full <- unique(full_stat)#convert all stats into numeric variablesall_stat_full <- cbind(all_stat_full[,1:4], mutate_all(all_stat_full[,5:ncol(all_stat_full)], function(x) as.numeric(as.character(x))))#export the table to CSVwrite.csv(all_stat_full,"all_stat_full.csv")

You can access the file here.

Step 7: Create summary data frame

The core of data exploration: the pivot table. In R we do this with the help of the dplyr package. We take the data frame we have, group the data by the player, and we summarise the stats by summing them.

I always like viewing the table after pivoting.

###### Step 7: Create summary data frame######Sum all stats for each playerall_stat_full <- all_stat_full %>% group_by(Player) %>% summarise_each(list(sum))View(all_stat_full)
Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data - Web Scraping & Radar Plots - Sweep Sports Analytics Sweep Sports Analytics (1)

Step 8: Select players

We all have our favorite players as well as the ones that catch our attention, for good or bad reasons. As a New York Knicks fan, I recently developed a disliking for Trae Young and watch his stats closely. As a person that has bet (and lost) that “football’s coming home” for 7 straight major international tournaments, I enjoy looking at England stats.

Below I have selected 8 players that have been on the spotlight so far in these Euros.

#Look at the available player names.View(unique(all_stat_full$Player))#Select the players you want to see. Choose 8 players for better visual results.selected_players <- subset(all_stat_full, Player=="Kylian Mbappé" | Player=="Antoine Griezmann" | Player=="Harry Kane" | Player=="Kai Havertz" | Player=="Cristiano Ronaldo" | Player=="Álvaro Morata" | Player=="Memphis Depay" | Player=="Patrik Schick")

Step 9: Create the radar plots

As you may know we’ve been doing a bunch of basketball analytics. I can’t stress how lucky I am to have come across the great book Basketball Data Science with Applications in R. Anyone interested in basketball analytics should definitely get their hands on a copy. The BasketballAnalyzeR R package is simply amazing.

One of the cool things the authors and creators of the BasketballAnalyzeR R package have created is a radar plot format. So I apply a function intended for basketball analytics to soccer. Why not?

###### Step 9: Create the radar plots######attach the datasetattach(selected_players)#select the statistics we want to see and prepare for the plotSel <- data.frame("xG"=`Expected >> xG`, "Dr"=`Dribbles >> Succ`, "Pass"=`Passes >> Cmp`, "Sh"=`Performance >> Sh`, "SoT"=`Performance >> SoT`, "KP"=`KP`)Sel <- mutate_all(Sel, function(x) as.numeric(as.character(x)))#run the radialprofile function with std=T, which standardizes the data so that the scale looks normalp <- radialprofile(data=Sel, title=selected_players$Player, std=T)detach(selected_players)
Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data - Web Scraping & Radar Plots - Sweep Sports Analytics Sweep Sports Analytics (2)

Step 10: Make the graph presentable

Let’s reformat the graph, add titles and captions, and save it to our computer.

###### Step 10: Make the graph presentable#####g <- grid.arrange(grobs=p[1:length(p)], ncol=3)g2 <- cowplot::ggdraw(g)+theme_grey()+labs(title="Selected Players Radar Plots", subtitle="Data from fbref.com. Aggregated data from EURO 2020 Group Stage Matches. Stat values are standardized (μ=0, sd=1", caption = "@Sweep_SportsAnalytics")g2ggsave("radar-plot.png", w = 7.5, h = 7.5, dpi = 400)#create a table with descriptions for the stats we chosedescriptions <- data.frame( "Category"=colnames(Sel), "Description"=c("Expected Goals", "Successful Dribbles", "Completed Passes", "Shots", "Shots on Target", "Key Passes"))library(kableExtra)library(magick)descr <- tableGrob(print(descriptions, row.names = F))g_final <- g2 + annotation_custom(descr, xmin = 0.75, xmax = 0.85, ymin = 0.1, ymax = 0.2) + coord_cartesian(clip = "off")ggsave("radar-key-final.png", w = 7.5, h = 7.5, dpi = 400)
Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data - Web Scraping & Radar Plots - Sweep Sports Analytics Sweep Sports Analytics (3)

Step 11: Interpret the graph

Data analysis doesn’t mean much if you can’t answer the basic question: “So what?”

Interpreting your findings is the key to any analytics. Keep in mind that sports analytics have been around for over a decade, but you don’t see many data nerds managing a team. The best managers and sports personnel know what to do with the results of the analysis. They have a deep understanding of the game, and that’s most important.

I would love to see your interpretation of any players and stats you analyze! An idea for you: change the URLs to Copa America matches and select some of the stars.

Feel free to share your findings in the comments below or on our Instagram or Facebook pages.

Full Code Below

###### Step 2: Install packages###### For the below 2 commands, if prompted, type "no" and Enterinstall.packages("colorspace")install.packages("curl")install.packages("BasketballAnalyzeR")install.packages("ggplot2")install.packages("htmltab")install.packages("stringr")install.packages("dplyr")install.packages("gridExtra")install.packages("cowplot")###### Step 3: Load libraries#####library(curl)library(BasketballAnalyzeR)library(ggplot2)library(htmltab)library(stringr)library(dplyr)library(gridExtra)library(cowplot)###### Step 4: Read fbref.com URLs###### Group Aurl1 <- "https://fbref.com/en/matches/caa84313/Italy-Switzerland-June-16-2021-UEFA-Euro"url2 <- "https://fbref.com/en/matches/95a9ebd1/Turkey-Italy-June-11-2021-UEFA-Euro"url3 <- "https://fbref.com/en/matches/f09b64db/Turkey-Wales-June-16-2021-UEFA-Euro"url4 <- "https://fbref.com/en/matches/d9eaa85c/Wales-Switzerland-June-12-2021-UEFA-Euro"url5 <- "https://fbref.com/en/matches/b756c626/Italy-Wales-June-20-2021-UEFA-Euro"url6 <- "https://fbref.com/en/matches/fa85a731/Switzerland-Turkey-June-20-2021-UEFA-Euro"url_group_A <- rbind(url1, url2, url3, url4, url5, url6)# Group Burl7 <- "https://fbref.com/en/matches/e594174b/Belgium-Russia-June-12-2021-UEFA-Euro"url8 <- "https://fbref.com/en/matches/25bb1fa2/Denmark-Belgium-June-17-2021-UEFA-Euro"url9 <- "https://fbref.com/en/matches/2c48acb2/Finland-Russia-June-16-2021-UEFA-Euro"url10 <- "https://fbref.com/en/matches/c3c2ffa2/Denmark-Finland-June-12-2021-UEFA-Euro"url11 <- "https://fbref.com/en/matches/bd35edec/Finland-Belgium-June-21-2021-UEFA-Euro"url12 <- "https://fbref.com/en/matches/04188c5c/Russia-Denmark-June-21-2021-UEFA-Euro"url_group_B <- rbind(url7, url8, url9, url10, url11, url12)# Group Curl13 <- "https://fbref.com/en/matches/f3d39a29/Netherlands-Austria-June-17-2021-UEFA-Euro"url14 <- "https://fbref.com/en/matches/b47a0ea6/Austria-North-Macedonia-June-13-2021-UEFA-Euro"url15 <- "https://fbref.com/en/matches/e0eed6e8/Ukraine-North-Macedonia-June-17-2021-UEFA-Euro"url16 <- "https://fbref.com/en/matches/0e9919a5/Netherlands-Ukraine-June-13-2021-UEFA-Euro"url17 <- "https://fbref.com/en/matches/841065f5/North-Macedonia-Netherlands-June-21-2021-UEFA-Euro"url18 <- "https://fbref.com/en/matches/7ed46abd/Ukraine-Austria-June-21-2021-UEFA-Euro"url_group_C <- rbind(url13, url14, url15, url16, url17, url18)# Group Durl19 <- "https://fbref.com/en/matches/6599f4ab/Scotland-Czech-Republic-June-14-2021-UEFA-Euro"url20 <- "https://fbref.com/en/matches/1e930db9/Croatia-Czech-Republic-June-18-2021-UEFA-Euro"url21 <- "https://fbref.com/en/matches/764c27dc/England-Croatia-June-13-2021-UEFA-Euro"url22 <- "https://fbref.com/en/matches/027b11df/England-Scotland-June-18-2021-UEFA-Euro"url23 <- "https://fbref.com/en/matches/20b1972b/Czech-Republic-England-June-22-2021-UEFA-Euro"url24 <- "https://fbref.com/en/matches/0305e42c/Croatia-Scotland-June-22-2021-UEFA-Euro"url_group_D <- rbind(url19, url20, url21, url22, url23, url24)# Group Eurl25 <- "https://fbref.com/en/matches/107fd412/Spain-Sweden-June-14-2021-UEFA-Euro"url26 <- "https://fbref.com/en/matches/d35ad7a8/Poland-Slovakia-June-14-2021-UEFA-Euro"url27 <- "https://fbref.com/en/matches/c6533f76/Sweden-Slovakia-June-18-2021-UEFA-Euro"url28 <- "https://fbref.com/en/matches/14874531/Spain-Poland-June-19-2021-UEFA-Euro"url29 <- "https://fbref.com/en/matches/ee6087f4/Sweden-Poland-June-23-2021-UEFA-Euro"url30 <- "https://fbref.com/en/matches/7b46b857/Slovakia-Spain-June-23-2021-UEFA-Euro"url_group_E <- rbind(url25, url26, url27, url28, url29, url30)# Group Furl31 <- "https://fbref.com/en/matches/95d34c87/France-Germany-June-15-2021-UEFA-Euro"url32 <- "https://fbref.com/en/matches/ba500d70/Hungary-Portugal-June-15-2021-UEFA-Euro"url33 <- "https://fbref.com/en/matches/988198ba/Hungary-France-June-19-2021-UEFA-Euro"url34 <- "https://fbref.com/en/matches/e33c4403/Portugal-Germany-June-19-2021-UEFA-Euro"url35 <- "https://fbref.com/en/matches/5a7e53d8/Portugal-France-June-23-2021-UEFA-Euro"url36 <- "https://fbref.com/en/matches/a4888546/Germany-Hungary-June-23-2021-UEFA-Euro"url_group_F <- rbind(url31, url32, url33, url34, url35, url36)###### Step 5: Read a single pair of tables for a single game###### Choose a game from the list of URLs from the previous stepselected_game <- url35# Some data manipulation to get the date and teams from the URLsgame_data <- substr(selected_game, 39, nchar(selected_game)-10)date <- substr(game_data, nchar(game_data)-11, nchar(game_data))teams <- substr(game_data, 1, nchar(game_data)-13)teams <- str_replace(teams, "Czech-Republic", "Czech Republic")teams <- str_replace(teams, "North-Macedonia", "North Macedonia")teamA <- sub("-.*", "", teams)teamB <- sub(".*-", "", teams)#define the nodenode <- "#stats_b561dd30_defense"#add the node to the URLurl <- paste0(selected_game, node)#read first table and add the date and teamsstatA <- htmltab(doc = url, which = 4, rm_nodata_cols = F)statA <- cbind(date, Team=teamA, Opponent=teamB, statA)#read second table and add the date and teamsstatB <- htmltab(doc = url, which = 11, rm_nodata_cols = F)statB <- cbind(date, Team=teamB, Opponent=teamA, statB)#combine the two table rowsstat_both <- rbind(statA, statB)stat_both$Player <- str_trim(stat_both$Player, side = c("both", "left", "right"))View(stat_both)###### Step 6: Read all tables for all games######combine all game URLs for all groupsselected_urls <- rbind(url_group_A, url_group_B, url_group_C, url_group_D, url_group_E, url_group_F)#initialize tablesall_stat <- NULLfull_stat <- NULLfor (g in 1:length(selected_urls)){ # Get the game info from the URL game_data <- substr(selected_urls[g], 39, nchar(selected_urls[g])-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) #Let's get the html data and assign the 4th table to the variable statA, indicating team A. statA <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[4]] #We see that the column names are messed up because of the way the stats table is set up. #The header row as well as the first row contain header info, so let's create new column names using both rows. colnames(statA) <- paste0(colnames(statA), " >> ", statA[1, ]) names(statA)[1:5] <- paste0(statA[1,1:5]) #Then let's delete the first row. statA <- statA[-c(1),] #Add the date and team names to the stats. statA <- cbind(date, Team=teamA, Opponent=teamB, statA) #Read the html and get the 11th table, which is the same type of stats for the opposing team. statB <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[11]] colnames(statB) <- paste0(colnames(statB), " >> ", statB[1, ]) names(statB)[1:5] <- paste0(statB[1,1:5]) statB <- statB[-c(1),] statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) stat_both$Player <- str_trim(stat_both$Player, side = c("both", "left", "right")) #define the game's data frame all_stat <- stat_both Sys.sleep(15) #loop for all tables related to the game for(i in 5:10){ statA <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[i]] colnames(statA) <- paste0(colnames(statA), " >> ", statA[1, ]) names(statA)[1:6] <- paste0(statA[1,1:6]) statA <- statA[-c(1),] statA <- cbind(date, Team=teamA, Opponent=teamB, statA) statB <- curl::curl(url) %>% xml2::read_html() %>% rvest::html_nodes('table') %>% rvest::html_table() %>% .[[i+7]] colnames(statB) <- paste0(colnames(statB), " >> ", statB[1, ]) names(statB)[1:6] <- paste0(statB[1,1:6]) statB <- statB[-c(1),] statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) ifelse (i==10 ,all_stat <- merge(all_stat, stat_both, by=c("Player","date", "Team", "Opponent", "Age", "Min"), all=T), all_stat <- merge(all_stat, stat_both, all=T)) Sys.sleep(15) } #add the game tables to the total data frame full_stat <- rbind(full_stat, all_stat)}#remove any duplicatesall_stat_full <- unique(full_stat)#convert all stats into numeric variablesall_stat_full <- cbind(all_stat_full[,1:4], mutate_all(all_stat_full[,5:ncol(all_stat_full)], function(x) as.numeric(as.character(x))))#export the table to CSVwrite.csv(all_stat_full,"all_stat_full.csv")###### Step 7: Create summary data frame - pivot table######Sum all stats for each playerall_stat_full <- all_stat_full %>% group_by(Player) %>% summarise_each(list(sum))View(all_stat_full)######Step 8: Select players######Select the players you want to see. Choose 8 players for better visual results.selected_players <- subset(all_stat_full, Player=="Kylian Mbappé" | Player=="Antoine Griezmann" | Player=="Harry Kane" | Player=="Kai Havertz" | Player=="Cristiano Ronaldo" | Player=="Álvaro Morata" | Player=="Memphis Depay" | Player=="Patrik Schick")###### Step 9: Create the radar plots######attach the datasetattach(selected_players)#select the statistics we want to see and prepare for the plotSel <- data.frame("xG"=`Expected >> xG`, "Dr"=`Dribbles >> Succ.x`, "Pass"=`Passes >> Cmp`, "Sh"=`Performance >> Sh`, "SoT"=`Performance >> SoT`, "KP"=`KP`)Sel <- mutate_all(Sel, function(x) as.numeric(as.character(x)))#run the radialprofile function with std=T, which standardizes the data so that the scale looks normalp <- radialprofile(data=Sel, title=selected_players$Player, std=T)detach(selected_players)###### Step 10: Make the graph presentable#####g <- grid.arrange(grobs=p[1:length(p)], ncol=3)g2 <- cowplot::ggdraw(g)+theme_grey()+ labs(title="Selected Players Radar Plots", subtitle="Data from fbref.com. Aggregated data from EURO 2020 Group Stage Matches.\nStat values are standardized (μ=0, sd=1).", caption = "@Sweep_SportsAnalytics")g2ggsave("radar-plot.png", w = 9, h = 9, dpi = 400)#create a table with descriptions for the stats we chosedescriptions <- data.frame( "Category"=colnames(Sel), "Description"=c("Expected Goals", "Successful Dribbles", "Completed Passes", "Shots", "Shots on Target", "Key Passes"))descr <- tableGrob(print(descriptions, row.names = F))#add the description tableg_final <- g2 + annotation_custom(descr, xmin = 0.8, xmax = 0.9, ymin = 0.1, ymax = 0.2) + coord_cartesian(clip = "off")g_finalggsave("radar-key-final.png", w = 9, h = 9, dpi = 400)

Related

Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data - Web Scraping & Radar Plots - Sweep Sports Analytics Sweep Sports Analytics (2024)
Top Articles
The 11 Best Craigslist Personals Ads Alternatives in 2024 - DoULike Blog
The Star Press Muncie In Obituaries
Goodbye Horses : L'incroyable histoire de Q Lazzarus - EklectyCity
4808460530
Papa's Pizzeria - Play Online at Coolmath Games
Jodie Sweetin Breast Reduction
Lux Nails Columbia Mo
Creepshot. Org
Northern Whooping Crane Festival highlights conservation and collaboration in Fort Smith, N.W.T. | CBC News
Scoped Courses - Bruiser Industries
I Don'T Give A Rat'S Ass: The Meaning And Origin Of This Phrase - Berry Patch Farms
8 Garden Sprayers That Work Hard So You Don't Have To
Premier Auto Works-- The House Of Cash Car Deals
How To Start Reading Usagi Yojimbo [Guide + Reading Order]
Rivers Edge Online Login Bonus
Kitchen Exhaust Cleaning Companies Clearwater
Get Got Lyrics
Free Bubble Letters Generator | Add bubble letters with a click!
Spaghetti Models | Cyclocane
Troy Bilt Belt Diagram
Kamala Harris, Donald Trump debate prompts major endorsem*nt, Fox News invitation for a 2nd face-off
Sean Mckenna Eagar Az
O'reilly's Eastman Georgia
E41.Ultipro.com
Seconds Valuable Fun Welcoming Gang Back Andy Griffith's Birthday A Top Wish So A Happy Birthday FZSW A Fabulous Man Kevin Talks About Times From Ten Day Weekend Fun Labor Day Break
Management Trainee: Associate Adjuster - June 2025
The Lives of Others - This American Life
I-80 New Jersey Traffic and Road Conditions
Milwaukee Nickname Crossword Clue
Cargurus Honda Accord
Qcp Lpsg
Stephanie Ruhle's Husband
Adriana Zambrano | Goosehead Insurance Agent in Metairie, Louisiana
Bank Of America Financial Center Irvington Photos
Official Klj
Hispanic supermarket chain Sedano's now delivering groceries in Orlando
Candy Land Santa Ana
Roses Gordon Highway
Sierra Vista Jail Mugshots
Owen Roeder Tim Dillon
What Does It Mean When Hulu Says Exp
ARK Fjordur: Ultimate Resource Guide | Where to Find All Materials - Games Fuze
4Myhr Mhub
How Much Does Costco Gas Cost Today? Snapshot of Prices Across the U.S. | CostContessa
Snapcamms
Currently Confined Coles County
76 Games Unblocked Fnf
Currently Confined Coles County
Server Jobs Near
Winta Zesu Net Worth
Vox Machina Wiki
Craigslist Org Sd Ca
Latest Posts
Article information

Author: Velia Krajcik

Last Updated:

Views: 5874

Rating: 4.3 / 5 (74 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.