come for the viz, stay for the craft (or vice versa)

Data Blog

Adding the dots, and formatting our arc plot

We are on the home straight now, and ready to complete our image. As you can tell by now, to work on a project where you have to gather data, format it, build out a visualisation, and then get it ready for print is sometimes not a straight forward exercise. If you work as a data visualisation designer, requests come in that often don’t include any data, so you need to be able to source it, clean it and format it. (I once had a potential client show me a whole range of data visualisations he liked, and when we got to the stage to discuss the sorts of data he had, he replied laughing “oh, I don’t have any data!”. And when he said none, he had none. Couldn’t get any either. Strangest meeting…)

Let’s recap what we’ve done so far…

  1. We built a data scraper here and here that created a file of monuments for Cook’s voyage in 1770.

  2. We built a prototype of the arc plot here

Next stage is to create the dot plot that sits under the arc plot, combine them, and get them all spruced up so they can be used for publication.

My initial sketch was this:

IMG_8068.JPG

This was the prototype that I originally submitted:

cook 1.jpg

I created this using standard R image exports and brought it into to illustrator and did most of the formatting there as I was in a bit of a rush. The initial data set was also still being worked on, so this was really just a proof of concept. .

A GENTLE REMINDER: This data is not accurate and should only be used in this example. Don’t use it for your research on monuments or Cook or anything. Consider this a dummy data file!

What we will do below though is much more reproducible, so if the data set gets updated for any reason, or we change the historical event that the graphic is focussed on, it will be a breeze to recreate.

We left the last stage with an arc plot that looked like this:

Rplot05.png


It looks OK, but also like a pretty standard R graph. My goal in this visualisation was to explore or communicate two things;

  1. The pace of the number of monuments that have been dedicated since 1770 (does it slow down over time? Does it pick up? The aim is to get the viewer look and think about what was going on socially or politically when a monument is erected, ro why is that monument dedication pace increases over time and doesn’t diminish.)

  2. The number dedicated each year. In addition to the pace of the dedications, are there any specific periods of greater volume? We knew there would be a peak in 1970, but what we didn’t expect was the consistent, steady dedications after this period)

We knew the visualisation would raise more questions than provide answers. Why does Australia continue to celebrate Cook and the Endeavour at such a consistent pace post 1970? What are we trying to say about our past and our history if we keep focussing on events like this? Who is erecting these monuments and why? These are bigger questions that the overall project would delve into, but here…we are just going to concentrate on the creation of this graphic.

Creating the Dot Plot

Under the arc plot, we need to create a dot plot stringing from each of the arcs. Each dot will represent a monument.

Open up R Studio, and make sure that you have the following packages in your library (if you don’t have all these packages already, go and grab them from CRAN - there are a few extra since the last post)

library(tidyverse)
library(lubridate)
library(ggraph)
library(tidygraph)
library(scales)
library(grid)
library(wesanderson)
library(egg)

Make sure you also have the data file from the last exercise too called cookExample, or get it now using this line:

cookExample <- readr::read_csv("https://raw.githubusercontent.com/KellyTall/Hellomister_DataBlog/master/cookExample.csv")

We’ll start off with a basic dotplot, and the gradually build the version we need.

Run the below code in R. You should get a warning message about the binwidth you are using, but ignore this for now

dot1 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot()

We are using Dedication_Year as the x axis and colouring each of the dots by the figure the monument was dedicated to.

This looks a bit cluttered, the 1970 number expands beyond the plot area, and the dots for each figure from the same year aren’t stacked.

You’ll also notice that while the y axis scale is a count, but the scale goes from 0 to 1. Ignore the scale. The main thing is that it’s a count. Apparently this is just what it does for a dot plot. Using other plots (like a histogram) will show the count, but this geom will always show the scale in this fashion.

Let’s work further to fix these things up and add in some of the detail we are missing.

Run the following line:

dot2 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot(binwidth = .1)
Rplot010.png

We have adjusted the binwidth to show us the maximum amount of detail, and it now shows all the activity in 1970, but it’s made it pretty hard to see what’s going on. Run the following so we can make the dots bigger, plus stack the dots :

dot3 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot(binwidth = .1, stackgroups = TRUE, binpositions = "all", dotsize = 10)

We’ve increased the dot size, kept the detail, and we have also stacked the groups on top of one and other so that the dots representing each of the names can be seen now. We’ve also forced it to show all the bin positions.

My design showed the dots running from top to bottom, so let’s do this now.

Run the following:

dot4 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down")
Rplot12.png

All the information we need is in the right place. It’s just a matter of formatting it.

I want to get rid of all the extra stuff on the chart I don’t want to see, and so I will create a theme I can add to both the dot plot and arc graph. This theme will remove all the unnecessary detail in the chart that we don’t need. I won’t explain what each line does. If you are keen to understand it further, head over to this page and see how the incredible amount of options you have to play with in ggplot.

Run this:

cook_theme <-  theme(axis.line=element_blank(),
                     axis.text.y=element_blank(),
                     axis.ticks=element_blank(),
                     axis.title.x=element_blank(),
                     axis.title.y=element_blank(),
                     panel.background=element_blank(),
                     panel.border=element_blank(),
                     panel.grid.major=element_blank(),
                     panel.grid.minor=element_blank(),
                     plot.background=element_blank(),
                     plot.title = element_text(family="Helvetica Neue Light", size=20, face="plain"),
                     plot.subtitle =  element_text(family="Helvetica Neue Light", size=12.9, face="plain"),
                     legend.text =  element_text(family="Helvetica Neue Light", face="plain"),
                     legend.title = element_text(family="Helvetica Neue Light", face="plain"),
                     axis.text = element_text(family="Helvetica Neue Light", face="plain", size=6),
                     legend.key = element_blank())

Everytime I am calling “element_blank( )” I am telling ggplot to keep that area clean. I have also called Helvetica Neue Light as font to use my text areas - if you don’t have that in your system substitute for whatever you would like to use.

dot5 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down")+
        cook_theme

By adding the additional line at the end of the same piece of code, you have stripped away all of the elements that you don’t need in your final artwork. Before I really understood how to play with the ggplot theme, I would export to Illustrator and remove everything by hand. And if I had to change the chart, I’d have to export and remove it all again. It’s great to really get into theme manipulation so you can start to automate a lot of the manual or repetitive work that is required in Illustrator.

Try it now by replacing the black and white theme with your new cook theme for your arc plot (If you are not sure what the below is you need to go pack a post and see how we carted the arc plot - I am assuming that you have done the previous exercise and that you already have your cook_tody file sorted and you’ve created this arc plot already.)

cook5 <- ggraph(cook_tidy, layout = 'linear') + 
        geom_edge_arc(aes(width=weight, colour=Figure), alpha=.5) +
        cook_theme
Rplot15.png

Now, let’s see what we can do to improve the x axis. At the moment both plots are running using a default axis; the arc plot is starting at 1800, and ending at 2000, and the dot plot starts at 1850 ands at 2000. We need to make them uniform, and I also want to highlight that the arcs all start at 1770. Run the following to each of our existing code chunks. You can see I’ve added a line called “scale_x_continous” that specifies what we want to axis to start and finish at, as well as specify the break we want labelled. If you also run the last line called ggarrange, we are telling R to combine our two plots into one. (ggarrange comes from the egg package so make sure you have installed that and put it in your library. I have played around with a few grid tools, and this is my favourite one so far, because of the ways it handles the various areas that contributes to a chart or graph, and the way it aligns and allows for easy height and width adjustment. If you use any others, and have a preference let me know! If you are interested to know more about egg have a look at more info on this nice layout package here)

cook6 <- ggraph(cook_tidy, layout = 'linear') + 
        geom_edge_arc(aes(width=weight, colour=Figure), alpha=.5) +
        cook_theme +
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) 

dot6 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down")+
        cook_theme + 
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) 
    
ggarrange(cook6, dot6, heights = c(8, 4))  
Rplot16.png

As you can see we are getting closer and closer to the final image we need. I’d like the legend to be clearer, and not to repeat the colour encoding. I can remove it form the dot plot by adding an addition to the theme. I also would like to remove the stroke line around the dot shapes.

cook6 <- ggraph(cook_tidy, layout = 'linear') + 
        geom_edge_arc(aes(width=weight, colour=Figure), alpha=.5) +
        cook_theme +
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) 

dot6 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down", stroke =0)+
        cook_theme + 
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) +
        theme(legend.position = "none",
              title = element_blank())
    
ggarrange(cook6, dot6, heights = c(8, 4))    
Rplot18.png

Now, I’d like to fix up the axis text. I will move the text of the arc plot to the top by adding a position to the custom scale, and add some vertical reference lines to each of them (geom_vline). You can see that I have put this in the second line. This is because the graph is drawn by layer. If I add this to the end it will draw the lines on top of the data encodings, and I want it to sit behind them. If you are curious to see what it looks like, swap the lines out to the end of the chunk and see.

cook6 <- ggraph(cook_tidy, layout = 'linear') + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_edge_arc(aes(width=weight, colour=Figure), alpha=.5) +
        cook_theme +
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000), position="top") 

dot6 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down", stroke =0)+
        cook_theme + 
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) +
        theme(legend.position = "none",
              title = element_blank())
    
ggarrange(cook6, dot6, heights = c(8, 4))    
Rplot19.png

We need a title, and I will add that to the arc plot.

cook6 <- ggraph(cook_tidy, layout = 'linear') + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_edge_arc(aes(width=weight, colour=Figure), alpha=.5) +
        cook_theme +
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000), position="top") +
        labs(title="Visualising Cook Memorials",
             subtitle="Monuments dedicated to the voyage of the Endeavour from 1770 until today") 


dot6 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down", stroke =0)+
        cook_theme + 
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) +
        theme(legend.position = "none",
              title = element_blank())
    
ggarrange(cook6, dot6, heights = c(8, 4))    
Rplot20.png

The font family and text size is set in our theme, so you can play around with that to suit your needs. I want to also use different colours. I could ahve worked with the colours that I used for my orifginal prototype but I decided to use the Wes Anderson colour palette package created by Karthik Ram. Visit his github page for examples and more info. I have chose to work with the Moonrise Kingdom 3 as shown below:

Rplot21.png

I only through want to use colours 1, 3 and 5. I can do this by creating a subset of the colours

mr3 <- wes_palettes$Moonrise3

The output of my new object are the HEX values of the whole palette. All I have to do is call on the values for the first, third and fifth positions.

[1] "#85D4E3" "#F4B5BD" "#9C964A" "#CDC08C"

[5] "#FAD77B"


Run the following. Note I have added “scale_edge_colour” to the arc plot, and “scale_fill_manual” to the dot plot. Each of them essentially does the same thing, but as they are different types of encodings you always have to make sure you are using the right instructions for it to work.

cook6 <- ggraph(cook_tidy, layout = 'linear') + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_edge_arc(aes(width=weight, colour=Figure), alpha=.5) +
        scale_edge_colour_manual(values = mr3[c(1,3,5)])+
        cook_theme +
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000), position="top") +
        labs(title="Visualising Cook Memorials",
             subtitle="Monuments dedicated to the voyage of the Endeavour from 1770 until today") 


dot6 <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_dotplot(binwidth = 1, stackgroups = TRUE, binpositions = "all", stackdir = "down", stroke =0)+
        cook_theme + 
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) +
        theme(legend.position = "none",
              title = element_blank()) +
        scale_fill_manual(values = mr3[c(1,3,5)])
    
ggarrange(cook6, dot6, heights = c(8, 4))    

And we are pretty much done. You can also change the text of the x axis plus fix up the formatting of the legend, but I am not going to do this here.

(**UPDATE - I fixed up the formatting of the legend and the axis labels as they were BUGGING me. You can see the extra lines I have added into the theme. I also changed the title of the legend for the arc weight as that was bugging me too…You should be able to spot where I have done this in the complete code section at the bottom of the page ).

You can also mess around with the size of your export and the format you need it in. But the steps we have taken mean that if your data changes we will get the same style of image each time and no need for continual re-work in an external tool like Illustrator. This has saved me heaps of time in the past.

Rplot09.jpeg

The full code chunk is there. Let me know if you get a chance to use this, or if you have any feedback on how to improve it. As usual, this big old chunk o’ code can be found here

##libraries
library(tidyverse)
library(lubridate)
library(ggraph)
library(tidygraph)
library(scales)
library(grid)
library(wesanderson)
library(egg)

##data file import

cookExample <- readr::read_csv("https://raw.githubusercontent.com/KellyTall/Hellomister_DataBlog/master/cookExample.csv")

# write_csv(cookExample, "cookExample.csv")

# View(cookExample)

# prepare edge file
cook_edge <- cookExample %>% 
        mutate(from = 1770) %>% 
        select(from, Dedication_Year, Figure, Monument_Type) %>% 
        rename(to=Dedication_Year) %>% 
        mutate(to=as.numeric(to)) %>% 
        group_by(Figure, Monument_Type, from, to) %>% 
        summarise(weight = n()) %>% 
        na.omit()



## create your graph files
cook_tidy <- tbl_graph(edges=cook_edge, directed = TRUE)


##plot your arc plot



##theme
cook_theme <-  theme(axis.line=element_blank(),
                     axis.text.y=element_blank(),
                     axis.ticks=element_blank(),
                     axis.title.x=element_blank(),
                     axis.title.y=element_blank(),
                     panel.background=element_blank(),
                     panel.border=element_blank(),
                     panel.grid.major=element_blank(),
                     panel.grid.minor=element_blank(),
                     plot.background=element_blank(),
                     plot.title = element_text(family="Helvetica Neue Light", size=20, face="plain"),
                     plot.subtitle =  element_text(family="Helvetica Neue Light", size=12.9, face="plain"),
                     legend.text =  element_text(family="Helvetica Neue Light", face="plain"),
                     legend.title = element_text(family="Helvetica Neue Light", face="plain"),
                     axis.text = element_text(family="Helvetica Neue Light", face="plain", size=6),
                     legend.key = element_blank())

# names(wes_palettes)
##creating object with HEX colours from Wes Andreson Colour Palette /- here using Moonrise 3
mr3 <- wes_palettes$Moonrise3
# wes_palette("Moonrise3")

cook_dot <- ggplot(cookExample, aes(Dedication_Year, fill=Figure)) + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_dotplot(binwidth = .1, stackgroups = TRUE, binpositions = "all", dotsize = 10, stackdir = "down", stroke=0)+
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000)) +
        cook_theme +
        theme(legend.position = "none",
              title = element_blank()) +
        scale_fill_manual(values = mr3[c(1,3,5)]) ##this selects the 1, 2 and 3 HEX colours from the palette in object mr3
        
        

cook_arc <- ggraph(cook_tidy, layout = 'linear') + 
        geom_vline(xintercept = c(1770, 1870, 1970, 2000), colour="lightgray")+
        geom_edge_arc(aes(width=weight,edge_colour=Figure), alpha=.5) +
        scale_edge_colour_manual(values = mr3[c(1,3,5)], name = "Figure")+
        scale_edge_width(name="Number of Monuments")+
        scale_x_continuous(limits=c(1770, 2020), breaks = c(1770, 1870, 1970, 2000), position = "top") +
        cook_theme +
        labs(title="Visualising Cook Memorials",
             subtitle="Monuments dedicated to the voyage of the Endeavour from 1770 until today") 

ggarrange(cook_arc, cook_dot, heights = c(10, 4))





    
Kelly TallComment