Those civil war monuments in the US

I once saw a video suggesting that the US should choose Canada as their next president. They would solve the race-problem. As soon as they figured out why there still was a race-problem. It’s one of those things that are pretty hard to understand for europeans. And apparently for canadians as well.

Anyway, there is a problem, and lately it’s been manifesting in protests against monuments celebrating confederate war-“heroes”. Thats another thing that is difficult to understand from across the atlantic. Those monuments must certainly have been standing there for a very long time. After all, the war ended in 1865. They may be celebrating the losers, but seriously, what’s the problem? Well… Actually there is a problem. Most of the monuments were not erected to commemorate fallen soldiers just after the war. They were erected to show those n******, who was still in charge when they started to organize for rights. And again when those rights were granted (to some degree). There’s a lot more information in this report. And suddenly even europeans begin to understand what all the fuss is about. But still. Seriously? Get over it, you lost, be nice, and treat people like, you know, Jesus said you should.

Okay – that was the introduction. And now for the data. Somewhere in there, there is an interesting datavisualization. The data is here:

https://data.world/datadanlarson/confederatemonument/workspace/file?filename=CivilWarMamorials.csv

What do I want to do with it? I want an animated GIF. Each frame in that gif should represent a year, and show a map of the US, with all the monuments erected up to and including that year. Preferably it should be noticable what monuments where erected that year. A graph showing the development of the number of monuments would be nice as well.

Okay, lets get coding. The data comes from data.world, and they have their own R-package:

  devtools::install_github("datadotworld/data.world-r", build_vignettes = TRUE, force = TRUE)

In order to get to the data, You will need a token. There is excellent help to get at data.world.

data.world::set_config(data.world::save_config(auth_token = "redacted"))

It is a horrible token. Anyway, lets plod on. Load their library, and get the data

library(data.world)
dataset_key <- "https://data.world/datadanlarson/confederatemonument"
tables_qry <- data.world::qry_sql("SELECT * FROM Tables")
tables_df <- data.world::query(tables_qry, dataset = dataset_key)
sample_qry <- data.world::qry_sql(sprintf("SELECT * FROM `%s`", tables_df$tableName[[1]]))
sample_df <- data.world::query(sample_qry, dataset = dataset_key)

This is a simple copy-paste from the examples at data.world. The net result is, that we have all the data in the dataframe sample_df.

We are interested in the year a monument was erected. Not all years are known. And this is of course a serious flaw in the following visualization. Patterns may not be what they appear, when approx. half the data is missing.

Anyway, lets get rid of the rows with missing data:

newdata <- sample_df[complete.cases(sample_df),]

complete.cases returns a logical vector. True if the row is free from NAs, eg. is complete, False if there is an NA in it. newdata is now a dataframe with only the complete cases.

We need coordinates. Google is our friend, however, I’m not going to run that geocoding again. Google allows 2500 calls to their API every day from a given IP-number. It is easy to run out of calls.

library(ggmap)
newerdata <- data.frame(city=character(), state = character(), year = numeric(), status = character(), lat = numeric(), lon = numeric(), stringsAsFactors = FALSE)
for (row in 1:nrow(newdata)){
result <- geocode(paste(newdata[row,2]$city, newdata[row,1]$state, sep=", "), output="latlon", source="google")

    newerdata[nrow(newerdata) + 1,] = c(newdata[row,2]$city,newdata[row,1]$state, as.numeric(newdata[row,5]$year), newdata[row,6]$civilwarstatus, result$lat, result$lon)
  
}
save(newerdata, file="newestdata.rda")

What happens? We call ggmap, which provides the function geocode. A new dataframe, newerdata, is defined. For each row in newdata, I call geogode on the city and state, separated with a “,”, and saves the result in “result”. And then I add the rows I want to the newerdata dataframe. There is probably a better way to do that. But it works. Here there is another thing that should probably be handled. Sometimes Google is not able to determine exactly what location we give it. There may be more than one place in a US state with the same name. I’m just taking the first result google gives me. A bit sloppy, I know. make some plots.

Finally I save the data. Now we should be ready to We are going to need some libraries for that:

library(grid)
library(gganimate)
library(animation)
library(ggplot2)
require(cowplot)

I’ll just make absolutely sure that the data is in the form I need it to be:

load("newestdata.rda")
newestdata <- newestdata[complete.cases(newestdata),]
newestdata <- na.omit(newestdata)
newestdata$year <- as.numeric(newestdata$year)
newestdata$lat <- as.numeric(newestdata$lat)
newestdata$lon <- as.numeric(newestdata$lon)

The coordinates and the year should be numeric. Any rows with missing values should be gone forever.

I get a map to plot on:

us <- get_map("USA", zoom = 4)

And, lets make the first plot:

g <-ggmap(us) +
  geom_point(aes(x=lon, y=lat), data=newestdata, color="red")
g

Theres a lot of red there. I would like to make an animation. The standard way to do it here, or at least the way I usually do it, would be to define a function, that plots what I want to plot as a function of the year. Like this:

singleyear <- function(ye){
  g <-ggmap(us) +
  geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year<(ye)),], color="red")
 g
}

Now, when I make this function call, I should get a map with all the monuments erected before 1890:

singleyear(1890)

I’m missing the monuments erected in 1890. That is because I would like those to be a different color. Lets add to the function, and call it again:

 

 

 

singleyear <- function(ye){
  g <-ggmap(us) +
  geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year<ye),], color="red")+
  geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year==ye),], color="blue")
 g
}
singleyear(1890)

Now, when I plot the map for a given year, all monuments from that year will appear as blue. And for the next year they’ll turn red.

 

 

Lets back up a bit here.

What I’m doing is this: I take the map I retrieved from Google, and plots it with the ggmap function. Then I add points with the geom_point function. I tell the function that the data is “newestdata”, and that the points should be placed at position x,y where x is the longitude, and y the latitude. The color should be red. I am however also telling that the data should be the part of newestdata, where year is smaller than the ye-value I provide to the function. In the next line, I add the same point, just for the part of the data where year is equal to the ye-value i provide to the function. And that the color should be blue.

What I am going to do later, is to call this function for all years from 1860 to 2017, and show those plots one after another. We will get at small movie, where new monuments turn up as blue spots. And then turn red in the next frame.

Everything becomes very red. One way to do something about this, would be to make the dots transparent. Let them fade out. When the monument is just erected, let the dot be blue. The year after, let it be a transparent red dot, the year after that, make it more transparent. Areas with a lot of monuments will still be pretty red, but the new monuments will be more visible. We’ll get at sort of heatmap, where the very red areas are places with at lot of monuments, and the not so red areas have fewer.

I’ll need to have a maximum and a minimum for the transparency. Defined as af function of the year I am plotting, and the year a monument was erected. I’m gonna add it to the dataframe before I plot it. This is the line:

maxp <- 0.8
minp <- 0.3
newestdata$alpha <- (maxp - minp)/(ye-1861)*(as.numeric(newestdata$year)-ye)+maxp

A new column, alpha, is added to the dataframe, and the difference between the year I am plotting, and the year of the row is normalised to the range 0.3 to 0.8. The point begins as blue, turns red with a transparency of 0.8, and will fade to 0.3.

I’m not sure the range is perfect. I’m gonna go with it for now.

Lets add it to the function:

maxp <- 0.8
minp <- 0.3
singleyear <- function(ye){
  newestdata$alpha <- (maxp - minp)/(ye-1861)*(as.numeric(newestdata$year)-ye)+maxp
  g <-ggmap(us) +
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year<ye),], alpha=newestdata[which(newestdata$year<ye),]$alpha, color="red")+ 
  geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year==ye),], alpha=1, color="blue")
 g
}
singleyear(1890)

 

What I also added, was this: “alpha=newestdata[which(newestdata$year<(aar)),]$alpha”. Alpha is the transparency. So each point is plottet with the transparency I calculated.

What next? Lets adjust the plot a bit:

maxp <- 0.8
minp <- 0.3
singleyear <- function(ye){
  newestdata$alpha <- (maxp - minp)/(ye-1861)*(as.numeric(newestdata$year)-ye)+maxp
  g <-ggmap(us) +
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year<ye),], alpha=newestdata[which(newestdata$year<ye),]$alpha, color="red")+ 
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year==ye),], alpha=1, color="blue") +
    theme(plot.title=element_text(hjust=0)) +
    theme(axis.ticks = element_blank(),
        axis.text = element_blank(),
        panel.border = element_blank()) +
    labs(title="Confederate memorials", subtitle=ye, x=" ", y="")
 g
}
singleyear(1890)

 

Labels are left aligned (hjust=0), tickmarks and text is removed (all the element_blank parts). And a title “Confederate memorials” is added, with a subtitle indicating the year we have reached in the plot.

Nice. What more? The interesting part, at least I think it is interesting, is the fact that the number of these memorials increase at certain times. That can be seen on the map. But a line-plot would probably make it more visible. It would also be nice to have it side-by-side with the map. How to do that?

To begin with, I’ll need a set of data with the cumulative count of monuments:

linedata <- as.data.frame(table(newestdata$year), stringsAsFactors = FALSE)

colnames(linedata) <- c("year", "cumsum")
linedata$year <- as.numeric(linedata$year)
linedata$cumsum <- cumsum(linedata$cumsum)

I make a new dataframe, linedata. The content is table(newestdata$year). The table function summarizes the data in newestdata. It gives me a table with all the years, and the number those years occur. Eg. that there are 7 occurences of the year 1870. That corresponds to 7 monuments erected in 1870. I save that as a dataframe. Then I change the names of the colums, and make sure that the years are saved as numeric. And then I call cumsum. That is a function that calculates the cumulative sum. Ie in 1861 2 monuments where erected. None where erected before that. The cumulative sum of all monuments until 1861 was 2. In 1862 3 monuments were erected. The cumulative sum for 1862 is 5. The sum of the 3 monuments erected that year, and all the monuments erected before that.

That in itself is an interesting plot:

ye = 2017
h <- ggplot(linedata[which(linedata$year<=ye),]) +
  geom_line(aes(x=year, y=cumsum))
h

 

Something happens in 1910 and again in 1950. Or somewhere close to those years. At least that is where the graph changes shape.

Lets adjust it a bit:

ye=2017
h <- ggplot(linedata[which(linedata$year<=ye),]) +
  geom_line(aes(x=year, y=cumsum))+
  xlim(1860,2017) +
  ylim(0,850) +
  ylab("Number of confederate memorials") +
  xlab("")
h

Nothing fancy, just freezing the axes, adding a label for the y-axis, and removing it from the x-axis.

Now I can add it to the original plot. I loaded the library cowplot earlier. That gives us the function plot_grid.

 

i <- plot_grid(g,h,align='h')
i

I have two plots, g and h, and combine them in a newplot, horizontally (thats the h in align), to a new plot i. And then I plot i.

 

 

Lets add that to the function:

singleyear <- function(ye){
  newestdata$alpha <- (maxp - minp)/(ye-1861)*(as.numeric(newestdata$year)-ye)+maxp
  g <-ggmap(us) +
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year<ye),], alpha=newestdata[which(newestdata$year<ye),]$alpha, color="red")+ 
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year==ye),], alpha=1, color="blue") +
    theme(plot.title=element_text(hjust=0)) +
    theme(axis.ticks = element_blank(),
        axis.text = element_blank(),
        panel.border = element_blank()) +
    labs(title="Confederate memorials", subtitle=ye, x=" ", y="")
  h <- ggplot(linedata[which(linedata$year<=ye),]) +
    geom_line(aes(x=year, y=cumsum))+
    xlim(1860,2017) +
    ylim(0,850) +
    ylab("Number of confederate memorials") +
    xlab("")
  i <- plot_grid(g,h,align='h')
  i
}
singleyear(1890)

Now I’m getting there! Almost ready for the animation. I would like some annotation on the h-plot. A couple of arrows. And I would like them to show up in the animation. As in, from year 1910 there should be text at a certain place. But not before. It’s not that difficult.

If I add this:

if(ye>1908){

h <- h + annotate(“text”, x =1980, y = 200, label=“NAACP established”) + geom_segment(aes(x=1950, y = 200, xend=1909, yend = 200), size=1, arrow=arrow(length=unit(0.5, “cm”))) }

to the function, every plot, for a year after 1908, will have an annotation on the h-plot, at position 1980,200, with the text “NAACP established”. The next line, geom_segment, will add an arrow, with a size defined by the size and length parameters, beginning at 1950,200 and ending at 1909,200.

It took quite a bit of time to fiddle with those positions!

That is not the only annotation I want. So the complete function is as follows:

singleyear <- function(ye){
  newestdata$alpha <- (maxp - minp)/(ye-1861)*(as.numeric(newestdata$year)-ye)+maxp
  g <-ggmap(us) +
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year<ye),], alpha=newestdata[which(newestdata$year<ye),]$alpha, color="red")+ 
    geom_point(aes(x=lon, y=lat), data=newestdata[which(newestdata$year==ye),], alpha=1, color="blue") +
    theme(plot.title=element_text(hjust=0)) +
    theme(axis.ticks = element_blank(),
        axis.text = element_blank(),
        panel.border = element_blank()) +
    labs(title="Confederate memorials", subtitle=ye, x=" ", y="")
  h <- ggplot(linedata[which(linedata$year<=ye),]) +
    geom_line(aes(x=year, y=cumsum))+
    xlim(1860,2017) +
    ylim(0,850) +
    ylab("Number of confederate memorials") +
    xlab("")
  
  if(ye>1908){
    h <- h + annotate("text", x =1980, y = 200, label="NAACP established") +
       geom_segment(aes(x=1950, y = 200, xend=1909, yend = 200), size=1, arrow=arrow(length=unit(0.5, "cm")))
  }

  if(ye>1910){
    h <- h + annotate("text", x = 1870, y = 500, label="Coincidence?") +
      geom_segment(aes(x=1870, y=490, xend=1905, yend=240), size=1, arrow=arrow(length=unit(0.5, "cm")))
  }

  if(ye>1955){
    h <- h + annotate("text", x =1960, y = 400, label="Beginning of civil rights movement") +
      geom_segment(aes(x=1950, y = 450, xend=1955, yend = 640), size=1, arrow=arrow(length=unit(0.5, "cm")))
  }

  if(ye>1960){
    h <- h + annotate("text", x= 1866, y=515, label="Another") +
      geom_segment(aes(x=1882, y=500, xend=1955, yend=680), size=1, arrow=arrow(length=unit(0.5, "cm")))
  }

  i <- plot_grid(g,h,align='h')
  print(i)
}
singleyear(2017)
## Warning: Removed 1 rows containing missing values (geom_point).

OK. It looks like hell. But! In just a moment, it will look. Well, not exactly perfect, but at least much better.

The way to animate is this: Define a function that gives you the frames you want, as a function of an iterable. That’s already done. Call this:

saveGIF({
  for (ye in 1860:2017){
   singleyear(ye)
  }
}
, interval=0.3, ani.width=1280, ani.height=720)

Enjoy. Oh, and note, that it will crash if you try to animate it a notebook.

Hvor blev jeg af?

Tja. Godt spørgsmål. Der har været meget stille her i et stykke tid.

Det har der været før. Det bliver der sikkert igen. Denne gang har det nok været som reaktion på ændringer på min arbejdsplads. Der jo nok, hvis vi virkelig skal kigge dybt i sjælen, udløste noget der kunne ligne en depression. Ikke noget alvorligt. Afgjort ikke nok til at udløse piller. Men bare en længere periode med – “Fuck. Var det det?” tanker. Og ikke specielt systematiske overvejelser om hvad katten jeg så skulle bruge tiden på.

Det er jeg så nogenlunde ude af. Der begynder at være fod på tingene på hjemmefronten. Arbejdet er, nåja, arbejde. Det virker ikke så håbløst som det har gjort. Og jeg er nok ved at vænne mig til, at identitet er noget der skal findes andre steder end på jobbet. Ligesom anerkendelse og respekt.

Så med lidt held kommer der til at ske lidt mere her. Der er nørdeprojekter undervejs, faktisk godt i gang. Der er køkkenteknikker der skal afprøves. Der er politiske kæpheste der skal luftes. Og så er det fredag, og lige om lidt skal jeg i biografen og se præsentationen af noget et af de projekter jeg har gang i har kastet af sig.