Okay. I would like to make some networks. I would like to animate them. And I’m quietly harvesting all the pdf’s in order to OCR them, and see what I can learn from that.
Before animating networks, it might be a good idea to animate something simpler.
We went on a visit to my sister in law, and on the rather long trip (by train), I had time to read up on network-graphs in R. Some of the last pages were about animation.
so, without further ado, let me introduce this new library:
library(animation)
## Warning: package 'animation' was built under R version 3.2.5
I’ll get back to that, lets begin by getting at the data:
data <- readRDS(file="d:\\acta\\consistentdata.rda")
Step one is getting the number of authors. I really should have saved that in the file. Maybe I did, and forgot about it. Anyway, as I now know how to do it, it is quite simple:
for(i in 1:nrow(data)){
data$number[i] <- length(unlist(strsplit(data$authors[i],";")))
}
numbers <- data$number
year <- data$year
numbermat <- data.frame(year, numbers, stringsAsFactors = FALSE)
If You have read the previous instalments, You will recognize this. I have a new dataframe, numbermat, with just to columns: the year, and the number of authors. One row for each paper. That is what I need here.
What I will try to do, is making an animated GIF, or something similar, that shows one histogram of the frequency of the number of authors on papers from each year.
Histograms are simple enough, if you know how. hist(x) gives a simple histogram. Like this:
h <- hist(numbermat$numbers)
h
## $breaks
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
##
## $counts
## [1] 5657 6242 2862 1081 397 155 51 27 8 9 4 0 0 0
## [15] 0 0 0 0 1
##
## $density
## [1] 3.429732e-01 3.784406e-01 1.735176e-01 6.553898e-02 2.406936e-02
## [6] 9.397357e-03 3.092033e-03 1.636959e-03 4.850249e-04 5.456530e-04
## [11] 2.425124e-04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [16] 0.000000e+00 0.000000e+00 0.000000e+00 6.062811e-05
##
## $mids
## [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5
## [15] 14.5 15.5 16.5 17.5 18.5
##
## $xname
## [1] "numbermat$numbers"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
It basically gives me all the usefull information. And a plot. That is actually annoying. Later, I’m gonna want to create the histogram-object, without plotting it. I’ll get back to that. But it is simple to avoid:
h <- hist(numbermat$numbers, plot=FALSE)
There, no plot. hist(numbermat\(numbers[numbermat\)year == i]) Back to the content of that object. There are some breaks. That can be one of several things. But what I need is a vector, giving the breakpoints between the histograms of the cell. The number of authors ranges between 0 and 19 (included).
Counts gives me the counts of each value. Theres something fishy here. I would expect the first count to be 1. There is one paper with 0 authors. I know that. It was published in 1957.
And I can verify it:
numbermat$year[numbermat$numbers==0]
## [1] "1957"
length(numbermat$year[numbermat$numbers==0])
## [1] 1
Strange. Any other problems?
length(numbermat$year[numbermat$numbers==1])
## [1] 5656
Oh yeah, there is a problem. There are 5656 papers with one author. But the hist-function finds 5657 papers.
Let me google that.
Yep. Hist, as default, works with right-closed intervals. As in: [0:1]. If I set it to false, the right interval will be open, as in [0:1[. Lets try:
h <- hist(numbermat$numbers, plot=FALSE, right=FALSE)
h
## $breaks
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
##
## $counts
## [1] 1 5656 6242 2862 1081 397 155 51 27 8 9 4 0 0
## [15] 0 0 0 0 1
##
## $density
## [1] 6.062811e-05 3.429126e-01 3.784406e-01 1.735176e-01 6.553898e-02
## [6] 2.406936e-02 9.397357e-03 3.092033e-03 1.636959e-03 4.850249e-04
## [11] 5.456530e-04 2.425124e-04 0.000000e+00 0.000000e+00 0.000000e+00
## [16] 0.000000e+00 0.000000e+00 0.000000e+00 6.062811e-05
##
## $mids
## [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5
## [15] 14.5 15.5 16.5 17.5 18.5
##
## $xname
## [1] "numbermat$numbers"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
Much better. Not that it makes that much of a difference:
plot(h)
There is one paper with zero authors. It absolutely drowns in comparison with the 5656 papers with one author.
Lets take a quick look at two plots:
par(mfrow=c(1,2))
hist(numbermat$numbers[numbermat$year==1947], right=FALSE)
hist(numbermat$numbers[numbermat$year==1999], right=FALSE)
I don’t think I’ve introduced par(mfrow=c(1,2)) before. It sets graphical parameters in R, and in this case, I’m telling R, that the plots following this command, should be drawn as 1 row, and 2 colums (thats the c(1,2) part)
I think it might be a bad way to do it, but it works.
The point is, that something happens from 1947 to 1999. And I would like to animate that.
And now we begin to touch on the “animate” library. It is not that complicated. The idea is that I make a plot for each year, and collect them into an animation by showing one plot after the other.
But that of course means, that I need to make sure, that only the parts that should be animated, actually get animated. The axes should be identical to begin with.
Lets work with that, and lets work with the years 1957 and 1989. 1957 has a paper with 0 authors, and 1989 has one with 19 authors. That gives me the extremes that I need to consider:
h1 <- hist(numbermat$numbers[numbermat$year==1957], right=FALSE, plot=FALSE)
h2 <- hist(numbermat$numbers[numbermat$year==1989], right=FALSE, plot=FALSE)
plot(h1)
plot(h2)
I had an “xname” in the histogram objects. And I can control the main title of the plots directly in the plot command. And while I’m at it, I’ll parametize the functions. Or whatever it’s called.
But histogram is actually not nessecarily a good way to plot it. There is another way to plot, barplot.
Barplot takes a vector (height), with the values, and plot it. The easy way to get that is to make a factorized vector, divide it by the number of papers, and multiply with 100 to get percentages. That allows me to control the levels easily. That would be difficult with histogram, where I would have to add the missing numbers (eg 19 authors) to all the histograms. And correct by the fact, that most years would have no data for that.
So:
barplot(height = table(factor(numbermat$numbers[numbermat$year==1957], levels=0:19))/length(numbermat$numbers[numbermat$year==1957])*100,
ylab = "%",
xlab = "number of authors",
main = 1957)
barplot(height = table(factor(numbermat$numbers[numbermat$year==1989], levels=0:19))/length(numbermat$numbers[numbermat$year==1989])*100,
ylab = "%",
xlab = "number of authors",
main = 1989)
Lets parametize (or what ever its called), pick out some of the parameters, and make a couple of other adjustments:
ylab <- "%"
xlab <- "Number of authors on papers in Acta Chem. Scand."
ylim <- c(0,100)
color <- rainbow(10)
i <- 1957
barplot(height = table(factor(numbermat$numbers[numbermat$year==i], levels=0:19))/length(numbermat$numbers[numbermat$year==i])*100,
ylab = ylab,
xlab = xlab,
ylim = ylim,
col = color,
main = i)
i <- 1989
barplot(height = table(factor(numbermat$numbers[numbermat$year==i], levels=0:19))/length(numbermat$numbers[numbermat$year==i])*100,
ylab = ylab,
xlab = xlab,
ylim = ylim,
col = color,
main = i)
The y-axis now goes from 0 to 100. I can change the different parameters one place. And I’ve added some color. I should, more or less, be ready to animate. But first:
par(mfrow=c(1,1))
The idea is to generate all the plots, inside the “saveGIF” function. And it’ll handle the rest. saveHTML also works. It just generates imagefiles of all the plots, and dress it up in a webpage, showing the plots one after the other, giving the illusion of motion. I would like a GIF instead.
saveGIF({
for(i in 1947:1999){
barplot(height = table(factor(numbermat$numbers[numbermat$year==i], levels=0:19))/length(numbermat$numbers[numbermat$year==i])*100,
ylab = ylab,
xlab = xlab,
ylim = ylim,
col = color,
main = i)
}
},movie.name="animation1.gif",interval=0.2)
## Executing:
## ""convert" -loop 0 -delay 20 Rplot1.png Rplot2.png Rplot3.png
## Rplot4.png Rplot5.png Rplot6.png Rplot7.png Rplot8.png
## Rplot9.png Rplot10.png Rplot11.png Rplot12.png Rplot13.png
## Rplot14.png Rplot15.png Rplot16.png Rplot17.png Rplot18.png
## Rplot19.png Rplot20.png Rplot21.png Rplot22.png Rplot23.png
## Rplot24.png Rplot25.png Rplot26.png Rplot27.png Rplot28.png
## Rplot29.png Rplot30.png Rplot31.png Rplot32.png Rplot33.png
## Rplot34.png Rplot35.png Rplot36.png Rplot37.png Rplot38.png
## Rplot39.png Rplot40.png Rplot41.png Rplot42.png Rplot43.png
## Rplot44.png Rplot45.png Rplot46.png Rplot47.png Rplot48.png
## Rplot49.png Rplot50.png Rplot51.png Rplot52.png Rplot53.png
## "animation1.gif""
## Output at: animation1.gif
## [1] TRUE
And here it is:
One detail. Actually two. You need to have Imagemagick installed (http://www.imagemagick.org). But not only that, You need to have the legacy functions installed. There is a small tool, convert.exe in the imagemagick family, that gives access to all (mostly) of the extremely advanced functions in Imagemagick through the commandline. And that is there the magick happens. Convert.exe stiches together all the individual plots to one animated gif.
Just check the box legacy tools when installing Imagemagick.
Also, it took me some time to figure out that the name of the output file should be placed where it should.
Oh, and a WordPress Top-tip: When inserting animated GIFs, insert full size. WordPress makes a couple of different sized images, and thats annoying.
Taking a closer look, I can see that it can be tweaked a bit. The percentage is never over 60%. So, lets try again:
ylim <- c(0,60)
saveGIF({
for(i in 1947:1999){
barplot(height = table(factor(numbermat$numbers[numbermat$year==i], levels=0:19))/length(numbermat$numbers[numbermat$year==i])*100,
ylab = ylab,
xlab = xlab,
ylim = ylim,
col = color,
main = i)
}
},movie.name="animation2.gif",interval=0.2)
## Executing:
## ""convert" -loop 0 -delay 20 Rplot1.png Rplot2.png Rplot3.png
## Rplot4.png Rplot5.png Rplot6.png Rplot7.png Rplot8.png
## Rplot9.png Rplot10.png Rplot11.png Rplot12.png Rplot13.png
## Rplot14.png Rplot15.png Rplot16.png Rplot17.png Rplot18.png
## Rplot19.png Rplot20.png Rplot21.png Rplot22.png Rplot23.png
## Rplot24.png Rplot25.png Rplot26.png Rplot27.png Rplot28.png
## Rplot29.png Rplot30.png Rplot31.png Rplot32.png Rplot33.png
## Rplot34.png Rplot35.png Rplot36.png Rplot37.png Rplot38.png
## Rplot39.png Rplot40.png Rplot41.png Rplot42.png Rplot43.png
## Rplot44.png Rplot45.png Rplot46.png Rplot47.png Rplot48.png
## Rplot49.png Rplot50.png Rplot51.png Rplot52.png Rplot53.png
## "animation2.gif""
## Output at: animation2.gif
## [1] TRUE
Nice! Now I know how to make animated GIFs. On to networks.