https://christianknudsen.info/wp-admin/post.php?post=493&action=edit

Let’s be honest. I am easily distracted. While I was thinking about how to plot networks of coauthorships in Acta Chemica Scandinavica, I tinkered with getting data on my twitter-following.

Thats easy enough, but I thought it would be cool to map them. While googling that (I know. There are automated ways to do that, there are scripts I can just copy. Its not difficult. I just want to do it myself.) I stumbled across some neat heatmaps visualizing the distance to fastfood outlets in the US. That looked like fun.

So now, I’m hammering away at that at the moment….

What to map? I need coordinates. For some reason I thought of 7/11. Probably something with those US fastfood stores. Anyway, http://www.7-eleven.dk/find-butik/ has a map. And coordinates on the page!

Save, download. Delete everything that does not contain coordinates. Keep the parts with “value=[coordinates]”. And save the file. And then, some R. Set the working directory, readlines, use stringfunctions to extract the coordinates:

rm(list=ls())
setwd("~/Desktop/711")
koo <- readLines("711_coordinates.html")
koo_list <- c()
for(i in 1:length(koo)){
  res <- strsplit(koo[i], 'value=\"')
  res <- strsplit(res[[1]][2],'\">')
  res <- res[[1]][1]
  koo_list <- c(koo_list, res)
}

I have a nagging suspision that there are repeated coordinates. Lets get rid of those, and take a look:

koo <- unique(koo_list)
head(koo)
## [1] "55.68096,12.58000"   "55.681899,12.583777" "55.68216,12.57454"  
## [4] "55.67787,12.58015"   "55.68225,12.57044"   "55.68284,12.57098"

I need to get that into a dataframe:

koo_df <- data.frame(lat=character(), lng=character(), stringsAsFactors=FALSE)

for(i in 1:length(koo)){
  koordinat <- strsplit(koo[i],',')
  new_row <- c(as.numeric(koordinat[[1]][1]),as.numeric(koordinat[[1]][2]))
  koo_df <- rbind(koo_df, new_row, stringsAsFactors=FALSE)

}
colnames(koo_df) <- c("lat", "lng")
tail(koo_df)
##          lat      lng
## 181 57.05891  9.92778
## 182 57.15559  9.73841
## 183 57.45580 10.04221
## 184 57.45665  9.98596
## 185 57.44108 10.53995
## 186       NA       NA

Oh, and I should get rid of any missing values:

row.has.na <- apply(koo_df, 1, function(x){any(is.na(x))})
koo_df <- koo_df[!row.has.na,]

I need some libraries:

library("maps")
library("mapdata")
library('sp')
library('maptools')
## Checking rgeos availability: FALSE
##      Note: when rgeos is not available, polygon geometry     computations in maptools depend on gpclib,
##      which has a restricted licence. It is disabled by default;
##      to enable gpclib, type gpclibPermit()
library('spatstat')
## Loading required package: nlme
## Loading required package: rpart
## 
## spatstat 1.46-1       (nickname: 'Spoiler Alert') 
## For an introduction to spatstat, type 'beginner'
## 
## Note: spatstat version 1.46-1 is out of date by more than 11 weeks; a newer version should be available.

Maps helps drawing maps. Mapdata provides maps for the world. sp gives methods for handling spatial data. Lines and polygons eg. I’m going to need that. Maptools is another bunch of tols for handling maps. Spatstat is a set of tools for handling spatial statistical data. I only really need one of them (I think). But it is the most important.

Lets plot something:

kort <- map('worldHires', 'Denmark', col="white", fill=TRUE, bg=NA, 
    xlim=c(8,17), ylim=c(53,58), resolution=0)

Map takes a lot of different arguments. The first two: worldHires tells it that it should look to the worldHires map from mapdata. And that we should look specifically to the part called Denmark. col allows me to choose the color of the map – I’m going with black’n’white here. Fill – should the areas be filled with the color (yes, they should, for reasons I’ll get back to.) bg is the background color, and I can limit the part of the map I want to draw. Maps have coordinates as degrees as their natural units. Longitude 8 to 17, latitude 53 to 58 does nicely. The Faroe Islands are in the map (but for some reason not Greenland), I’m not going to bother with them. And Bornholm is going to be cut out later. Well, Bornholm is going to get cut now:

kort <- map('worldHires', 'Denmark', col="green", fill=TRUE, bg="blue", 
    xlim=c(8,13), ylim=c(54,58), resolution=0)

Resolution – there are two settings, 0 and 1. 0 gives the highest resolution.

There are some problems here. Smaller islands have disappered. The resolutions is not fantastic. But all that can be easily solved later. The hard part is the following.

Lets get back to the original map, and take a look:

kort <- map('worldHires', 'Denmark', col="white", fill=TRUE, bg=NA, 
    xlim=c(8,13), ylim=c(54,58), resolution=0)

str(kort)
## List of 4
##  $ x    : num [1:11164] 8.66 8.68 8.69 8.7 8.72 ...
##  $ y    : num [1:11164] 54.9 54.9 54.9 54.9 54.9 ...
##  $ range: num [1:4] 8.09 12.62 54.56 57.75
##  $ names: chr [1:10] "Denmark" "Denmark:Mors" "Denmark:Mon" "Denmark:Samso" ...
##  - attr(*, "class")= chr "map"

The kort-variable (kort because that is danish for map) is a list of four lists. x and y that defines the shapes. How that is actually done given that Denmark consist of 1? islands, I have no idea. There is a range. And there are 10 named areas in the object. All of them Denmark, but some of them islands like Mors, Møn etc.

Lets also plot the location of 7/11 stores in Denmark:

map(kort)
points(koo_df$lng, koo_df$lat)

We’ll get back to that.

What I want to plot is a distancemap. Different colors based on how far a given position.

The library spatstat has a methods/function that handles that. It’s called distmap(). It takes an object of class “ppp” – that can be made with the function ppp(), also from spatstat.

ppp() takes two vectors for x and y coordinates, a window, defining the, well, window, of the map. And that is of the class “owin”.

Therefore, I should begin by making the point pattern dataset::

A <- ppp(koo_df$lng,koo_df$lat,window=owin(xrange=c(7,13),yrange=c(53,58)))
plot(A)

It looks a bit skewed. But otherwise ok. Distmap() takes the ppp object. So its rather simple to calculate and plot:

z <- distmap(A)

plot(z)

Yeah. Very psychedelic. Looking at it, it appears that the scale is in degrees. The yellow colour should be “2.5”. That more or less matches 2½ degree, if you remember that the y-scale is 5 degrees high.

One can almost see something that looks like the outline of Denmark. It gets easier it you know that it is there.

What would be nice, would be to define a window matching Denmark. Make a cut-out of Denmark in the distancemap.

That can be done. Spatstat has a function as.owin.SpatialPolygons() that can take polygons, defined elsewhere, and convert them to a windows, that can be used in the ppp()-function. It takes an object of the type spatialpolygons. Where do we get that? maptools has a function that can convert a map to spatialpolygons: map2SpatialPolygons(). That function takes a map. And a list of the names saved in that map. We already have those. kort, and kort$names. So, first map2SpatialPolygons:

kort.poly <- map2SpatialPolygons(kort, IDs = kort$names)

Then as.owin.SpatialPolygons:

dk.owin <- as.owin.SpatialPolygons(kort.poly)

And now I have a nice cutout shapes like Denmark! I can even plot it!!

plot(dk.owin)

Neato! Now I just need to use the dk.owin window when defining the ppp object, rather than the rectangular window I used earlier:

A <- ppp(koo_df$lng,koo_df$lat,window=dk.owin)
## Warning in ppp(koo_df$lng, koo_df$lat, window = dk.owin): 20 points were
## rejected as lying outside the specified window

Run the distmap function:

dist <- distmap(A)

and plot it:

plot(dist)

Done! I do get a warning. 20 points are outside the window. Not that big a surprise, the map is not very precise. Whole islands are missing! But looking at the original map, and what I get when I plot the window, something is happening. And maybe I should take a closer look at what projection that is being used. After all, Denmark lies on a sphere. And the scale is annoying. I want it in kilometers, not degrees.

Average number of authors in Acta Chem. Scand.

We saw the rapid decline of german as a scientific language in the last installment. But what else can we learn? Given that I do not have access to the full text papers.

Well, what about the number of authors on a given paper?

I have a hypothesis. In the beginning of time, chemistry was a lonely science, where individual scientists worked and published alone.

As the years went by, chemistry became a more collaborative science, with more scientists working together, and also publishing together.

So – the average number of authors on a paper will rise, as a function of time.

There is only one way to see if I’m right. Continue reading “Average number of authors in Acta Chem. Scand.”

Acta network part 1

Lets get to work on the network. I don’t have any citation lists, so what I’m after here, is coauthorship. I’m going to draw on some unpublished work I did on Zika-virus research.

I can begin by disregarding all papers with only one author. I want to keep the publication year – as I want to animate the network.

As always, we’ll begin by reading the data:

data <- readRDS(file="d:\\acta\\consistentdata.rda")
head(data)

Continue reading “Acta network part 1”

Animated Acta Chem. Scand.

Okay. I would like to make some networks. I would like to animate them. And I’m quietly harvesting all the pdf’s in order to OCR them, and see what I can learn from that.

Before animating networks, it might be a good idea to animate something simpler.

We went on a visit to my sister in law, and on the rather long trip (by train), I had time to read up on network-graphs in R. Some of the last pages were about animation.

so, without further ado, let me introduce this new library:

library(animation)
## Warning: package 'animation' was built under R version 3.2.5

I’ll get back to that, lets begin by getting at the data:

Continue reading “Animated Acta Chem. Scand.”