Openstreetmap data – for Florence

Not that advanced, but I wanted to play around a bit with plotting the raw data from Openstreetmap.

We’re going to Florence this fall. It’s been five years since we last visited the fair city, that has played such an important role in western history.

Openstreetmaps is, as the name implies, open.

I’m going to need some libraries

#library(OpenStreetMap)
library(osmar)
library(ggplot2)
library(broom)
library(geosphere)
library(dplyr)

osmar provides functions to interact with Openstreetmap. ggplot2 is used for the plots, broom for making some objects tidy and dplyr for manipulating data.

Getting the raw data, requires me to define a boundary box, encompassing the part of Florence I would like to work with. Looking at https://www.openstreetmap.org/export#map=13/43.7715/11.2717, I choose these coordinates:

top <- 43.7770
bottom <- 43.7642
left <- 11.2443
right <- 11.2661

After that, I can define the bounding box, tell the osmar functions at what URL we can find the relevant API (this is just the default). And then I can retrieve the data via get_osm(). I immediately save it to disc. This takes some time to download, and there is no reason to do that more than once.

box <- corner_bbox(left, bottom, right, top)
src <- osmsource_api(url = "https://api.openstreetmap.org/api/0.6/")
florence <- get_osm(box, source=src)
saveRDS(florence, "florence.rda")

Lets begin by making a quick plot:

plot(florence, xlim=c(left,right),ylim=c(bottom,top) )

plot of chunk unnamed-chunk-44

Note that what we get a plot of, is, among other things, of all lines that are partly in the box. If a line extends beyond the box, we get it as well.

Looking at the data:

summary(florence$ways)
## osmar$ways object
## 6707 ways, 9689 tags, 59052 refs 
## 
## ..$attrs data.frame: 
##     id, visible, timestamp, version, changeset, user, uid 
## ..$tags data.frame: 
##     id, k, v 
## ..$refs data.frame: 
##     id, ref 
##  
## Key-Value contingency table:
##         Key         Value Freq
## 1  building           yes 4157
## 2    oneway           yes  456
## 3   highway    pedestrian  335
## 4   highway   residential  317
## 5   bicycle           yes  316
## 6       psv           yes  122
## 7   highway  unclassified  108
## 8   highway       footway  101
## 9   barrier          wall   98
## 10  surface paving_stones   87

I would like to plot the roads and buildings. For some reason there are a lot of highways, of a kind I would probably not call highways.

Anyway, lets make a list of tags. tags() finds the elements that have a key in the tag_list, way finds the lines that are represented by these elements, and find, finds the ID of the objects in “florence” matching this.
find_down() finds all the elements related to these id’s. And finally we take the subset of the large florence data-set, which have id’s matching the id’s we have in from before.

tag_list <- c("highway", "bicycle", "oneway", "building")
dat <- find(florence, way(tags(k %in% tag_list)))
dat <- find_down(florence, way(dat))
dat <- subset(florence, ids = dat)

Now, in a couple of lines, I’m gonna tidy the data. That removes the information of the type of line. As I would like to be able to color highways differently from buildings, I need to keep the information.
Saving the key-part of the tags, and the id:

types <- data.frame(dat$ways$tags$k, dat$ways$tags$id)
names(types) <- c("type", "id")

This gives me all the key-parts of all the tags. And I’m only interested in a subset of them:

types <- types %>% 
  filter(type %in% tag_list)

types$id <- as.character(types$id)

Next as_sp() converts the osmar object to a spatial object (just taking the lines):

dat <- as_sp(dat, "lines")

tidy (from the library broom), converts it to a tidy tibble

dat <- tidy(dat)

That tibble is missing the types – those are added.

new_df <- left_join(dat, types, by="id")

And now we can plot:

new_df %>% 
  ggplot(aes(x=long, y=lat, group=group)) +
  geom_path(aes(color=type)) +
  scale_color_brewer() +
    xlim(left,right) +
  ylim(bottom,top) +
  theme_void() +
theme(legend.position="none")

plot of chunk unnamed-chunk-52

Nice.

Whats next? Someting like what is on this page: https://github.com/ropensci/osmplotr

Project Euler 187

Project Euler 187. Semiprimes

I had to look it up. Semiprimes are numbers that are the product of two prime numbers. And only two, although they may be equal.

There are ten of them below 30: 4, 6, 9, 10, 14, 15, 21, 22, 25 and 26.

16 is not. The only primefactor is 2, but it occurs four times.

How many of these semiprimes are there below 108?

That should be pretty straightforward: Generate all primes below 108, make all the multiplications, and count how many uniqe numbers there are below n, where n=108.

One problem:

n <- 10**8
numbers <- primes(n)
length(numbers)
## [1] 5761455

That is a lot of numbers to multiply.

A trick: 2 times all the primes below n/2 will give all the semiprimes that have 2 as one of the primefactors (smaller than n).

3 times all the primes below n/3 will in the same way give all the semiprimes, that have 3 as one of the primefactors.

If I can figure out how many primes there are below n/2, I get the number of semiprimes that has 2 as one of the two primefactors. The same for the number of primes below n/3. If continue that to \(\sqrt(n)\), and add them all together, I should get the total number of semiprimes below n.

One issue though. The first prime below n/2 that I multiply by 2, is 3. And the first prime below n/3 that I multiply by 3 is 2. Both giving 6. I need to figure out how to only count 6 one time.

I just generated all the primes below n. The number of primes below n/2 is:

length(numbers[numbers<n/2])
## [1] 3001134

And the number of primes below n/3 is:

length(numbers[numbers<n/3])
## [1] 2050943

I do want to multiply 3 by 3. But I need to exclude 2.

length(numbers[numbers<n/3 & numbers>=3])
## [1] 2050942

Qap’la! One less prime.

I just need to do it for all of them.

n_semi_primes <- function(x){
  counter <- 0
  for(i in numbers[numbers<=sqrt(x)]){
    counter <- counter + length(numbers[numbers<x/i & numbers>=i])
  }
  return(counter)
}

I’m writing this as a function, taking a limit x. A counter is set to 0. And for all primes i less than \(\sqrt(n)\), I add the number of primes between i and < x/i.

I can test it on the example given:

n_semi_primes(30)
## [1] 10

That was the number of semiprimes below 30. And then it is just a question of running it on 108:

answer <- n_semi_primes(10**8)

Project Euler problem 62

Euler problem 62

The cube, 41063625 (3453), can be permuted to produce two other cubes: 56623104 (3843) and 66430125 (4053). In fact, 41063625 is the smallest cube which has exactly three permutations of its digits which are also cube.

Find the smallest cube for which exactly five permutations of its digits are cube.

Alright. I need to find five cubes, that are permutations of the same digits.

How to check if two numbers are permutations of each other?

We can generate the largest permutation of a given number. If the largest permutation of two numbers are identical, the two numbers are permutations of each other.

So I need a function, that returns the largest permutation of a number. It would be nice, if that function was vectorized.

max_perm <- function(t){
  require(magrittr)
  options(scipen=5)
  t %>% 
    as.character() %>% 
    str_split("") %>% 
    lapply(sort, decreasing=TRUE) %>% 
    lapply(paste0, collapse="") %>% 
    unlist() %>% 
    as.double()
}

Convert the input to character. Split at “”. That returns a list with vectors containing the individual digits of the input. lapply sorts the individual vectors in the list in decreasing order. Then lapply pastes the elements in each vector together with paste0 and “” as the separator. Then it is unlisted, and returned as numeric.

What is worth noting is a thing I was struggling with for far too long. R likes to write numbers in scientific notation. As in “1e+06”. I have not studied the phenomenon in detail. But options(scipen=5) solves the problem. It is the “penalty” used to decide when a number should be written in scientific notation. Unless I change that (trial and error, but it should be larger than whatever is default), as.character(1000000) will return “1e+06”. And the permutations of “1” “e” “+” “0” “6” are not terribly useful in this context.

I’m hazarding a guess that I don’t need to handle cubes of values of more than four digits.

Beginning with a vector of all numbers from 1 to 9999, I convert it to a dataframe. I transmute the first column to a column with the name x.
Then I mutate a second column, cube, into existence, and calculate it as the cube of the x-value. A third column, max_cube, is mutated with the result from my max_perm function above. And tha column is immediately used to group the data, so I get date grouped by identical maximum values of the permutations. I filter on the count of those groups, and only keep the groups that contain 5 elements. Then I ungroup it, and select just the cube column.

I now have a data frame with a single column containing 10 values. They are all cubes, five of them are permutations of each other. The other five are also permutaions of each other. And now I just have to take the smallest of them.

result <- 1:9999 %>% 
  as.double() %>% 
  as.data.frame() %>% 
  transmute(., x = .) %>% 
  mutate(cube = x**3) %>% 
  mutate(max_cube = max_perm(cube)) %>% 
  group_by(max_cube) %>% 
  filter(n()==5) %>% 
  ungroup() %>% 
  select(cube) %>% 
  min()

Before I print the result, so I can enter it into Project Euler, I set options(digits=17).

Done! A good exercise. And a valuable lesson in the importance of the options in R.