An engineers’ dream come true

Project Euler, problem 263 – An engineers’ dream come true

This is, at the time of coding, the highest numbered Project Euler problem I’ve tried to tackle. With a difficulty rating of 75% it is also the most difficult. At least on paper. But An engineers’ dream come true? How can I not, as an engineer, not try to solve it?

We har looking for an n, for which it holds that:

  • n-9 and n-3 must be consecutive primes
  • n-3 and n+3 must also be consecutive primes
  • n+3 and n+9 must also be consecutive primes.

These are primepairs that are “sexy”, that is that have differences of 6.

Also, n, n-8, n-4, n+4 and n+8 must be practical numbers, that is numbers where the numbes 1:n can be written as sums of distinct divisors of n.

So if a number n gives sexy prime pairs, and are very practical – that is what an engineer dreams of – hence the paradise.

The fastest way seems to be to generate a list of primes, sieve those out that conforms to the requirements for consecutive primes, and then test those numbers for practicality.

Lets get started!

The trusty numbers library provides the primes, up to 1000000. Then for each of those primes, return the n-value, if the differences between the sets of primes, are 6.

library(numbers)

primenumbers <- Primes(1000000)
primecandidates <- numeric()
for(i in 1:(length(primenumbers)-4)){
if((primenumbers[i+1] - primenumbers[i] == 6) & (primenumbers[i+2] - primenumbers[i+1]== 6) & (primenumbers[i+3] - primenumbers[i+2]== 6)){
  primcandidates <- c(primecandidates,(primenumbers[i]+9))
  }
}

What we get from this, is not primenumbers, but values of n, that gives the consecutive primes we need.

Now I have a list of candidate n’s based on the requirements for primes. Next I need to check for practicality.

First I tried a naive way. Generate the list of numbers that I need to sum to using distinct divisors, for a given x.
Then get the divisors of x. I dont need to check if I can sum to the numbers that are themselves divisors, so excluding them leaves me with at slightly smaller set. Then I get all the ways I can take two divisors from the set of divisors. Sum them, and exclude them from the list of numbers. I continue until I have taken all the possible combinations of 2, 3, 4 etc divisors, based on how many there are. If there are no numbers left in the vector of numbers that I need to be able to sum to, I was able to express all those numbers as sums of distinct divisors. And then x was a practical number.

practical <- function(x){
  test <- 1:x
  divs <- divisors(x)
  test <- setdiff(test,divs)
  for(i in 2:length(divs)){
    test <- setdiff(test,combn(divs,i,sum))
  }
  !as.logical(length(test))
}

Two problems. One that can be fixed. I continue to generate combinations of divisors and summing them, even if I have already found ways to sum all the numbers. The second problem is more serious. When I have to test a number with really many divisors – it takes a long time. Also, handling a vector containing all numbers 1:1000000 takes a lot of time.

I need a faster way of checking for practicality.

Wikipedia to the rescue. There exists a way of checking. I have no idea why it works. But it does.

For a number x, larger than 1, and where the first primefactor is 2. All primefactors are ordered. Taking each primefactor, that has to be smaller than or equal to the sum of the divisors of the product of all the smaller primefactors. Plus one. Oh! And that sum – if 3 is a primefactor twice, that is if 32 is a factor, I should square 3 in the product.

That sounds simple.

For a number x, get the primefactors. Use table to get the counts of the primefactors, ie that 3 is there twice. Those are the values of the result from the table function. The names of the table function are the primefactors.

For each factor from number 2 to the end of the number of factors, get the names of the primefactors from number 1 to just before that factor we are looking at (as numeric). Exponentiate with the values from the table – that is how many times a primefactor is a primefactor. Generate the product, get the divisors of that product, sum them, and add 1. If the factor we were looking at is larger that that, we have determined that x is not practical – and can return FALSE. If x gets through that process, it is practial.

I need to handle the case where there is only one primefactor – 2. Those numbers are practial, but the way I have done the check breaks when there is only one primefactor. Its simple enough, just check if there is only one distinct primefactor, and return TRUE in that case.

practical <- function(x){
  all_factors <- factors(x)
  all_factors <- table(all_factors)
  n_factors <- length(all_factors)
  res <- TRUE
    if(n_factors ==1){
    return(TRUE)
    break()
  }

  for(i in 2:n_factors){
    if((as.numeric(names(all_factors)[i]))>(sum(divisors(prod(as.numeric(names(all_factors)[1:i-1])**as.numeric(all_factors)[1:i-1])))+1)){
      return(FALSE)
      break()
      }
  }
  return(TRUE)
}

So, quite a bit faster!

Now I can take my candidate n’s based on the requirements for primepairs, and just keep the n’s that are themselves practical. And where n-8, n-4, n+4 and n+8 are also practial:

eng_numbers <- primecandidates %>%
  keep(function(x) practical(x-8)) %>%
  keep(function(x) practical(x-4)) %>%
  keep(function(x) practical(x)) %>%
  keep(function(x) practical(x+4)) %>%
  keep(function(x) practical(x+8))

eng_numbers
## numeric(0)

OK. There was no n’s in this set.

This is kinda problem. The n we are looking for are actually pretty large. I know this, because this writeup is written after I found the solution. So it is not because the code is flawed.

Nu har vi så den udfordring, at vi skal have fat i ret høje tal.

primenumbers <- primes(500000000)
primecandidates <- numeric()
for(i in 1:(length(primenumbers)-4)){
if((primenumbers[i+1] - primenumbers[i] == 6) & (primenumbers[i+2] - primenumbers[i+1]== 6) & (primenumbers[i+3] - primenumbers[i+2]== 6)){
  primecandidates <- c(primecandidates,(primenumbers[i]+9))
  }
}
eng_numbers <- primecandidates %>%
  keep(function(x) practical(x-8)) %>%
  keep(function(x) practical(x-4)) %>%
  keep(function(x) practical(x))   %>%
  keep(function(x) practical(x+4)) %>%
  keep(function(x) practical(x+8))

length(eng_numbers)
## [1] 3

That gives us three. We need four.

Now – I could just test larger and larger sets of primes. I run into memory problems when I try that. Then I could write some code, that generates primenumbers between some two numbers, and increse those numbers until I get number four.

I have not. Instead I just tried again and again, until I found number 4. We need to have an upper limit for primes that are some four times larger than the one I just used. Anyway, I found number four. And got the green tickmark.

Lesson learned

Sometimes you really need to look into the math. This would have been impossible if I had not found a quicker way to check for practicality.

Also, I should refactor this code. The check for practicality could certainly be optimised a bit.

And – I should probably check for primality, rather than generating the list of primes.

Yet another idea for a project

Library of Congress has dumped a lot (like a LOT) of data on the net:

https://catalog.data.gov/dataset?q=organization:library-of-congress+AND+type:dataset&publisher=Library+of+Congress

Including their catalogue. I was wondering… Would it be possible to build a machine learning algorithm, that returns a subject, based on title? And maybe other information?

Something to look into during the long hours in the summer, where the boss is on holiday, the patrons are away, and we have time to do interesting stuff? Not that we are not doing interesting stuff already, but stuff that is interesting in it self.

You have to run very fast to stay in the same place

In a world, changing with increasing speed, it is simply human nature to try to make things stay the way they have always been. We don’t like change. And we are prepared to do a lot of work to prevent the change. Even if it is inevitable.

Because of this, forward thinking managers, not only in libraries, are spending a lot of time changing things. That is usually a good thing.

But sometimes it is not. The idea that we should change, adopt, develop and reform things, because if we don’t the world around us will change, can often lead to spending a lot of time running forward in circles. And not necessarily forward. We are not always certain that a change is actually for the better, but at least it is a change, and we have to change don’t we?

So – one of the cynical lessons from someone who has spent the last 20+ years observing projects, is that far too often, changes are made based on the assumption that change in itself is good. Or:

  1. To improve things, things must change
  2. We are changing things
  3. Therefore, we are improving things.

Before you spend my tax money on changing things, please pause and consider if the change is actually for the better. Or just a change.

Competence development

What is it?

We spend a lot of time talking about it. And to be honest, in my opinion, not very much time actually doing anything about it.

Competence is the ability to do something successfully or efficiently.

It follows logically, that when you develop your competences, you will, afterwards, be able to do something (successfully) that you were not able to do before. Or you will be able to do it more efficiently, ie better or faster.

Usually it is not enough to acquire a skill. “Use it or lose it” they say. I was actually speaking pretty good german when I was 15. Today I’m anything but fluent, and struggling with Duolingo to regain the skills I had 28 years ago. I do read technical german pretty well. And that testifies to the fact, that I have used the part of my “german competences” that relates to technical texts. Whereas I have not used the part that allows me to speak or understand spoken, german.

It’s the same with the skills you need at your job. You may acquire some skills at a course, by reading a book, by getting peer-training. But if you don’t use them afterwards, you loose them.

Far too often we spend time attending a course, get back to work, and not use the things we learned. The net result is that we wasted the time and money spend on the course. Noone made sure that we had the time to use our new skills. Or that procedures in the organization was changed accordingly. Maybe we learned skills that we don’t really need.
It would be fun to learn to weld. And even better if my boss would pay for it. But I would never get to use that skill at work. Or at home. So it would be a bloody waste of time! It would probably be bloody in a literal sense as well…

That is one problem. Developing skills and competences that are not actually useful. Or used. I have had it happen to me several times. I’ve even paid for useless courses and activities out of my own pocket.

The other problem is labelling.

We all agree that competence development is positive. We want more of it. It is nice, and necessary. Therefore it is not surprising, that we would like to label activities that does not really develop any competences, as developing competences. I have been required to register, as competence development, giving an introduction to ebook readers. I agree that the participants in the course, the colleagues subjected to me talking about ebook readers for 1½ hours, had their competences developed. Or at least I hope they had.

But how is giving that introduction developing my competences? As an academic I am able to argue for, or againts, anything, sometimes even at the same time. But my competences regarding ebook readers were definitely not developed. I might have gotten slightly better at giving introductions. But I did not learn anything new about ebook readers.

Why do I care? Well. First of all I abhor newspeak. If something is not developing competences, dont claim that that is what is happening. It might be interesting. It might be beneficial. But if competences are not developed, they are not developed, and please do not claim that they were.

Secondly, it generates the impression that we are spending a lot of resources on developing competences. That is a problem if that impression is false. We will wake up some day, and wonder why we did not learn what was necessary to survive in a changing library landscape. It will be a mystery to us, because we thought we were spending a lot of time developing all sorts of competences. But in reality we did not develop anything.

 

 

 

Stacked barcharts with ggplot2

Barcharts to percentages

Maybe not the most correct headline.

Anyway. Assume we have five cases, with two values for a given treatment, each labelled “AVO” and “INT”. How to get a barchart, where each cases is one bar, and the bars are coloured based on the percentages of the two values?

Lets make some sample data:

set.seed(47)
cases <- 1:5
cases <- c(cases, cases)
treat1 <- rep("AVO", 5)
treat2 <- rep("INT", 5)
treat <- c(treat1, treat2)
val <- runif(10)

df <- data.frame(cases=cases, treatment = treat, value = val, stringsAsFactors = FALSE)

How does that look?

df
##    cases treatment     value
## 1      1       AVO 0.9769620
## 2      2       AVO 0.3739160
## 3      3       AVO 0.7615020
## 4      4       AVO 0.8224916
## 5      5       AVO 0.5735444
## 6      1       INT 0.6914124
## 7      2       INT 0.3890619
## 8      3       INT 0.4689460
## 9      4       INT 0.5433097
## 10     5       INT 0.9248920

OK, nice and tidy data.

What I want is a bar for case 1, filled with percentages. One color with 0.976/(0.976+0.691) and another color for 0.691/(0.976+0.691).

library(ggplot2)
ggplot(df)+
  geom_col(aes(y=value, x=cases, fill=treatment), position = "fill")

plot of chunk unnamed-chunk-3

We’re using geom_col. That is a shorthand for geom_bar with stat=“identity”, and is the shortcut to expressing values in the barchart.

The y-value is the value from df, the x-value is the cases. And we’re colouring the bars based on the value in treatment. The position=“fill” stack the bars, and normalize the heights. If we wanted to just stack the elements, we would use position=“stack”:

ggplot(df)+
  geom_col(aes(y=value, x=cases, fill=treatment), position = "stack")

plot of chunk unnamed-chunk-4

This is a great, like really great, cheatsheet for ggplot:

https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Project Euler 54

Poker hands. Project Euler number 54

Given 1000 random hands of 5 playing cards dealt to two players, how many hands does player 1 win?

We have the following notation for the cards:

2, 3, 4, 5, 6, 7, 8, 9, T, J, Q, K, A

Where T is ten, J is Jack, Q is Queen, K is King and A is Ace.

And the suits:

H, C, S, D

We have a number of different combinations:

  1. High Card: Highest value card.
  2. One Pair: Two cards of the same value.
  3. Two Pairs: Two different pairs.
  4. Three of a Kind: Three cards of the same value.
  5. Straight: All cards are consecutive values.
  6. Flush: All cards of the same suit.
  7. Full House: Three of a kind and a pair.
  8. Four of a Kind: Four cards of the same value.
  9. Straight Flush: All cards are consecutive values of same suit.
  10. Royal Flush: Ten, Jack, Queen, King, Ace, in same suit.

If a given hand is a Flush, it gets the rank of 6. Which means it beat a Straight, which has the rank of 5.

That should be simple. Write a function that returns the rank of a given hand, and determine if one hand beats another hand.

However, two hands with the rank 2, are not necessarily equal. One pair of fours is better than one pair of twos.

So the function should also return some value that can be used for discriminating between different hands with the same rank.

A hand comes on the form:

5H 5C 6S 7S KD

I think it might be useful to have a function that returns the suits and the values.

library(dplyr)
suits <- function(s){
  delres <- unlist(strsplit(s, split = " "))
  sapply(delres, function(x) substr(x,2,2), USE.NAMES = FALSE)
}

values <- function(s){
  delres <- unlist(strsplit(s, split = " "))
  sapply(delres, function(x) substr(x,1,1), USE.NAMES = FALSE)
}

Testing for a Royal Flush. All values of suits should be identical. And the values should be T, J, Q, K, A. That is simple. Check if all the values that should be in the hand, if it is a Royal Flush are there. Then check if the suit is the same for all cards in the hand. If both are true, return TRUE.

is_royal_flush <- function(testhand){
  testvalue <- c("T", "J", "Q", "K", "A")
  value_match <- all(testvalue %in% values(testhand))
  suits_match <- length(unique(suits(testhand)))==1
  return(as.logical(value_match*suits_match))
}

Straight Flush: All cards are consecutive values of same suit.

Testing for the same suit is copy-paste from above. Testing for consecutive values is a little bit more tricky. I cant just sort the values, since K is before Q in the alphabet. So I’m using dplyr::recode() to recode the T, J, Q, K, A values to numbers. The I can convert them to numeric and sort them. And I’ll just have to figure out if the values are consecutive.

I had to google a bit. But diff() does it. It calculates the difference between the elements. It is possible to define a lag, but here the default of 1 does the trick. It returns the second element in the vector, minus the first element. And the third element minus the second.

When I have five elements, and they have consecutive values, diff() will return a vector of 4 ones.

is_straight_flush <- function(testhand){
  suits_match <- length(unique(suits(testhand)))==1
  testvalues <- values(testhand)
  testvalues <- recode(testvalues, T = "10", J = "11", Q = "12", K = "13", A = "14")
  testvalues <- as.numeric(testvalues)
  testvalues <- sort(testvalues)
  value_match <- all(diff(testvalues) == c(1,1,1,1))
  return(as.logical(value_match*suits_match))
}

Four of a Kind: Four cards of the same value.

We are not interested in the suits here. First I get the values. Then I use table() to get the counts of the different values. Passing that to max() returns 4, if there are four identical values.

is_four_of_a_kind <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  return(max(testvalues)==4)
}

Full House: Three of a kind and a pair. I use the same functionality as in is_four_of_a_kind. The trick here is, that the max of the tabulation of values should be 3. But the minimum should also be 2.

is_full_house <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  (max(testvalues)==3)&(min(testvalues)==2)
}

Flush: All cards of the same suit. In is_straight_flush I test if the suits are the same. That is what I need here, I just skip the part where I test if the values are consecutive.

is_flush <- function(testhand){
  suits_match <- length(unique(suits(testhand)))==1
  return(as.logical(suits_match))
}

Straight: All cards are consecutive values. This is more or less the same function is_straight_flush(). I just skip the test for suits.

is_straight <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- recode(testvalues, T = "10", J = "11", Q = "12", K = "13", A = "14")
  testvalues <- as.numeric(testvalues)
  testvalues <- sort(testvalues)
  value_match <- all(diff(testvalues) == c(1,1,1,1))
  return(as.logical(value_match))
}

Three of a Kind: Three cards of the same value. That is basically the same as the function is_four_of_a_kind() above. The maximum value from the table should just be 3 instead of 4:

is_three_of_a_kind <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  return(max(testvalues)==3)
}

Two Pairs: Two different pairs. First I use values() to get the values of the hand. I’m not interested in the suits.
Then I tabulate the testvalues. I get a count of the number each value occurs in the hand.
But I’m no interested in that number. If there are two pairs in the hand, I will get two values, that occurs two times.

So if I run table() on the result again, I get a count of how many times the count 2 occurs. If the count 2 occurs twice, there are two hands.

is_two_pairs <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  testvalues <- table(testvalues)
  2 %in% testvalues
}

One Pair: Two cards of the same value. Again, basically the same as is_three_of_a_kind(), with 2 instead of 3 in the test:

is_pair <- function(testhand){
  testvalues <- values(testhand)
  testvalues <- table(testvalues)
  return(max(testvalues)==2)
}

It will return FALSE if there are three of a kind. I’m interested in the highest rank a hand can get, so that should not be a problem.

High Card: Highest value card. I don’t need to test for this. If a hand has not gotten a higher rank, it will always have this.

Nice. I now have a set of functions that I can use to figure out what hand has the highest rank.

Lets put that together:

rank <- function(testhand){
  res <- 1 # The hand will never get a rank lower than 1.
  if(is_pair(testhand)){res <- 2}
  if(is_two_pairs(testhand)){res <- 3}
  if(is_three_of_a_kind(testhand)){res <- 4}
  if(is_straight(testhand)){res <- 5}
  if(is_flush(testhand)){res <- 6}
  if(is_full_house(testhand)){res <- 7}
  if(is_four_of_a_kind(testhand)){res <- 8}
  if(is_straight_flush(testhand)){res <- 9}
  if(is_royal_flush(testhand)){res <- 10}
  return(res)
}

That should solve a lot of the cases. However. What if the rank of two hands are the same?

If both hands have rank 9, that is if they are both straight flush, the hand with the highest card wins. There is no reason to test for the rest of the cards.

For all the others, I need functions that breaks the tie. It it should be broken.

Most of those determinations will be based on the value of the cards. So I might as well have a function that recodes ace to 14 etc.

recode_value <- function(values){
  as.numeric(recode(values, T = "10", J = "11", Q = "12", K = "13", A = "14"))
}

I’m probably going to need the reverse function:

rev_recode_value <- function(values){
  recode(as.character(values), "10" = "T", "11" = "J", "12" = "Q", "13" = "K", "14" = "A")
}

For High Card, the hand with the highest value card wins. If the highest value card in both hand 1 and hand 2 is equal, we look at the second highest card in each hand. Etc. Only if the values are identical in the two hands is there a tie.

break_high_card <- function(hand1, hand2){
  v1 <- values(hand1)
  v2 <- values(hand2)
  v1 <- sort(recode_value(v1))
  v2 <- sort(recode_value(v2))
  if((identical(v1,v2))){
    return(0)
    break()
  } else{
  while(v1[length(v1)]==v2[length(v2)]){
    v1 <- v1[-length(v1)]
    v2 <- v2[-length(v2)]
  }  
  }
  if(max(v1)>max(v2)){
    return(1)
  } else {
    return(2)
  }
}

All right. That was not easy… First I get the values for the two hands. Then I recode them to numerical values, and sort them.

If the two hands have equal values, it is a tie. Otherwise, while the last element in the first vector containing the values is equal to the last element in the second vector, I get rid of the last element in both vectors.

When that is done, the last element in the two vectors is compared. The hand with the highest value wins.
The funcion returns 0 if it is a tie, 1 if hand1 wins, and 2 if hand2 wins.

This was actually the hardest part to write.

Next. Breaking the tie between two hands each having one pair.

break_one_pair <- function(hand1, hand2){
  v1 <- values(hand1)  
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1_pair <- as.numeric(names(v1)[v1==2])
  v2_pair <- as.numeric(names(v2)[v2==2])
  if(v1_pair>v2_pair){
    return(1)
    break()
  }
  if(v2_pair>v1_pair){
    return(2)
    break()
  }
  if(v2_pair==v1_pair){
    v1 <- names(v1)[!v1==2]
    v2 <- names(v2)[!v2==2]
    v1 <- rev_recode_value(v1)
    v2 <- rev_recode_value(v2)
    return(break_high_card(v1,v2))
    break()
  }
}

First I’m picking out the values of the pairs in the two hands. If hand1 has a pair of 8’s and hand2 has a pair of 7’s, hand1 wins.

If the pairs are of the same value. Then I pick out the values of the rest of the cards. I am making a dangerous assumption here. This will only work if the two hands actually only have one pair. I hope it works, but this is one of the places where it could go wrong.

Two Pairs: Two different pairs.

The hand with the highest value pair wins. If those values are identical, the value of the second pair determines the winner. If that is also equal, the value of the fifth remaining card, determines the winner.
Only if that is also equal, is there a tie. Therefore, there can only be a tie, if all values of all cards are equal.

break_two_pairs <- function(hand1, hand2){
  v1 <- values(hand1)  
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1_pair <- sort(as.numeric(names(v1)[v1==2]))
  v2_pair <- sort(as.numeric(names(v2)[v2==2])) # now v1_pair and v2_pair contains the values of each of the pairs in each of the hands.
  if(identical(v1_pair,v2_pair)){        # If the pairs have the same value, pick out the fifth value, and compare them.
    v1 <- as.numeric(names(v1)[!v1==2])
    v2 <- as.numeric(names(v2)[!v2==2])
    if(v1>v2){
      return(1)
      break()
    }
    if(v2>v1){
      return(2)
      break()
    }
    if(v1==v2){
      return(0)
      break()
    }
  } else {
    v1 <- rev_recode_value(v1_pair)
    v2 <- rev_recode_value(v2_pair)
    return(break_high_card(v1,v2))
  }
}

Kinda the same as for one pair. v1_pair and v2_pair ends up with vectors containing the values of the two pairs in the two hands.

If these two vectors are identical, we need to look at the fifth value.

If not, I reverse_recode the values of the pairs, and use break_high_card, to determine which is larger.

Three of a Kind: Three cards of the same value.

Again I’m making the assumption that the two hands passed to this function have three of a kind. And nothing better than that.

This is a bit simpler. There is no way that both hands can have three of a kind with the same value. So I don’t need to consider the remaining cards. The assumption here is that there is only on deck of cards involved. This is another place where things can go wrong. But lets try!

break_three_of_a_kind <- function(hand1, hand2){
  v1 <- values(hand1)
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1 <- as.numeric(names(v1)[v1==3])
  v2 <- as.numeric(names(v2)[v2==3])
  if(v1>v2){
    return(1)
    break()
  } else {
    return(2)
    break()
  }
}

I convert to values, recode them, tabulates them. And pick out the values that occur three times. Then I just compare them.

Straight: All cards are consecutive values.

Once more I assume that the two hands passed to this function actually only contains a straigth. It is rather simple. As the values must be consecutive, the highest value in each hand determines the winner.

break_straigth <- function(hand1,hand2){
  v1 <- values(hand1)
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- max(v1)
  v2 <- max(v2)
  if(v1>v2){
    return(1)
    break()
  } 
  if(v2>v1){
    return(2)
    break()
  }
  if(v2==v1){
    return(0)
    break()
  }
}

Flush: All cards of the same suit.

Again – I assume that flush is the highest combination in these hands. This is basically the same as high card.

break_flush <- function(hand1, hand2){
  return(break_high_card(hand1,hand2))
}

Full House: Three of a kind and a pair.

The same assumption applies. These two hands both have Full House as the highest rank.

The winner is determined by the higher value of the three of a kind.

break_full_house <- function(hand1,hand2){
  return(break_three_of_a_kind(hand1,hand2))
}

Four of a Kind: Four cards of the same value.

Same as for the full house. The winner is determined by the higher value of the four of a kind.

break_four_of_a_kind <- function(hand1,hand2){
  v1 <- values(hand1)  
  v2 <- values(hand2)
  v1 <- recode_value(v1)
  v2 <- recode_value(v2)
  v1 <- table(v1)
  v2 <- table(v2)
  v1 <- sort(as.numeric(names(v1)[v1==4]))
  v2 <- sort(as.numeric(names(v2)[v2==4]))
  if(v1>v2){
    return(1)
    break()
  } else {
    return(2)
    break()
  }
}

Straight Flush: All cards are consecutive values of same suit.

Again I’m making dangerous assumptions. But, this is functionally the same as high card.

break_straight_flush <- function(hand1,hand2){
  return(break_high_card(hand1,hand2))
}

Royal Flush: Ten, Jack, Queen, King, Ace, in same suit.

If both hand are royal flush – it will always be a tie.

break_royal_flush <- function(hand1,hand2){
  return(0)
}

Phew…

I now have a function that will return the rank of a given hand.

If I compare the rank of two hands, I will either get a clear winner, or need to run one of the break functions to determine the result.

winner <- function(hand1, hand2){
  if(rank(hand1)>rank(hand2)){
    return(1)
    break()
  }
  if(rank(hand2)>rank(hand1)){
    return(2)
    break()
  }
  if(rank(hand2)==rank(hand1)){
    rang <- rank(hand2)
    if(rang==1){
      return(break_high_card(hand1,hand2))
      break()
    }
    if(rang==2){
      return(break_one_pair(hand1,hand2))
      break()
    }
    if(rang==3){
      return(break_two_pairs(hand1,hand2))
      break()
    }
    if(rang==4){
      return(break_three_of_a_kind(hand1,hand2))
      break()
    }
    if(rang==5){
      return(break_straigth(hand1,hand2))
      break()
    }
    if(rang==6){
      return(break_flush(hand1,hand2))
      break()
    }
    if(rang==7){
      return(break_full_house(hand1,hand2))
      break()
    }
    if(rang==8){
      return(break_four_of_a_kind(hand1,hand2))
      break()
    }
    if(rang==9){
      return(break_straight_flush(hand1,hand2))
      break()
    }
    if(rang==10){
      return(break_royal_flush(hand1,hand2))
      break()
    }
  }
}

Phew… Again.

What should be left now, is to load the hands, and test them.

data <- read.csv("https://projecteuler.net/project/resources/p054_poker.txt", header=FALSE, stringsAsFactors = FALSE)
library(tidyr)
library(stringr)

answer <- data %>%
  mutate("h1" = str_sub(V1, 1,14)) %>%
  mutate("h2" = str_sub(V1, 16,30)) %>%
  rowwise() %>%
  mutate("w" = winner(h1,h2)) %>%
  filter(w==1) %>%
  nrow()

I read in the file to data, make a column h1 containing the first 14 characters of each line in the datafile, and a column h2 containing the rest of the line.

rowwise() groups the dataframe by row – so it doesn’t matter that my winner() function is not that vectorized. For each set of hands I calculate the winner, filter the result so I only have the rows where hand 1 wins. And then I get the number of rows – which are the answer.

Lessons learned

Some. rowwise() is a usefull function. I’m certain I dont understand everything about it, but it would seem to solve a fair number of the problems I’ve had using non-vectorized functions.

I should probably also learn to set aside some time for re-factoring my code before publishing it.