Euler 100

Project Euler – problem 100

Back to the hopeless examples of probabilities from school.
In a bag there are 15 black balls and six white ones. Project Euler talks about discs, math-teachers has always used balls as examples, and they where always white and black. So I’ll stick with that.

It you draw two balls from the bag, there is a 50/50 chance of drawing 2 black balls:

(15/21)*(14/20)
## [1] 0.5

I’m told that the next set of balls in the bag with that property, is 85 black balls and 35 white ones:
(85/120)

(85/120)*(84/119)
## [1] 0.5

Find the mix of black and white balls, that gives a probability of 50/50 of drawing 2 black balls, given that there should be more than 10¹² = 1000000000000 balls in the rather large bag.

That should be straight-forward.
Lets call the number of black balls b and the number of white balls w. And lets define the total number of balls in the back as n=w+b
The probability of drawing two black balls is:

(b/n)((b-1)/(n-1)) = ½

n = w + b > 10¹²

Two equations with two unknowns.
The probability can be rearranged:

(b/n)((b-1)/(n-1)) = ½ <=>

b(b-1) / n(n-1) = ½ <=>

(b² – b) / (n² – n) = ½ <=>

b² – b = ½(n² – n) <=>

2b² – 2b = n² – n <=>

2b² – 2b – n² + n = 0

Hm. Maybe it is not that simple after all. First of all I don’t know if n is 100000000000 or 100000000001. That actually makes a pretty big difference:

1000000000001**2 - 1000000000000**2
## [1] 1.999978e+12

Second of all, I need to find integer solutions. An analytical solution might not give integer results. And I can’t have one third of a ball in the bag.

Googling “finding integer solutions to equations” give, as the first result, a link to the wikipedia article on “Diophantine equations”.
Which apparently are equations that should have integer solutions.

All right, a couple of the problems I’ve tackled earlier, and quite a lot of Project Euler problems I’ve given up on appears to be about solving these Diophantine equations.

So. Nice. The last link of the wikipedia page is to https://www.alpertron.com.ar/QUAD.HTM.
I should probably read up on the methods. But that will have to wait.

The point is, that this Diophantine equation can be solved by:

b~n+1~ = 3b~n~ + 2n~n~ -2

n~n+1~ = 4b~n~ + 3n~n~ -3

The idea is that we have a solution (b~n~, n~n~). And these two equations allows us to calculate the next solution, (b~n+1~, n~n+1~)

Lets try that, we was given that (15,21) was a solution. The next should be (85,120). Do we get that?

b <- 15
n <- 21
b_n <- 3*b + 2*n -2
n_n <- 4*b + 3*n -3
print(paste(b_n, n_n, sep=","))
## [1] "85,120"

Qap’la, it works. Nice. Now I just need to run through this until n~n+1~ gets above 10¹².

b <- 15
n <- 21
while(n<10**12){
  b_n <- 3*b + 2*n -2
  n_n <- 4*b + 3*n -3
  b <- b_n
  n <- n_n
}
answer <- b

Lessons learned:

  1. Solving Diophantine equations is at the heart of a lot of these problems. I’ve learned a new tools to handle them!
  2. If you want to subscript stuff in RMarkdown, you place a ~ on each side of what you want subscripted.

Other stuff to note: Maybe it is time someone wrote a new solver for Diophantine equations. The one I found is 19 years old. Something to do in Shiny perhaps?

Euler 80

Problem 80 from Project Euler.

The problem tells us that if the square root of a natural number is not an integer, it is irrational.
Project Euler also claims that it is well known. I did not know it.

We are then told that the square root of 2 is 1.4142135623730950… And that the digital sum of the first 100 digits is 475.

The task is now to take the first 100 natural numbers. And find the total of the digital sums for the first 100 digits for all the irrational square roots.

Lets begin by figuring out how to handle that many digits. R does not support more than around 15 places after the decimal point.

The library Rmpfr can handle arbitrary precision:

library(Rmpfr)
## Loading required package: gmp
## 
## Attaching package: 'gmp'
## The following objects are masked from 'package:base':
## 
##     %*%, apply, crossprod, matrix, tcrossprod
## C code of R package 'Rmpfr': GMP using 64 bits per limb
## 
## Attaching package: 'Rmpfr'
## The following objects are masked from 'package:stats':
## 
##     dbinom, dnorm, dpois, pnorm
## The following objects are masked from 'package:base':
## 
##     cbind, pmax, pmin, rbind
a <- sqrt(mpfr(2,500))

The variable a now contains the square root of 2 with a precision of 500 bytes. I’m not quite sure how many decimal places that actually translates to. But testing the following code allows me to conclude with confidence that it is at least 100.

A thing to note here is, that

a <- mpfr(sqrt(2),500)

and

a <- sqrt(mpfr(2,500))

are not equal. In the first exampel sqrt(2) is evaluated before saving the value with the high precision. Start by converting the number 2 to a high precision representation, before doing math on it.

Next is writing a function that will return the digital sum of the first 100 digits of a number.

digitsum <- function(x){
  s <- 0
  for(i in 1:100){
    s <- s + floor(x)
    x <- (x - floor(x))*10
  }
  s
}

First s is initialized to 0. Then floor(x) gives us the first digit in x. We add that to s, and subtract it from x, and multiply by 10. Repeat that 100 times, and you get the sum of the first 100 digits in x.

Let us test that it works. Project Euler told us what the result for sqrt(2) is:

digitsum(a)
## 1 'mpfr' number of precision  500   bits 
## [1] 475

Nice, the correct result (not that that guarantees that I’ve done everything correctly).

Now, lets find all the irrational square roots we need to look at:

library(purrr)
t <- 1:100
s <- t %>%
  keep(~as.logical(sqrt(.x)%%1))

I need to practice this way of coding a bit more. t contains the first 100 natural numbers. I pass that to the keep()-function, and the predicate function takes the square root of each number, take the modulus 1, and convert it to a logical value. If the modulus of the square root is 0, the square root is an integer, and 0 i false. So we’re keeping all the non-integer squareroots.

Now I’ll convert all the natural numbers to the mpfr-class. The next line takes the square root. The third line calculate the digitalsum. And the final line gives us the result.

s <- mpfr(s,500)
r <- sqrt(s)
r <- digitsum(r)
sum(r)
## 1 'mpfr' number of precision  500   bits 
## [1] Censored

Lessons learned:
Rmpfr allows us to work with (more or less) arbitrary precision.
But we need to convert numbers to the relevant class before doing math on it.

Replacing values in a dataframe – to what a previous value was

Given a set of data, where some values indicate that they are the same as a previous value, how to replace them with the correct value.

Eg, this dataframe:

(m <- data.frame(i=c(1:10,NA), t=c("lorem", "do", "do", "Do", "ipsum", "do", "Do", "(do)", "dolor", NA, "test"), stringsAsFactors=F))
##     i     t
## 1   1 lorem
## 2   2    do
## 3   3    do
## 4   4    Do
## 5   5 ipsum
## 6   6    do
## 7   7    Do
## 8   8  (do)
## 9   9 dolor
## 10 10  <NA>
## 11 NA  test

How to replace the first three “do”s with “lorem” and the next set of “do”s with “ipsum”

Using fill() from the tidyr package is straight forward. It takes a vector, locates all NA, and replaces them with the last, non-NA value.
Simple enough, change all the variations of “do” to NA, run fill(). Done.
One problem, there might be NAs in the dataset, that we do not want to affect.
Solution – there might be a more elegant one, but this works:

  1. Change the NAs to something that do not occur in the data
  2. Change to variations of “do” to NA
  3. Use the fill()-function
  4. Change the NAs from step 1 back to NA
library(tidyr)
rpl <- "replacement"
m[is.na(m$t),]$t <- rpl
doset <- c("do", "Do", "(do)")
m[(m$t %in% doset),]$t <- NA

m <- m %>% fill(t)
m[(m$t == rpl),]$t <- NA
m
##     i     t
## 1   1 lorem
## 2   2 lorem
## 3   3 lorem
## 4   4 lorem
## 5   5 ipsum
## 6   6 ipsum
## 7   7 ipsum
## 8   8 ipsum
## 9   9 dolor
## 10 10  <NA>
## 11 NA  test

Done!

Oh, and by the way, this is my first post generated directly from RStudio!

Der skal mere liv her!

Og et par andre steder. Noget af det jeg bruger en del tid på, både privat og professionelt, er R. Som i det statistiske program R.

Jeg skal derfor snart have taget et kig på denne side:

Og få publiceret hvad jeg alligevel nusser rundt med her og andre steder.

Hide rows, based on value of cell – in Excel

So – you want to hide some rows on a worksheet, based on the value in a cell. Or more than one.
Here’s how to do that with VBA
Find the last row of the range that you want to apply the hiding to.
Get a range of rows, in this case starting at A7, and ending at “LastRow”.
For each value in that range, if the value i column A is equal to the value in cell G1 (that is Cells(1,7), And the value three columns over, eg in colum D (that is c.Offset(0,3)), is equal to the value in cell G2 (Cells(2,7), then set the entire row to be hidden, else set it to be shown.


Private Sub Worksheet_Change(ByVal Target As Range)
Dim LastRow As Long, c As Range
Application.EnableEvents = False
LastRow = Application.WorksheetFunction.CountA(Range("A7:A100000")) + 6
On Error Resume Next
For Each c In Range("A7:A" & LastRow)
If (c.Value = Cells(1, 7).Value And c.Offset(0, 3).Value = Cells(2, 7).Value) Then
c.EntireRow.Hidden = False
Else
c.EntireRow.Hidden = True
End If
Next
On Error GoTo 0
Application.EnableEvents = True
End Sub