Given a set of data, where some values indicate that they are the same as a previous value, how to replace them with the correct value.
Eg, this dataframe:
(m <- data.frame(i=c(1:10,NA), t=c("lorem", "do", "do", "Do", "ipsum", "do", "Do", "(do)", "dolor", NA, "test"), stringsAsFactors=F))
## i t ## 1 1 lorem ## 2 2 do ## 3 3 do ## 4 4 Do ## 5 5 ipsum ## 6 6 do ## 7 7 Do ## 8 8 (do) ## 9 9 dolor ## 10 10 <NA> ## 11 NA test
How to replace the first three “do”s with “lorem” and the next set of “do”s with “ipsum”
Using fill() from the tidyr package is straight forward. It takes a vector, locates all NA, and replaces them with the last, non-NA value.
Simple enough, change all the variations of “do” to NA, run fill(). Done.
One problem, there might be NAs in the dataset, that we do not want to affect.
Solution – there might be a more elegant one, but this works:
- Change the NAs to something that do not occur in the data
- Change to variations of “do” to NA
- Use the fill()-function
- Change the NAs from step 1 back to NA
library(tidyr)
rpl <- "replacement"
m[is.na(m$t),]$t <- rpl
doset <- c("do", "Do", "(do)")
m[(m$t %in% doset),]$t <- NA
m <- m %>% fill(t)
m[(m$t == rpl),]$t <- NA
m
## i t ## 1 1 lorem ## 2 2 lorem ## 3 3 lorem ## 4 4 lorem ## 5 5 ipsum ## 6 6 ipsum ## 7 7 ipsum ## 8 8 ipsum ## 9 9 dolor ## 10 10 <NA> ## 11 NA test
Done!
Oh, and by the way, this is my first post generated directly from RStudio!