Info about object dimensions:
length(c(1,2,3)) dim(matrix(1:6, 2, 3)) ncol(matrix(1:6, 2, 3)) nrow(matrix(1:6, 2, 3))
Brief info about any object:
typeof(str) class(str) unclass(str) str(c(1, 2)) str(summary)
Column names of datasets:
names(...) names(list(colA=1, colB=2))
Column/row names of matrixes:
colnames(matrix(...)) rownames(matrix(...))
List objects in global context: ls().
Objext size in memory: object.site(1:2)
Controlling output precision:
options(digits=3)
List of all options:
str(options())
To mark function for debugging call:
debug(fun, text = "", condition = NULL) debugonce(fun, text = "", condition = NULL)
To return function to normal execution:
undebug(fun) isdebugged(fun)
You can under to debug mode in any piece of code by calling browser.
traceback prints out the function call stack after an error occurs; does nothing if there's no error.
trace allows you to insert debugging code into a function a specific places.
recover allows you to modify the error behavior so that you can browse the function call stack.
How long execution of expression takes (in low sec/milisec resolution):
system.time(expr, gcFirst = TRUE) unix.time(expr, gcFirst = TRUE)
Rprof function enable global profiling. summaryRprof function decrypt profiling data:
Rprof() ## start profiling Rprof(NULL) ## suspend profiling Rprof(append = TRUE) ## resume profiling Rprof(NULL) ## end profiling summaryRprof() ## investigate profiling report
For each distribution there are exists corresponding generation function, named with prefix r:
rnorm(n, mean = 0, sd = 1) rt(n, df, ncp) rbinom(n, size, prob) rpois(n, lambda) runif(n, min = 0, max = 1) rexp rchisq rgamma
In order to generate predictable sequences use:
set.seed(seed, kind = NULL, normal.kind = NULL)
Sampling from array:
sample(x, size, replace = FALSE, prob = NULL) sample.int(n, size = n, replace = FALSE, prob = NULL) sample(1:10, 10) ## permutation!! sample(1:10, 100, replace=TRUE)
lapply iterate over data and return list with result of function application:
lapply(1:5, function(x) x^2) lapply(matrix(rnorm(20*10),20,10), mean)
Usually you don't need a list but a vector. sapply works like lapply but also try to convert result to matrix or vector is dimantions and elvement types permit this:
lapply(list(1:5), mean) [[1]] [1] 3 sapply(list(1:5), mean) [1] 3
apply works on specific dimension of data so useful to work with matrixes and data frames:
apply(matrix(1:6, 2, 3), 1, min) [1] 1 2 apply(matrix(1:6, 2, 3), 2, max) [1] 2 4 6 apply(array(rnorm(2*2*10), c(2, 2, 10)), c(1, 2), mean) [,1] [,2] [1,] -0.2733804 0.3154234 [2,] 0.1830982 -0.5889010
colSums, rowSums, colMeans, rowMeans is defined as optimized equivalent for:
rowSums = apply(x, 1, sum) colSums = apply(x, 2, sum) rowMeans = apply(x, 1, mean) colMeans = apply(x, 2, mean)
split partitioning data on factor (analog of SQL group by):
data<-data.frame(rnorm(10),rbinom(10,1,prob=.7)) sdata<-split(data[,1],data[,2]) lapply(sdata,mean)
Check Inspecting objects section.
Investigating unique values:
sapply(data, unique) sapply(data$col, unique) sapply(data[,c("col1","col2")], unique) sapply(data[,5:10], unique) table(data$col) tapply(data$what, data$by, unique) tapply(data$what, data$by, summary) tapply(data$what, data$by, range) tapply(data$what, data$by, mean) tapply(data$what, data$by, sd)
Brief info about vectors and matrixes:
summary(1:8) summary(matrix(1:20, 4, 5))
Simple plots:
i<-1:100 x<-i/10 y<-x^2 plot(x,y) hist(rpois(100,10)) hist(rpois(100,10),breaks=20)
names(d)[names(d)=="beta"] <- "two" names(d)[2] <- "two" library(plyr) newd <- rename(d, c("beta"="two", "gamma"="three"))
rownames(dt) <- NULL colnames(dt) <- NULL
Drop column from data frame by number:
dfnew <- df[-1] # first dfnew <- df[-ncol(df)] # last dfnew <- df[-c(1, 3:4, 7)] # range
Drop column from data frame by name:
newdf <- df[ , !(names(df) %in% c("lat", "long"))] df <- data.frame( a = 1:10, b = 2:11, c = 3:12 ) df <- subset(df, select = c(a,c)) df <- subset(df, select = -c(a,c))