[ About | Licence | Contacts ]
Written by Oleksandr Gavenko (AKA gavenkoa), compiled on 2024-04-01 from rev 052223c22317.

R

Inspecting objects

Info about object dimensions:

length(c(1,2,3))
dim(matrix(1:6, 2, 3))
ncol(matrix(1:6, 2, 3))
nrow(matrix(1:6, 2, 3))

Brief info about any object:

typeof(str)
class(str)
unclass(str)
str(c(1, 2))
str(summary)

Column names of datasets:

names(...)
names(list(colA=1, colB=2))

Column/row names of matrixes:

colnames(matrix(...))
rownames(matrix(...))

List objects in global context: ls().

Objext size in memory: object.site(1:2)

Interactive session

Controlling output precision:

options(digits=3)

List of all options:

str(options())

Debugging

To mark function for debugging call:

debug(fun, text = "", condition = NULL)
debugonce(fun, text = "", condition = NULL)

To return function to normal execution:

undebug(fun)
isdebugged(fun)

You can under to debug mode in any piece of code by calling browser.

traceback prints out the function call stack after an error occurs; does nothing if there's no error.

trace allows you to insert debugging code into a function a specific places.

recover allows you to modify the error behavior so that you can browse the function call stack.

Profiling

How long execution of expression takes (in low sec/milisec resolution):

system.time(expr, gcFirst = TRUE)
unix.time(expr, gcFirst = TRUE)

Rprof function enable global profiling. summaryRprof function decrypt profiling data:

Rprof()       ## start profiling
Rprof(NULL)   ## suspend profiling
Rprof(append = TRUE)  ## resume profiling
Rprof(NULL)   ## end profiling
summaryRprof() ## investigate profiling report

Generating random numbers

For each distribution there are exists corresponding generation function, named with prefix r:

rnorm(n, mean = 0, sd = 1)
rt(n, df, ncp)
rbinom(n, size, prob)
rpois(n, lambda)
runif(n, min = 0, max = 1)
rexp
rchisq
rgamma

In order to generate predictable sequences use:

set.seed(seed, kind = NULL, normal.kind = NULL)

Sampling from array:

sample(x, size, replace = FALSE, prob = NULL)
sample.int(n, size = n, replace = FALSE, prob = NULL)

sample(1:10, 10)  ## permutation!!
sample(1:10, 100, replace=TRUE)

Looping over data

lapply iterate over data and return list with result of function application:

lapply(1:5, function(x) x^2)
lapply(matrix(rnorm(20*10),20,10), mean)

Usually you don't need a list but a vector. sapply works like lapply but also try to convert result to matrix or vector is dimantions and elvement types permit this:

lapply(list(1:5), mean)
[[1]]
[1] 3
sapply(list(1:5), mean)
[1] 3

apply works on specific dimension of data so useful to work with matrixes and data frames:

apply(matrix(1:6, 2, 3), 1, min)
[1] 1 2

apply(matrix(1:6, 2, 3), 2, max)
[1] 2 4 6

apply(array(rnorm(2*2*10), c(2, 2, 10)), c(1, 2), mean)
           [,1]       [,2]
[1,] -0.2733804  0.3154234
[2,]  0.1830982 -0.5889010

colSums, rowSums, colMeans, rowMeans is defined as optimized equivalent for:

rowSums = apply(x, 1, sum)
colSums = apply(x, 2, sum)
rowMeans = apply(x, 1, mean)
colMeans = apply(x, 2, mean)

split partitioning data on factor (analog of SQL group by):

data<-data.frame(rnorm(10),rbinom(10,1,prob=.7))
sdata<-split(data[,1],data[,2])
lapply(sdata,mean)

Exploring data

Check Inspecting objects section.

Investigating unique values:

sapply(data, unique)
sapply(data$col, unique)
sapply(data[,c("col1","col2")], unique)
sapply(data[,5:10], unique)

table(data$col)

tapply(data$what, data$by, unique)
tapply(data$what, data$by, summary)
tapply(data$what, data$by, range)
tapply(data$what, data$by, mean)
tapply(data$what, data$by, sd)

Brief info about vectors and matrixes:

summary(1:8)
summary(matrix(1:20, 4, 5))

Simple plots:

i<-1:100
x<-i/10
y<-x^2
plot(x,y)

hist(rpois(100,10))
hist(rpois(100,10),breaks=20)

Renaming columns

names(d)[names(d)=="beta"] <- "two"
names(d)[2] <- "two"

library(plyr)
newd <- rename(d, c("beta"="two", "gamma"="three"))

Removing names for raws and columns

rownames(dt) <- NULL
colnames(dt) <- NULL

Filtering raws and columns

TODO

Droping raws and columns

Drop column from data frame by number:

dfnew <- df[-1]         # first
dfnew <- df[-ncol(df)]  # last
dfnew <- df[-c(1, 3:4, 7)]  # range

Drop column from data frame by name:

newdf <- df[ , !(names(df) %in% c("lat", "long"))]

df <- data.frame( a = 1:10, b = 2:11, c = 3:12 )
df <- subset(df, select = c(a,c))
df <- subset(df, select = -c(a,c))