r language

Synopsis

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

Vector

Creating Vectors

InitializationVectorDefinition
c(2,4,6)2 4 6Join elements into a vector
2:62 3 4 5 6An Integer Sequence
seq(2, 3, by=0.5)2.0 2.5 3.0A complex sequence
rep(1:2, times=3)1 2 1 2 1 2Repeat a vector
rep(1:2, each=3)1 1 1 2 2 2Repeat elements of a Vector

Vector Functions

sort(x)
#Returns x sorted.

rev(x)
#Returns x reverse.

table(x)
#See count of values.

unique(x)
#See unique Values.

Selecting Vector Elements

By Position

ExampleDefinition
x[4]The fourth element.
x[-4]All but the fourth.
x[2:4]Elements two to four.
x[-(2:4)]All elements except two to four.
x[c(1:5)]Elements one and five.

By Value

ExampleDefinition
x[x == 10]Elements which are equal to 10.
x[x < 0]All elements less than zero.
x[x %in% c(1, 2, 5)]Elements in the set 1, 2, 5.

Named Vectors

ExampleDefinition
x[‘apple’]Element with name ‘apple’.

Getting Help

Accessing the help files

?mean
#Get help of a particular function.

help.search(‘weighted mean’)
#Search the help files for a word or phrase.

help(package = ‘dplyr’)
#Find help for a package.

More about an object

str(iris)
#Get a summary of an object’s structure.

class(iris)
#Find the class an object belongs to.

Working Directory

getwd()
#Find the current working directory (where inputs are found and outputs are sent).

setwd(‘C://file/path’)
#Change the current working directory

Use projects in RStudio to set the working directory to the folder you are working in.

Using Libraries

install.packages(‘dplyr’)
#Download and install a package from CRAN.

library(dplyr)
#Load the package into the session, making all its functions available to use.

dplyr::select
#Use a particular function from a package. data(iris) Load a built-in dataset into the environment

Programming

For Loop

for (variable in sequence)
{
 Do something
}

#Example:
for (i in 1:4)
{
 j <- i + 10
 print(j)
}

While Loop

while (condition)
{
 Do something
}

#Example:
while (i < 5)
{
 print(i)
 i <- i + 1
}

If statements

if (condition)
{
Do something
}
else
{
Do something different
}

#Example:
if (i > 3)
{
print(‘Yes’)
}
else
{
print(‘No’)
}

Functions

function_name <- function(var)
{
Do something
return(new_variable)
}

#Example:
square <- function(x)
{
squared <- x*x
return(squared)
}

Conditions

ConditionDefinition
a == bAre equal
a != bNot equal
a > bGreater than
a < bLess than
a >= bGreater than or equal to
a <= bLess than or equal to
is.na(a)Is missing
is.null(a)Is null

Reading and Writing Data

InputOutputDescription
df <- read.table(‘file.txt’)write.table(df, ‘file.txt’)Read and write a delimited text file.
df <- read.csv(‘file.csv’)write.csv(df, ‘file.csv’)Read and write a comma separated value file. This is a special case of read.table/ write.table.
load(‘file.RData’)save(df, file = ’file.Rdata’)Read and write an R data file, a file type special for R.

Data Type Conversions

Converting between common data types in R. Can always go from a higher value in the table to a lower value.

Data TypeVariablesDefinition
as.logicalTRUE, FALSE, FALSEBoolean values (TRUE or FALSE).
as.numeric1, 0, 0Integers or floating point numbers.
as.character‘1’, ‘0’, ‘0’Character strings. Generally preferred to factors.
as.factor‘1’, ‘0’, ‘1’, levels: ‘1’, ‘0’Character strings with preset levels. Needed for some statistical models.

Matrixes

Selecting Segments from a Matrix

m <- matrix(x, nrow = 3, ncol = 3)
#Create a matrix from x.

m[2, ] - Select a row

OOO

m[ ,1] - Select a row

O
O
O

m[2,3] - Select an element

O

Matrix Functions

t(m)
#Transpose

m %*% n
#Matrix Multiplication

solve(m, n)
#Find x in: m * x = n

Strings

paste(x, y, sep = ' ')
#Join multiple vectors together.

paste(x, collapse = ' ')
#Join elements of a vector together.

grep(pattern, x)
#Find regular expression matches in x.

gsub(pattern, replace, x)
#Replace matches in x with a string.

toupper(x)
#Convert to uppercase.

tolower(x)
#Convert to lowercase.

nchar(x)
#Number of characters in a string

Also see the stringr library

Factors

factor(x)
#Turn a vector into a factor. Can set the levels of the factor and the order.

cut(x, breaks = 4)
#Turn a numeric vector into a factor but ‘cutting’ into sections.

Math Functions

log(x) 			- 	Natural log.
sum(x) 			- 	Sum.
exp(x) 			- 	Exponential.
mean(x) 		- 	Mean.
max(x) 			- 	Largest element.
median(x) 		-	Median.
min(x) 			- 	Smallest element.
quantile(x) 	- 	Percentage quantiles.
round(x, n) 	- 	Round to n decimal places.
rank(x) 		- 	Rank of elements.
signif(x, n) 	-	Round to n significant figures.
var(x) 			-	The variance.
cor(x, y) 		- 	Correlation.
sd(x) 			- 	The standard deviation.

Lists

l <- list(x = 1:5, y = c('a', 'b'))
#A list is collection of elements which can be of different types.

l[[2]]
#Second element of l.

l[1]
#New list with only the first element.

l$x
#Element named x.

l['y']
#New list with only element named y.

Statistics

ModelDefinition.
lm(x ~ y, data=df)Linear model.
glm(x ~ y, data=df)Generalised linear model.
summary( )Get more detailed information out a model.

Variable Assignment

> a <- 'apple'
> a
[1] 'apple'

The Environment

FunctionDefinition
ls()List all variables in the environment.
rm()Remove x from the environment.
rm(list = ls()Remove all variables from the environment.

You can use the environment panel in RStudio to browse variables in your environment.

Data Frames

df <- data.frame(x = 1:3, y = c('a', 'b', 'c'))
#A special case of a list where all elements are the same length.
XY
1a
2b
3c

Understanding a data frame

View(df)
#See the full data frame.

head(df)
#See the first 6 rows.

List subsetting

df$x
O
O
O
O
df[[2]]
O
O
O
O

Matrix subsetting

df[ ,2]
O
O
O
O
df[2, ]
O
O
O
O

Data Frame Functions

nrow(df)
#Number of rows.

ncol(df)
#Number of columns.

dim(df)
#Number of columns and rows.

Also see the dplyr library.

Distributions

Random VariatesDensity FunctionCumulative DistributionQuantile
Normalrnormdnormcnormqnorm
Poisonrpoisdpoiscpoisqpois
Binomlalrbinomdbinomcbinomqbinom
Uniformrunifdunifcunifqunif

Plotting

plot(x)
#Values of x in order.

enter image description here plot(x, y) #Values of x against y enter link description here hist(x) #Histogram of x. enter image description here