readr
, dplyr
, and haven
. (Run install.packages(c("readr", "dplyr", "haven"))
)psych
. (Run install.packages("psych")
.)Follow up:
dir.create()
function. E.g. dir.create("TestFolder")
should create a new folder named “TestFolder” in your current directory. To see if it worked, type dir()
. You can also check whether a folder already exists using the dir.exists()
function, e.g. dir.exists("TestFolder")
.As we discussed previously, when you load or create data in R, you create objects. As you could probably guess, there are different types of objects in R. For the most part, we will use four types of objects in this course1
tibble
)Each of these object types is a different way to store data. Each can take numbers or characters. And you can generally change the type of an object fairly easily. Plus, they all follow similar indexing rules, which is quite nice.
While these object types have much in common, they each act slightly differently, so it is important to keep track of your object types. Luckily, R provides us with the class()
function, as well as dim()
and length()
, which both help us figure out the type of data object we have.
First, check the class of a number
class(2)
## [1] "numeric"
Now, let’s check the class of a string of characters
class("Max Auffhammer")
## [1] "character"
Vectors are more-or-less the basic building block in R. Vectors are one dimensional, and they only need one element. One way to create a vector is using the c()
(combine) function.
# Create a vector called 'vec'
vec <- c(1, 2, 3, 4, 5)
# Print the vector named 'vec'
vec
## [1] 1 2 3 4 5
To create sequences of numbers, you can use the colon (:
), e.g. 1:5
.2
# Create a vector of the sequence from 1 to 5 and store it as 'vec2'
vec2 <- 1:5
# Print 'vec2'
vec2
## [1] 1 2 3 4 5
# Are the two vectors' elements equal?
vec2 == vec
## [1] TRUE TRUE TRUE TRUE TRUE
# Are the two vectors equal?
all.equal(vec, vec2)
## [1] TRUE
Notice that using the double-equal sign (==
) tests pairwise equality, while the function all.equal()
tests equality between the two vectors.3
# Create another numeric vector
vec3 <- c(1, 2, 8:10)
# Print it
vec3
## [1] 1 2 8 9 10
# Check element-wise equality
vec == vec3
## [1] TRUE TRUE FALSE FALSE FALSE
# Check overall equality
all.equal(vec, vec3)
## [1] "Mean relative difference: 1.25"
You can also create vectors out of vectors.
# Create a new vector by combining 'vec2' and 'vec3'
vec23 <- c(vec2, vec3)
# Print the new vector
vec23
## [1] 1 2 3 4 5 1 2 8 9 10
Recall that vectors are one-dimensional. Let’s explore our vector a bit using the functions class()
, is.vector()
, dim()
, and length()
.4
# The class function
class(vec23)
## [1] "numeric"
# The is.vector function
is.vector(vec23)
## [1] TRUE
# The dimension function
dim(vec23)
## NULL
# The length function
length(vec23)
## [1] 10
Because vectors are one-dimensional, indexing the vectors requires only one input. Let’s grab the seventh element of the vector vec23
vec23[7]
## [1] 2
Finally, we can also create vectors of characters.5
# Create the string vector using quotation marks
str_vec <- c("Aren't", "vectors", "exciting", "?")
# Print it
str_vec
## [1] "Aren't" "vectors" "exciting" "?"
# Check its class
class(str_vec)
## [1] "character"
# Check if it is a vector
is.vector(str_vec)
## [1] TRUE
# Grab the third element
str_vec[3]
## [1] "exciting"
What happens when you combine a vector of characters with a vector of numbers (or create a vector of both numbers and characters)?
# Create a vector of the numeric vector 'vec' and the character vector 'str_vec'
mix_vec <- c(vec, str_vec)
# Print the result
mix_vec
## [1] "1" "2" "3" "4" "5" "Aren't"
## [7] "vectors" "exciting" "?"
# Check the class of the new vector
class(mix_vec)
## [1] "character"
It’s matrix time.
matrix()
R’s matrix function is aptly named matrix()
. The matrix function requires two arguments:
data
: the stuff inside the matrixncol
: the number of columns6To learn more about the matrix function, type ?matrix
in your console (or in the “Help” tab of RStudio). This help searching works for all loaded functions. If you do not find what you are looking for, try using two question marks, ??matrix
, which will perform a fuzzy search for the word.
Let’s make a matrix. 3 \(\times\) 2 and filled with the numbers between 1 and 6. Following Max’s notation, let’s call this matrix \(\mathbf{A}\).
# Create a 3x2 matrix filled with the sequence 1:6
A <- matrix(data = 1:6, ncol = 2)
# Print it
A
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
What are the dimensions of A
? What about its length? Is A
a vector or a matrix?
# The dimension of A
dim(A)
## [1] 3 2
# The length of A
length(A)
## [1] 6
# Check if A is a vector
is.vector(A)
## [1] FALSE
# Check if A is a matrix
is.matrix(A)
## [1] TRUE
Finally, you can index any element of a matrix using its row and column. The way R prints matrices actually shows you exactly how to reference the row, the column, or the exact element.
# Print the matrix A
A
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# Print the second row of A
A[2,]
## [1] 2 5
# Print the second and third rows of A
A[2:3,]
## [,1] [,2]
## [1,] 2 5
## [2,] 3 6
# Print the first column of A
A[,1]
## [1] 1 2 3
# Print the element in the second row and first column of A
A[2,1]
## [1] 2
Define \(\mathbf{B} = \mathbf{A}'\). How can we manually make the transpose of \(\mathbf{A}\)? It should be 2 \(\times\) 3. Let’s try
# Print A
A
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# First attempt at B
B <- matrix(data = 1:6, ncol = 3)
# Print B
B
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Nope. The dimensions are correct, but we want the first row to be 1, 2, 3—not 1, 3, 5. To tell R to fill the matrix by row, rather than by column (the default), use the argument byrow
.
# Using the byrow option
B <- matrix(data = 1:6, ncol = 3, byrow = TRUE)
# Print B
B
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
That’s better.
R has a much simpler way to transpose matrices: the t()
function. Feed a matrix to the t()
function, and the output will be the transpose of the matrix. Let’s verify.
# Check element by element
t(A) == B
## [,1] [,2] [,3]
## [1,] TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE
# Check if all the element-by-element comparisons are TRUE
all(t(A) == B)
## [1] TRUE
# Check if the transpose of A is identical to B
identical(t(A), B)
## [1] TRUE
In R, the matrix multiplication command is %*%
. Admittedly, this notation is a little strange, but *
is already taken for scalar multiplication. Don’t get lazy here: R will still let you use *
multiplication with matrices, but it will be elementwise multiplication—not matrix multiplication.
As an example, here is the elementwise multiplication of \(\mathbf{A}\) and itself, using the *
operator
# Print A as a reminder
A
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# Elementwise multiplication of A and A
A * A
## [,1] [,2]
## [1,] 1 16
## [2,] 4 25
## [3,] 9 36
The matrix multiplication of \(\mathbf{A}\) and itself is not defined because \(\textbf{A}\) does not conform with itself.
# Matrix multiplication of A and itself
A %*% A
## Error in A %*% A: non-conformable arguments
You can also multiply matrices by scalars using the *
operator
A * 3
## [,1] [,2]
## [1,] 3 12
## [2,] 6 15
## [3,] 9 18
To see a matrix multiplication that actually works, let’s define a 2 \(\times\) 2 matrix with the elements 1 through 4. Call it \(\mathbf{C}\). Calculate \(\mathbf{A}\times\mathbf{C}\).
# Create C
C <- matrix(data = 1:4, ncol = 2)
# Multiply A and C
A %*% C
## [,1] [,2]
## [1,] 9 19
## [2,] 12 26
## [3,] 15 33
No discussion of matrices would be complete without identity matrices and inverses.
In R, we create identity matrices using the function diag()
. The argument to diag()
is the number of rows/columns of the identity matrix. Because identity matrices are square, this argument is a single number.
For a 5 \(\times\) 5 identity matrix,
diag(5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 0 0 0
## [2,] 0 1 0 0 0
## [3,] 0 0 1 0 0
## [4,] 0 0 0 1 0
## [5,] 0 0 0 0 1
Notice that if you apply the function diag()
to an already-created matrix, it will grab the diagonal elements of that matrix.
# Grab the diagonal elements of the matrix C
diag(C)
## [1] 1 4
R’s built-in function to take the inverse of a matrix is solve()
. Give solve()
an invertible matrix, and it will return the inverse of the matrix.
# Take the inverse of C
solve(C)
## [,1] [,2]
## [1,] -2 1.5
## [2,] 1 -0.5
# Multiply C by its inverse
C %*% solve(C)
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
The function det()
will give you the determinant of a (square) matrix. To calculate the trace of a matrix, you need to load the psych
package. Within the psych
package is a function called tr()
that will calculate the trace.
# The determinant of a matrix
det(C)
## [1] -2
# The trace of a matrix using the psych package's tr()
library(psych)
tr(C)
## [1] 5
What is another way to calculate the trace of a matrix using tools we’ve already learned?
sum(diag(C))
## [1] 5
Combining sum()
and diag()
will give the trace of a matrix.
You will often want to add an additional row or column to your matrix. rbind()
and cbind()
bind rows and columns onto existing matrices, respectively.
Remember the matrix \(\mathbf{A}\)?
A
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
Let’s add a column of ones in front of the matrix.
# Bind a column of ones in front of the matrix A
cbind(c(1, 1, 1), A)
## [,1] [,2] [,3]
## [1,] 1 1 4
## [2,] 1 2 5
## [3,] 1 3 6
What happens if we get a little lazy and type 1
instead of c(1, 1, 1)
?
cbind(1, A)
## [,1] [,2] [,3]
## [1,] 1 1 4
## [2,] 1 2 5
## [3,] 1 3 6
R recycles the data given to make the dimensions match. This feature can be very helpful, but it might occasionally trip you up too. Be careful being lazy.
Now, let’s bind the vector c(3, 6)
to the end (bottom?) of A.
rbind(A, c(3, 6))
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [4,] 3 6
Finally, notice that you can bind matrices together (as long as the dimension match).
rbind(A, C)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [4,] 1 3
## [5,] 2 4
We do not have time to cover every function related to vectors and matrices, but here are a few more functions (or uses of functions we covered) that may prove useful.
crossprod()
and tcrossprod()
take cross products (e.g., \(\mathbf{A}'\times\mathbf{B}\)).solve()
can also take two arguments, a
and b
, and returns the solution for x
in the equation a %*% x = b
.eigen()
gives the eigenvectors and eigenvalues of a matrix. eigen(A)$vectors
will give you just the eigenvectors of \(\mathbf{A}\), and eigen(A)$values
will give you only the eigenvalues of \(\mathbf{A}\).matrix()
a single number and specify both ncol
and nrow
to fill a matrix with the same number. For example, matrix(data = 1, nrow = 3, ncol = 5)
creates a 3 \(\times\) 5 matrix of ones.sample()
will randomly sample a number of elements from a vector with or without replacement (without replacement is the default). For example, sample(x = 1:100, size = 5, replace = TRUE)
will randomly draw five numbers from 1 to 100 with replacement. See ?sample
for more information. You should set your seed to make sampling replicable. The function set.seed()
sets the seed.If you can spare a minute, please fill out the following survey to help me better understand how section/learning R is going. Thank you in advance!
The surveyWe will focus on vectors and matrices today.↩
You can also use the seq()
function, e.g., seq(from = 1, to = 5)
or seq(from = 1, to = 5, by = 1)
.↩
You can also use the identical()
function, which is probably more appropriate, but beware: it is very picky. For instance, try as.integer(1) == as.numeric(1)
and all.equal(as.integer(1), as.numeric(1))
compare the results to identical(as.integer(1), as.numeric(1))
. Picky! Might not always get you what you want.↩
Most object types/classes have an associated function like the is.vector()
function. For example, is.numeric()
, is.integer()
, is.character()
, is.matrix()
, and is.data.frame()
, to name a few.↩
Recall that we use quotation marks to create string/character vectors.↩
Alternatively, you can use nrow
and omit ncol
. Or use both.↩