Saturday, 10 December 2016
Which environment you are using in R matters, but we generally ignore this fact. Here is a pretty simple example of when the environment does matter in a context where such problems often arise—where we define an object that will be called by a function.
The main function (ending in “a”) takes three arguments: x, y, z
and feeds them to a second function (ending in “b”). This “b” function only accepts x
as an argument, but the function uses y
and z
, in addition to x
(not ideal function writing).
The first pair of functions (1a and 1b) are not nested, meaning test1b
is not defined within test1a
.
# Define the first function:
# This function has three arguments and calls the next function
test1a <- function(x, y, z){
test1b(x)
}
# Define the second function:
# This function uses x, y, and z, but it only takes x as an argument
test1b <- function(x){
print(rep(x, y * z))
}
# Run the function 'test1a'
test1a(1, 2, 3)
## Error in print(rep(x, y * z)): object 'y' not found
And we get an error: R cannot find y
(or z
). This error occurs because the function test1b
lives on a different level (environment) than the environment in which we defined x, y, z
. The three variables are defined within the context of the function test1a
, but test1b
is defined outside of test1a
and only has access to the variable x
, which we pass it through its x
argument.
One solution: nest the functions…
Simply define the second (b) function inside of the first (a) function.
# Define the first function:
# This function has three arguments and calls the next function
test2a <- function(x, y, z){
# Define the second function:
# This function uses x, y, and z, but it only takes x as an argument
test2b <- function(x){
print(rep(x, y * z))
}
test2b(x)
}
# Run the function 'test2a'
test2a(1, 2, 3)
## [1] 1 1 1 1 1 1
It works (no errors).
There other ways to fix the error in Option 1. The one you would probably hear from StackOverflow is to write better functions (i.e. name all the arguments the function needs). Errors within user-defined functions often come from users (me) getting lazy and calling objects within the function that have not been passed to the function. This tendency is a bit sloppy and, as Option 1 demonstrates, error prone.
The fix: pass all three arguments from the first (a) function to the second (b) function.
# Define the first function:
# This function has three arguments and calls the next function
test3a <- function(x, y, z){
test3b(x, y, z)
}
# Define the second function:
# This function uses x, y, and z, but it only takes x as an argument
test3b <- function(x, y, z){
print(rep(x, y * z))
}
# Run the function 'test3a'
test3a(1, 2, 3)
## [1] 1 1 1 1 1 1
Success. Plus, you won’t get reprimanded for sloppy scripting.
Environments matter. Sloppy coding can remind you how they matter. Loops also will remind you how they matter (for instance, lapply
vs. for loops while updating data.tables).
Enjoy!