Saturday, 10 December 2016

Intro

Which environment you are using in R matters, but we generally ignore this fact. Here is a pretty simple example of when the environment does matter in a context where such problems often arise—where we define an object that will be called by a function.

The two-function problem

The main function (ending in “a”) takes three arguments: x, y, z and feeds them to a second function (ending in “b”). This “b” function only accepts x as an argument, but the function uses y and z, in addition to x (not ideal function writing).

Option 1: Non-nested functions

The first pair of functions (1a and 1b) are not nested, meaning test1b is not defined within test1a.

# Define the first function:
# This function has three arguments and calls the next function
test1a <- function(x, y, z){
    test1b(x)
}
# Define the second function:
# This function uses x, y, and z, but it only takes x as an argument
test1b <- function(x){
    print(rep(x, y * z))
}
# Run the function 'test1a'
test1a(1, 2, 3)

## Error in print(rep(x, y * z)): object 'y' not found

And we get an error: R cannot find y (or z). This error occurs because the function test1b lives on a different level (environment) than the environment in which we defined x, y, z. The three variables are defined within the context of the function test1a, but test1b is defined outside of test1a and only has access to the variable x, which we pass it through its x argument.

One solution: nest the functions…

Option 2: Nested functions

Simply define the second (b) function inside of the first (a) function.

# Define the first function:
# This function has three arguments and calls the next function
test2a <- function(x, y, z){
  # Define the second function:
  # This function uses x, y, and z, but it only takes x as an argument
  test2b <- function(x){
      print(rep(x, y * z))
  }
  test2b(x)
}
# Run the function 'test2a'
test2a(1, 2, 3)

## [1] 1 1 1 1 1 1

It works (no errors).

Option 3: Write clearer functions

There other ways to fix the error in Option 1. The one you would probably hear from StackOverflow is to write better functions (i.e. name all the arguments the function needs). Errors within user-defined functions often come from users (me) getting lazy and calling objects within the function that have not been passed to the function. This tendency is a bit sloppy and, as Option 1 demonstrates, error prone.

The fix: pass all three arguments from the first (a) function to the second (b) function.

# Define the first function:
# This function has three arguments and calls the next function
test3a <- function(x, y, z){
    test3b(x, y, z)
}
# Define the second function:
# This function uses x, y, and z, but it only takes x as an argument
test3b <- function(x, y, z){
    print(rep(x, y * z))
}
# Run the function 'test3a'
test3a(1, 2, 3)

## [1] 1 1 1 1 1 1

Success. Plus, you won’t get reprimanded for sloppy scripting.

Wrap up

Environments matter. Sloppy coding can remind you how they matter. Loops also will remind you how they matter (for instance, lapply vs. for loops while updating data.tables).

Enjoy!