Sunday, 30 July 2017
The other day, a friend and I were wondering how many lines of code we had written.1 Really, the conversation started with me wondering if I had hit a million lines of R code yet.2 We proposed that the R code necessary for a quick-and-dirty calculation for the number of lines-of-code-written would be fairly short. We were correct!
After loading the requisite packages, the following script
Usersand in all of its subdirectories,
# Load two packages library(dplyr) library(stringr) # Count your lines of R code list.files(path = "/Users/", recursive = T, full.names = T) %>% str_subset("[.][R]$") %>% sapply(function(x) x %>% readLines() %>% length()) %>% sum()
##  95349
Boom! Not bad, but also not a million. Yet.
list.files() part of the script takes some time: it is essentially finding all of the files on your computer. In my case, the
list.files() call on my
/Users/ directory returns approximately 3.2 million (full) filenames.
The two main problems with this script result from updates/version control:
git), then this script will only count the number of lines of code in the most recent version of each of your files. E.g., If you have re-written a file 10,000,000 times, you’ll miss 9,999,999 of the versions (and their lines of code) in your tally. You could probably fix this issue by grabbing the version histories of your
Rfiles from Github and then finding the unique lines for a given file.
In my case, I think issue (1) is my biggest problem, but I’ll leave the remedy for future Ed.