Sunday, 30 July 2017
The other day, a friend and I were wondering how many lines of code we had written.1 Really, the conversation started with me wondering if I had hit a million lines of R code yet.2 We proposed that the R code necessary for a quick-and-dirty calculation for the number of lines-of-code-written would be fairly short. We were correct!
After loading the requisite packages, the following script
Users
and in all of its subdirectories,.R
(R scripts),.R
scripts,.R
scripts.# Load two packages
library(dplyr)
library(stringr)
# Count your lines of R code
list.files(path = "/Users/", recursive = T, full.names = T) %>%
str_subset("[.][R]$") %>%
sapply(function(x) x %>% readLines() %>% length()) %>%
sum()
## [1] 95349
Boom! Not bad, but also not a million. Yet.
Note: The list.files()
part of the script takes some time: it is essentially finding all of the files on your computer. In my case, the list.files()
call on my /Users/
directory returns approximately 3.2 million (full) filenames.
The two main problems with this script result from updates/version control:
git
), then this script will only count the number of lines of code in the most recent version of each of your files. E.g., If you have re-written a file 10,000,000 times, you’ll miss 9,999,999 of the versions (and their lines of code) in your tally. You could probably fix this issue by grabbing the version histories of your R
files from Github and then finding the unique lines for a given file.In my case, I think issue (1) is my biggest problem, but I’ll leave the remedy for future Ed.
Enjoy!