Sunday, 30 July 2017
The other day, a friend and I were wondering how many lines of code we had written.1 Really, the conversation started with me wondering if I had hit a million lines of R code yet.2 We proposed that the R code necessary for a quick-and-dirty calculation for the number of lines-of-code-written would be fairly short. We were correct!
After loading the requisite packages, the following script
Users and in all of its subdirectories,
.R (R scripts),
##  95349
Boom! Not bad, but also not a million. Yet.
list.files() part of the script takes some time: it is essentially finding all of the files on your computer. In my case, the
list.files() call on my
/Users/ directory returns approximately 3.2 million (full) filenames.
The two main problems with this script result from updates/version control:
git), then this script will only count the number of lines of code in the most recent version of each of your files. E.g., If you have re-written a file 10,000,000 times, you’ll miss 9,999,999 of the versions (and their lines of code) in your tally. You could probably fix this issue by grabbing the version histories of your
R files from Github and then finding the unique lines for a given file.
In my case, I think issue (1) is my biggest problem, but I’ll leave the remedy for future Ed.