Months ago, I asked a question to the community: how should I organize my R research projects? After writing that post, doing some reading, then putting a plan in practice, I now have my own answer.
Last week I announced the first release of MCHT, an R package that facilitates bootstrap and Monte Carlo hypothesis testing. In this article, I will elaborate on some important technical details about making
MCHTest objects, explaining in the process how closures and R environments work.
In Data Science from Scratch, a book introducing data science using Python, Joel Grus said the following about R (pg. 302):
Although you can totally get away with not learning R, a lot of data scientists and data science projects use it, so it’s worth getting familiar with it.
In part, this is so that you can understand people’s R-based blog posts and examples and code; in part, this is to help you better appreciate the (comparatively) clean elegance of Python; and in part, this is to help you be a more informed participant in the never-ending “R versus Python” flamewars.
At the University of Utah, I teach the R lab that accompanies MATH 3070, “Applied Statistics I.”” None of my students are presumed to have any programming experience, and they never hesitate to remind me of that fact, especially when they are starting out. When I create assignments and pick problems, I often can write a one- or three-line solution in thirty seconds that students will sometimes spend four hours trying to solve. They then see my solution and slap their foreheads at its simplicity. I can be tricky with my solutions. For example, suppose you wish to find the sample proportion for a certain property. A common approach (or at least the one used in the textbook our course uses, Using R for Introductory Statistics by John Verzani) looks like this:
CapitalOne contacted me a few months ago and requested that I apply for an internship with them for a data science related position. I never got the job (nor did I really want it; I had already agreed to teach during the summer and I was apprehensive about leaving people hanging, and also about moving), but I did go through part of the interview process. CapitalOne had me complete their data science challenge, which had some problems that were supposedly common tasks in data science. Some of it I was not well equipped for, such as regression; I was used to regression from an econometric point of view, not a computer science or data science point of view, and I was still learning. But there was one part of the challenge that I remember very well, and I was very happy with the solution.