On Programming Languages; Why My Dad Went From Programming to Driving a Bus

In Data Science from Scratch, a book introducing data science using Python, Joel Grus said the following about R (pg. 302):

Although you can totally get away with not learning R, a lot of data scientists and data science projects use it, so it’s worth getting familiar with it.

In part, this is so that you can understand people’s R-based blog posts and examples and code; in part, this is to help you better appreciate the (comparatively) clean elegance of Python; and in part, this is to help you be a more informed participant in the never-ending “R versus Python” flamewars.

From the wording he uses to the content of his message, Mr. Grus demonstrates how programmers can be passionate about the languages they use. Every year, websites publish articles about the popularity of programming languages and what the current trends are. AppDynamics, an American application performance management and IT operations analytics company, recently published an article on the most popular languages of 2017 on their website, projecting what languages they believe will remain dominant and which up-and-comers people should keep an eye on (in a broad sense). Meanwhile, statisticians and data scientists regularly write articles tracking the horse race between R, Python (the front runners), SAS, and the rest. Bob Muenchen, in a recent article, tracked the popularity of languages in job postings and found that R recently surpassed SAS, but both are less popular than Python. There’s also the “doomsayer” genre that’s regularly published, where people argue that such-and-such languages are “dying.” Quora questions regularly pop up asking “Is [insert language here] dying?” Dice.com has a 2014 article of languages marked for death, with a 2016 follow-up. In my personal life, my brother’s fiance, a computer programmer, continues to insist that Python is a dying language (a statement I disagree with).

Programmers, or those who write programs regularly, love talking about languages. No one would question how important languages are in general, but many love to compare the popularity and merit of languages in use. The Python people believe Python is best for data science; it’s “simpler than R” (cough cough) and, unlike R, is a general-purpose programming language. The R people believe R is better; it is built specifically for data analysis, has a larger universe of packages devoted to data analysis, and supports functional programming better. But wait, the Pythonistas say: R’s approach to OOP is… bizarre, to put it mildly. Well, the R crowd replies, at least we can switch text editors without fearing screwing up all the white space and our script becoming one giant IndentationError for reasons unknown (curly braces for the win!). And so on.

Dad’s Experience

The battle of the languages, though, is no laughing matter. Set aside for a second what language(s) to choose when starting a project. What languages practitioners do and don’t know directly corresponds to their professional success. Failing to keep on top of industry trends will lead to one losing a job and never get another to replace it.

I know this very well.

When I was born, my dad was the editor of a local newspaper, but lost his job for reasons not fully in his control (I hear it amounts to workplace politics). So he went through an intensive two-year education to get an associate degree in computer science. When Dad graduated, I walked up the stage with him, a toddler at the time. (Read my Dad’s blog post, written when I graduated with my BS in Mathematics and HBS in Economics.)

Graduation with Dad

So for most of my life (I’d guess around twenty years), my dad was a computer programmer, writing COBOL code. We had a decent middle-class living through my life. Unfortunately, Dad did not expand his skills. COBOL was losing popularity. Dad thought that knowing a rare language was an asset, but being proficient only in a rare language turned out to be a liability. He did eventually realize this and sought employers who would train him, but none of them, from Discover to Wencor to the State of Utah, trained him in more popular modern languages (even when they promised they would). He tried participating in a training program offered by the Utah Department of Workforce Services, but the company they hired for the training was… well, terrible. He didn’t learn anything. (Why the State of Utah chose to have a private company provide training instead of, say, Salt Lake Community College, is beyond me. I’m convinced that this company has a parasitic deal with the state government, where they get money from the state to provide crappy services to people down on their luck looking to better their lives, fueling my skepticism of education from the private sector. But I digress.)

The conclusion of the story: my parents declared bankruptcy, the house I grew up in from 1996 up until 2015 was foreclosed on, and my dad is a bus driver now. He’s kept looking for computer programming work, and bought a book on Java, but the life of the poor is hard, and erratic bus driving schedules coupled with living paycheck-to-paycheck makes learning programming hard, especially without a decent computer.

Choosing What to Use

So yes, programming languages matter. They make or break careers. Furthermore, no one can depend on employers to give their employees all the skills they need to stay relevant in the labor market; one should be staying on top of trends and being prepared to take initiative for themselves.

But what language to learn?

I’m not going to provide any data; I’ve linked to at least two good articles that could give a good description of the “lay of the land”. I’m merely going to note what I’ve noticed.

As much as people like to talk about the features of this or that programming language and how that makes them better or worse, one of the key factors that determines what language will be used in a project is its existing user base. One reason why the user base matters is it determines how easy it is to find answers to questions that invariably arise. In the day and age of programming using the Google+StackExchange method, one cannot understate how important this is. If you’re using a popular programming language in your field, chances are that any problem you encounter has already been solved.

Another reason that plays into the former is that packages are, in many ways, more important than the languages themselves, and the user base determines what packages will be written for the language. This means that not only are users more likely to find existing code that does what they want to do, that code will likely be better supported since there are more eyes looking at it to identify undesirable behavior, meaning those packages will be of better quality. If healthcare analysts prefer R, you will likely find lots of high-quality R packages for healthcare-related applications, while if data scientists prefer Python, you can expect lots of excellent machine learning packages for Python (as a hypothetical example).

I learned this lesson from experience. In my last year as an undergraduate at the University of Utah, I worked for a non-profit policy advocacy group called Voices for Utah Children studying Utah’s gender gap in wages. I wrote two reports. One was a basic study where I sliced and diced the gender gap in different ways. The second was my Honor’s thesis, where I did a more advanced econometric study of the pay gap.

For the first study, I used R because it was a language I had learned while taking statistics courses, and it’s a free, open-source languages. I was an inexperienced programmer and I was working alone (my supervisor at Voices for Utah Children did not do anything programming-related), but using the well-known Google+StackExchange method, I was able to learn a lot, enough to do the job. Granted, these days I don’t even want to look at the scripts I wrote then, they were so terrible, but I still managed to learn a lot and get the job done.

For my Honor’s thesis, I still wanted to keep using R, but now I needed to work with a faculty member in the Economics department. He used Stata for his work; in fact, the vast majority of econometricians and those doing statistical work in policy or social science use Stata. Relevant data was provided in Stata-friendly formats. Not only were the packages I needed for my project best supported in Stata, the equivalent R packages were not just inflexible and not user friendly, they may not have even worked at all (or at least for how I needed to use them)! And even communicating what was being done with my thesis adviser was difficult, leading to perhaps months of wasted time and effort. On one fateful day, when there were discrepancies between how R and Stata were subsetting the same data set, I was in my adviser’s office trying to work things out. I was repeatedly producing errors in my code, slowing the process down, trying to work with code that looked unnecessarily complicated, and my adviser eventually said, “I know one thing; I will never use R.”1 The day he said that, I went home and paid $200 for Stata. R may have been the superior language (and I still believe so), but I was swimming upstream trying to use it.

An unfortunate consequence of what I’ve just described is that better tools or languages may not see use simply because they’re not popular (an example of a network effect). Those few brave souls who try to use better tools are in for a rough ride. That said, unless the benefit of using a “better” language surpasses the cost of going against the consensus, it’s better to stick with what’s popular.

Learning to Learn

That said, what’s more important than learning any particular programming language is learning how to learn programming languages. If there are two popular programming languages in a field (say, R and Python), learn both. Learning lots of programming languages is surprisingly easy. Eventually, familiar patterns appear that makes learning new languages easier. Feel free to specialize in a few, but a broad skill set is more valuable than a narrow one.

The best reason to keep learning new languages, though, is because technology is always changing. Once upon a time, hardware was more important than software. Low-level programming languages were key since one had to optimize heavily for low speed and storage space. Later, Moore’s Law lead to lots of processing power and hard-drive space, so less efficient languages, starting with C but going so far as Python, became popular; programmer time was more valuable than computer-time. Future innovations are likely to render the existing order obsolete as well. One can easily imagine quantum computing revolutionizing software again, bringing in a new set of programming languages that anyone worth her salt will need to learn to stay relevant.

People are welcome to continue debating the merits of this language or that. What matters most, though, are what people are actually using, and the field is always changing, from fads to reactions to truly revolutionary technology. So one must be continually learning to stay relevant today and stay sharp for what’s ahead. I at least enjoy the process.

EDIT: Well this blog post has set a new personal record in views (+5,000), and the day’s not over yet! Guess it sparked a conversation!

I’ve been following discussions on Reddit and Hacker News, and I’ve been seeing a reoccurring comment: “[Dad] is not actually interested in computer programming. If he were, he would have kept up his technical skills to stay relevant.”

This is not untrue per se; Dad never dreamed of being a computer programmer. He entered the industry to support his family, and at the time it was well-paying. To the extent Dad actually is interested in programming, it’s to provide the family with a decent living. If you were to ask Dad what his passion is, he’d likely say it’s writing and music. He started a blog when he lost his job in 2011 in hopes that it might turn into a career. You can follow his blog here, which talks a lot about his experience as a programmer and losing his jobs. (Dad is genuinely a good writer, and I don’t say that just because I’m his son. I remember reading and loving his old newspaper columns as a kid. You should check his blog out!)

Additionally, Dad does have somewhat of an aversion to the industry, perhaps stemming from some of the people he’s had to work for (at multiple companies), with anti-social and anal-retentive personalities, and there is a belief that his line of work has a propensity to attract similar people. He’d rather not have to deal with those types again.

That said, I remember when I was ten years old expressing an interest in computer programming. Dad hooked me up with resources for learning and playing with QBASIC, and I remember him enthusiastically explaining variables, control statements, and loops to me, and I would show him the code I came up with. He was supportive of my endeavors to play with code, and while I may not have been a high-school hacker (my interest in coding fluctuated up until college), his support gave me a leg up when I entered college and needed to learn R coding, the first time I had a coding class.

True, Dad does not have the passion for programming to motivate him to read a new O’Reilly book every month and go home and do “for fun” what he does at work eight hours a day five days a week. Some people are like that; Dad is not one of them. But hindsight is 20/20; at the time, it was not obvious that is what it took. Perhaps this article can serve as a warning to those who don’t love programming what they should expect in order to stay relevant in their career. And if what it takes is a turn-off, maybe they should pursue a different line of work.


This blog post was inspired by “The Most Popular Programming Languages for 2017,” by Jordan Bach, which was brought to my attention by Bethany Emerson at Ghergich & Co. If you’re interested in investigating new programming languages to learn, consider the infographic below. Click the image below to read the full article.

Click to Enlarge Image

The Most Popular Programming Languages for 2017


  1. Granted, I did not know then what I do about R now, and a lot of what I’ve learned since then would likely have led to that experience in my adviser’s office going better. At the time, I was unaware of Hadley Wickham, dplyr, or the tidyverse in general; I was subsetting using the abominable which() function. A few years of following R-Bloggers has done wonders for my R skills. But even if I had known of dplyr, I would eventually have been forced to switch to Stata anyway. I needed to do Oaxaca-Blinder decomposition on CPS survey data, using regressions robust to heteroskedasticity. With Stata, doing this is almost trivially easy, but with R, I would need to use the sandwich, survey, and oaxaca packages and combine them together in a way they refuse to combine. survey is extremely difficult to understand, isn’t very flexible, and does combine with sandwich to get heterosketasticity-robust standard errors. oaxaca‘s principal function, oaxaca(), has a terrible interface that, first, uses the parameter name weight that the function lm() would need to use in an entirely different way, and second, refuses to allow custom functions to compute regressions in a way that alleviates the first problem. With this in mind, I’m not shocked at all that econometricians use Stata instead of R. The tidyverse revolution has yet to touch R econometrics in a way that would make it remotely usable even for the most basic task as computing a linear regression with heteroskedasticity-robust standard errors on survey data. Oaxaca decompositions are also very common, yet they are not practically doable in R right now without re-writing the function. Someone needs to take a look at this. End of rant. 
Advertisements

27 thoughts on “On Programming Languages; Why My Dad Went From Programming to Driving a Bus

  1. This is a great article, but I don’t understand why your father didn’t train himself. Doing courses on his own time rather than looking for an employer to train him. For example in my life I have gone from Clipper > Delphi > C++ > C# > Javascript and none of it was because of employer training initiatives.

    Like

    • There were signs at the time he should have, but Dad was not paying attention to the job market while he was employed and didn’t see signs of trouble until he was unemployed. It was a mistake, acknowledged now. You could say my post is a warning about making the same mistake again.

      Like

  2. This is an age-old story, relevant as always. There’s a wider lesson here than just programming languages, about economics, age discrimination etc. too. Coming back to programming languages, though:

    1. For a variety of reasons R and Python are riding high at the moment. But the “next generation” will be JavaScript / ES6 / WebAssembly. It’s inevitable – all that’s missing to do data science and machine learning in the browser is some decent math libraries written in WebAssembly.

    2. I have mixed feelings about Julia. As a long-time scientific applications programmer I’m learning it, but I think it has little chance of becoming as widely used as R. The only thing Julia has going for it is that it’s compiled / fast. Speed of *development* has trumped speed of *execution* for decades and I doubt if that will be reversed by Julia. I like it because, like S and R, it mixes the good parts of FORTRAN and LISP.

    3. Don’t forget typesetting / LaTeX and its ilk! Scientific computing requires typesetting as much as it requires both symbolic and numeric computing. And don’t forget IDEs – a Jupyter notebook is *not* an IDE, no matter how much the Jupyter community would like it to be.

    Liked by 1 person

    • I really like this comment! Thank you for sharing! I agree; there’s a lot that could be said about the stories I tell here.

      I’ll also keep in mind your distinction between Jupyter notebooks and an IDE. I’m creating a course on data analysis with Python and use primarily Jupyter (although highlight alternatives), and I’ll be sure to make the distinction.

      Like

  3. I think the article, especially the rant, has neglected a very important detail in comparing R and Stata: you have paid $200 for Stata. How much have you paid for R? Also note that the problem you have met would cost more than $200 to solve, and to make R reach feature parity with Stata (such as GUI) would cost even more. Stata could charge only $200 because of economy of scale. But there’s no economy of scale for open source projects. Problem with open source remains that they are underfunded for the crucial rules they play. The vast positive externalities contributors to open source software create is not compensated proportionately.

    Like

    • The fact that R is free makes it more sad that the econometrics packages are in such poor shape. I’m fairly certain that many Stata packages were written not to improve the program (and sell it) but because the authors themselves needed that functionality (and they were willing to do it for free). The poor quality of the equivalent R packages really just means that not enough econometricians/social scientists/policy wonks (etc.) are using the packages to create better ones.

      Like

  4. When I read this line : “learning programming is hard, especially without a decent computer”, it became obvious to me that you have no idea what you’re talking about. Learning programming doesn’t take a decent computer, this is an excuse that could only be believeable by a non-programmer.

    Learning programming takes passion, commitment, and humility, whether it’s on your Android or a $10 raspberry pi zero—not necessarily an expensive computer. Most programs, especially programs written for the purpose of education, occupy mere kilobytes of drive space (& memory). If you hadn’t heard, most modern computers come with hundreds of gigabytes of drive space & several gigs of memory to boot. With all of those economics pennies you’ve been saving up, you can buy your dad a Windows machine for ~$150 and replace the OS with some Linux flavor that’s been picked out of a hat.

    Like

    • I never say “learning programming is hard.” I said: “…the life of the poor is hard, and erratic bus driving schedules coupled with living paycheck-to-paycheck makes learning programming hard, especially without a decent computer.”

      I have thought about the Raspberry Pi and how it might be a good resource for learning what he wants to learn; I just don’t know much about it myself, beyond the fact that it’s a minicomputer with little power when, as a statistician, I’m usually looking for MORE power, not less.

      I’d also caution you about claiming learning new skills is easy under literally any circumstances, including a job where the hours are, literally, whenever the boss says they are and he’ll tell you them a day in advance (hint: they’re not yesterday’s). Oh, and you can’t go home during your four hours of (unpaid) downtime, because you’ll need to be driving another bus. Then there’s life’s other problems. When it rains, it pours, and that can make learning more difficult. (I did not say impossible; just difficult. Bear in mind, I was also telling my dad he could learn on his own too, but there was a lot on his mind.)

      Like

  5. Look, not to strip you of your victimhood status, but your dad didn’t lose his job because no one gave him the right training, or because his computer wasn’t nice enough. The key is never resources. It’s resourcefulness. I pulled a B average in highschool, but knew 10 languages at some level of fluency before I dropped out of college. I am _always_ reading and learning. Your dad thought there was a shortcut by knowing a rare language, but there aren’t any shortcuts. My dad knew zero programming languages and is also a bus driver. I make sick money. No shortcuts bro. Srsly. Get out there and hustle. The reward of the good worker is to be replaced by the miracle worker. Read this twice, then read it again, slowly.

    Like

    • You should be careful before extrapolating to me. I said my dad had trouble. I, however, am fine. I don’t make a glorious living, but that’s because I’m a PhD student. I’ve got opportunities coming out the wazoo, coming in to my inbox on a regular basis and leading me to overwhelm myself by taking on too many commitments (I have a hard time saying no). I said my dad didn’t like reading O’Reilly books. Not only do I love them, I’m making my own courses to sell. To the extent I was hurt by my dad’s situation, it was as a dependent, which I no longer am.

      Like

      • That’s a pretty solid approach. I’m sure you’ll do well. I think during our fathers’ prime, there wasn’t an expectation that people be relentless about pushing themselves. Now that we’re globalized, I’m sure you feel it just like I feel it. The upside is that instead of being the smartest person on the block, if you’re successful, you’re probably one of the smartest people in the world. There’s a silver lining.

        Liked by 1 person

  6. Interesting read, I’ve always been of the opinion that a good Software Engineer can pick-up almost any language and your better if hiring someone with really good problem solving and design skills. Every job I’ve had I’ve either had no it little experience with the main languages they use but have learned them. And my current job we’ve expanded into other languages that we didn’t know because they were made for what we do.

    Like

  7. I love your Dad’s story and feel a little bit the same when I look at my past as a COBOL programmer in the 80/90’s.

    I really understand what happened to your father and lots of my colleagues when no company would develop anymore a special company software. It was time to buy a ready to use software, and it was time to find another job quickly for programmers.

    Anyway, I just understood just a little bit earlier than your father did and oriented my career toward Supply Chain and then Operations. Not an easy step, but it saved me a lot of years of struggles for nothing.

    Sad story, but never forget that time is never on your side, allways go ahead, learn to learn, use all the tools you know to progress.

    Even if I have nothing to deal with computer science now, I just can make the difference among others being able to still develop my own KPIs whith no help using VBA, MsAccess, Python, Perl, R… though it is not anymore my “real” job.

    Liked by 1 person

  8. I think R is not so easy for econometrics largely because economists use Stata, although StataCorp does add a few nice things like the -margins- command. I find that I have to use both R and Stata in my day-to-day work.

    I bet though now that you know dplyr you probably enjoy R more for wrangling data than Stata. I find the one dataframe limitation of Stata to be especially annoying. But then again R doesn’t have a good join function like stata’s merge 1:1 (maybe I’ll adopt statar::joinby eventually).

    For some of your work you might want to look into the “lfe” package, I think it is a pretty awesome one-function-only implementation of the core applied economics toolbox (fixed-effects, IV, clustered standard errors). It’s really hard to get the standard errors from lfe::felm to match -reghdfe-, though it can be done…

    Overall I second your philosophy, I think in data analysis it pays to just be pragmatic, and know as many languages as possible and be ready to shift to a new one in a heartbeat.

    Like

  9. I liked your story and shared it with my girlfriend; an economist as well. She has done plenty of her graduate work in Stata as well.

    About a year ago, she became interested in some Data Science courses I was following on Coursera and decided to join me in doing them. Long story short, she’s an R evangelist now.

    She’s basically the chief R programmer at her job and one can be sure that any future hires will be expected to either know or learn R.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s