28 Continuing your education
It is my hope that the lab component of ECON 41 has taught you that empirics and programming are an indispensible combination. In order to accomplish anything with either of these skillsets, they must be used together, and this is true despite the traditional underemphasis on programming in introductory Statistics courses.
Another outcome that I was aiming for with this aspect of the course was to deepen your interest in both of these topics. I really hope that you want to learn more, especially as a college student and maybe even later as a graduate student, and I have some general advice about how to continue.
28.0.1 Further courses in Mathematics and Statistics
If you’re interested in any subject that requires you to use math, there are a few courses you absolutely must take while you’re in college. The courses you must take are: Calculus 3, Linear Algebra and regression analysis. Another course you should probably take in Mathematics is Introduction to Proof Techniques.
In Mathematics, you must take multivariable calculus with vectors (Calculus 3) and linear algebra. Sometimes people on the street or in industry will tell you that you don’t really need the knowledge from these classes because they themselves didn’t take one of these courses or either of them and today they don’t use this knowledge in their jobs. But think about this claim for a second. Should you really trust an opinion like this? Someone who hasn’t taken these courses can’t really tell you how valuable they are because they haven’t taken them and therefore can’t apply knowledge from them because they don’t have it themselves. You will meet lots of quacks like this throughout your life. Don’t let them knock you off course.
Multivariable calculus and linear algebra are in fact very important classes for further study in lots of subjects, including Economics, Statistics, Engineering, Computer Science and so on. If you don’t take these classes, you can’t fulfill the prerequisites for lots of graduate programs or even complete requirements for lots of undergraduate majors. You also can’t learn about machine learning in any meaningful way because so much of the basic material is based in knowledge from these two classes. Please do not listen to anybody who tells you that you do not need these courses, because they are wrong.
Another course you should take in Mathematics is one that introduces proof writing. Computer Science majors usually have to take a course called Discrete Mathematics which focuses on this. But I recommend taking the version of this course for Mathematics majors because it covers pretty much the same material, but tends to be more rigorous. This is a course that Mathematics majors have to take before they can take upper division courses in Mathematics. The prerequisites vary depending on the university. For example, UCSB requires students to take Calculus 3 before enrolling in this course. UIUC only requires Calculus 2. UCSD requires Calculus 2 and Matrix Algebra. The version that I took online required Calculus 3. Depending on your major, you may have to take a course like this, but you should still consider it even if it’s not required.
Why should you take a course in proof writing? Because it is the bridge that you need to cross in order to start studying Mathematics at an advanced level, which is something you may need to do in order to enter a top graduate program in a subject that interests you, and something that you also may need to be able to do in order to publish research. Before taking a course like this you probably haven’t learned how to write proofs in any systematic way even if you’ve had math instructors who have used them in your courses or made you write proofs on tests, and your classes were always about finding numeric answers to tricky questions that apply concepts you’re learning about in your class. Advanced courses (upper division ones) in Mathematics still involve computation, but they also involve a great deal of proof writing, and in order to write proofs you need to be introduced to basic techniques and get lots of practice with them first. A course in proof writing is where this is done. Do not tell yourself that you can just figure out how to write proofs in a class like Modern Algebra while you’re taking it. You will be sorry.
As for Statistics, you should take at least one more course: Regression Analysis. In this course you’ll learn more about inference and how to build regression models. This is very important basic knowledge for further study in anything that involves Statistics, including machine learning, and regression techninques remain a very important tool in academic research. Don’t let anybody tell you that this stuff is not important.
28.0.2 College majors and minors
The absolute best place for you to develop stronger quantitative and empirical skills is as a university student. Depending on your level of interest this could mean majoring in a subject that focuses on empirics and programming or minoring in a related subject. Some obvious choices for both of these subjects are Statistics and Computer Science. Of course you will also have to take more courses in Mathematics as well.
However, depending on where you end up going for college, there may be majors with narrower focuses and obscure names that may fit your interests even better than those more well known ones. Please do not take the advice I give here as a substitute for speaking with professors and counselors at your school about how to plan your education. This is merely meant to be some food for thought.
At UCLA, there is a major called Mathematics of Computation which provides rigorous education in Mathematics. In addition to this, students have to take courses in Computer Science and Physics, and they can take elective courses in Statistics too. This is basically the closest thing that UCLA currently has to a major in Data Science.
http://catalog.registrar.ucla.edu/ucla-catalog19-20-965.html
UCLA also has a program called Program in Computing which offers Computer Science courses to students majoring in subjects other than Computer Science who need to learn how to code in order to do their coursework and engage in academic research.
At UC Berkeley, there is an undergraduate major and minor in Data Science. They also have a 5 year BS/MS program in Data Science.
https://data.berkeley.edu/academics/undergraduate-programs
https://www.ischool.berkeley.edu/programs/5th-year-mids
If you’re attending a different school and you want to find programs that are similar to the ones I linked above from UCLA and UC Berkeley, the best place to start would probably be the websites for the departments of Statistics, Computer Science and Mathematics at your school. You should also ask questions online on places like the reddit page for your school. But the best place to get answers to questions you have about this stuff is from counselors and professors in those departments.
28.0.3 Online non-credit courses
Regardless of which major you pick in college, even if it’s a highly technical one like Data Science or Computer Science, chances are there are going to be some gaps in the curriculum you’re learning that could make it hard for you to get a job in the field that interests you if they go unfilled. Recently there was a controversy about the practicality of UCLA’s Computer Science curriculum that was covered in the Daily Bruin. That article is linked below.
So do problems like the ones above mean that you shouldn’t bother getting a degree in something like Computer Science or Data Science? Absolutely not.
If you’re hoping to get a job which requires you to use skills like those we learned in the lab component of this class, you’re going to have to be a self-starter, and you’re going to have to learn continuously throughout your career. You cannot expect your professors to teach you absolutely everything that you will ever need to know, because what you need to know will change over time as your field advances. This is the main reason why I pretty much always tell you to use Google to try to solve your own programming problems: because this is what you will have to do in future related coursework and in your career, and you should show up to future endeavors ready to do that.
So where should you go to try to plug the gaps in your knowledge?
There are two big websites that collaborate with universities around the world to offer low cost courses in lots of different subjects, but especially computer programming. These two websites are coursera.org and edx.org.
Single courses on these websites usually cost about $50, though the cost sometimes varies. Regardless of how much any particular course or package of courses is, they will always be much cheaper than any of the courses you’re taking while you’re in college, though they are also a lot less rigorous. Still, if the choice is between taking a relatively easy course in something you want to learn about and not being able to learn about it at all, the former is pretty much always better than the latter. And like courses you take in college, the quality of courses on coursera.org and edx.org also varies, but there are some out there that are truly outstanding.
Another website you should consider joining is DataCamp.com. If you only took the courses that were assigned for credit as part of our class, you’ve hardly scraped the surface of what’s available there.
28.0.4 R programming
Before proceeding with your R programming education, I recommend getting a lot more practice in R on DataCamp.com. In particular I recommend pursuing the Data Scientist with R certification that they offer there. This certification will not turn you into a fully fledged data scientist. But it will expose you to many, many other uses of the R language and help you develop some more specific goals. I already made you take a few of the courses in this program as part of our class. Take more and add this line to your CV!
https://www.datacamp.com/tracks/data-scientist-with-r
Those of you who are interested in business and finance should also consider the Quantitative Analyst with R specialization. This won’t turn you into some kind of quant either. Only your college education can really do that. However, this certification will introduce you to lots of capabilities inside of R that you can use to put what you’re learning about in college to work, which can help you get a great job.
https://www.datacamp.com/tracks/quantitative-analyst-with-r
In my opinion the Statistics with R Specialization on Coursera is probably the best set of introductory Statistics courses that exists today regardless of the fact that these are non-credit courses. These courses teach basic (descriptive, inferential, regression) and introductory Bayesian statistics with the same rigor and quality that you’d expect to experience at a top university. But not only do they accomplish that, they do it in an R programming environment. Each course culminates in a peer graded project which requires the student to dig through a big dataset to answer their own related research questions, and there is a capstone project for the final course that requires the student to apply everything they’ve learned. I took the first three of these courses myself a couple of years ago to review this subject in order to TA for this class and this Specialization is definitely the main inspiration for the lab component of our course. And one interesting thing about the lead instructor, Mine Çetinkaya-Rundel, is that she got her PhD in Statistics from UCLA! Go Bruins!
https://www.coursera.org/specializations/statistics
Another great series of courses is the one that makes up the Professional Certificate in Data Science from HarvardX on edx.org. This certificate covers lots of the same topics as the Statistics with R specialization, but it also covers a lot of ground that the other specialization doesn’t. For example, this specialization will teach you about basic machine learning and how to use GitHub. The latter will be extremely important later when you need to show off examples of your work to potential employers and graduate departments that are reviewing your applications. You will almost certainly not be taught how to do this as part of your standard college education, but that doesn’t mean that you don’t need to figure out how to do it anyway. I just gave you two great reasons to do it.
https://www.edx.org/professional-certificate/harvardx-data-science
Can you learn R programming at your university? Maybe. But you should be aware that if they do offer courses in this, it is possible that the curriculum will be rather outdated. Maybe they’ll make you use base R graphics instead of ggplot2
for instance. If this is the case, then you’re probably better off just studying it on your own.
28.0.5 Python programming
In the introduction to this book I was careful to point out that although our course is not Python-based, this does not mean that Python is not worth learning, and that you must learn it sooner or later if you’re aiming for some sort of a data-driven career. So where can you do that?
Some universities like UC Berkeley have Data Science majors that are Python-based. But the courses in these programs generally make use of libraries (called “packages” in R) that are not popular outside of UC Berkeley. For instance, in Python there is a library called pandas that is widely used for data manipulation. It is analagous to the dplyr
library in R. However, at UC Berkeley, at least in Data 8 - Introduction to Data Science, they use their own library called datascience instead. I have also seen other examples of custom libraries for other courses. It really seems that if you don’t teach yourself pandas as a UC Berkeley student, no one else will. This may be true at other universities as well, assuming they even offer courses in Python programming that are focused on data analysis.
Another thing I said at the beginning of this book is that Python was not developed specifically for data analysis like R was and that learning enough of it to use it in the same way we use R takes more time. My list of recommendations reflects that.
The first thing I recommend you complete before embarking on further education in Python is the Data Scientist with Python certificate on DataCamp. I recommend it for the same reasons I recommend its R counterpart: it provides a broad, interactive overview of the Python language and its analytical capabilities, and it will help you start to develop related goals.
https://www.datacamp.com/tracks/data-scientist-with-python
After you complete this, you should dive back into Python programming, this time with a Computer Science focus. This is because in order to become comfortable with using Python as a tool for data analysis, you need to become comfortable with Python’s syntax and data structures.
The first recommendation I have for this is Coursera’s Python For Everybody specialization. It’s not a terribly difficult specialization, and you can probably complete without watching the lecture videos as long as you read the textbook. This will set you up for success in the following specialization, which is significantly more challenging.
https://www.coursera.org/specializations/python
The second recommendation I have is essential if you’re an aspiring data scientist or you simply think you are interested in machine learning. The Python 3 Specialization from the University of Michigan covers some of the same topics as Python For Everybody, but in significantly greater depth. This is not a specialization for which you can skip all the lecture videos and expect to get by with the textbook. The purpose of this specialization is to teach Python programming to people who want to enter the University of Michigan’s Masters in Applied Data Science degree program, an online degree that is offered on Coursera. The last two projects are extremely challenging and have a clear data science focus. The final project involves using text and facial recognition algorithms. You don’t have to write these algorithms yourself, but you do have to figure out how to use them, and this requires very strong basic knowledge of Python in order to do successfully. You should not skip directly to this specialization without completing the previous two recommended prerequisites as a result, because you probably won’t be able to finish this if you do.
https://www.coursera.org/specializations/python-3-programming
The third specialization I recommend is a continuation of the previous one. Basically, it consists of more projects like the final project of the Python 3 Specialization. Great extra practice!
https://www.coursera.org/specializations/data-science-python
After you have taken multivariable calculus with vectors (Calculus 3) and linear algebra (preferably the version for math majors) and you have gotten very good at Python programming, then you’ll be ready to start learning about machine learning. A great course for this is the Machine Learning course by Andrew Ng offered in collaboration with Stanford University on Coursera. Machine learning is hot stuff right now, but in order to learn how to do anything meaningful using these tools, you absolutely must have a strong background in Python programming, Mathematics and Statistics. If you try to skip the prerequisites and go directly to machine learning because you’ve convinced yourself that you can just figure it all out while you’re doing it, you won’t have a good time.
https://www.coursera.org/learn/machine-learning
After you complete the above course, you’ll probably want to keep learning. Coursera has a series of courses in deep learning (also by Andrew Ng) which are meant to follow the above course.
https://www.coursera.org/specializations/deep-learning
And for those of you who are more business and finance focused, machine learning can be useful for you too. Coursera has a specialization that focuses on using machine learning in finance. Make sure you have fulfilled the prerequisites before taking a crack at it.
https://www.coursera.org/specializations/machine-learning-reinforcement-finance
28.0.6 A few last pieces of advice
Although coursera.org and edx.org are good places to get practice with coding and learn how to use specific packages and libraries that may not be emphasized in programming courses you take at your university if such courses are offered at all, please understand that when it comes to instruction in Mathematics and Statistics, there is absolutely no good alternative to credit bearing courses in these subjects from your university. It is possible to take non-credit courses for classes like calculus, linear algebra, probability and so on, but these are generally inferior to credit bearing versions of them. Of course some graduate programs will accept courses from edx.org and coursera.org to fulfill certain admission requirements, but this is typically only done at the Masters level and not at the PhD level, and such courses will pretty much never prepare you as well as a traditional alternative will, which at the very least will set you up for a tougher graduate school experience than you would otherwise have. Also, if you take a credit bearing version of a course, there’s pretty much never any question about whether or not it will fulfill an admission requirement as long as it’s the right course. But if it’s a non-credit version, its acceptability will be very much in doubt from school to school even if it is the “right” course.
It is also extremely important to get a good Mathematics and Statistics education because no matter how good you get at programming, your skills as a programmer won’t be very valuable if you don’t have the theoretical knowledge from a strong education in Mathematics and Statistics that gives your programming skills analytical power. (Assuming you’re pursuing goals that require such knowledge, of course.) For example, your ability to conduct a t test using t.test()
is worthless if you can’t interpret the result of such a test. The same is true for anything else you figure out how to do in an R or Python programming environment but cannot fundamentally understand.
My point is this: the time for you to really fill out your math education is now. It will be much harder for you to return to school later. If you need to, there are programs that exist and I’ve linked a couple below, but it will be hard to balance the demands of such coursework with other obligations (job, family, etc.) you may have later in life.