Stock Data Analysis with Python (Second Edition)


This is a lecture for MATH 4100/CS 5160: Introduction to Data Science, offered at the University of Utah, introducing time series data analysis applied to finance. This is also an update to my earlier blog posts on the same topic (this one combining them together). I strongly advise referring to this blog post instead of the previous ones (which I am not altering for the sake of preserving a record). The code should work as of July 7th, 2018. (And sorry for some of the formatting;’s free version doesn’t play nice with code or tables.)

Continue reading


Replication Intervals

At the University of Utah I’ve taught MATH 1070 and MATH 3070. Both are introductory statistics classes, but I call MATH 1070 “Introductory Statistics for People Who Don’t Like Math” while MATH 3070 is “Introductory Statistics for People Who Do Like Math”, since the latter requires calculus and uses far more probability. In both classes, though, students need to learn what confidence intervals (CIs) say and don’t say, and I spend a lot of time debunking common misconceptions for what a confidence interval says.

Continue reading

Learn Basic Python and scikit-learn Machine Learning Hands-On with My Course: Training Your Systems with Python Statistical Modelling

This post is actually months late, but like with my last video course announcement, it’s better late than never. And besides, of my video courses, I had the most fun writing this one.

Continue reading

A Recession Before 2020 Is Likely; On the Distribution of Time Between Recessions

I recently saw a Reddit thread in r/PoliticalDiscussion asking the question “If the economy is still booming 2020, how should the Democratic address this?” This gets to an issue that’s been on my mind since at least 2016, maybe even 2014: when will the current period of economic growth end?

Continue reading

R Function for Simulating Gaussian Processes

This semester my studies all involve one key mathematical object: Gaussian processes. I’m taking a course on stochastic processes (which will talk about Wiener processes, a type of Gaussian process and arguably the most common) and mathematical finance, which involves stochastic differential equations (SDEs) used for derivative pricing, including in the Black-Scholes-Merton equation. Then I’m involved in a Gaussian process and stochastic calculus reading group. So these processes will take up a lot of my attention.

Continue reading

Start Getting and Working with Data with “Data Acquisition and Manipulation with Python”

This news is a few weeks late, but better late than never!

Continue reading

Problems In Estimating GARCH Parameters in R

UPDATE (11/2/17 3:00 PM MDT): I got the following e-mail from Brian Peterson, a well-known R finance contributor, over R’s finance mailing list:

I would strongly suggest looking at rugarch or rmgarch. The primary
maintainer of the RMetrics suite of packages, Diethelm Wuertz, was
killed in a car crash in 2016. That code is basically unmaintained.

I will see if this solves the problem. Thanks Brian! I’m leaving this post up though as a warning to others to avoid fGarch in the future. This was news to me, books often refer to fGarch, so this could be a resource for those looking for working with GARCH models in R why not to use fGarch.

UPDATE (11/2/17 11:30 PM MDT): I tried a quick experiment with rugarch and it appears to be plagued by this problem as well. Below is some quick code I ran. I may post a full study as soon as tomorrow.


spec = ugarchspec(variance.model = list(garchOrder = c(1, 1)), mean.model = list(armaOrder = c(0, 0), include.mean = FALSE), = list(alpha1 = 0.2, beta1 = 0.2, omega = 0.2))
ugarchpath(spec = spec, n.sim = 1000, n.start = 1000) -> x
srs = x@path$seriesSim
spec1 = ugarchspec(variance.model = list(garchOrder = c(1, 1)), mean.model = list(armaOrder = c(0, 0), include.mean = FALSE))
ugarchfit(spec = spec1, data = srs)
ugarchfit(spec = spec1, data = srs[1:100])

These days my research focuses on change point detection methods. These are statistical tests and procedures to detect a structural change in a sequence of data. An early example, from quality control, is detecting whether a machine became uncalibrated when producing a widget. There may be some measurement of interest, such as the diameter of a ball bearing, that we observe. The machine produces these widgets in sequence. Under the null hypothesis, the ball bearing’s mean diameter does not change, while under the alternative, at some unkown point in the manufacturing process the machine became uncalibrated and the mean diameter of the ball bearings changed. The test then decides between these two hypotheses.

Continue reading