Months ago, I asked a question to the community: how should I organize my R research projects? After writing that post, doing some reading, then putting a plan in practice, I now have my own answer.

First, some background. In the early months of 2016 I began a research project with my current Ph.D. advisor that involved extensive coding and spanned over at least two years. My code was poorly organized and thus problematic, as managing the chaos and extending the code became difficult. Meanwhile, I was reading articles by programmers and researchers about ways to organize R code so that research results are reproducible, distributable, and extensible. I identified two different approaches to organizing a project to meet these goals: one centered around makefiles, and another around package development. Given these competing approaches and their differing advantages, I was unsure what to do.

Since writing that post, I did more reading. First, I read two of Hadley Wickham’s (excellent) books: *R Packages* and *Advanced R*. (I loved *R Packages* so much I bought a physical copy.) I also read a book I picked up in a Humble Bundle book sale called *Code Craft; The Practice of Writing Excellent Code* by Pete Goodliffe for learning about good coding practices. Finally, I read a good portion of the GNU `make`

manual.

I also spent *months* restructuring the project to comply with what I learned. Many, *many* hours were spent just fixing the mess I had made by not doing things right in the first place.

The result is **CPAT**, an R package implementing some change point analysis statistical tests. What **CPAT** does will be the subject of a future post (it will be published when the accompanying paper is made available online); what I want to focus on in this article is how I learned to organize an R research project, and how that culminated in CPAT.

In the earlier article I presented two approaches that I suggested were “competing” approaches to organizing a research project: the *project as executable* approach of Jon Zelner and the *project as package* approach of Csefalvay and Flight. Both approaches, in my view, possessed unique advantages, but seemed to be at odds.

They are not at odds. **CPAT** demonstrates that it is possible to view an R project as both an executable and as a package. That said, the package development approach becomes dominant; making the package executable (from the command line) is an additional feature that makes the project even more portable and extensible.

If one is going to adopt the package development approach, one must use the hierarchy R packages needs. So that means:

- R code that defines the package (which are mostly just functions) is placed in the
`R/`

directory. - Documentation is placed in the
`man/`

directory (if you’re using**roxygen2**and**devtools**like a sane human being, though, this is something you won’t do yourself, though). - Project data goes in the
`data/`

directory. - Compiled code from other languages (such as C++ when using
**Rcpp**) goes in the`src/`

directory. - Code tests—
*which are not optional and must be written!*—go in the`tests/`

directory (but if you’re using**testthat**for your testing then the tests you actually wrote go in`tests/testthat/`

). - Long-form documentation goes in
`vignettes/`

. This could be the paper itself, if written in the form of a vignette. - Other important files should be placed in a reasonably-organized
`inst/`

directory, to be installed with the package, along with other files that should be installed into the base directory (such as`Makefile`

). For example, I put all my plots in`inst/plots/`

, and this would also be a good directory to put the paper that accompanies the project. - Put executable scripts, including R scripts, in
`exec/`

.

The approach championed by Zelner doesn’t require a particular organizational style but simply that there be a coherent organization to the project. R package development not only has a coherent structure but even *enforces* it. If that structure doesn’t quite work, then one can add other files and directories as needed and note them in the `.Rbuildignore`

file, so they’re ignored when the package is built.

When writing an R package, the relevant R tools basically enforce some essential points of style such as documenting objects. Also, the developer-researcher starts to think of important functionality of the project in terms of reusable functions that should be added to the package to be called by the scripts that actually execute the analysis—with documentation and everything else. Having well-documented functions, even if they serve a minor purpose, helps greatly in making the project more easily understood and written not only by others but by the original author as well. In my case, since I wrote **CPAT** almost exclusively with vim, I wrote a UltiSnips snippet creating a function skeleton that not only defines the function but automatically adds the framework of the documentation, as seen below.

While package development does place (helpful) constraints, it does not specify everything. In other words, there is room for style. I essentially define *style* to mean any aspect of programming in which a choice is made that was not determined by the programming language or software. Examples of style include naming conventions, indentation, etc. Consistent style makes for understandable code; having consistent style is arguably more important than the stylistic decisions made. So I decided to codify my own stylistic preferences in a style guide, and when I did my code rewrite I made the new code comply with my style guide, even if that aded more time to editing. Whenever I encountered a new “decision point” (such as, say, dataset naming conventions), I committed my decision to the style guide.

As I mentioned above, the package development approach turns out not to be mutually exclusive with the project-as-executable approach. While it seems like documentation on R package development (including Dr. Wickham’s book) mentions the `exec/`

directory of a package only in passing, I found it to be a good place to place executable R scripts. Similarly, `make`

can still be used to automate analysis tasks; R packages allow for including `make`

files.

So in addition to the files that essentially defined the package, I also wrote stand-alone, command line executable R scripts and placed them in the `exec/`

directory (which causes them to be flagged as “executable” when the package is installed). I wrote a Vim template file for R scripts that provides a skeleton for making the package executable from the command line. That template is listed below:

#!/usr/bin/Rscript ################################################################################ # MyFile.R ################################################################################ # 2018-12-31 (last modified date) # John Doe (author) ################################################################################ # This is a one-line description of the file. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(foo, bar, help = FALSE) { # This function will be executed when the script is called from the command # line; the help parameter does nothing, but is needed for do.call() to work quit() } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = "This is a template for executable R scripts.", option_list = list( make_option(c("--foo", "-f"), type = "integer", default = 0, help = "A command-line argument"), make_option(c("--bar", "-b"), type = "character", help = "Another command-line argument") ) )) do.call(main, cl_args) }

Converting my scripts into modularized, executable programs was, not surprisingly, very time consuming, and the transition was not perfect; some scripts just could not be modularized well. Nevertheless, the end result was likely worth it, and I could then write a Makefile defining how the pieces fit together. This tamed the complexity of the project and made it more reproducible; someone looking to repeat my analysis should only have to type `make`

in a Linux terminal^{1} to see the results themselves.

While I did make my project modular and executable, though, I did not try to make it stand-alone with, say, **packrat** or Docker. I did try to use **packrat**, even setting it up to work with my package. But I ran into severe problems when I tried to work with my package at the University of Utah Mathematics Department, since the computer system’s R installation is almost four years old as of this writing and highly tempermental due to how the system administrator set it up. **packrat** made complications working with the department computers even worse, and I disabled it in a huff one day and never looked back. As for Docker or GitLab, I did not want my project tied up with proprietary or web-based services, and I felt that the end result Zelner was seeking when using these services is overkill; when you’ve added **packrat** (which I didn’t because of complications, but still) and defined how the project pieces fit together with `make`

, you’ve mostly conquered the reproducibility problem, in my view. So I never missed these services.

The end result of this work can be seen in the `paper`

branch of CPAT, also permanently available in this tarball. The directory tree is also informative.

In some sense the end goal is to have an R package that could be distributed to others via, say, CRAN, so they can *use* the methods you employed and developed, not just reproduce your research; at least, that’s the case for me, a mathematical statistician interested in analyzing and developing statistical tests and procedures. When a package is written to contain research and not just for software distribution, it comes with a lot of files that aren’t needed for the package to function; just look at the dirctory tree!

The solution is to just delete the files that can be recreated—perhaps with `make clean`

if you set it up right—and consider adding other files to `.Rbuildignore`

when you want to distribute the package for others to use. So this isn’t actually a big problem.

Another issue that I encountered and am still unsure about are functions that are useful to the project but not useful outside of it. If you look through the `paper`

branch manual or even the public version manual you will find functions that were useful only for the project, perhaps for converting data structures created by scripts or making particular plots that make sense only for the paper. They’re all private functions that need to be accessed via the `:::`

operator, yet they’re still in the manual.

I’m undecided whether this is good style. On the one hand, it’s nice that when others read your code there’s manual entries even for functions that are local to the project to further document what was done and how the code works. Even when distributing the software, having every function documented, even ones that are “private” to the package, seems to be in concordance with the spirit of open source software, making the source code easier understood by users who need and want to know how your software works. It also could serve as a good way to modularize documentation; a statistical formula is kept with the function that computes it rather than the interface to that function (which likely links to that underlying function). Having examples for those internal functions also should provide an additional layer of testing and helps when others want to extend the package.

On the other hand… most of the pages of the manual are devoted to functions the user isn’t supposed to be calling directly in their work. Of all those functions, maybe five are functions the user is expected to use. Should all that documentation space be devoted to something the user doesn’t use?

While I’m not set in my opinion, I lean to having more documentation rather than less, even if most of it is for private functions. After all, it’s useful to me when I’m developing the project and package.

I feel like spending those months to make my project logical and reporducible was time well spent. Not only did I learn a lot in the process, I had a useful end product that is now available on CRAN. Additionally, this project is not over; my advisor and I are continuing to work on extending the results that lead to the creation of this package in the first place, which will call for more simulation experiments. Now that I’ve organized my work I now have a good base for continuing that work.

I hope that this article inspired others on how to organize their R research projects. Gauging from reactions to my previous article, I think this is an underappreciated topic, unfortunately. Having a plan for managing package complexity and organization goes a long way to keeping your work under control and helps others appreciate what you’ve done. It also can lead to your work having a greater impact since others can use it as well.

I got a lot of good feedback from my previous article. I look forward to hearing what the community has to say now. I’m always open to suggestion.

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

- Sadly my project is tied closely to the Unix/Linux setup; I have no idea how well it would work on Windows and I’m not very interested in making the project easy for Windows use (despite having Windows 10 installed on my primary laptop). What this means is that the goal of full reproducibility isn’t met for most Windows users, a large market of users. That said… if you’re a Windows user, just download VirtualBox for free, download and install Ubuntu or some other Linux distribution you like, install R and the needed packages, and now you can reproduce my work. You may even discover why I profer to work in a Linux environment yourself. ↩

Now here is a blog post that has been sitting on the shelf far longer than it should have. Over a year ago I wrote an article about problems I was having when estimating the parameters of a GARCH(1,1) model in R. I documented the behavior of parameter estimates (with a focus on ) and perceived pathological behavior when those estimates are computed using **fGarch**. I called for help from the R community, including sending out the blog post over the R Finance mailing list.

I was not disappointed in the feedback. You can see some mailing list feedback, and there were some comments on Reddit that were helpful, but I think the best feedback I got was through my own e-mail.

Dr. Brian G. Peterson, a member of the R finance community, sent some thought provoking e-mails. The first informed me that **fGarch** is no longer the go-to package for working with GARCH models. The RMetrics suite of packages (which include **fGarch**) was maintained by Prof. Diethelm Würtz at ETH Zürich. He was killed in a car accident in 2016.

Dr. Peterson recommended I look into two more modern packages for GARCH modelling, **rugarch** (for univariate GARCH models) and **rmgarch** (for multivariate GARCH models). I had not heard of these packages before (the reason I was aware of **fGarch** was because it was referred to in the time series textbook Time Series Analysis and Its Applications with R Examples by Shumway and Stoffer), so I’m very thankful for the suggestion. Since I’m interested in univariate time series for now, I only looked at **rugarch**. The package appears to have more features and power than **fGarch**, which may explain why it seems more difficult to use. However the package’s vignette is helpful and worth printing out.

Dr. Peterson also had interesting comments about my proposed applications. He argued that intraday data should be preferred to daily data and that simulated data (including simulated GARCH processes) has idiosyncracies not seen in real data. The ease of getting daily data (particularly for USD/JPY around the time of Asian financial crises, which was an intended application of a test statistic I’m studying) motivated my interest in daily data. His comments, though, may lead me to reconsider this application.^{1} (I might try to detect the 2010 eurozone financial crises via EUR/USD instead. I can get free intraday data from HistData.com for this.) However, if standard error estimates cannot be trusted for small sample sizes, our test statistic would still be in trouble since it involves estimating parameters even for small sample sizes.

He also warned that simulated data exhibits behaviors not seen in real data. That may be true, but simulated data is important since it can be considered a statistician’s best-case scenario. Additionally, the properties of the process that generated simulated data are known *a priori*, including the values of the generating parameters and whether certain hypotheses (such as whether there is a structural change in the series) are true. This allows for sanity checks of estimators and tests. This is impossible for real-world since we don’t have the *a priori* knowledge needed.

Prof. André Portela Santos asked that I repeat the simulations but with since these values are supposedly more common than my choice of . It’s a good suggestion and I will consider parameters in this range in addition to in this post. However, my simulations seemed to suggest that when , the estimation procedures nevertheless seem to want to be near the range of large . I’m also surprised since my advisor gave me the impression that GARCH processes with either or large are more difficult to work with. Finally, if the estimators are strongly biased, we might expect to see most estimated parameters to lie in that range, though that does not mean the “correct” values lie in that range. My simulations suggest **fGarch** struggles to discover even when those parameters are “true.”” Prof. Santos’ comment leads me to desire a metastudy about what common estimates of GARCH parameters are on real world. (There may or may not be one; I haven’t checked. If anyone knows of one, please share.)

My advisor contacted another expert on GARCH models and got some feedback. Supposedly the standard error for is large, so there should be great variation in parameter estimates. Some of my simulations agreed with this behavior even for small sample sizes, but at the same time showed an uncomfortable bias towards and . This might be a consequence of the optimization procedures, as I hypothesized.

So given this feedback, I will be conducting more simulation experiments. I won’t be looking at **fGarch** or **tseries** anymore; I will be working exclusively with **rugarch**. I will explore different optimization procedures supported by the package. I won’t be creating plots like I did in my first post; those plots were meant only to show the existence of a problem and its severity. Instead I will be looking at properties of the resulting estimators produced by different optimization procedures.

As mentioned above, **rugarch** is a package for working with GARCH models; a major use case is estimating their parameters, obviously. Here I will demonstrate how to specify a GARCH model, simulate data from the model, and estimate parameters. After this we can dive into simulation studies.

library(rugarch)

## Loading required package: parallel ## ## Attaching package: 'rugarch' ## The following object is masked from 'package:stats': ## ## sigma

To work with a GARCH model we need to specify it. The function for doing this is `ugarchspec()`

. I think the parameters `variance.model`

and `mean.model`

are the most important parameters.

`variance.model`

is a list with named entries, perhaps the two most interesting being `model`

and `garchOrder`

. `model`

is a string specify which type of GARCH model is being fitted. Many major classes of GARCH models (such as EGARCH, IGARCH, etc.) are supported; for the “vanilla” GARCH model, set this to `"sGARCH"`

(or just omit it; the standard model is the default). `garchOrder`

is a vector for the order of the ARCH and GARCH components of the model.

`mean.model`

allows for fitting ARMA-GARCH models, and functions like `variance.model`

in that it accepts a list of named entries, the most interesting being `armaOrder`

and `include.mean`

. `armaOrder`

is like `garchOrder`

; it’s a vector specifying the order of the ARMA model. `include.mean`

is a boolean that, if true, allows for the ARMA part of the model to have non-zero mean.

When simulating a process, we need to set the values of our parameters. This is done via the `fixed.pars`

parameter, which accepts a list of named elements, the elements of the list being numeric. They need to fit the conventions the function uses for parameters; for example, if we want to set the parameters of a model, the names of our list elements should be `"alpha1"`

and `"beta1"`

. If the plan is to simulate a model, every parameter in the model should be set this way.

There are other parameters interesting in their own right but I focus on these since the default specification is an ARMA-GARCH model with ARMA order of with non-zero mean and a GARCH model of order . This is not a vanilla model as I desire, so I almost always change this.

spec1 <- ugarchspec(mean.model = list(armaOrder = c(0,0), include.mean = FALSE), fixed.pars = list("omega" = 0.2, "alpha1" = 0.2, "beta1" = 0.2)) spec2 <- ugarchspec(mean.model = list(armaOrder = c(0,0), include.mean = FALSE), fixed.pars = list("omega" = 0.2, "alpha1" = 0.1, "beta1" = 0.7)) show(spec1)

## ## *---------------------------------* ## * GARCH Model Spec * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ------------------------------------ ## GARCH Model : sGARCH(1,1) ## Variance Targeting : FALSE ## ## Conditional Mean Dynamics ## ------------------------------------ ## Mean Model : ARFIMA(0,0,0) ## Include Mean : FALSE ## GARCH-in-Mean : FALSE ## ## Conditional Distribution ## ------------------------------------ ## Distribution : norm ## Includes Skew : FALSE ## Includes Shape : FALSE ## Includes Lambda : FALSE

show(spec2)

## ## *---------------------------------* ## * GARCH Model Spec * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ------------------------------------ ## GARCH Model : sGARCH(1,1) ## Variance Targeting : FALSE ## ## Conditional Mean Dynamics ## ------------------------------------ ## Mean Model : ARFIMA(0,0,0) ## Include Mean : FALSE ## GARCH-in-Mean : FALSE ## ## Conditional Distribution ## ------------------------------------ ## Distribution : norm ## Includes Skew : FALSE ## Includes Shape : FALSE ## Includes Lambda : FALSE

The function `ugarchpath()`

simulates GARCH models specified via `ugarchspec()`

. The function needs a specification objectect created by `ugarchspec()`

first. The parameters `n.sim`

and `n.start`

specify the size of the process and the length of the burn-in period, respectively (with defaults 1000 and 0, respectively; I strongly recommend setting the burn-in period to at least 500, but I go for 1000). The function creates an object that contains not only the simulated series but also residuals and .

The `rseed`

parameter controls the random seed the function uses for generating data. Be warned that `set.seed()`

is effectively ignored by this function, so if you want consistent results, you will need to set this parameter.

The `plot()`

method accompanying these objects is not completely transparent; there are a few plots it could create and when calling `plot()`

on a `uGARCHpath`

object in the command line users are prompted to input a number corresponding to the desired plot. This is a pain sometimes so don’t forget to pass the desired plot’s number to the `which`

parameter to avoid the prompt; setting `which = 2`

will give the plot of the series proper.

old_par <- par() par(mfrow = c(2, 2)) x_obj <- ugarchpath(spec1, n.sim = 1000, n.start = 1000, rseed = 111217) show(x_obj)

## ## *------------------------------------* ## * GARCH Model Path Simulation * ## *------------------------------------* ## Model: sGARCH ## Horizon: 1000 ## Simulations: 1 ## Seed Sigma2.Mean Sigma2.Min Sigma2.Max Series.Mean ## sim1 111217 0.332 0.251 0.915 0.000165 ## Mean(All) 0 0.332 0.251 0.915 0.000165 ## Unconditional NA 0.333 NA NA 0.000000 ## Series.Min Series.Max ## sim1 -1.76 1.62 ## Mean(All) -1.76 1.62 ## Unconditional NA NA

for (i in 1:4) { plot(x_obj, which = i) }

par(old_par)

## Warning in par(old_par): graphical parameter "cin" cannot be set ## Warning in par(old_par): graphical parameter "cra" cannot be set ## Warning in par(old_par): graphical parameter "csi" cannot be set ## Warning in par(old_par): graphical parameter "cxy" cannot be set ## Warning in par(old_par): graphical parameter "din" cannot be set ## Warning in par(old_par): graphical parameter "page" cannot be set

# The actual series x1 <- x_obj@path$seriesSim plot.ts(x1)

The `ugarchfit()`

function fits GARCH models. The function needs a specification and a dataset. The `solver`

parameter accepts a string stating which numerical optimizer to use to find the parameter estimates. Most of the parameters of the function manage interfacing with the numerical optimizer. In particular, `solver.control`

can be given a list of arguments to pass to the optimizer. We will be looking at this in more detail later.

The specification used for generating the simulated data won’t be appropriate for `ugarchfit()`

, since it contains fixed values for its parameters. In my case I will need to create a second specification object.

spec <- ugarchspec(mean.model = list(armaOrder = c(0, 0), include.mean = FALSE)) fit <- ugarchfit(spec, data = x1) show(fit)

## ## *---------------------------------* ## * GARCH Model Fit * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ----------------------------------- ## GARCH Model : sGARCH(1,1) ## Mean Model : ARFIMA(0,0,0) ## Distribution : norm ## ## Optimal Parameters ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## omega 0.000713 0.001258 0.56696 0.57074 ## alpha1 0.002905 0.003714 0.78206 0.43418 ## beta1 0.994744 0.000357 2786.08631 0.00000 ## ## Robust Standard Errors: ## Estimate Std. Error t value Pr(>|t|) ## omega 0.000713 0.001217 0.58597 0.55789 ## alpha1 0.002905 0.003661 0.79330 0.42760 ## beta1 0.994744 0.000137 7250.45186 0.00000 ## ## LogLikelihood : -860.486 ## ## Information Criteria ## ------------------------------------ ## ## Akaike 1.7270 ## Bayes 1.7417 ## Shibata 1.7270 ## Hannan-Quinn 1.7326 ## ## Weighted Ljung-Box Test on Standardized Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 3.998 0.04555 ## Lag[2*(p+q)+(p+q)-1][2] 4.507 0.05511 ## Lag[4*(p+q)+(p+q)-1][5] 9.108 0.01555 ## d.o.f=0 ## H0 : No serial correlation ## ## Weighted Ljung-Box Test on Standardized Squared Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 29.12 6.786e-08 ## Lag[2*(p+q)+(p+q)-1][5] 31.03 1.621e-08 ## Lag[4*(p+q)+(p+q)-1][9] 32.26 1.044e-07 ## d.o.f=2 ## ## Weighted ARCH LM Tests ## ------------------------------------ ## Statistic Shape Scale P-Value ## ARCH Lag[3] 1.422 0.500 2.000 0.2331 ## ARCH Lag[5] 2.407 1.440 1.667 0.3882 ## ARCH Lag[7] 2.627 2.315 1.543 0.5865 ## ## Nyblom stability test ## ------------------------------------ ## Joint Statistic: 0.9518 ## Individual Statistics: ## omega 0.3296 ## alpha1 0.2880 ## beta1 0.3195 ## ## Asymptotic Critical Values (10% 5% 1%) ## Joint Statistic: 0.846 1.01 1.35 ## Individual Statistic: 0.35 0.47 0.75 ## ## Sign Bias Test ## ------------------------------------ ## t-value prob sig ## Sign Bias 0.3946 6.933e-01 ## Negative Sign Bias 3.2332 1.264e-03 *** ## Positive Sign Bias 4.2142 2.734e-05 *** ## Joint Effect 28.2986 3.144e-06 *** ## ## ## Adjusted Pearson Goodness-of-Fit Test: ## ------------------------------------ ## group statistic p-value(g-1) ## 1 20 20.28 0.3779 ## 2 30 26.54 0.5965 ## 3 40 36.56 0.5817 ## 4 50 47.10 0.5505 ## ## ## Elapsed time : 2.60606

par(mfrow = c(3, 4)) for (i in 1:12) { plot(fit, which = i) }

## ## please wait...calculating quantiles...

par(old_par)

## Warning in par(old_par): graphical parameter "cin" cannot be set ## Warning in par(old_par): graphical parameter "cra" cannot be set ## Warning in par(old_par): graphical parameter "csi" cannot be set ## Warning in par(old_par): graphical parameter "cxy" cannot be set ## Warning in par(old_par): graphical parameter "din" cannot be set ## Warning in par(old_par): graphical parameter "page" cannot be set

Notice the estimated parameters and standard errors? The estimates are nowhere near the “correct” numbers even for a sample size of 1000, and there is no way a reasonable confidence interval based on the estimated standard errors would contain the correct values. It looks like the problems I documented in my last post have not gone away.

Out of curiosity, what would happen with the other specification, one in the range Prof. Santos suggested?

x_obj <- ugarchpath(spec2, n.start = 1000, rseed = 111317) x2 <- x_obj@path$seriesSim fit <- ugarchfit(spec, x2) show(fit)

## ## *---------------------------------* ## * GARCH Model Fit * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ----------------------------------- ## GARCH Model : sGARCH(1,1) ## Mean Model : ARFIMA(0,0,0) ## Distribution : norm ## ## Optimal Parameters ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## omega 0.001076 0.002501 0.43025 0.66701 ## alpha1 0.001992 0.002948 0.67573 0.49921 ## beta1 0.997008 0.000472 2112.23364 0.00000 ## ## Robust Standard Errors: ## Estimate Std. Error t value Pr(>|t|) ## omega 0.001076 0.002957 0.36389 0.71594 ## alpha1 0.001992 0.003510 0.56767 0.57026 ## beta1 0.997008 0.000359 2777.24390 0.00000 ## ## LogLikelihood : -1375.951 ## ## Information Criteria ## ------------------------------------ ## ## Akaike 2.7579 ## Bayes 2.7726 ## Shibata 2.7579 ## Hannan-Quinn 2.7635 ## ## Weighted Ljung-Box Test on Standardized Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 0.9901 0.3197 ## Lag[2*(p+q)+(p+q)-1][2] 1.0274 0.4894 ## Lag[4*(p+q)+(p+q)-1][5] 3.4159 0.3363 ## d.o.f=0 ## H0 : No serial correlation ## ## Weighted Ljung-Box Test on Standardized Squared Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 3.768 0.05226 ## Lag[2*(p+q)+(p+q)-1][5] 4.986 0.15424 ## Lag[4*(p+q)+(p+q)-1][9] 7.473 0.16272 ## d.o.f=2 ## ## Weighted ARCH LM Tests ## ------------------------------------ ## Statistic Shape Scale P-Value ## ARCH Lag[3] 0.2232 0.500 2.000 0.6366 ## ARCH Lag[5] 0.4793 1.440 1.667 0.8897 ## ARCH Lag[7] 2.2303 2.315 1.543 0.6686 ## ## Nyblom stability test ## ------------------------------------ ## Joint Statistic: 0.3868 ## Individual Statistics: ## omega 0.2682 ## alpha1 0.2683 ## beta1 0.2669 ## ## Asymptotic Critical Values (10% 5% 1%) ## Joint Statistic: 0.846 1.01 1.35 ## Individual Statistic: 0.35 0.47 0.75 ## ## Sign Bias Test ## ------------------------------------ ## t-value prob sig ## Sign Bias 0.5793 0.5625 ## Negative Sign Bias 1.3358 0.1819 ## Positive Sign Bias 1.5552 0.1202 ## Joint Effect 5.3837 0.1458 ## ## ## Adjusted Pearson Goodness-of-Fit Test: ## ------------------------------------ ## group statistic p-value(g-1) ## 1 20 24.24 0.1871 ## 2 30 30.50 0.3894 ## 3 40 38.88 0.4753 ## 4 50 48.40 0.4974 ## ## ## Elapsed time : 2.841597

That’s no better Now let’s see what happens when we use different optimization routines.

`ugarchfit()`

‘s default parameters did a good job of finding appropriate parameters for what I will refer to as model 2 (where and ) but not for model 1 (). What I want to know is when one solver seems to beat another.

As pointed out by Vivek Rao^{2} on the R-SIG-Finance mailing list, the “best” estimate is the estimate that maximizes the likelihood function (or, equivalently, the log-likelihood function), and I omitted inspecting the log likelihood function’s values in my last post. Here I will see which optimization procedures lead to the maximum log-likelihood.

Below is a helper function that simplifies the process of fitting a GARCH model’s parameters and extracting the log-likelihood, parameter values, and standard errors while allowing for different values to be passed to `solver`

and `solver.control`

.

evalSolverFit <- function(spec, data, solver = "solnp", solver.control = list()) { # Calls ugarchfit(spec, data, solver, solver.control), and returns a vector # containing the log likelihood, parameters, and parameter standard errors. # Parameters are equivalent to those seen in ugarchfit(). If the solver fails # to converge, NA will be returned vec <- NA tryCatch({ fit <- ugarchfit(spec = spec, data = data, solver = solver, solver.control = solver.control) coef_se_names <- paste("se", names(fit@fit$coef), sep = ".") se <- fit@fit$se.coef names(se) <- coef_se_names robust_coef_se_names <- paste("robust.se", names(fit@fit$coef), sep = ".") robust.se <- fit@fit$robust.se.coef names(robust.se) <- robust_coef_se_names vec <- c(fit@fit$coef, se, robust.se) vec["LLH"] <- fit@fit$LLH }, error = function(w) { NA }) return(vec) }

Below I list out all optimization schemes I will consider. I only fiddle with `solver.control`

, but there may be other parameters that could help the numerical optimization routines, namely `numderiv.control`

, which are control arguments passed to the numerical routines responsible for standard error computation. This utilizes the package **numDeriv** which performs numerical differentiation.

solvers <- list( # A list of lists where each sublist contains parameters to # pass to a solver list("solver" = "nlminb", "solver.control" = list()), list("solver" = "solnp", "solver.control" = list()), list("solver" = "lbfgs", "solver.control" = list()), list("solver" = "gosolnp", "solver.control" = list( "n.restarts" = 100, "n.sim" = 100 )), list("solver" = "hybrid", "solver.control" = list()), list("solver" = "nloptr", "solver.control" = list("solver" = 1)), # COBYLA list("solver" = "nloptr", "solver.control" = list("solver" = 2)), # BOBYQA list("solver" = "nloptr", "solver.control" = list("solver" = 3)), # PRAXIS list("solver" = "nloptr", "solver.control" = list("solver" = 4)), # NELDERMEAD list("solver" = "nloptr", "solver.control" = list("solver" = 5)), # SBPLX list("solver" = "nloptr", "solver.control" = list("solver" = 6)), # AUGLAG+COBYLA list("solver" = "nloptr", "solver.control" = list("solver" = 7)), # AUGLAG+BOBYQA list("solver" = "nloptr", "solver.control" = list("solver" = 8)), # AUGLAG+PRAXIS list("solver" = "nloptr", "solver.control" = list("solver" = 9)), # AUGLAG+NELDERMEAD list("solver" = "nloptr", "solver.control" = list("solver" = 10)) # AUGLAG+SBPLX ) tags <- c( # Names for the above list "nlminb", "solnp", "lbfgs", "gosolnp", "hybrid", "nloptr+COBYLA", "nloptr+BOBYQA", "nloptr+PRAXIS", "nloptr+NELDERMEAD", "nloptr+SBPLX", "nloptr+AUGLAG+COBYLA", "nloptr+AUGLAG+BOBYQA", "nloptr+AUGLAG+PRAXIS", "nloptr+AUGLAG+NELDERMEAD", "nloptr+AUGLAG+SBPLX" ) names(solvers) <- tags

Now let’s run the gauntlet of optimization choices and see which produces the estimates with the largest log likelihood for data generated by model 1. The `lbfgs`

method (low-storage version of the Broyden-Fletcher-Goldfarb-Shanno method, provided in **nloptr**) unfortunately does not converge for this series, so I omit it.

optMethodCompare <- function(data, spec, solvers) { # Runs all solvers in a list for a dataset # # Args: # data: An object to pass to ugarchfit's data parameter containing the data # to fit # spec: A specification created by ugarchspec to pass to ugarchfit # solvers: A list of lists containing strings of solvers and a list for # solver.control # # Return: # A matrix containing the result of the solvers (including parameters, se's, # and LLH) model_solutions <- lapply(solvers, function(s) { args <- s args[["spec"]] <- spec args[["data"]] <- data res <- do.call(evalSolverFit, args = args) return(res) }) model_solutions <- do.call(rbind, model_solutions) return(model_solutions) } round(optMethodCompare(x1, spec, solvers[c(1:2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- ---------- ## nlminb 0.2689 0.1774 0.0000 0.0787 0.0472 0.2447 0.0890 0.0352 0.2830 -849.6927 ## solnp 0.0007 0.0029 0.9947 0.0013 0.0037 0.0004 0.0012 0.0037 0.0001 -860.4860 ## gosolnp 0.2689 0.1774 0.0000 0.0787 0.0472 0.2446 0.0890 0.0352 0.2828 -849.6927 ## hybrid 0.0007 0.0029 0.9947 0.0013 0.0037 0.0004 0.0012 0.0037 0.0001 -860.4860 ## nloptr+COBYLA 0.0006 0.0899 0.9101 0.0039 0.0306 0.0370 0.0052 0.0527 0.0677 -871.5006 ## nloptr+BOBYQA 0.0003 0.0907 0.9093 0.0040 0.0298 0.0375 0.0057 0.0532 0.0718 -872.3436 ## nloptr+PRAXIS 0.2689 0.1774 0.0000 0.0786 0.0472 0.2444 0.0888 0.0352 0.2823 -849.6927 ## nloptr+NELDERMEAD 0.0010 0.0033 0.9935 0.0013 0.0039 0.0004 0.0013 0.0038 0.0001 -860.4845 ## nloptr+SBPLX 0.0010 0.1000 0.9000 0.0042 0.0324 0.0386 0.0055 0.0536 0.0680 -872.2736 ## nloptr+AUGLAG+COBYLA 0.0006 0.0899 0.9101 0.0039 0.0306 0.0370 0.0052 0.0527 0.0677 -871.5006 ## nloptr+AUGLAG+BOBYQA 0.0003 0.0907 0.9093 0.0040 0.0298 0.0375 0.0057 0.0532 0.0718 -872.3412 ## nloptr+AUGLAG+PRAXIS 0.1246 0.1232 0.4948 0.0620 0.0475 0.2225 0.0701 0.0439 0.2508 -851.0547 ## nloptr+AUGLAG+NELDERMEAD 0.2689 0.1774 0.0000 0.0786 0.0472 0.2445 0.0889 0.0352 0.2826 -849.6927 ## nloptr+AUGLAG+SBPLX 0.0010 0.1000 0.9000 0.0042 0.0324 0.0386 0.0055 0.0536 0.0680 -872.2736

According the the maximum likelihood criterion, the “best” result is achieved by `gosolnp`

. The result has the unfortunate property that , which is certainly not true, but at least the standard error for would create a confidence interval that contains ‘s true value. Of these, my preferred estimates are produced by AUGLAG+PRAXIS, as seems reasonable and in fact the estimates are all close to the truth, (at least in the sense that the confidence intervals contain the true values), but unfortunately the estimates do *not* maximize the log likelihood, even though they are the most reasonable.

If we looked at model 2, what do we see? Again, `lbfgs`

does not converge so I omit it. Unfortunately, `nlminb`

does not converge either, so it too must be omitted.

round(optMethodCompare(x2, spec, solvers[c(2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- ---------- ## solnp 0.0011 0.0020 0.9970 0.0025 0.0029 0.0005 0.0030 0.0035 0.0004 -1375.951 ## gosolnp 0.0011 0.0020 0.9970 0.0025 0.0029 0.0005 0.0030 0.0035 0.0004 -1375.951 ## hybrid 0.0011 0.0020 0.9970 0.0025 0.0029 0.0005 0.0030 0.0035 0.0004 -1375.951 ## nloptr+COBYLA 0.0016 0.0888 0.9112 0.0175 0.0619 0.0790 0.0540 0.2167 0.2834 -1394.529 ## nloptr+BOBYQA 0.0010 0.0892 0.9108 0.0194 0.0659 0.0874 0.0710 0.2631 0.3572 -1395.310 ## nloptr+PRAXIS 0.5018 0.0739 0.3803 0.3178 0.0401 0.3637 0.2777 0.0341 0.3225 -1373.632 ## nloptr+NELDERMEAD 0.0028 0.0026 0.9944 0.0028 0.0031 0.0004 0.0031 0.0035 0.0001 -1375.976 ## nloptr+SBPLX 0.0029 0.1000 0.9000 0.0146 0.0475 0.0577 0.0275 0.1108 0.1408 -1395.807 ## nloptr+AUGLAG+COBYLA 0.0016 0.0888 0.9112 0.0175 0.0619 0.0790 0.0540 0.2167 0.2834 -1394.529 ## nloptr+AUGLAG+BOBYQA 0.0010 0.0892 0.9108 0.0194 0.0659 0.0874 0.0710 0.2631 0.3572 -1395.310 ## nloptr+AUGLAG+PRAXIS 0.5018 0.0739 0.3803 0.3178 0.0401 0.3637 0.2777 0.0341 0.3225 -1373.632 ## nloptr+AUGLAG+NELDERMEAD 0.0001 0.0000 1.0000 0.0003 0.0003 0.0000 0.0004 0.0004 0.0000 -1375.885 ## nloptr+AUGLAG+SBPLX 0.0029 0.1000 0.9000 0.0146 0.0475 0.0577 0.0275 0.1108 0.1408 -1395.807

Here it was PRAXIS and AUGLAG+PRAXIS that gave the “optimal” result, and it was only those two methods that did. Other optimizers gave visibly bad results. That said, the “optimal” solution is the preferred on with the parameters being nonzero and their confidence intervals containing the correct values.

What happens if we restrict the sample to size 100? (`lbfgs`

still does not work.)

round(optMethodCompare(x1[1:100], spec, solvers[c(1:2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- --------- ## nlminb 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## solnp 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## gosolnp 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## hybrid 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+COBYLA 0.0007 0.1202 0.8798 0.0085 0.0999 0.0983 0.0081 0.1875 0.1778 -85.3121 ## nloptr+BOBYQA 0.0005 0.1190 0.8810 0.0085 0.0994 0.0992 0.0084 0.1892 0.1831 -85.3717 ## nloptr+PRAXIS 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+NELDERMEAD 0.0451 0.2742 0.5920 0.0281 0.1230 0.1297 0.0191 0.0906 0.0667 -80.6587 ## nloptr+SBPLX 0.0433 0.2740 0.5998 0.0269 0.1237 0.1268 0.0182 0.0916 0.0648 -80.6616 ## nloptr+AUGLAG+COBYLA 0.0007 0.1202 0.8798 0.0085 0.0999 0.0983 0.0081 0.1875 0.1778 -85.3121 ## nloptr+AUGLAG+BOBYQA 0.0005 0.1190 0.8810 0.0085 0.0994 0.0992 0.0084 0.1892 0.1831 -85.3717 ## nloptr+AUGLAG+PRAXIS 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+AUGLAG+NELDERMEAD 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+AUGLAG+SBPLX 0.0450 0.2742 0.5924 0.0280 0.1230 0.1295 0.0191 0.0906 0.0666 -80.6587

round(optMethodCompare(x2[1:100], spec, solvers[c(1:2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- ---------- ## nlminb 0.7592 0.0850 0.0000 2.1366 0.4813 3.0945 7.5439 1.7763 11.0570 -132.4614 ## solnp 0.0008 0.0000 0.9990 0.0291 0.0417 0.0066 0.0232 0.0328 0.0034 -132.9182 ## gosolnp 0.0537 0.0000 0.9369 0.0521 0.0087 0.0713 0.0430 0.0012 0.0529 -132.9124 ## hybrid 0.0008 0.0000 0.9990 0.0291 0.0417 0.0066 0.0232 0.0328 0.0034 -132.9182 ## nloptr+COBYLA 0.0014 0.0899 0.9101 0.0259 0.0330 0.1192 0.0709 0.0943 0.1344 -135.7495 ## nloptr+BOBYQA 0.0008 0.0905 0.9095 0.0220 0.0051 0.1145 0.0687 0.0907 0.1261 -135.8228 ## nloptr+PRAXIS 0.0602 0.0000 0.9293 0.0522 0.0088 0.0773 0.0462 0.0015 0.0565 -132.9125 ## nloptr+NELDERMEAD 0.0024 0.0000 0.9971 0.0473 0.0629 0.0116 0.0499 0.0680 0.0066 -132.9186 ## nloptr+SBPLX 0.0027 0.1000 0.9000 0.0238 0.0493 0.1308 0.0769 0.1049 0.1535 -135.9175 ## nloptr+AUGLAG+COBYLA 0.0014 0.0899 0.9101 0.0259 0.0330 0.1192 0.0709 0.0943 0.1344 -135.7495 ## nloptr+AUGLAG+BOBYQA 0.0008 0.0905 0.9095 0.0221 0.0053 0.1145 0.0687 0.0907 0.1262 -135.8226 ## nloptr+AUGLAG+PRAXIS 0.0602 0.0000 0.9294 0.0523 0.0090 0.0771 0.0462 0.0014 0.0565 -132.9125 ## nloptr+AUGLAG+NELDERMEAD 0.0000 0.0000 0.9999 0.0027 0.0006 0.0005 0.0013 0.0004 0.0003 -132.9180 ## nloptr+AUGLAG+SBPLX 0.0027 0.1000 0.9000 0.0238 0.0493 0.1308 0.0769 0.1049 0.1535 -135.9175

The results are not thrilling. The “best” result for the series generated by model 1 was attained by multiple solvers, and the 95% confidence interval (CI) for would not contain ‘s true value, though the CIs for the other parameters would contain their true values. For the series generated by model 2 the best result was attained by the `nlminb`

solver; the parameter values are not plausible and the standard errors are huge. At least the CI would contain the correct value.

From here we should no longer stick to two series but see the performance of these methods on many simulated series generated by both models. Simulations in this post will be too computationally intensive for my laptop so I will use my department’s supercomputer to perform them, taking advantage of its many cores for parallelization.

library(foreach) library(doParallel) logfile <- "" # logfile <- "outfile.log" # if (!file.exists(logfile)) { # file.create(logfile) # } cl <- makeCluster(detectCores() - 1, outfile = logfile) registerDoParallel(cl) optMethodSims <- function(gen_spec, n.sim = 1000, m.sim = 1000, fit_spec = ugarchspec(mean.model = list( armaOrder = c(0,0), include.mean = FALSE)), solvers = list("solnp" = list( "solver" = "solnp", "solver.control" = list())), rseed = NA, verbose = FALSE) { # Performs simulations in parallel of GARCH processes via rugarch and returns # a list with the results of different optimization routines # # Args: # gen_spec: The specification for generating a GARCH sequence, produced by # ugarchspec # n.sim: An integer denoting the length of the simulated series # m.sim: An integer for the number of simulated sequences to generate # fit_spec: A ugarchspec specification for the model to fit # solvers: A list of lists containing strings of solvers and a list for # solver.control # rseed: Optional seeding value(s) for the random number generator. For # m.sim>1, it is possible to provide either a single seed to # initialize all values, or one seed per separate simulation (i.e. # m.sim seeds). However, in the latter case this may result in some # slight overhead depending on how large m.sim is # verbose: Boolean for whether to write data tracking the progress of the # loop into an output file # outfile: A string for the file to store verbose output to (relevant only # if verbose is TRUE) # # Return: # A list containing the result of calling optMethodCompare on each generated # sequence fits <- foreach(i = 1:m.sim, .packages = c("rugarch"), .export = c("optMethodCompare", "evalSolverFit")) %dopar% { if (is.na(rseed)) { newseed <- NA } else if (is.vector(rseed)) { newseed <- rseed[i] } else { newseed <- rseed + i - 1 } if (verbose) { cat(as.character(Sys.time()), ": Now on simulation ", i, "\n") } sim <- ugarchpath(gen_spec, n.sim = n.sim, n.start = 1000, m.sim = 1, rseed = newseed) x <- sim@path$seriesSim optMethodCompare(x, spec = fit_spec, solvers = solvers) } return(fits) } # Specification 1 first spec1_n100 <- optMethodSims(spec1, n.sim = 100, m.sim = 1000, solvers = solvers, verbose = TRUE) spec1_n500 <- optMethodSims(spec1, n.sim = 500, m.sim = 1000, solvers = solvers, verbose = TRUE) spec1_n1000 <- optMethodSims(spec1, n.sim = 1000, m.sim = 1000, solvers = solvers, verbose = TRUE) # Specification 2 next spec2_n100 <- optMethodSims(spec2, n.sim = 100, m.sim = 1000, solvers = solvers, verbose = TRUE) spec2_n500 <- optMethodSims(spec2, n.sim = 500, m.sim = 1000, solvers = solvers, verbose = TRUE) spec2_n1000 <- optMethodSims(spec2, n.sim = 1000, m.sim = 1000, solvers = solvers, verbose = TRUE)

Below is a set of helper functions I will use for the analytics I want.

optMethodSims_getAllVals <- function(param, solver, reslist) { # Get all values for a parameter obtained by a certain solver after getting a # list of results via optMethodSims # # Args: # param: A string for the parameter to get (such as "beta1") # solver: A string for the solver for which to get the parameter (such as # "nlminb") # reslist: A list created by optMethodSims # # Return: # A vector of values of the parameter for each simulation res <- sapply(reslist, function(l) { return(l[solver, param]) }) return(res) } optMethodSims_getBestVals <- function(reslist, opt_vec = TRUE, reslike = FALSE) { # A function that gets the optimizer that maximized the likelihood function # for each entry in reslist # # Args: # reslist: A list created by optMethodSims # opt_vec: A boolean indicating whether to return a vector with the name of # the optimizers that maximized the log likelihood # reslike: A bookean indicating whether the resulting list should consist of # matrices of only one row labeled "best" with a structure like # reslist # # Return: # If opt_vec is TRUE, a list of lists, where each sublist contains a vector # of strings naming the opimizers that maximized the likelihood function and # a matrix of the parameters found. Otherwise, just the matrix (resembles # the list generated by optMethodSims) res <- lapply(reslist, function(l) { max_llh <- max(l[, "LLH"], na.rm = TRUE) best_idx <- (l[, "LLH"] == max_llh) & (!is.na(l[, "LLH"])) best_mat <- l[best_idx, , drop = FALSE] if (opt_vec) { return(list("solvers" = rownames(best_mat), "params" = best_mat)) } else { return(best_mat) } }) if (reslike) { res <- lapply(res, function(l) { mat <- l$params[1, , drop = FALSE] rownames(mat) <- "best" return(mat) }) } return(res) } optMethodSims_getCaptureRate <- function(param, solver, reslist, multiplier = 2, spec, use_robust = TRUE) { # Gets the rate a confidence interval for a parameter captures the true value # # Args: # param: A string for the parameter being worked with # solver: A string for the solver used to estimate the parameter # reslist: A list created by optMethodSims # multiplier: A floating-point number for the multiplier to the standard # error, appropriate for the desired confidence level # spec: A ugarchspec specification with the fixed parameters containing the # true parameter value # use_robust: Use robust standard errors for computing CIs # # Return: # A float for the proportion of times the confidence interval captured the # true parameter value se_string <- ifelse(use_robust, "robust.se.", "se.") est <- optMethodSims_getAllVals(param, solver, reslist) moe_est <- multiplier * optMethodSims_getAllVals( paste0(se_string, param), solver, reslist) param_val <- spec@model$fixed.pars[[param]] contained <- (param_val <= est + moe_est) & (param_val >= est - moe_est) return(mean(contained, na.rm = TRUE)) } optMethodSims_getMaxRate <- function(solver, maxlist) { # Gets how frequently a solver found a maximal log likelihood # # Args: # solver: A string for the solver # maxlist A list created by optMethodSims_getBestVals with entries # containing vectors naming the solvers that maximized the log # likelihood # # Return: # The proportion of times the solver maximized the log likelihood maxed <- sapply(maxlist, function(l) { solver %in% l$solvers }) return(mean(maxed)) } optMethodSims_getFailureRate <- function(solver, reslist) { # Computes the proportion of times a solver failed to converge. # # Args: # solver: A string for the solver # reslist: A list created by optMethodSims # # Return: # Numeric proportion of times a solver failed to converge failed <- sapply(reslist, function(l) { is.na(l[solver, "LLH"]) }) return(mean(failed)) } # Vectorization optMethodSims_getCaptureRate <- Vectorize(optMethodSims_getCaptureRate, vectorize.args = "solver") optMethodSims_getMaxRate <- Vectorize(optMethodSims_getMaxRate, vectorize.args = "solver") optMethodSims_getFailureRate <- Vectorize(optMethodSims_getFailureRate, vectorize.args = "solver")

I first create tables containing, for a fixed sample size and model:

- The rate at which a solver attains the highest log likelihood among all solvers for a series
- The rate at which a solver failed to converge
- The rate at which a roughly 95% confidence interval based on the solver’s solution managed to contain the true parameter value for each parameter (referred to as the “capture rate”, and using the robust standard errors)

solver_table <- function(reslist, tags, spec) { # Creates a table describing important solver statistics # # Args: # reslist: A list created by optMethodSims # tags: A vector with strings naming all solvers to include in the table # spec: A ugarchspec specification with the fixed parameters containing the # true parameter value # # Return: # A matrix containing metrics describing the performance of the solvers params <- names(spec1@model$fixed.pars) max_rate <- optMethodSims_getMaxRate(tags, optMethodSims_getBestVals(reslist)) failure_rate <- optMethodSims_getFailureRate(tags, reslist) capture_rate <- lapply(params, function(p) { optMethodSims_getCaptureRate(p, tags, reslist, spec = spec) }) return_mat <- cbind("Maximization Rate" = max_rate, "Failure Rate" = failure_rate) capture_mat <- do.call(cbind, capture_rate) colnames(capture_mat) <- paste(params, "95% CI Capture Rate") return_mat <- cbind(return_mat, capture_mat) return(return_mat) }

as.data.frame(round(solver_table(spec1_n100, tags, spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 16.2 20.0 21.8 29.2 24.0 ## solnp 0.1 0.0 13.7 24.0 15.4 ## lbfgs 15.1 35.2 56.6 67.9 58.0 ## gosolnp 20.3 0.0 20.3 32.6 21.9 ## hybrid 0.1 0.0 13.7 24.0 15.4 ## nloptr+COBYLA 0.0 0.0 6.3 82.6 19.8 ## nloptr+BOBYQA 0.0 0.0 5.4 82.1 18.5 ## nloptr+PRAXIS 15.8 0.0 42.1 54.5 44.1 ## nloptr+NELDERMEAD 0.4 0.0 5.7 19.3 8.1 ## nloptr+SBPLX 0.1 0.0 7.7 85.7 24.1 ## nloptr+AUGLAG+COBYLA 0.0 0.0 6.1 84.5 19.9 ## nloptr+AUGLAG+BOBYQA 0.1 0.0 6.5 83.2 19.4 ## nloptr+AUGLAG+PRAXIS 22.6 0.0 41.2 54.6 44.1 ## nloptr+AUGLAG+NELDERMEAD 11.1 0.0 7.5 18.8 9.7 ## nloptr+AUGLAG+SBPLX 0.6 0.0 7.9 86.5 23.0

as.data.frame(round(solver_table(spec1_n500, tags, spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 21.2 0.4 63.3 67.2 63.8 ## solnp 0.1 0.2 32.2 35.6 32.7 ## lbfgs 4.5 41.3 85.0 87.6 85.7 ## gosolnp 35.1 0.0 69.0 73.2 69.5 ## hybrid 0.1 0.0 32.3 35.7 32.8 ## nloptr+COBYLA 0.0 0.0 3.2 83.3 17.8 ## nloptr+BOBYQA 0.0 0.0 3.5 81.5 18.1 ## nloptr+PRAXIS 18.0 0.0 83.9 87.0 84.2 ## nloptr+NELDERMEAD 0.0 0.0 16.4 20.7 16.7 ## nloptr+SBPLX 0.1 0.0 3.7 91.4 15.7 ## nloptr+AUGLAG+COBYLA 0.0 0.0 3.2 83.3 17.8 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 3.5 81.5 18.1 ## nloptr+AUGLAG+PRAXIS 21.9 0.0 80.2 87.4 83.4 ## nloptr+AUGLAG+NELDERMEAD 0.6 0.0 20.0 24.0 20.5 ## nloptr+AUGLAG+SBPLX 0.0 0.0 3.7 91.4 15.7

as.data.frame(round(solver_table(spec1_n1000, tags, spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 21.5 0.1 88.2 86.1 87.8 ## solnp 0.4 0.2 54.9 53.6 54.6 ## lbfgs 1.1 44.8 91.5 88.0 91.8 ## gosolnp 46.8 0.0 87.2 85.1 87.0 ## hybrid 0.5 0.0 55.0 53.6 54.7 ## nloptr+COBYLA 0.0 0.0 4.1 74.5 15.0 ## nloptr+BOBYQA 0.0 0.0 3.6 74.3 15.9 ## nloptr+PRAXIS 17.7 0.0 92.6 90.2 92.2 ## nloptr+NELDERMEAD 0.0 0.0 30.5 29.6 30.9 ## nloptr+SBPLX 0.0 0.0 3.0 82.3 11.6 ## nloptr+AUGLAG+COBYLA 0.0 0.0 4.1 74.5 15.0 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 3.6 74.3 15.9 ## nloptr+AUGLAG+PRAXIS 13.0 0.0 83.4 93.9 86.7 ## nloptr+AUGLAG+NELDERMEAD 0.0 0.0 34.6 33.8 35.0 ## nloptr+AUGLAG+SBPLX 0.0 0.0 3.0 82.3 11.6

as.data.frame(round(solver_table(spec2_n100, tags, spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 8.2 24.2 22.3 34.7 23.9 ## solnp 0.3 0.0 21.1 32.6 21.3 ## lbfgs 11.6 29.5 74.9 73.2 70.4 ## gosolnp 19.0 0.0 31.9 41.2 30.8 ## hybrid 0.3 0.0 21.1 32.6 21.3 ## nloptr+COBYLA 0.0 0.0 20.5 94.7 61.7 ## nloptr+BOBYQA 0.2 0.0 19.3 95.8 62.2 ## nloptr+PRAXIS 16.0 0.0 70.2 57.2 52.8 ## nloptr+NELDERMEAD 0.2 0.0 7.8 27.8 14.1 ## nloptr+SBPLX 0.1 0.0 24.9 91.0 65.0 ## nloptr+AUGLAG+COBYLA 0.0 0.0 21.2 95.1 62.5 ## nloptr+AUGLAG+BOBYQA 0.9 0.0 20.1 96.2 62.5 ## nloptr+AUGLAG+PRAXIS 38.8 0.0 70.4 57.2 52.7 ## nloptr+AUGLAG+NELDERMEAD 14.4 0.0 10.7 26.0 16.1 ## nloptr+AUGLAG+SBPLX 0.1 0.0 25.8 91.9 65.5

as.data.frame(round(solver_table(spec2_n500, tags, spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 1.7 1.6 35.0 37.2 34.2 ## solnp 0.1 0.2 46.2 48.6 45.3 ## lbfgs 2.2 38.4 85.2 88.1 82.3 ## gosolnp 5.2 0.0 74.9 77.8 72.7 ## hybrid 0.1 0.0 46.1 48.5 45.2 ## nloptr+COBYLA 0.0 0.0 8.2 100.0 40.5 ## nloptr+BOBYQA 0.0 0.0 9.5 100.0 41.0 ## nloptr+PRAXIS 17.0 0.0 83.8 85.1 81.0 ## nloptr+NELDERMEAD 0.0 0.0 26.9 38.2 27.0 ## nloptr+SBPLX 0.0 0.0 8.2 100.0 40.2 ## nloptr+AUGLAG+COBYLA 0.0 0.0 8.2 100.0 40.5 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 9.5 100.0 41.0 ## nloptr+AUGLAG+PRAXIS 77.8 0.0 84.4 85.4 81.3 ## nloptr+AUGLAG+NELDERMEAD 1.1 0.0 32.5 40.3 32.3 ## nloptr+AUGLAG+SBPLX 0.0 0.0 8.2 100.0 40.2

as.data.frame(round(solver_table(spec2_n1000, tags, spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 2.7 0.7 64.1 68.0 63.8 ## solnp 0.0 0.0 70.1 73.8 69.8 ## lbfgs 0.0 43.4 90.6 91.5 89.9 ## gosolnp 3.2 0.0 87.5 90.3 86.9 ## hybrid 0.0 0.0 70.1 73.8 69.8 ## nloptr+COBYLA 0.0 0.0 2.3 100.0 20.6 ## nloptr+BOBYQA 0.0 0.0 2.5 100.0 22.6 ## nloptr+PRAXIS 14.1 0.0 89.1 91.3 88.5 ## nloptr+NELDERMEAD 0.0 0.0 46.3 55.6 45.4 ## nloptr+SBPLX 0.0 0.0 2.2 100.0 19.5 ## nloptr+AUGLAG+COBYLA 0.0 0.0 2.3 100.0 20.6 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 2.5 100.0 22.6 ## nloptr+AUGLAG+PRAXIS 85.5 0.0 89.1 91.3 88.5 ## nloptr+AUGLAG+NELDERMEAD 0.3 0.0 51.9 58.2 51.3 ## nloptr+AUGLAG+SBPLX 0.0 0.0 2.2 100.0 19.5

These tables already reveal a lot of information. In general it seems that the AUGLAG-PRAXIS method (the augmented Lagrangian method using the principal axis solver) provided in NLOpt does best for model 2 especially for large sample sizes, while for model 1 the `gosolnp`

method, which uses the `solnp`

solver by Yinyu Ye but with random initializations and restarts, seems to win out for larger sample sizes.

The bigger story, though, is the failure of any method to be the “best”, especially in the case of smaller sample sizes. While there are some optimizers that consistently fail to attain the maximum log-likelihood, no optimizer can claim to consistently obtain the best result. Additionally, different optimizers seem to perform better with different models. The implication for real-world data–where the true model parameters are never known–is to try every optimizer (or at least those that have a chance of maximizing the log-likelihood) and pick the results that yield the largest log-likelihood. No algorithm is trustworthy enough to be the go-to algorithm.

Let’s now look at plots of the estimated distribution of the parameters. First comes a helper function.

library(ggplot2) solver_density_plot <- function(param, tags, list_reslist, sample_sizes, spec) { # Given a parameter, creates a density plot for each solver's distribution # at different sample sizes # # Args: # param: A string for the parameter to plot # tags: A character vector containing the solver names # list_reslist: A list of lists created by optMethodSimsf, one for each # sample size # sample_sizes: A numeric vector identifying the sample size corresponding # to each object in the above list # spec: A ugarchspec object containing the specification that generated the # datasets # # Returns: # A ggplot object containing the plot generated p <- spec@model$fixed.pars[[param]] nlist <- lapply(list_reslist, function(l) { optlist <- lapply(tags, function(t) { return(na.omit(optMethodSims_getAllVals(param, t, l))) }) names(optlist) <- tags df <- stack(optlist) names(df) <- c("param", "optimizer") return(df) }) ndf <- do.call(rbind, nlist) ndf$n <- rep(sample_sizes, times = sapply(nlist, nrow)) ggplot(ndf, aes(x = param)) + geom_density(fill = "black", alpha = 0.5) + geom_vline(xintercept = p, color = "blue") + facet_grid(optimizer ~ n, scales = "free_y") }

Now for plots.

solver_density_plot("omega", tags, list(spec1_n100, spec1_n500, spec1_n1000), c(100, 500, 1000), spec1)

solver_density_plot("alpha1", tags, list(spec1_n100, spec1_n500, spec1_n1000), c(100, 500, 1000), spec1)

solver_density_plot("beta1", tags, list(spec1_n100, spec1_n500, spec1_n1000), c(100, 500, 1000), spec1)

Bear in mind that there are only 1,000 simulated series and the optimizers produce solutions for each series, so in principle optimizer results should not be independent, yet the only time these density plots look the same is when the optimizer performs terribly. But even when an optimizer isn’t performing terribly (as is the case for the `gosolnp`

, `PRAXIS`

, and `AUGLAG-PRAXIS`

methods) there’s evidence of artifacts around 0 for the estimates of and and 1 for . These artifacts are more pronounced for smaller sample sizes. That said, for the better optimizers the estimators look almost unbiased, especially for and , but their spread is large even for large sample sizes, especially for ‘s estimator. That’s not the case for the `AUGLAG-PRAXIS`

optimizer, though; it appears to produce biased estimates.

Let’s look at plots for model 2.

solver_density_plot("omega", tags, list(spec2_n100, spec2_n500, spec2_n1000), c(100, 500, 1000), spec2)

solver_density_plot("alpha1", tags, list(spec2_n100, spec2_n500, spec2_n1000), c(100, 500, 1000), spec2)

solver_density_plot("beta1", tags, list(spec2_n100, spec2_n500, spec2_n1000), c(100, 500, 1000), spec2)

The estimators don’t struggle as much for model 2, but the picture is still hardly rosy. The `PRAXIS`

and `AUGLAG-PRAXIS`

methods seem to perform well, but far from spectacularly for small sample sizes.

So far, my experiments suggest practitioners should not rely on any one optimizer but instead to try different ones and choose the results that have the largest log-likelihood. Suppose we call this optimization routine the “best” optimizer. how does this optimizer perform?

Let’s find out.

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec1_n100, reslike = TRUE), "best", spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 49.5 63.3 52.2

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec1_n500, reslike = TRUE), "best", spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 86 88.8 86.2

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec1_n1000, reslike = TRUE), "best", spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 92.8 90.3 92.4

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec2_n100, reslike = TRUE), "best", spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 55.2 63.2 52.2

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec2_n500, reslike = TRUE), "best", spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 83 86.3 80.5

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec2_n1000, reslike = TRUE), "best", spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 88.7 91.4 88.1

Bear in mind that we evaluate the performance of the “best” optimizer by the CI capture rate, which should be around 95%. The “best” optimizer obviously has good performance but does not outperform all optimizers. This is disappointing; I had hoped that the “best” optimizer would have the highly desirable property of a 95% capture rate. Performance is nowhere near that except for larger sample sizes. Either the standard errors are being underestimated or for small sample sizes the Normal distribution poorly describes the actual distribution of the estimators (which means multiplying by two does not lead to intervals with the desired confidence level).

Interestingly, there is no noticeable difference in performance between the two models for this “best” estimator. This suggests to me that the seemingly better results for models often seen in actual data might be exploiting the bias of the optimizers.

Let’s look at the distribution of the estimated parameters.

solver_density_plot("omega", "best", lapply(list(spec1_n100, spec1_n500, spec1_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec1)

solver_density_plot("alpha1", "best", lapply(list(spec1_n100, spec1_n500, spec1_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec1)

solver_density_plot("beta1", "best", lapply(list(spec1_n100, spec1_n500, spec1_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec1)

solver_density_plot("omega", "best", lapply(list(spec2_n100, spec2_n500, spec2_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec2)

solver_density_plot("alpha1", "best", lapply(list(spec2_n100, spec2_n500, spec2_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec2)

solver_density_plot("beta1", "best", lapply(list(spec2_n100, spec2_n500, spec2_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec2)

The plots suggest that the “best” estimator still shows some pathologies even though it behaves less poorly than the other estimators. I don’t see evidence for bias in parameter estimates regardless of choice of model but I’m not convinced the “best” estimator truly maximizes the log-likelihood, especially for smaller sample sizes. the estimates for are especially bad. Even if the standard error for should be large I don’t think it should show the propensity for being zero or one that these plots reveal.

I initially wrote this article over a year ago and didn’t publish it until now. The reason for the hang up was because I wanted a literature review of alternative ways to estimate the parameters of a GARCH model. Unfortunately I never completed such a review, and I’ve decided to release this article regardless.

That said, I’ll share what I was reading. One article by Gilles Zumbach tried to explain why estimating GARCH parameters is hard. He noted that the quasi-likelihood equation that solvers try to maximize has bad properties, such as being non-concave and having “flat” regions that algorithms can become stuck in. He suggested an alternative procedure to finding the parameters of GARCH models, where one finds the best fit in an alternative parameter space (which supposedly has better properties than working with the original parameter space of GARCH models) and estimating one of the parameters using, say, the method of moments, without any optimization algorithm. Another article, by Fiorentini, Calzolari, and Panattoni, showed that analytic gradients for GARCH models could be computed explicitly, so gradient-free methods like those used by the optimization algorithms seen here are not actually necessary. Since numerical differentiation is generally a difficult problem, this could help ensure that no additional numerical error is being introduced that causes these algorithms to fail to converge. I also wanted to explore other estimation methods to see if they somehow can avoid numerical techniques altogether or have better numerical properties, such as estimation via method of moments. I wanted to read an article by Andersen, Chung, and Sørensen to learn more about this approach to estimation.

Life happens, though, and I didn’t complete this review. The project moved on and the problem of estimating GARCH model parameters well was essentially avoided. That said, I want to revisit this point, perhaps exploring how techniques such as simulated annealing do for estimating GARCH model parameters.

So for now, if you’re a practitioner, what should you do when estimating a GARCH model? I would say don’t take for granted that the default estimation procedure your package uses will work. You should explore different procedure and different parameter choices and go with the results that lead to the largest log-likelihood value. I showed how this could be done in an automated fashion but you should be prepared to *manually* pick the model with the best fit (as determined by the log-likelihood). If you don’t do this the model you estimated may not actually be the one for which theory works.

I will say it again, one last time, in the last sentence of this article for extra emphasis: *don’t take numerical techniques and results for granted!*

sessionInfo()

## R version 3.4.2 (2017-09-28) ## Platform: i686-pc-linux-gnu (32-bit) ## Running under: Ubuntu 16.04.2 LTS ## ## Matrix products: default ## BLAS: /usr/lib/libblas/libblas.so.3.6.0 ## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] ggplot2_2.2.1 rugarch_1.3-8 printr_0.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.16 htmltools_0.3.6 ## [3] SkewHyperbolic_0.3-2 expm_0.999-2 ## [5] scales_0.5.0 DistributionUtils_0.5-1 ## [7] Rsolnp_1.16 rprojroot_1.2 ## [9] grid_3.4.2 stringr_1.3.1 ## [11] knitr_1.17 numDeriv_2016.8-1 ## [13] GeneralizedHyperbolic_0.8-1 munsell_0.4.3 ## [15] pillar_1.3.0 tibble_1.4.2 ## [17] compiler_3.4.2 highr_0.6 ## [19] lattice_0.20-35 labeling_0.3 ## [21] Matrix_1.2-8 KernSmooth_2.23-15 ## [23] plyr_1.8.4 xts_0.10-0 ## [25] spd_2.0-1 zoo_1.8-0 ## [27] stringi_1.2.4 magrittr_1.5 ## [29] reshape2_1.4.2 rlang_0.2.2 ## [31] rmarkdown_1.7 evaluate_0.10.1 ## [33] gtable_0.2.0 colorspace_1.3-2 ## [35] yaml_2.1.14 tools_3.4.2 ## [37] mclust_5.4.1 mvtnorm_1.0-6 ## [39] truncnorm_1.0-7 ks_1.11.3 ## [41] nloptr_1.0.4 lazyeval_0.2.1 ## [43] crayon_1.3.4 backports_1.1.1 ## [45] Rcpp_1.0.0

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

- When I wrote this article initially, my advisor and a former student of his developed a test statistic that should detect early or late change points in a time series, including a change in the parameters of a GARCH model. My contribution to the paper we were writing included demonstrating that the test statistic detects structural change sooner than other test statistics when applied to real-world data. To be convincing to reviewers, our test statistic should detect a change that another statistic won’t detect until getting more data. This means that the change should be present but not so strong that both statistics immediately detect the change with miniscule -values. ↩
- The profile on LinkedIn I linked to may or may not be the correct person; I’m guessing it is based on the listed occupations and history. If I got the wrong person, I’m sorry. ↩

The *Forgotten Age* cycle of Arkham Horror is at a close and Fantasy Flight Games already announced the next cycle, *The Circle Undone*. Not only that, they’ve announced two mythos packs at a rate that… surprised me. A new cycle announcement and two mythos pack announcements in less than two months? Am I the only one who finds the new pace of announcements surprising? Perhaps that means they want to get product out at a faster pace?

Eh, enough speculation. I wrote about Arkham Horror before, analyzing Olive McBride specifically. This analysis (despite errors in the initial publication) was well received, even earning me a shoutout from my favorite Arkham-related YouTube channel.

In the announcement of the mythos pack *The Wages of Sin*, another mathematically interesting card was spoiled: Henry Wan, seen below.

Designing new allies for Arkham Horror is very hard because there can effectively only be one ally in a deck and there are many good allies already released, many of them in the core set. Henry Wan, specifically, is competing with Leo de Luca, who competes with Dr. Milan Christopher for the title of “Best Ally”. Cards like Charisma help the problem, but only if you plan on running multiple allies and are willing to pay the experience points for it.

Can Henry Wan compete with Leo de Luca? That strongly depends on how good his ability is. Actions are a precious commodity in Arkham Horror; this is why Leo de Luca is considered such a great card. Card draw and resource gain *can* help action economy, especially in a spendthrift class such as the Rogue (green) class, but it often takes many resources to compensate for a lost action.

Consider, for instance, Father Mateo’s Elder Sign ability; gain an extra action, or a card and a resource. As a point of reference, players can draw a card *or* gain a resource for one of their actions, so a raw evaluation would say that drawing a card *and* gaining a resource is actually worth two actions and thus is better than just getting a free action. But I feel most of the time people use Father Mateo’s elder sign effect to gain the additional action rather than the card and resource (though choosing the latter effect is far from rare). In fact, I think that a single action could be valued at *three* resources, based only on the fact that when a player draws Emergency Cache they will eagerly play it. When viewed from this perspective, Leo de Luca pays for himself after about three turns, and drawing him early gives an investigator a major boost in a scenario.

Henry Wan will thus live or die based on how strong his ability is. That said, “strong” depends on how well a player can use his ability, which is not a trivial task.

Make no mistake: Henry Wan is a gambler’s card (which fits the Rogue theme very well). Not only does a player gamble the resources spent on him, they gamble the action spent to trigger his ability; heck, using a deck slot on him is a gamble! A player thus will gain value from him *only* if they use him optimally.

Optimal play is not trivially determined, but fortunately Henry Wan’s ability is easy to model mathematically if you’re familiar with Markov chains. Wait, are most people *not* familiar with Markov chains? Oh, I didn’t know that. Oh well, maybe they’ll learn something from what follows. I’ll do my best to make it simple.

From here on, I consider drawing a card or gaining a resource with Henry Wan as equivalent; I’ll simply imagine that we’re trying to gain resources using his ability. Henry Wan’s ability calls on players to institute a policy for playing him of the following form:

**After X draws, take your winnings; do not draw anymore.**

Our job is thus to choose X so that we maximize the *expected* resource gain (in the probabilistic sense of expectation.)

I’m going to call utilizing Henry Wan’s ability a single “game”. Here’s how I view the game: the chaos bag is filled with tokens labeled either “S” or “F”, with every “F” being one of the icon tokens mentioned in Henry Wan’s ability. When an “S” is drawn, the game continues, while the game ends the moment an “F” is drawn. Every time we draw an additional “S”, there is one fewer “S” in the bag, and the odds of drawing an “F” increase; that said, our total winnings increase with each “S” we draw.

The game ends when either an “F” is drawn or the policy is triggered. Our winnings depend on which of these outcomes we find ourselves. If it’s the former, our winnings are 0, while if the latter, our winnings are X. Thus it’s easy to see (if you’re familiar with probability) that the expected winnings for any given policy is X times the probability of winning with the chosen policy: , if you prefer (with be the probability of not failing using the policy of ending after X draws). We thus want to pick X that maximizes .

Calculating calls for the Markov chain. Below is the chain I imagine:

- The initial state is state 0, representing zero draws. There are also states numbered 1 to X, and a state F.
- If the chain is at state , the chain moves to state with probability or to state F with probability .
- Both state X and state F are absorbing states. (Once entered, the chain does not leave the state; in other words, the "game" ends.)

The problem now is to calculate the probability the chain is absorbed into state X. The solution of ending in a particular absorbing state is well known (and given in the above link to Wikipedia).

No special trick for finding a maximizing X is necessary once we know how to solve this problem for any X; just list out all possible policies (there's only finitely many we need to worry about, and the number doesn't exceed 20 most of the time) and the expected winnings and pick the X maximizing this number.

The maximizing policy depends on what's in the chaos bag. Shocking, right? That said, this is an important point; each campaign/scenario/difficulty level has its own chaos bag, and thanks to cards with the **seal** keyword, the chaos bag can be changed *during* a scenario, perhaps to either the benefit or detriment of Henry Wan. Fortunately, the "S" and "F" language makes modelling the contents of the chaos bag so simple, we can create two-dimensional tables depending only on the number of "S's" and "F's" in the bag and those tables will cover nearly every scenario an investigator will encounter.

The script below (which can be made executable on Unix systems with R installed) can be used for generating such tables.

#!/usr/bin/Rscript ################################################################################ # ArkhamHorrorHenryWanTableGenerator.R ################################################################################ # 2018-12-02 # Curtis Miller ################################################################################ # This is a one-line description of the file. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # FUNCTIONS ################################################################################ #' Henry Wan Policy Calculator #' #' Calculates important quantities for optimal play with Henry Wan #' #' @param s The number of "S" (or "success") tokens in the bag #' @param f The number of "F" (or "failure") tokens in the bag #' @param olive If \code{TRUE}, the first draw is done with Olive McBride #' @param out If \code{"X"}, return the optimal stopping time (default); if #' \code{"EV"}, return the expected winnings of the optimal policy; #' if \code{"P"}, return the probability of success of the optimal #' policy #' @return Numeric depending on the value of the parameter \code{out} #' @examples #' wan_policy_calculator(11, 5) wan_policy_calculator <- function(s, f, olive = FALSE, out = c("X", "EV", "P")) { out <- out[[1]] policies <- (ifelse(olive, 2, 1)):s # Candidate X values policy_probs <- sapply(policies, function(X) { # Set up transition matrix of Markov chain P <- 0 * diag(X + 2) rownames(P) <- c(0:X, "F") colnames(P) <- rownames(P) P[c(X, "F"), c(X, "F")] <- diag(2) transient_states <- ifelse(X > 1, list(c("0", 1:(X - 1))), "0")[[1]] P[transient_states, "F"] <- f/(s + f - (0:(X - 1))) if (olive) { if (s + f < 3 | X == 1) { stop("X or chaos bag doesn't make sense with Olive!") } # Failure with Olive is modeled with a hypergeometric RV, with drawing one # or fewer "S's" P["0", "F"] <- phyper(1, m = s, n = f, k = 3) # The state 1 is effectively removed when Olive is used transient_states <- transient_states[-2] P <- P[-2, -2] # TODO: curtis: OLIVE IMPELENTATION -- Sun 02 Dec 2018 11:05:17 PM MST } if (X > 1) { if (olive & X == 2) { P["0", "2"] <- 1 - P["0", "F"] } else { P[transient_states, as.character((ifelse(olive, 2, 1)):X)] <- diag( c(1 - P[transient_states, "F"])) } } else { P["0", "1"] <- 1 - P["0", "F"] } # Compute absorption probability R <- P[transient_states, c(X, "F")] Q <- P[transient_states, transient_states, drop = FALSE] N <- solve(diag(nrow(Q)) - Q) B <- N %*% R B[1,1][[1]] }) X <- which.max(policy_probs * policies) if (out == "X") { policies[[X]] } else if (out == "EV") { policies[[X]] * policy_probs[[X]] } else if (out == "P") { policy_probs[[X]] } else { stop(paste("Don't know how to handle out =", out)) } } wan_policy_calculator <- Vectorize(wan_policy_calculator, c("s", "f")) ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(olive = FALSE, value = FALSE, prob = FALSE, digits = 2, lower_s = 5, upper_s = 20, lower_f = 0, upper_f = 8, help = FALSE) { # This function will be executed when the script is called from the command # line; the help parameter does nothing, but is needed for do.call() to work library(pander) sl <- lower_s su <- upper_s fl <- lower_f fu <- upper_f out <- "X" if (value) {out <- "EV"} if (prob) {out <- "P"} wan_table <- outer(sl:su, fl:fu, FUN = function(r, c) { wan_policy_calculator(r, c, olive = olive, out = out) }) rownames(wan_table) <- sl:su colnames(wan_table) <- fl:fu wan_table <- round(wan_table, digits = digits) pandoc.table(wan_table, style = "rmarkdown") } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = paste("Generates tables describing optimal policies", "for playing with the card Henry Wan in", "Arkham Horror: The Card Game (number of icon", "tokens in bag are columns; non-icon rows)."), option_list = list( make_option(c("--olive", "-o"), action = "store_true", default = FALSE, help = "The first draw is done with Olive"), make_option(c("--value", "-v"), action = "store_true", default = FALSE, help = paste("Report expected value rather than", "optimal stopping policy")), make_option(c("--prob", "-p"), action = "store_true", default = FALSE, help = paste("Report success probability of optimal", "stopping policy rather than the", "optimal stopping policy itself")), make_option(c("--digits", "-d"), type = "integer", default = 2, help = "Number of digits for rounding"), make_option(c("--lower-s", "-s"), type = "integer", default = 5, help = "Lowest considered number of non-icon tokens"), make_option(c("--upper-s", "-w"), type = "integer", default = 20, help = "Highest considered number of non-icon tokens"), make_option(c("--lower-f", "-f"), type = "integer", default = 0, help = "Lowest considered number of icon tokens"), make_option(c("--upper-f", "-r"), type = "integer", default = 8, help = "Highest number of icon tokens") ))) cl_args <- cl_args[c("olive", "value", "prob", "digits", "lower-s", "upper-s", "lower-f", "upper-f", "help")] names(cl_args) <- c("olive", "value", "prob", "digits", "lower_s", "upper_s", "lower_f", "upper_f", "help") do.call(main, cl_args) }

With the above script I can make the following three tables. The columns represent the number of (bad) icon tokens in the bag, while rows represent the number of other tokens in the bag. The first table is the optimal stopping policy; the second, the probability of success of the optimal stopping policy; and the third, the expected winnings of the optimal policy (which is the product of the previous two tables).

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |

6 |
6 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |

7 |
7 | 4 | 3 | 2 | 2 | 2 | 1 | 1 | 1 |

8 |
8 | 4 | 3 | 3 | 2 | 2 | 2 | 1 | 1 |

9 |
9 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 1 |

10 |
10 | 5 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

11 |
11 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

12 |
12 | 6 | 4 | 4 | 3 | 3 | 2 | 2 | 2 |

13 |
13 | 7 | 5 | 4 | 3 | 3 | 2 | 2 | 2 |

14 |
14 | 7 | 5 | 4 | 3 | 3 | 3 | 2 | 2 |

15 |
15 | 8 | 6 | 4 | 3 | 3 | 3 | 2 | 2 |

16 |
16 | 8 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

17 |
17 | 9 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

18 |
18 | 10 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

19 |
19 | 10 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

20 |
20 | 11 | 7 | 5 | 5 | 4 | 3 | 3 | 3 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
1 | 0.5 | 0.48 | 0.36 | 0.56 | 0.5 | 0.45 | 0.42 | 0.38 |

6 |
1 | 0.57 | 0.54 | 0.42 | 0.33 | 0.55 | 0.5 | 0.46 | 0.43 |

7 |
1 | 0.5 | 0.42 | 0.47 | 0.38 | 0.32 | 0.54 | 0.5 | 0.47 |

8 |
1 | 0.56 | 0.47 | 0.34 | 0.42 | 0.36 | 0.31 | 0.53 | 0.5 |

9 |
1 | 0.5 | 0.51 | 0.38 | 0.46 | 0.4 | 0.34 | 0.3 | 0.53 |

10 |
1 | 0.55 | 0.42 | 0.42 | 0.33 | 0.43 | 0.38 | 0.33 | 0.29 |

11 |
1 | 0.5 | 0.46 | 0.45 | 0.36 | 0.46 | 0.4 | 0.36 | 0.32 |

12 |
1 | 0.54 | 0.49 | 0.36 | 0.39 | 0.32 | 0.43 | 0.39 | 0.35 |

13 |
1 | 0.5 | 0.43 | 0.39 | 0.42 | 0.35 | 0.46 | 0.41 | 0.37 |

14 |
1 | 0.53 | 0.46 | 0.42 | 0.45 | 0.38 | 0.32 | 0.43 | 0.39 |

15 |
1 | 0.5 | 0.4 | 0.45 | 0.47 | 0.4 | 0.34 | 0.45 | 0.42 |

16 |
1 | 0.53 | 0.43 | 0.38 | 0.38 | 0.42 | 0.36 | 0.32 | 0.43 |

17 |
1 | 0.5 | 0.46 | 0.4 | 0.4 | 0.44 | 0.38 | 0.34 | 0.45 |

18 |
1 | 0.47 | 0.48 | 0.42 | 0.42 | 0.46 | 0.4 | 0.35 | 0.47 |

19 |
1 | 0.5 | 0.43 | 0.44 | 0.44 | 0.36 | 0.42 | 0.37 | 0.33 |

20 |
1 | 0.48 | 0.45 | 0.46 | 0.36 | 0.38 | 0.44 | 0.39 | 0.35 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 1.5 | 0.95 | 0.71 | 0.56 | 0.5 | 0.45 | 0.42 | 0.38 |

6 |
6 | 1.71 | 1.07 | 0.83 | 0.67 | 0.55 | 0.5 | 0.46 | 0.43 |

7 |
7 | 2 | 1.25 | 0.93 | 0.76 | 0.64 | 0.54 | 0.5 | 0.47 |

8 |
8 | 2.22 | 1.4 | 1.02 | 0.85 | 0.72 | 0.62 | 0.53 | 0.5 |

9 |
9 | 2.5 | 1.53 | 1.15 | 0.92 | 0.79 | 0.69 | 0.6 | 0.53 |

10 |
10 | 2.73 | 1.7 | 1.26 | 0.99 | 0.86 | 0.75 | 0.66 | 0.59 |

11 |
11 | 3 | 1.85 | 1.36 | 1.09 | 0.92 | 0.81 | 0.72 | 0.64 |

12 |
12 | 3.23 | 1.98 | 1.45 | 1.18 | 0.97 | 0.86 | 0.77 | 0.69 |

13 |
13 | 3.5 | 2.14 | 1.57 | 1.26 | 1.05 | 0.91 | 0.82 | 0.74 |

14 |
14 | 3.73 | 2.29 | 1.68 | 1.34 | 1.13 | 0.96 | 0.87 | 0.79 |

15 |
15 | 4 | 2.43 | 1.78 | 1.41 | 1.2 | 1.03 | 0.91 | 0.83 |

16 |
16 | 4.24 | 2.59 | 1.88 | 1.5 | 1.26 | 1.09 | 0.95 | 0.87 |

17 |
17 | 4.5 | 2.74 | 2 | 1.59 | 1.32 | 1.15 | 1.01 | 0.91 |

18 |
18 | 4.74 | 2.87 | 2.11 | 1.67 | 1.38 | 1.21 | 1.06 | 0.94 |

19 |
19 | 5 | 3.03 | 2.21 | 1.75 | 1.46 | 1.26 | 1.12 | 0.99 |

20 |
20 | 5.24 | 3.18 | 2.3 | 1.82 | 1.53 | 1.32 | 1.17 | 1.04 |

I view column 6, row 11 as the "typical" scenario, and the conclusion is this: *you'd be better off just grabbing a resource/drawing a card the usual way than by using Henry Wan!* Not only is Henry Wan worse than Leo de Luca, *he's worse than gaining resources with a regular action!*

Granted, there are cards with the *seal* keyword that can help improve the odds. But one must ask whether the opportunity cost of playing those cards is worth it. Perhaps the benefits of a favorable chaos bag for skill tests plus better Henry Wan games would give the investigators a *teeny tiny* edge… after a hell of a lot of work and lucky draw. That said, I'm sure there's much easier ways to play the game that are also more fun.

When Henry Wan was announced, people considered pairing him up with Olive McBride, who's ability works "when you would reveal a chaos token". Any investigator that can take both Mystic (purple) and Rogue (green) cards (including Sefina Rousseau and all Dunwich investigators; I don't count Lola Hayes since, while she can include both cards in her deck, using them together may not be possible) can include these two cards in the same deck.

I'll always assume that Olive's ability is utilized on the first draw. When using Olive with Henry, one can get two tokens drawn without either of them being a bad icon that ends the "game". Thus Olive boosts the success rate and the ultimate payout.

Having Olive and Henry out at the same time is extremely difficult; first, you'd have to have charismas to accomodate them, then draw them both in a game at reasonable times. The likelihood of getting the combo out is low and comes with significant opportunity costs.

That said, when Olive is out, she provides Henry enough of a boost to make him playable. The following tables account for Olive's effect (see the code for how) on the first draw but otherwise match up with the earlier tables.

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

6 |
6 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

7 |
7 | 4 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |

8 |
8 | 4 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |

9 |
9 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |

10 |
10 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

11 |
11 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

12 |
12 | 7 | 4 | 3 | 3 | 3 | 2 | 2 | 2 |

13 |
13 | 7 | 5 | 4 | 3 | 3 | 2 | 2 | 2 |

14 |
14 | 7 | 5 | 4 | 3 | 3 | 3 | 2 | 2 |

15 |
15 | 8 | 6 | 4 | 3 | 3 | 3 | 2 | 2 |

16 |
16 | 8 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

17 |
17 | 9 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

18 |
18 | 10 | 6 | 5 | 4 | 3 | 3 | 3 | 3 |

19 |
19 | 10 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

20 |
20 | 11 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
1 | 0.75 | 0.86 | 0.71 | 0.6 | 0.5 | 0.42 | 0.36 | 0.31 |

6 |
1 | 0.8 | 0.89 | 0.77 | 0.67 | 0.58 | 0.5 | 0.44 | 0.38 |

7 |
1 | 0.67 | 0.65 | 0.82 | 0.72 | 0.64 | 0.56 | 0.5 | 0.45 |

8 |
1 | 0.71 | 0.7 | 0.85 | 0.76 | 0.69 | 0.62 | 0.55 | 0.5 |

9 |
1 | 0.62 | 0.74 | 0.61 | 0.8 | 0.73 | 0.66 | 0.6 | 0.55 |

10 |
1 | 0.56 | 0.59 | 0.65 | 0.55 | 0.76 | 0.7 | 0.64 | 0.59 |

11 |
1 | 0.6 | 0.63 | 0.68 | 0.59 | 0.79 | 0.73 | 0.67 | 0.62 |

12 |
1 | 0.55 | 0.66 | 0.71 | 0.62 | 0.54 | 0.75 | 0.7 | 0.66 |

13 |
1 | 0.58 | 0.56 | 0.56 | 0.64 | 0.57 | 0.78 | 0.73 | 0.68 |

14 |
1 | 0.62 | 0.59 | 0.59 | 0.67 | 0.6 | 0.53 | 0.75 | 0.71 |

15 |
1 | 0.57 | 0.51 | 0.61 | 0.69 | 0.62 | 0.56 | 0.77 | 0.73 |

16 |
1 | 0.6 | 0.54 | 0.51 | 0.54 | 0.64 | 0.58 | 0.53 | 0.75 |

17 |
1 | 0.56 | 0.56 | 0.53 | 0.57 | 0.66 | 0.6 | 0.55 | 0.77 |

18 |
1 | 0.53 | 0.59 | 0.55 | 0.59 | 0.68 | 0.62 | 0.57 | 0.52 |

19 |
1 | 0.56 | 0.52 | 0.57 | 0.6 | 0.53 | 0.64 | 0.59 | 0.54 |

20 |
1 | 0.53 | 0.55 | 0.59 | 0.62 | 0.55 | 0.66 | 0.61 | 0.56 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 2.25 | 1.71 | 1.43 | 1.19 | 1 | 0.85 | 0.73 | 0.63 |

6 |
6 | 2.4 | 1.79 | 1.55 | 1.33 | 1.15 | 1 | 0.87 | 0.77 |

7 |
7 | 2.67 | 1.96 | 1.63 | 1.44 | 1.27 | 1.13 | 1 | 0.89 |

8 |
8 | 2.86 | 2.1 | 1.7 | 1.53 | 1.37 | 1.23 | 1.11 | 1 |

9 |
9 | 3.12 | 2.21 | 1.83 | 1.59 | 1.45 | 1.32 | 1.2 | 1.09 |

10 |
10 | 3.33 | 2.38 | 1.95 | 1.65 | 1.52 | 1.39 | 1.28 | 1.18 |

11 |
11 | 3.6 | 2.52 | 2.04 | 1.76 | 1.57 | 1.46 | 1.35 | 1.25 |

12 |
12 | 3.82 | 2.64 | 2.12 | 1.85 | 1.62 | 1.51 | 1.41 | 1.31 |

13 |
13 | 4.08 | 2.8 | 2.24 | 1.93 | 1.71 | 1.56 | 1.46 | 1.37 |

14 |
14 | 4.31 | 2.95 | 2.36 | 2.01 | 1.79 | 1.6 | 1.51 | 1.42 |

15 |
15 | 4.57 | 3.07 | 2.45 | 2.07 | 1.86 | 1.67 | 1.55 | 1.46 |

16 |
16 | 4.8 | 3.24 | 2.54 | 2.17 | 1.93 | 1.75 | 1.58 | 1.5 |

17 |
17 | 5.06 | 3.38 | 2.66 | 2.26 | 1.99 | 1.81 | 1.65 | 1.54 |

18 |
18 | 5.29 | 3.51 | 2.77 | 2.34 | 2.04 | 1.87 | 1.71 | 1.57 |

19 |
19 | 5.56 | 3.67 | 2.87 | 2.42 | 2.12 | 1.92 | 1.77 | 1.63 |

20 |
20 | 5.79 | 3.82 | 2.96 | 2.49 | 2.2 | 1.97 | 1.82 | 1.69 |

Notice that when Olive is being used it's optimal to use Olive to get two tokens out (that you can pick) then end the "game". It seems that Olive does make Henry's ability profitable… albeit mildly. If we're using the heuristic that a card needs to pay for its own resource cost plus three for each action involved, I'd say that the combo would need at least eight turns to be profitable in a typical game… which is terrible.

Henry Wan is an expensive way to attempt to milk a little more value from Olive. Even with Olive I don’t think he’s worth the trouble.

It's equally true in Arkham as it is in real life: gambling is better for the house than the gambler (with the house being the forces of the mythos, in this case). If you're looking to have fun gambling, Henry Wan is your card. If you're looking to win… look elsewhere.

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>These past few weeks I’ve been writing about a new package I created, **MCHT**. Those blog posts were basically tutorials demonstrating how to use the package. (Read the first in the series here.) I’m done for now explaining the technical details of the package. Now I’m going to use the package for purpose I initially had: exploring the distribution of time separating U.S. economic recessions.

I wrote about this before. I suggested that the distribution of times between recessions can be modeled with a Weibull distribution, and based on this, a recession was likely to occur prior to the 2020 presidential election.

This claim raised eyebrows, and I want to respond to some of the comments made. Now, I would not be surprised to find this post the subject of an R1 on r/badeconomics, and I hope that no future potential employer finds this (or my previous) post, reads it, and then decides I’m an idiot and denies me a job. I don’t know enough to dogmatically subscribe to the idea but I do want to explore it. Blog posts are not journal articles, and I think this is a good space for me to make arguments that could be wrong and then see how others more intelligent than myself respond. The act of keeping a blog is good for me and my learning (which never ends).

My previous post on the distribution of times between recessions was… controversial. Have a look at the comments section of the original article and the comments of this reddit thread. Here is my summarization of some of the responses:

- There was no statistical test for the goodness-of-fit of the Weibull distribution.
- No data generating process (DGP) was proposed, in the sense that there’s no explanation for
*why*the Weibull distribution would be appropriate, or the economic processes that produce memory in the distribution of times between recessions. - Isn’t it strange to suggest that other economic variables are irrelevant to when a recession occurs? That seems counterintuitive.
- MAGA! (actually there were no MAGAs, thankfully)

Then there was this comment, by far the harshest one, by u/must_not_forget_pwd:

The idea that recessions are dependent on time is genuinely laughable. It is an idea that seems to be getting some traction in the chattering classes, who seem more interested in spewing forth political rantings rather than even the semblance of serious analysis. This also explains why no serious economist talks about the time and recession relationship.

The lack of substance behind this time and recession idea is revealed by asking some very basic questions and having a grasp of some basic data. If recessions were so predictable, wouldn’t recessions be easy to prevent? Monetary and fiscal policies could be easily manipulated so as to engineer a persistent boom.

Also, if investors could correctly predict the state of the economy it would be far easier for them to determine when to invest and to capture the subsequent boom. That is, invest in the recession, when goods and services are cheaper and have the project come on stream during the following boom and make a massive profit. If enough investors acted like this, there would be no recession to begin with due to the increase in investment.

Finally, have a look at the growth of other countries. Australia hasn’t had two consecutive quarters of negative growth since the 1990-91 recession. Sure there have been hiccups along the way for Australia, such as the Asian Financial Crisis, the introduction of the GST, a US recession in the early 2000s, and more recently the Global Financial Crisis. Yet, Australia has managed to persist without a recession despite the passage of time. No one in Australia would take you seriously if you said that recessions were time dependent.

If these “chattering classes” were interested in even half serious analysis of the US economy, while still wanting to paint a bleak picture, they could very easily look at what is going on right now. Most economists have the US economy growing above trend. This can be seen in the low unemployment rate and that inflation is starting to pickup. Sure wages growth is subdued, but wages growth should be looking to pickup anytime now.

However, during this period the US government is injecting a large amount of fiscal stimulus into the US economy through tax cuts. Pumping large amounts of cash into the economy during a boom isn’t exactly a good thing to do and is a great way to overheat the economy and bring about higher inflation. This higher inflation would then cause the US Federal Reserve to react by increasing interest rates. This in turn could spark a US recession.

Instead of this very simple and defensible story that requires a little bit of homework, we get subjected to this nonsense that recessions are linked to time. I think it’s time that people call out as nonsense the “analysis” that this blog post has.

TL;DR: The idea that recessions are dependent on time is dumb, and if recessions were so easy to predict would mean that recessions wouldn’t exist. This doesn’t mean that a US recession couldn’t happen within the next few years, because it is easy to see how one could occur.

I think that the tone of this message could have been… nicer. That said, I generally welcome direct, harsh criticism, as I often learn a lot from it, or at least am given a lot to think about.

So let’s discuss these comments.

First, a statistical test for the goodness of fit of the Weibull distribution. I personally was satisfied looking at the plots I made, but some people want a statistical test. The test that comes to mind is the Kolmogorov-Smirnov test, and R does support the simplest version of this test via `ks.test()`

, but when you don’t know all of the parameters of the distribution assumed under the null hypothesis, then you cannot use `ks.test()`

. This is because the test was derived assuming there were no unknown parameters; when nuisance parameters are present and need to be estimated, then the distribution used to compute -values is no longer appropriate.

Good news, though; **MCHT** allows us to do the test properly! First, let’s get set up.

library(MCHT) library(doParallel) library(fitdistrplus) recessions <- c( 4+ 2/12, 6+ 8/12, 3+ 1/12, 3+ 9/12, 3+ 3/12, 2+ 0/12, 8+10/12, 3+ 0/12, 4+10/12, 1+ 0/12, 7+ 8/12, 10+ 0/12, 6+ 1/12) registerDoParallel(detectCores())

I already demonstrated how to perform a bootstrap version of the Kolmogorov-Smirnov test in one of my blog posts about **MCHT**, and the code below is basically a direct copy of that code. While the test is not exact, it should be asymptotically appropriate.

ts <- function(x) { param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] ks.test(x, pweibull, shape = shape, scale = scale, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] rweibull(n, shape = shape, scale = scale) } b.wei.ks.test <- MCHTest(test_stat = ts, stat_gen = ts, rand_gen = rg, seed = 123, N = 1000, method = paste("Goodness-of-Fit Test for Weibull", "Distribution")) b.wei.ks.test(recessions)

## ## Goodness-of-Fit Test for Weibull Distribution ## ## data: recessions ## S = 0.11318, p-value = 0.94

The test does not reject the null hypothesis; there isn’t evidence that the data is not following a Weibull distribution (according to that test; read on).

Compare this to the Kolmogorov-Smirnov test checking whether the data follows the exponential distribution.

ts <- function(x) { mu <- mean(x) ks.test(x, pexp, rate = 1/mu, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) mu <- mean(x) rexp(n, rate = 1/mu) } b.ks.exp.test <- MCHTest(ts, ts, rg, seed = 123, N = 1000, method = paste("Goodness-of-Fit Test for Exponential", "Distribution")) b.ks.exp.test(recessions)

## ## Goodness-of-Fit Test for Exponential Distribution ## ## data: recessions ## S = 0.30074, p-value = 0.023

Here, the null hypothesis is rejected; there is evidence that the data wasn’t drawn from an exponential distribution.

What do the above two results signify? If we assume that the time between recessions is independent and identically distributed, then there is not evidence against the Weibull distribution, but there is evidence against the exponential distribution. (The exponential distribution is actually a special case of the Weibull distribution, so the second test effectively rules out that special case.) The exponential distribution has the *memoryless* property; if we say that the time between events follows an exponential distribution, then knowing that it’s been minutes since the last event occurs tells us *nothing* about when the next event occurs. The Weibull distribution, however, has *memory* when the shape parameter is not 1. That is, knowing how long it’s been since the last event occured does change how likely the event is to occur in the near future. (For the parameter estimates I found, a recession seems to become more likely the longer it’s been since the last one.)

We will revisit the goodness of fit later, though.

I do have some personal beliefs about what causes recessions to occur that would lead me to think that the time between recessions does exhibit some form of memory and would also address the point raised by u/must_not_forget_pwd about Australia not having had a recession in decades. This perspective is primarily shaped by two books, [1] and [2].

In short, I agree with the aforementioned reddit user; recessions are not inevitable. The stability of an economy is a characteristic of that economy and some economies are more stable than others. [1] notes that the Canadian economy had a dearth of banking crises in the 19th and 20th centuries, with the most recent one effectively due to the 2008 crisis in the United States. Often the stability of the financial sector (and probably the economy as a whole) is strongly related to the political coalition responsible for drafting the *de facto* rules that the financial system follows. In some cases the financial sector is politically weak and continuously plundered by the government. Sometimes it’s politically weak and allowed to exist unmolested by the government but is well whipped. Financiers are allowed to make money and the government repays its debts but if the financial sector steps out of line and takes on too much risk it will be punished. And then there’s the situation where the financial sector is politically powerful and able to get away with bad behavior, perhaps even being rewarded for that behavior by government bailouts. That’s the financial system the United States has.

So let’s consider the latter case, where the financial sector is politically powerful. This is where the Minsky narrative (see [2]) takes hold. He describes a boom-and-bust cycle, but critically, the cause of the bust was built into the boom. After a bust, many in the financial sector “learn their lesson” and become more conservative risk-takers. In this regime the economy recovers and some growth resumes. Over time, the financial sector “forgets” the lessons it learned from the previous bust and begins to take greater risks. Eventually these risks become so great that a greater systematic risk appears and the financial sector, as a whole, stands on shaky ground. Something goes wrong (like the bottom falls out of the housing market or the Russian government defaults), the bets taken by the financial sector go the wrong way, and a crisis ensues. The extra wrinkle in the American financial system is that the financial sector not only isn’t punished for the risks they’ve taken, they get rewarded with a bailout financed by taxpayers and the executives who made those decisions get golden parachutes (although there may be a trivial fine).

If the Minsky narrative is correct, then economic booms do die of “old age”, as eventually the boom is driven by increasingly risky behavior that eventually leads to collapse. When the government is essentially encouraging this behavior with blank-check guarantees, the risks taken grow (risky contracts become lotto tickets paid for by someone else when you lose, but you get all the winnings). Taken together, one can see why there could be some form of memory in the time between recessions. Busts are an essential feature of such an economy.

So what about the Australian economy, as u/must_not_forget_pwd brought up? In short, I think the Australian economy is prototyped by the Canadian economy as described in [1] and thus doesn’t follow the rules driving the boom/bust cycle in America. I think the Australian economy is the Australian economy and the American economy is the American economy. One is stable, the other is not. I’m studying the unstable one, not trying the explain the stability of the other.

First, does time matter to when a recession occurs? The short answer is “Yes, duh!” If you’re going to have any meaningful discussion about when a recession will occur you have to account for the time frame you’re considering. A recession within the next 30 years is much more likely than a recession in the next couple months (if only because one case covers the other, but in general a recession should be more likely to occur within a longer period of time than a shorter one).

But I think the question about “does time matter” is more a question about whether an economy essentially remembers how long it has been since the last recession or not. That’s both an economic and statistical question.

What about other variables? Am I saying that other variables don’t matter when I use only time to predict when the next recession occurs? No, that’s not what I’m saying.

Let’s consider regression equations, often of the form

I think economists are used to thinking about equations like this as essentially causal statements, but that’s not what a regression equation is, and when we estimate a regression equation we are not automatically estimating a function that needs to be interpreted causally. If a regression equation tells us something about causality, that’s great, but that’s not what they do.

Granted, economics students are continuously being reminded the correlation is not causation, but I think many then start to think that we should not compute a regression equation unless the relationship expressed can be interpreted causally. However, knowing that two variables are correlated, and how they are correlated, is often useful.

When we compute a regression function from data, we are computing a function that estimates *conditional expectations*. This function, when given the value of one variable, tells us what value we can expect for the other variable. That relationship may or may not be due to causality, but the fact that the two variables are not independent of each other can be, in and of itself, a useful fact.

My favorite example in the “correlation is not causation” discussion (probably mentioned first in some econometrics textbook or my econometrics professor) is the relationship between the damage caused by a fire and the number of firefighters at the scene of the fire. Let’s just suppose that we have some data, is the amount of damage in a fire (in thousands of dollars), is the number of firefighters, and we estimated the relationship

There is a positive relationship between the number of firefighters at the scene of the fire and the damage done by the fire. Does this mean that firefighters make fires worse? No, it does not. But if you’re a spectator and you see ten firefighters running the scene of a fire, can you expect the fire to be more damaging than fires where there are five firefighters and not as damaging as fires with fifteen firefighters? Sure, this is reasonable. Not only that, it’s a useful fact to know.

Importantly, when we choose the variables to include in a regression equation, we are deciding what variables we want to use for conditioning. That choice could be motivated by a causal model (because we care about causality), or by model fit (making the smallest error in our predictions while being sufficiently simple), or simply by what’s available. Some models may do better than others at predicting a variable but they all do the same thing: compute conditional expectations.

My point is this: when I use time as the only variable of interest when attempting to predict when a recession occurs, I’m essentially making a prediction based on a model that conditions only on time and nothing else. That’s not the same thing as saying that excluded variables don’t matter. Rather, a variable excluded in the model is effectively treated as being a part of the random soup that generated the data I observe. I’m not conditioning on its values to make predictions. Could my prediction be refined by including that information? Perhaps. But that doesn’t make the prediction automatically useless. In fact, I think we should *start* with predictions that condition on little to see if conditioning on more variables adds any useful information, generally preferring the simple to the complex given equal predictive value. This is essentially what most -tests automatically reported with statistical software do; they check if the regression model involving possibly multiple parameters does any better than one that only uses the mean of the data to predict values.

I never looked at a model that uses more information than just time, though. I wouldn’t be shocked if using more variables would lead to a better model. But I don’t have that data, and to be completely honest, I don’t want to spend the time to try and get a “great” prediction for when the next recession will occur. My numbers are essentially a back-of-the-envelope calculation. It could be improved, but just because there’s (perhaps significant) room for improvement doesn’t render the calculation useless, and I think I may have evidence that shows the calculation has some merit.

The reddit user had a long discussion about how well the economy would function if predicting the time between recessions only depended on time, that the Federal Reserve would head off every recession and investors would be adjusting their behavior in ways that render the calculation useless. My response is this: I’m not a member of the Fed. I have no investments. My opinion doesn’t matter to the economy. Thus, it’s okay for me to treat the decisions of the Fed, politicians, bank presidents, other investors, and so forth, as part of that random soup producing the economy I’m experiencing, because my opinions do not invalidate the assumptions of the calculation.

There is a sense in which statistics are produced with an audience in mind. I remember Nate Silver making this point in a podcast (don’t ask me which) when discussing former FBI director James Comey’s decision almost days before the 2016 presidential election to announce a reopening of an investigation into Hillary Clinton’s e-mails, which was apparently at least partially driven by the belief that Clinton was very likely to win. Silver said that Comey did not account for the fact that he was a key actor in the process he was trying to predict and that his decisions could change the likelihood of Clinton winning. He invalidated the numbers with his decision based on them. He was not the target audience of the numbers Nate Silver was producing.

I think a similar argument can be made here. If my decisions and beliefs mattered to the economy, then I should account for them in predictions, conditioning on them. But they don’t matter, so I’ve invalidated nothing, and the people who do matter likely are (or should be) reaching conclusions in a much more sophisticated way.

I’m a statistician. Statistics is my hammer. Everything looks like a nail to me. You know why? Because hammering nails is fun.

When I read u/must_not_forget_pwd’s critique, I tried to formulate it in a mathematical way, because that’s what I do. Here’s my best way to describe it in mathematical terms:

- The time between recessions are all independent of one another.
- Each period of growth follows its own distribution, with its own unique parameters.
- The time separating recessions is memoryless. Knowing how long it has been since the last recession tells us nothing about how much longer we have till the next recession.

I wanted a model that one might call “maximum unpredictability”. So if are the times separating recessions, then points 1, 2, and 3 together say that are independent random variables and , and there’s no known relationship between . If this is true, we have no idea when the next recession will occur because there’s no pattern we can extract.

My claim is essentially that , with and there’s only one . If I were to then attempt to formulate these as statistical hypotheses, those hypotheses would be:

Is it possible to decide between these two hypotheses? They’re not nested and it’s not really possible to use the generalized likelihood ratio test because the parameter space that includes both and is too big (you’d have to estimate parameters using data points). That said, they both suggest likelihood functions that, individually, can be maximized, and you might consider using the ratio between these two maximized functions as a test statistic. (Well, actually, the negative log likelihood ratio, which I won’t write down in math or try to explain unless asked, but you can see the end result in the code below in the definition of `ts()`

.)

Could that statistic be used to decide between the two hypotheses? I tried searching through literature (in particular, see [3]) and my conclusion is… *maybe?* To be completely honest, by this point we’ve left the realm of conventional statistics and are now turning into mad scientists, because not only are the hypotheses we’re testing and the statistic we’re using to decide between them just *wacky*, how the hell are we supposed to know the distribution of this test statistic under the null hypothesis when there are *two* nuisance parameters that likely aren’t going anywhere? Oh, and while we’re at it, the sample size of the data set of interest is really small, so don’t even *think* about using asymptotic reasoning!

I think you can see how this descent into madness would end up with me discovering the maximized Monte Carlo test (see [4]) and then writing **MCHT** to implement it. I’ll try anyting once, so the product of all that sweat and labor is below.

ts <- function(x) { n <- length(x) params <- coef(fitdist(x, "weibull")) k <- params[["shape"]] l <- params[["scale"]] (n * k - n + 1) * log(l) - log(k) + sum(l * (-k) * x^k - k * log(x)) - n } mcsg <- function(x, shape = 2, scale = 1) { x <- qweibull(x, shape = shape, scale = scale) test_stat(x) } brg <- function(x) { n <- length(x) params <- coef(fitdist(x, "weibull")) k <- params[["shape"]] l <- params[["scale"]] rweibull(n, shape = k, scale = l) } mc.mem.test <- MCHTest(ts, mcsg, seed = 123, nuisance_params = c("shape", "scale"), N = 1000, optim_control = list("lower" = c("shape" = 0, "scale" = 0), "upper" = c("shape" = 100, "scale" = 100), "control" = list("max.time" = 60)), threshold_pval = 0.2, localize_functions = TRUE, method = "MMC Test for IID With Memory") b.mem.test <- MCHTest(ts, ts, brg, seed = 123, N = 1000, method = "Bootstrap Test for IID With Memory") b.mem.test(recessions)

## ## Bootstrap Test for IID With Memory ## ## data: recessions ## S = -4601.9, p-value = 0.391

mc.mem.test(recessions)

## Warning in mc.mem.test(recessions): Computed p-value is greater than ## threshold value (0.2); the optimization algorithm may have terminated early

## ## MMC Test for IID With Memory ## ## data: recessions ## S = -4601.9, p-value = 0.962

Both tests failed to reject the null hypothesis. Unfortunately that doesn’t seem to say much. First, it doesn’t show the null hypothesis isn’t correct; it’s just not *obviously* incorrect. This is always the case, but the bizarre test I’m implementing here is severely underpowered perhaps to the point of being useless. The alternative hypothesis (which I assigned to my “opponent”) is severely disadvantaged.

The conclusion of the above results isn’t in fact that I’m right. Given the severe lack of power of the test, I would say that the results of the test above are essentially inconclusive.

I’m going to be straight with you: if you read this whole article, I probably wasted your time, and for that I am truly sorry.

I suppose you got to enjoy some stream-of-consciousness thoughts about a controversial blog post I wrote where I made a defense that may or may not be convincing, then watched as I developed a strange statistical test that probably didn’t even work to settle a debate with some random guy on reddit, saying he claimed something that honestly he would likely deny and end that imaginary argument inconclusively.

But hey, at least I satisfied my curiosity. And I’m pretty proud of **MCHT**, which I created to help me write this blog post. Maybe if I hadn’t spent three straight days writing nothing but blog posts, this one would have been better, but the others seemed pretty good. So something good came out of this trip… right?

Maybe I can end like this: do I still think that a recession before the 2020 election is likely? Yes. Do I think that a Weibull describes the time between recessions decently? Conditioning on nothing else, I think so. I still think that my previous work has some merit as a decent back-of-the-envelope calculation. Do I think that the time between recessions has a memory? In short, yes. And while we’re on the topic, I’m not the Fed, so my opinions don’t matter.

All that said, though, smarter people than me may have different opinions and their contributions to this discussion are probably more valuable than mine. For instance, the people at Goldman Sachs believe a recession soon is unlikely; but the people at J.P. Morgan Chase believe a recession could strike in 2020. I’m certainly persuadable on the above points, and as I’ve said before, I think the simple analysis could enhance the narrative advanced by better predictions.

Now that I’ve written this post, we will return to our regular scheduled programming. Thanks for reading! (Please don’t judge me.)

- C. Calomiris and S. Haber,
*Fragile by design: the political origins of banking crises and scarce credit*(2014), Princeton University Press, Princeton - H. P. Minsky,
*Stabilizing an unstable economy*(1986), Yale University Press, New Haven - D. R. Cox,
*Tests of separate families of hypotheses*, Proc. Fourth Berkeley Symp. on Math. Stat. and Prob., vol. 1 (1961) pp. 105-123 - J-M Dufour,
*Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics*, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Over the past few weeks I’ve published articles about my new package, **MCHT**, starting with an introduction, a further technical discussion, demonstrating maximized Monte Carlo (MMC) hypothesis testing, bootstrap hypothesis testing, and last week I showed how to handle multi-sample and multivariate data. This is the final article where I explain the capabilities of the package. I show how **MCHT** can handle time series data.

I should mention that I’m not focused on the merits of the procedures I use as examples in these posts, and that’s going to be the case here. It’s possible (perhaps even likely) that there’s a better way to decide between the hypotheses than what I show here. In these articles, I’m more interested in showing what *can* be done rather than what *should* be done. In particular, I like simple examples that many can understand, even if they may not be the best tool for the task at hand.

So far I don’t think this has been a serious issue; that is, I don’t think the procedures I’ve shown so far could be considered controversial (I think the most controversial would be the permutation test example). But the example I want to use here could be argued with; I personally would not use it. That said, I’m still willing to demonstrate it because it doesn’t take much to understand what’s going on and it does demonstrate how time series data can be handled.

Suppose we want to perform a test for the location of the mean, and thus decide between the hypotheses

There is the usual -statistic, which is , and as mentioned before the statistic assumes that the data came from a Normal distribution. That’s not all the test assumes, though. It also assumes that the data is independent and identically distributed.

In cross-sectional contexts this is fine, but it’s not okay when the data could depend on time and thus is not independent and identically distributed. Suppose instead that our data was generated according to a first-order autoregressive process (AR(1)), described below:

In this context, assume and is independent and identically distribution. It’s no longer given that the conventional -test will work as marketed since the data is no longer independent or identically distributed. Additionally, we have two nuisance parameters, and , that need to be accounted for.

We will view and as nuisance parameters and use MMC testing to handle them. That leaves the question of how to simulate an AR(1) process. With **MCHT**, if you can simulate a process, you can test with it.

The time series model above has a stationary solution when and when ranges between and . It's not possible to simulate a series of infinite length but one can get close by simulating a series that is very long. In particular, one can simulate, say, 500 terms of the series starting at a fixed number, then the actual number of terms of the series wanted, then throw away the first 500 terms. This is known as burn-in and it's very common practice in time series simulation.

Fortunately `MCHTest()`

allows for burn-in. Suppose that the sample size of the actual dataset is and we've decided that we want a burn-in period of . Then we can do the following:

- Generate random numbers to represent (except possibly for the scaling factor, as we're treating that as a nuisance parameter).
- Apply the recursive formula described above to the series after scaling the series by and using a chosen , and add to it.
- Keep only the last terms of the series; throw away the rest. This is your simulated dataset.
- After having obtained the simulated dataset, proceed with the Monte Carlo test as usual.

With MMC, the unscaled series is fixed after we generate it and we use optimization to adversarially choose and so that we maximize the -value of the test.

When using `MCHTest()`

, the `rand_gen`

function does not need to produce a dataset of the same length as the original dataset; this allows for burning it. However, if you're going to do this, then the `stat_gen`

function needs to know what the sample size of the dataset is, but all you need to do is give the `stat_gen`

function the parameter `n`

; this will be given the sample size of the original dataset. And of course the `test_stat`

function won't care whether the data came from a time series or not.

Putting this all together, we create the following test.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x, mu = 0) { sqrt(length(x)) * (mean(x) - mu)/sd(x) } rg <- function(n) { rnorm(n + 500) # Extra terms for a burn-in period } sg <- function(x, n, mu = 0, rho = 0, sigma = 1) { x <- sigma * x if (abs(rho) >= 1) {stop("Bad rho given!")} eps <- filter(x, rho, "recursive") # Apply the recursion eps <- eps[-(1:500)] # Throw away first 500 observations; they're burn-in dat <- eps + mu test_stat(dat, mu = mu) # Will be localizing } mc.ar1.t.test <- MCHTest(ts, sg, rg, N = 1000, seed = 123, test_params = "mu", nuisance_params = c("rho", "sigma"), optim_control = list(lower = c("rho" = -0.999, "sigma" = 0), upper = c("rho" = 0.999, "sigma" = 100), control = list("max.time" = 10)), threshold_pval = 0.2, localize_functions = TRUE, lock_alternative = FALSE) dat <- c(-1.02, -1.13, 0.53, 0.21, 1.76, 1.79, 1.42, -0.31, -0.28, -0.44) mc.ar1.t.test(dat, mu = 0, alternative = "two.sided")

## Warning in mc.ar1.t.test(dat, mu = 0, alternative = "two.sided"): Computed ## p-value is greater than threshold value (0.2); the optimization algorithm ## may have terminated early

## ## Monte Carlo Test ## ## data: dat ## S = 0.73415, p-value = 0.264 ## alternative hypothesis: true mu is not equal to 0

mc.ar1.t.test(dat, mu = 3, alternative = "two.sided")

## Warning in mc.ar1.t.test(dat, mu = 3, alternative = "two.sided"): Computed ## p-value is greater than threshold value (0.2); the optimization algorithm ## may have terminated early

## ## Monte Carlo Test ## ## data: dat ## S = -7.9712, p-value = 0.504 ## alternative hypothesis: true mu is not equal to 3

t.test(dat, mu = 3, alternative = "two.sided") # For reference

## ## One Sample t-test ## ## data: dat ## t = -7.9712, df = 9, p-value = 2.278e-05 ## alternative hypothesis: true mean is not equal to 3 ## 95 percent confidence interval: ## -0.5265753 1.0325753 ## sample estimates: ## mean of x ## 0.253

I have now covered what I consider the essential technical functionality of **MCHT**. All of the functionality I described in these posts is functionality that I want this package to have. Thus I personally am quite happy this package exists, which is good; I'm the package's primary audience, after all. All I can hope is that others find the package useful too.

I wrote this article more than a month before it was published, so perhaps I have made an update that isn't being accounted for here, but as of this version (0.1.0), I'd call the package in a beta stage of stability; it's usable, but features could be added or removed and there could be unknown bugs.

The following is a list of possible areas of expansion. This list exists mostly because I think it needs to exist; it gives me something to aim for before making a 1.0 release. That said, they could be useful features.

*A function for making diagnostic-type plots for tests, such as a function creating a plot for the rejection probability function (RPF) as described in [1].

*A function that accepts a `MCHTest`

-class object and returns a function that, rather than returning a `htest`

-class object, returns a function that will give the test statistic, simulated test statistics, and a -value, in a list; could be useful for diagnostic work.

*Real-world datasets that can be used for examples.

*Functions with a simpler interface than `MCHTest`

, perhaps with more restrictions on inputs.

*Pre-made `MCHTest`

objects perhaps implementing common Monte Carlo or Bootstrap tests.

I also welcome community requests and collaboration. If you want a feature, consider issuing a pull request on GitHub.

Do you want more documentation? More examples? More background? Let me know! I'd be willing to write more on this subject. Perhaps if I amass enough content I could write a book documenting **MCHT** and Monte Carlo/bootstrap testing.

These blog posts together extend beyond 10,000 words, so I'm thinking I have enough material to submit an article to, say, *J. Stat. Soft.* or the *R Journal* and thus get my first publication where I'm the sole author. But this is something I'm still considering; I'm an insecure person at heart.

Next week I will still be using this package in a blog post, but I won't be writing about how to use it anymore; instead, I'll be using it to revisit a proposition I made many months ago. (It was because of that article I created this package.) Stay tuned, and thanks for reading!

- R. Davidson and J. G. MacKinnon,
*The size distortion of bootstrap test*, Econometric Theory, vol. 15 (1999) pp. 361-376

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>My grandfather (to me, “Grandpa”) died on Friday, November 2nd, 2018, a few days before this post was written. As I write this I am with my family in Blackfoot, Idaho, staying in his and my grandmother’s (or “Grandma’s”) house; she has survived him. I will be staying with them until the funeral on Saturday, November 10th, 2018 at the Hawker Funeral Home, is over. (The funeral starts at 2:00 PM.)

My Grandpa was 90 years old, born on April 16th, 1928 (sharing his birthday with my brother, who was born in 1993). He died in a car accident while driving from Idaho Falls, Idaho to his home in Blackfoot, after having called my family and told them “I’m loaded up and I’m coming home.” He rear-ended a semi truck stopped to turn into Love’s truck stop on the highway. We believe that he must not have seen the truck in time, because there were long skid marks indicating that he slammed the brakes of his truck, which was towing a heavy trailer. (So we also know that he did not fall asleep at the wheel.) There was little damage to either the trailer or his vehicle, but the airbags in his car went off. He was awake for a moment after the accident since he unbuckled himself, but soon lost consciousness. He had a pacemaker, and we think that it failed and he effectively had a heart attack since his heart had stopped. (It’s possible that the airbag dealt trauma to his chest and head, thus prompting the failure of the pacemaker.) He died in the ambulance. No member of our family was with him.

That Friday I was planning to have dinner with a graduate student friend of mine. I had canceled the week earlier due to a surprise visit from my sister, but I told him “Nothing could possibly interfere with our dinner tonight except for possibly my brother, and I’ll just tell him that I canceled on you before and I’m not going to do it again.” That was some time around noon. At 4:30, my friend and I started reading the rules for a game we were going to play before having dinner, then got into a long discussion with another student about issues relating to privilege, the state of minorities, microagressions, and so on, which lasted till about 6:00. I checked my phone just to make sure that no one had tried to contact me and I discovered there were at least eight calls and a bunch of messages. So I called my brother and he told me to cancel my plans because something happened. After some prodding he told me that Grandpa died. So I agreed to meet him at my apartment. I told my friend, “I have to go; my Grandpa’s dead.” He said “Okay,” and I left. (So I canceled on him again; I’ll have to remember not to tell someone “Nothing could possibly cause me to cancel” because I might just kill someone else for calling down the karma.)

That’s all I want to say about my Grandpa’s death for now. The rest of this article is my memorial of him and the impact he had on my life.

My Grandpa Douglas was born and raised in Blackfoot, Idaho. His father was George Wareing, his mother was Amelia Hansen, and his brother was LaVere. He was born in 1928, so he was able to remember the depression. He remembered hobos coming to his parents’ door and his mother making them sandwiches.

I don’t know much about his relationships except for his relationship with his brother. LaVere was the eldest and I remember my Grandpa not particularly caring for games when he was older since LaVere would get very angry with Grandpa when Grandpa won. (But Grandpa definitely knew how to play Go Fish; I remember him occasionally cleaning someone out in games since he better remembered who had what in their hand. He enjoyed lightly competitive family games as well.) My understanding was that for most of his life my Grandpa had a poor relationship with his brother. That said, I heard that before LaVere died after the passing of his wife, their relationship improved.

I think the most distinct story I remember of Grandpa’s childhood was him riding trains. He loved trains all through his life. He would go to the train yard and the workers would bring him into the locomotive and he’d ride around with them. Thus for all his life Grandpa loved trains.

Grandpa had dyslexia. He was an intelligent person but he struggled with reading and writing. He was told repeatedly that he was stupid and he took those words to heart; throughout his life he would belittle himself, which his loved ones did not like hearing. The military would disagree with this idea that he was “stupid”; Grandpa said he was told he scored high on a military IQ test and the military wanted to have him join an intelligence division. The thought of a sea of documents Grandpa would have to read lead him to turn the offer down; while in the Air Force in the Korean War, he worked on a boat tasked with rescuing downed pilots. (Grandpa enlisted because he didn’t want to get drafted.)

My Grandpa met my Grandma when a friend of his said he met a girl from Chicago with a sister and invited my Grandpa to take the sister on a date with him. Grandpa accepted, and while he was at the house waiting, he saw my Grandma, Barbara Marshall, come into the room. She was a very beautiful girl, and Grandpa enjoyed the chat he had with her. He said to himself “I want that one,” and he courted and eventually married her. (She was not the sister he was supposed to meet.) Next summer would have been their 70th wedding anniversary. Grandpa never ceased to adore Grandma. To me, the best evidence of his love for her were the love notes he would stick to her mirror. When I would visit my Grandparents I made sure to see what new notes were on the mirror.

Grandpa taught me what true love is and that while marriage may take work it’s worth it. When I envision the ideal marriage I picture Grandma and Grandpa.

Grandma and Grandpa had seven children in total, four boys and three girls. David, Nancy, and Michael (“Mike”) were the first three and the eldest (in that order); Grandpa called them “the first family.” Grant, Paul, Amy (my mother) and Karen (in birth order again) were “the second family”.

While Grandpa did a variety of jobs and sometimes the money was very tight, he was a teacher by profession, and he loved his work. After his period in the military, he went to college at Idaho State University. He majored in political science and minored in economics. He learned to teach music. While a teacher he taught government classes, history classes, and music. He loved working with kids and many of his students remember him fondly.

Grandpa was once a Republican, even running for state office as a Republican and leading the local Republican party. I think he turned against the party because he opposed their foreign policy (Grandpa hated wars), and I don’t think he could ever be considered a conservative.

Grandpa lived on a farm that, to my understanding, had been in the family. The farm was divided when the I-15 was built, and two branches of the family now live on two separate parts of that original family farm. Grandpa was not a farmer, but he knew how to run farm equipment and took good care of the property for his whole life. In recent years he leased the property out to others, allowing their herds of cattle to graze before being sold to slaughter.

Grandpa himself hated killing anything. He once worked on a feed lot, which of course fed cattle that were intended to be slaughtered. (Grandpa was a real cowboy.) He hated that idea. He would rescue the spiders in the house before Grandma found and squashed them. Recently his property became infested with marmots, which were destroying his equipment. He shot them to death, but with quivering hands; he didn’t want to kill them.

My Grandpa didn’t have a mean bone in his body. In fact, the word I remember him most saying was “love”. He meant not only love for his wife but love for everyone. He would tell the nurse in a hospital or a stranger in a shopping line that they were wonderful and special and loved. He twisted a common poem with Christian overtones to read

One life to live

t’will soon be passed;

only what’s done

inkindnesswill last

Grandpa was one of the kindest people I knew.

My Mom and Dad initially lived in a single-wide trailer on my Grandparents property, when my Dad first was editor of the local newspaper, the *Morning News*. At that age my Grandpa got to know me. He gave me my first nickname, “destructo”, since I often left a big mess.

I obviously don’t remember much of this early time of my life, but my Grandpa did. He remembered driving me around in his pickup while playing jazz on the radio (Grandpa always loved jazz, and he gave me my taste for it). This was one of his favorite memories. He would carry me around while he did his chores, even when operating tractors on the property.

My Dad lost his job as editor of the *Morning News* and went to school for two years to become a computer programmer. When he graduated, he moved the family to Salt Lake City, and I grew up in a suburb of the city, West Jordan. Yet I still managed to grow close to my Grandpa. I think this is primarily because when I was in elementary school, we were on the track schedule: there were four tracks (A, B, C, and D), which rotated through a three-week vacation throughout the school year so that we had a schedule of three weeks off, nine weeks on. Frequently during those breaks my Mom would take my brother and I (and later my sister, when she was born in 1999) to spend a week with my Grandparents while my Dad stayed home to work.

Grandpa was known for giving nicknames to people. After “destructo”, I was “Colonel”, then “decum” after I turned ten. I think I’ve gotten other names too; he recently would call me “the professor”. But of the names he gave me, I think “Colonel” is my favorite. It’s also the first name I remember.

Grandpa liked trains and perhaps it was from him that I learned to like trains, especially the old steam trains. My Grandpa’s children bought for him a fancy HO-scale electric train, the locomotive a 4-6-6-4 *Challenger*. Around that time one of the remaining *Challengers* traveled through our area and we were able to see the real locomotive. But I was enamored with the model; I loved watching it drive around the track. Apparently I was the person who broke the model; it was never fixed. Nevertheless, my Grandpa got me liking trains. (To this day, my favorite locomotive is the *Challenger*.)

My parents bought me a Life-Like HO-scale electric train set not intended just for children but for model railroad enthusiasts as well one Christmas. This spawned an ill-advised hobby in my childhood around model trains; no one knew what they were doing but we were going to try to build a layout complete with scenics and landscaping. My Grandpa encouraged this. One of my favorite memories of him was driving from Blackfoot to Pocatello to a hobby shop to buy model trains and accessories. He bought me a wonderful little locomotive which could even puff smoke as it drove. We set up the tracks in the basement area and he and I would drive the trains.

I can’t remember when I stopped my pursuit of the model train hobby; I had a big wooden board in my bedroom with tracks on it that just turned into a giant table with train parts strewn about, lacking any sense of direction. In the end, I gave all of my trains, tracks, scenics, etc., to my Grandpa. We promised to one year build a layout at his home together. He had more space, including extra buildings to store the train, so it could be a great layout without inconveniencing anyone.

We never built that layout.

Another hobby that Grandpa tried to support me in was model airplanes. He helped my parents buy a plane for me. We tried to fly it, but no matter how many times we tried we could not get the plane to stay in the air, whether we were at an elementary school or at Grandpa’s expansive property. I can only remember one successful flight, and Grandpa was there to see it.

I feel that I can attribute my interest in politics to my Grandpa. My Dad, being a newspaper man, was interested in politics too, but I think the initial political conversations I would have was with my Grandpa. The first major political event I recall was the 2000 presidential election; it was the first year I discovered my family was a political minority—Democrats—in the states where we lived (Utah and Idaho). But my Grandpa and I would talk for a long time about politics. There was once a time where I would say “I like politics”; that was largely my Grandpa’s doing (even though that’s not how I would say it today).

Grandpa strongly opposed the war in Iraq. I remember mornings with Meet the Press on TV (back then Tim Russert was in charge, and the show has not been the same since he died in 2008), and the case for weapons of mass destruction (WMDs, which is a dumb word if only because it’s so poorly defined) being in Iraq was pushed. My Grandpa said there were none there unless we gave them to Saddam Hussein, but Iraq did not have the ability to acquire such weapons. Grandpa was right.

He hated Republican economic policy and feared they would try to gut Social Security and Medicaid. He disliked the loss of manufacturing jobs and feared automation putting people out of work. He wanted church and state separated and thus wasn’t too sympathetic for anti-gay and anti-abortion laws. He wanted stronger gun control. He was skeptical of capitalism, saying we needed a little socialism for the country to run. He hated the Idaho government’s approach to education (cut the budget) and a general unwillingness among conservatives to pay taxes, especially the rich. He was concerned about wealth and income inequality. And so on.

Grandpa’s views were powerful, and at family reunions it seems that the family’s political opinions are very homogeneous. Few in those reunions with perhaps around 50 people were sympathetic to Republicans. There was once a time when I was in community college I thought I might be a Libertarian or a Republican, but that view did not survive the University of Utah. Grandpa was skeptical of this potential change in attitude but he loved me regardless of what I believed.

I believe that Grandpa inspiring me to care about politics set me on the track that lead me to where I am today. When I was a kid I didn’t care for math; I could understand it but I had no love for it. I cared about politics, government, and social studies. When I was in high school, while I was taking math classes, I cared about debate (more on that later) and the school literary magazine. I *hated* physics. To this day I care only for broad descriptions of physics concepts, not for the details. (Grandpa didn’t understand physics but he was fascinated by it, as well as how people can discover things using just mathematics.) But I felt that with my mathematics background and my interest in politics I should try and get a degree in economics. That lead me to take more math classes and a statistics classes (a subject I once thought was likely the most dry mathematical subject not extending beyond using means and proportions for baseball statistics). I fell in love with these subjects and now I’m pursuing a Ph.D. in mathematics, studying mathematical statistics.

You can now see the line of thought that lead me to where I am, and I thank my Grandpa for planting that seed. I still am very interested in current affairs and politics and likely will be for the rest of my life no matter what I do.

I have hayfever and my grandparents lived on a farm, so often when I visited I couldn’t breath through my nose and my eyes would become itchy and inflamed. Sleeping at night was hard since I couldn’t breath. One night was particularly bad and I think I got up to try and find some nasal spray. Grandpa was awake too (Grandpa struggled to sleep; more on that later), and when he saw me we got into the car together and drove into town. It was very early in the morning so most of the town’s stores were closed, but we managed to find a convenience store that was open. He bought a nasal spray for me, we rode back home, and the spray helped my nose clear up.

I played piano (and one year tried the clarinet) as a kid. I was never a great piano player, but I did develop some skill. My Grandpa loved music and wanted me to study it as well. I remember painful sessions of Grandpa sitting me down and giving me his version of a piano lesson. He was highly critical of me and mistakes I would make. These lessons would always turn into a lecture about how valuable a skill like playing piano would be (not from a financial perspective but more from a civic one). He’d berate me for spending time playing with toys or computer games and not spending more time practicing piano.

For all his talk of my possibly enjoying piano, I don’t think I ever loved it as much as I did other things. When I started college I dropped piano, and my Grandpa always reminded me of that decision. He wished I kept it up. I may return to practicing piano some day when I have more time, but I wish I could have played for him one more time. As painful as his lessons were, I liked being with my Grandpa and I put up with them.

I enjoyed Grandpa’s music, though. He lead a jazz band all his life, whether it was a high school band or a volunteer community band. I remember as a kid going to his room at the Eastern Idaho Technical College to listen to his bands rehearse, then attending his concerts in Idaho Falls parks. My favorite Fourth of July was when I was very young, when the day started with one of his jazz concerts. The day ended with fireworks over the Snake River while we sat on the banks. Days like that were beautiful.

He and his band was featured by a local television station; you can see them play here.

Grandpa was not one to mince words. His never physically hit anyone (except once when he whopped my Aunt Karen on the butt when she and my Mom were teenagers after she made a rude comment to my Mom while lying on the bed; I don’t know what she said but I bet she deserved what she got). But Grandpa’s lectures were legendary. I think every child and grandchild got at least one lecture. I got my fair share. And he would tell you what he thought, and nothing less.

Grandpa was not always right, but he was a wise man and I always listened to what he said. I never got upset when I got a lecture. I knew he loved me and wanted to tell me something he thought I needed to hear in order to be the best and happiest person I could be.

Grandpa exhibited many virtues, but I don’t see “patience” as one of them, at least from my experience. There were the aforementioned piano lessons. I also remember when Grandpa was teaching me to drive. He was the first person to put me behind the wheel of a vehicle and tell me to drive. He would get after me for many things while driving. I did learn, but not until after a good verbal whipping for my mistakes. (I know very well *never* to cross my arms when turning the steering wheel.)

I remember going to his property so many times to “build fences”. I never once remember building a fence. When we would go to his place for “building fences” we often did something else, perhaps having nothing to do with fences or even work. We would clean up grass, tear down old buildings and fencing, dig holes, and many other non-fence-building things. I remember one year *after* I learned how to drive we were towing old vehicles. I drove the towing truck while Grandpa steered the vehicle being towed. This went well until we tried to tow a very old, rusty yellow car. My brother was in the back of the pickup truck I was driving, directing me. I pressed the gas and was having a hard time getting the truck going, so I hit the gas too hard and pulled the car’s bumper off. Grandpa got out and kicked the car and gave me a verbal tongue lashing. I felt terrible, but Grandpa forgave me and gave me a hug. My pulling off the bumper prompted him to decide that the car was beyond refurbishing anyway.

Grandpa cared a great deal for his property. I remember him trudging off in his irrigation boots to start irrigating the property. I loved when he irrigated; I would run through the watery half-acre lawn and swim in a particularly deep divet in the lawn, deep enough to reach my neck when I was little. He changed irrigation technique later, and flooding the lawn no longer occurred. (I missed this.) Even though he was in his late 70s or even early 80s he would carry several large metal pipes on his shoulders with sprinklers on them. In the evening the sprinklers would be running. He mowed his massive lawns by hand for years but in his later years he learned to appreciate the riding lawnmower.

The lawns of his house are beautiful. My Dad wanted Grandpa to show him how to take care of his property and run his machines after my parents moved back in a couple months ago. Dad got to run the lawnmower but Grandpa died before he showed Dad what else needs to be done to keep the place in good condition. If Dad plans to learn on his own, it will be a heavy lift to keep the place in the same condition Grandpa did without his guidance.

Grandpa was a hyper person; he had ADHD and could not stand sitting around. Whenever he caught a child or grandchild sitting around he would give them something to do. He would sometimes ask “Are you bored?” I learned to answer “no” when he asked this question, because otherwise he would give me some chore to do. Grandpa valued hard work.

I think that Grandpa telling me to come stay with him to help build fences was just an excuse to have me around. I was fine with this. This was more time to spend with Grandpa. We often did some work, but we also did fun things. I don’t think Grandpa would say that he spoiled his grandchildren (in fact I think I once mentioned that most Grandparents spoil their grandchildren and he said “too bad for you”). I remember Grandpa buying us ice cream, soda, and candy bars, even as recently as a few months ago.

Grandpa lived on a farm in a very rural area. He took advantage of this. He would go on a walk every morning. When a big truck on the freeway would drive by, he would motion for the truck to blow its horn, and often the truck drivers obliged. Grandpa became known among the trucker community, always being spotted on his walks in the morning.

I remember night walks with my Grandpa, too. My family would put on their jackets and walk through the night in the area. The stars were bright and truck lights passed by on crisp evenings, sometimes in the winter, sometimes with a distant thunderstorm lighting up the sky. I remember watching dogs walk with us while Grandpa lead us in fun walking and marching songs. I still remember some of those songs.

I had such good times with my Grandparents as a child that one year, when we had to end our vacation and return home, I was completely beside myself. I didn’t want to leave them. I may have cried the whole way home. I loved being with my Grandparents. I was very close to my Grandpa.

I end this section with a story: Grandpa and I were driving to Pocatello to visit a hobby shop when I was interested in model railroading. The drive is about 30 minutes. He bought me a PayDay bar, the first time I remember having one of those bars. He asked me what I was thinking about. I said “nothing.”

“Nothing?” he replied. “You mean your mind is a void? Nothing going on?”

“I guess so,” I said.

“But there’s so much to think about. You should always be thinking about

something.”

I’ve always been thinking since. I’m almost never bored.

I think Grandpa was on the debate team in high school; he recalled competing in extemp. Grandpa and my Mom encouraged me to join the debate team, and I did so. This was important to how I developed as a person. Prior to debate I was an incredibly shy person; giving a presentation in front of a class was an act of great courage. Debate helped pull me out of my bubble. I was a debater during all of high school, and I did well, placing and winning in several events, one of which was extemp. Today, while I struggle to develop non-professional relationships with people (especially women), I can confidently teach a class of any size and give a presentation with basically no notes to a crowded theater without breaking a sweat.

One year I wanted my Grandpa to be a judge in a debate tournament. He agreed but somehow he got the impression that he would be watching me compete. At the time a debate only included the debaters and the judge, with some exceptions when there were multiple people competing in the same room, but I did not want to challenge the norm. I felt pressured not by Grandpa but by my family to allow him to watch, and I was upset; eventually they relented.

However my Grandpa was involved in my debate career. I remember demonstrating speeches for him that I had rehearsed extensively. One day my Grandpa even taught my debate class. I remember it was in 2008, since he was about to turn 80 years old.

I got my first girlfriend, Andrea, in December 2009 and we were together until January 2011, with a one-month break. Grandpa liked Andrea when he met her, and he invited her to the 2010 family reunion. I appreciated that.

He did meet my second girlfriend, Jasmin, years later in December 2014, but he didn’t get to see her for long. Jasmin broke up with me in May 2015 and I was greatly hurt by this. With all respect, Jasmin was my favorite girlfriend, even though I was with her only for nine months. I was very happy with her and basically saw her as the girlfriend I always wanted, ever since I was a teenager praying to God for a particular girl. Losing her hurt me deeply and I think that break-up changed me. As an undergrad I was largely confident and even becoming more friendly, but as a grad student (post-Jasmin) I’ve become less confident, more pessimistic, and more withdrawn.

2016 was a harder year for me and at the family reunion I was still struggling with my grief. I had moved out of my parents’ house and I was feeling lonely; I missed Jasmin a lot. I was studying out of a real analysis textbook at the time; I saw my mathematical abilities as one of the few things that gave me value.

I was alone at the reunion when Grandpa came up to me. I was working problems in the analysis book and he asked me if I was happy. I broke down in tears and said “No.” He put his hand on my arm to comfort me. I told him that I missed Jasmin a lot. He wanted to help me. He wanted me to move back in with my parents and he wanted to arrange for me to use one of his cars (I rely on transit, which makes it very difficult for me to get out and meet people). I refused to move back no matter how much he protested, and I never got that car even when he recommitted to trying to get me one when my parents moved back into his home in Idaho. That said, his caring meant a lot to me and it helped me to seek out help from a professional.

As a kid, one thing that I wanted was for Grandpa’s jazz band to play at my wedding, with Grandpa conducting. That was going to be his wedding gift to me. I don’t think I ever told him this. Within the last couple of years, as my romantic life turned into an even greater failure, I lost hope that this would happen. Grandpa tried to help me, giving me tips on how to talk to girls and where to meet them. I did try for a little while, but I couldn’t overcome myself. Now Grandpa is dead, and with him my dream.

Grandpa went on a trip to Peru with my Aunt Karen and my Aunt Dalena. He didn’t know any Spanish but he wanted to help the people there in any way he could. I heard that the people in the villages he visited were amazed by him; they had never seen anyone as old as he was. Yet he was still a capable individual. The story I remember the most was him telling the children about steam trains like he knew from when he was a kid; they were very poor with a weak education so they were not familiar with trains. As he got onto the plane to leave, he gave a “Toot, toot!” for the kids, with tears in his eyes. I heard it was a touching moment.

When I was in college Grandpa grew more frail. He hated it. He didn’t want to be weak. He sometimes would say he’s “not a man anymore”, when nothing could be further from the truth. He couldn’t stand up straight like he used to. He had problems with his knees, his feet, his heart. He once reached underneath his lawnmower and lost his fingertips since the machine was still running. He was very angry and distraught with this mistake and his now maimed hands. (In my opinion the wound wasn’t noticeable.) My aunts and uncles wanted to do more for him in order to prevent him from straining himself or getting into a dangerous situation. He resisted help. He would even refuse my offers to do the dishes for him; that was his job and I wasn’t taking it from him.

When my brother lived with Grandma and Grandpa he had to help them in a few emergencies. They were resilient but Grandpa was growing weaker. Doctors visits became more common. He was feeling less well. More surgeries were needed. His heart grew weaker, and a pacemaker had to be implanted. Grandpa was starting to get very old.

Grandpa feared old age and resisted it. He didn’t want to be deprived of his independence. He told me in car rides with just him and I that he wanted to die in his house, not in a nursing home. His home would be his only home, and nowhere else. And he was uneasy about the prospect of death. As much as he said that he wanted to see important life events for his grandchildren (even great grandchildren; he pointed to my baby nephew Ayven and said “I want to see *him* get is Ph.D”), he spoke of his own life as if he believed he would not be around for much longer.

He said he never doubted there was a creator; life looked very created to him. He did question Christianity in general, Seventh-Day Adventist’s flavor in particular. But while he questioned the details he embraced the idea of loving everyone and living what one could call a Christian life. He attended church regularly with Grandma until his death, and my brother tells me that in Sabbath school class he would end a discussion by saying that he loved everyone in the room and they all were special.

Grandpa also decided it was highly unlikely that this life was the end and death was eternal oblivion. I agreed with him.

Grandpa wanted to attend my college graduation ceremony, but he had a shingles outbreak and could not go. He and Grandma were heartbroken, and I wished the were able to see me walk. (I’m glad that my Uncle Mike and Aunt Donna managed to come, though; it meant a lot to me that they did.) After that I decided that I really wanted to have Grandpa see me walk to get my Ph.D. He was proud of my studies and always enjoyed talking to me and seeing how I thought. (Sometimes I would start to feel like a freak when so much attention was paid to my mathematical ability, but I knew that any attention was out of love and pride in his grandson.) I started to picture a dinner the day before my Ph.D. graduation where my family, including Grandma and Grandpa, would meet my adviser—Lajos Horváth—for the first time over dinner at a nice restaurant. Then the next day my family (including Grandpa) would see my adviser put the sash over my neck that made me Dr. Curtis Miller. I felt as if this vision could be attained.

Grandpa is dead now, and I don’t have my Ph.D. Another vision I really wanted that will never come true.

Last year Grandma spent several days in the hospital in Salt Lake City for a heart surgery. I spent a lot of time with Grandma and Grandpa and the aunts and uncles who came with them. In addition to enjoying a pancake breakfast every morning in the hospital cafeteria, I spent many hours just about every day I could with them, talking with them. This was the first time I saw Grandpa with a cane. But he seemed to take to it well.

There were other times throughout last year that I intermittently saw my Grandparents. Sometimes they came to Salt Lake City for medical reasons, sometimes it was for good things like the birth of my nephew, Ayven, or my sister’s graduation. I missed the last family reunion because it was planned to be around my Grandparents’ 69th wedding anniversary and I was already arranged to travel to San Francisco on a grant to attend an MSRI workshop. (I would have gladly missed the workshop if I was aware there would be a conflict, but by the time the date was announced I felt that I could not cancel. But I was there in spirit since half of the reunion’s attendees caught the stomach flu that I caught from the wild and then spread to the Utah branch of the family.) I saw him shortly before the semester started, though, along with the weekend I helped my family move to Idaho back into Grandma and Grandpa’s house, and also the week of fall break this semester.

I was actually debating whether to visit during fall break this year, but I decided in favor of visiting, and I’m so glad that I did. That visit was the last time I would see my Grandpa, and he always loved to see me. He was hoping that next summer I would spend a length of time in Idaho with them. I think Grandpa was becoming increasingly doubtful of his longevity and wanted to see me as much as he could before he was gone.

The night I arrived after taking a Salt Lake Express bus to Pocatello (my sister Alicia picked me up from there) I went into Grandma and Grandpa’s bedroom, where he was sitting. He was recovering from another knee surgery so he didn’t want to leave his room. I sat beside him on the bed and he and I talked for a very long time about current events, how science works, the world, and many other things. These were the conversations he loved to have with me, the kind of conversations he and I would have over periods ranging from my childhood to my teenage years to my college years. He didn’t understand everything I said, because despite my best efforts I sometimes struggle to make myself understood, no matter how much I fear talking over anyone’s head. But he loved it. As he always loved talking to me.

Sometimes I wonder if anyone enjoyed talking to me as much as my Grandpa did.

Grandpa looked miserable the last time I saw him. He couldn’t sleep at night. He got only a few hours of sleep, then he was awake. I remember one night while I was trying to go asleep on the couch seeing him walk to the chair behind me and just sit down and stare over me, effectively alone (because I was going to just pretend I was asleep to try and go to sleep; perhaps I should have talked to him). These sleepless nights were increasingly common for him; I heard from my Mom that one night he went to his car and turned on jazz music so he didn’t disturb anyone while he dealt with being awake. He felt very lonely in these times.

To be completely honest, during my last trip, Grandpa did not seem happy anymore. He seemed miserable. He looked more haggard and frail. He couldn’t do anything because he was tired all day since he didn’t sleep at night. This was the reason he (and thus Grandma, too; she was not leaving him) missed my nephew’s first birthday party.

He would still try to work. He wanted to get the tin building on his property ready for winter and cleaned out so that my family could store their stuff in there and their belongings would be safe from mice. We went to a local lumber yard to buy wood, then to C.A.L. Ranch for rat traps. While there, he bought me a candy bar. Then we returned to his home and tried to put the boards he bought into the doorway, only to find that the lumber yard had cut them to our *exact* specifications; this apparently was not what we wanted because the boards were too snug to slide in. Grandpa found a buzz saw in the shed and turned it on. I held the boards while he cut. He was too weak to hold the saw up so it ended up hanging right next to his leg while it was still running. I saw this enormous safety hazard and wanted to say something to him, perhaps offering to take control of the saw instead, but as before I could never bring myself to question my Grandpa, even when I *really* should have.

My Grandpa’s last advice to me was about regret. He questioned whether I was living a healthy lifestyle. I don’t work out all that much these days; that seems like time better spent studying or at least doing something I personally find fun. He said that regret is a hard thing to deal with later in life as you deal with the consequences of bad decisions. One should try to minimize regret as much as they can.

You can see in this article a number of regrets I have.

Before I left, Grandpa asked me to promise him that I would go to the gym. He was adamant about me making that promise, so I did. (I still haven’t gone.)

Grant took me home and while I did say goodbye to Grandma and Grandpa I didn’t have that final goodbye hug I’m used to getting from them. We had already pulled out and Grant wanted to get home soon; he was already late according to his schedule. So I called my Mom and I told Grandma and Grandpa that I forgot that hug but I loved them and I would see them soon at this year’s Thanksgiving dinner.

I never saw my Grandpa again. I should have told Grant to turn around so I could get that last goodbye. There’s another regret.

I have yet to see Grandpa’s body, but I will be a pall bearer at his funeral. I’m planning on wearing one of the outfits my sister helped me pick, which I wore regularly to teaching: a black coat and a black sweater vest over a black-and-white plaid shirt, with black jeans and sneakers. When Grandpa saw me wearing this outfit he would call me “the professor”. It’s an outfit I like and I think it looks sharp, so it seems fitting.

Many people remember Grandpa and were touched by him. It was just a couple months ago one of his students came to visit him, spending an hour at his house with her family; she remembered him fondly. When he died and the news was released an unusual number of people called to ask when the funeral was scheduled to take place. Our family thinks there could be many people at his funeral. We’re pleased he touched so many lives.

I want to place a copy of this article in his casket. I won’t print it on any fancy paper, but after posting it I’ll print it out with all the metadata associated with web pages that browsers print out. I thought about this and like

it; it shows where I made my memories of the only Grandpa I knew and who I loved dearly public, on my personal website, along with the time and date.

Since Grandpa’s death, there’s been talk about whether we would bring him back if we could, or whether he died at a good time. I’m entitled to whatever opinion I want because my opinion doesn’t change anything.

I’m happy that Grandpa avoided the worst of aging. In some ways his death was a mercy. He did not lose his independence. He did not lose his home. He did not see his health decline even further and he was doing what he loved.

But I feel that there was a lot of unfinished business, things we wanted him to see. My brother wanted Grandpa to see him become an electrician. And I wish so badly that he could have at least seen me get my Ph.D.

I wish he could have seen Ayven get *his* Ph.D.

I will never have another Grandpa in my life. I had a damn good Grandpa though, one of the finest men who’s lived. I will never stop missing him.

I love you Grandpa.

While Grandpa loved jazz, he also loved classical music. He said that if he had to pick one composer to listen to for the rest of his life, it would be Beethoven. So below is the last piano piece I played for Grandpa, “Moonlight Sonata,” by Beethoven.

]]>I’ve spent the past few weeks writing about **MCHT**, my new package for Monte Carlo and bootstrap hypothesis testing. After discussing how to use **MCHT** safely, I discussed how to use it for maximized Monte Carlo (MMC) testing, then bootstrap testing. One may think I’ve said all I want to say about the package, but in truth, I’ve only barely passed the halfway point!

Today I’m demonstrating how general **MCHT** is, allowing one to use it for multiple samples and on non-univariate data. I’ll be doing so with two examples: a permutation test and the test for significance of a regression model.

The idea of the permutation test dates back to Fisher (see [1]) and it forms the basis of computational testing for difference in mean. Let’s suppose that we have two samples with respective means and , respectively. Suppose we wish to test

against

using samples and , respectively.

If the null hypothesis is true and we also make the stronger assumption that the two samples were drawn from distributions that could differ only in their means, then the labelling of the two samples is artificial, and if it were removed the two samples would be indistinguishable. Relabelling the data and artificially calling one sample the sample and the other the sample would produce highly similar statistics to the one we actually observed. This observation suggests the following procedure:

- Generate new datasets by randomly assigning labels to the combined sample of and .
- Compute copies of the test statistic on each of the new samples; suppose that the test statistic used is the difference in means, .
- Compute the test statistic on the actual sample and compare to the simulated statistics. If the actual statistic is relatively large compared to the simulated statistics, then reject the null hypothesis in favor of the alternative; otherwise, don’t reject.

In practice step 3 is done by computing a -value representing the proportion of simulated statistics larger than the one actually computed.

The permutation test is effectively a bootstrap test, so it is supported by **MCHT**, though one may wonder how that’s the case when the parameters `test_stat`

, `stat_gen`

, and `rand_gen`

only accept one parameter, `x`

, representing the dataset (as opposed to, say, `t.test()`

, which has an `x`

and an optional `y`

parameter). But `MCHTest()`

makes very few assumptions about what object `x`

actually is; if your object is either a vector or tabular, then the `MCHTest`

object should not have a problem with it (it’s even possible a loosely structured `list`

would be fine, but I have not tested this; tabular formats should cover most use cases).

In this case, putting our data in long-form format makes doing a permutation test fairly simple. One column will contain the group an observation belongs to while the other contains observation values. The `test_stat`

function will split the data according to group, compute group-wise means, and finally compute the test statistic. `rand_gen`

generates new dataset by permuting the labels in the data frame. `stat_gen`

merely serves as the glue between the two.

The result is the following test.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x) { grp_means <- aggregate(value ~ group, data = x, FUN = mean) grp_means$value[1] - grp_means$value[2] } rg <- function(x) { x$group <- sample(x$group) x } sg <- function(x) { test_stat(x) } permute.test <- MCHTest(ts, sg, rg, seed = 123, N = 1000, localize_functions = TRUE) df <- data.frame("value" = c(rnorm(5, 2, 1), rnorm(10, 0, 1)), "group" = rep(c("x", "y"), times = c(5, 10))) permute.test(df)

## ## Monte Carlo Test ## ## data: df ## S = 1.3985, p-value = 0.036

Suppose for each observation in our dataset there is an outcome of interest, , and there are variables that could together help predict the value of if they are known. Consider then the following linear regression model (with ):

The first question someone should asked when considering a regression model is whether it’s worth anything at all. An alternative approach to predicting is simply to predict its mean value. That is, the model

is much simpler and should be preferred to the more complicated model listed above if it’s just as good at explaining the behavior of for all . Notice the second model is simply the first model with all the coefficients identically equal to zero.

The -test (described in more detail here) can help us decide between these two competing models. Under the null hypothesis, the second model is the true model.

The alternative says that at least one of the regressors is helpful in predicting .

We can use the statistic to decide between the two models:

and are the residual sum of squares of models 1 and

2, respectively.

This test is called the -test because usually the F-distribution is used to compute -values (as this is the distributiont the statistic should follow when certain conditions hold, at least asymptotically if not exactly). What then would a bootstrap-based procedure look like?

If the null hypothesis is true then the best model for the data is this:

is the sample mean of and is the residual. This suggests the following procedure:

- Shuffle over all rows of the input dataset, with replacement, to generate new datasets.
- Compute statistics for each of the generated datasets.
- Compare the statistic of the actual dataset to the generated datasets’ statistics.

Let’s perform the test on a subset of the `iris`

dataset. We will see if there is a relationship between the sepal length and sepal width among *iris setosa* flowers. Below is an initial split and visualization:

library(dplyr) setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width) plot(Sepal.Width ~ Sepal.Length, data = setosa)

There is an obvious relationship between the variables. Thus we should expect the test to reject the null hypothesis. That is what we would conclude if we were to run the conventional test:

res <- lm(Sepal.Width ~ Sepal.Length, data = setosa) summary(res)

## ## Call: ## lm(formula = Sepal.Width ~ Sepal.Length, data = setosa) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.72394 -0.18273 -0.00306 0.15738 0.51709 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.5694 0.5217 -1.091 0.281 ## Sepal.Length 0.7985 0.1040 7.681 6.71e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2565 on 48 degrees of freedom ## Multiple R-squared: 0.5514, Adjusted R-squared: 0.542 ## F-statistic: 58.99 on 1 and 48 DF, p-value: 6.71e-10

Let’s now implement the procedure I described with `MCHTest()`

.

ts <- function(x) { res <- lm(Sepal.Width ~ Sepal.Length, data = x) summary(res)$fstatistic[[1]] # Only way I know to automatically compute the # statistic } # rand_gen's function can use both x and n, and n will be the number of rows of # the dataset rg <- function(x, n) { x$Sepal.Width <- sample(x$Sepal.Width, replace = TRUE, size = n) x } b.f.test.1 <- MCHTest(ts, ts, rg, seed = 123, N = 1000) b.f.test.1(setosa)

## ## Monte Carlo Test ## ## data: setosa ## S = 58.994, p-value < 2.2e-16

Excellent! It reached the correct conclusion.

One may naturally ask whether we can write functions a bit more general than what I’ve shown here at least in the regression context. For example, one may want parameters specifying a formula so that the regression model isn’t hard-coded into the test. In short, the answer is yes; `MCHTest`

objects try to pass as many parameters to the input functions as they can.

Here is the revised example that works for basically any formula:

ts <- function(x, formula) { res <- lm(formula = formula, data = x) summary(res)$fstatistic[[1]] } rg <- function(x, n, formula) { dep_var <- all.vars(formula)[1] # Get the name of the dependent variable x[[dep_var]] <- sample(x[[dep_var]], replace = TRUE, size = n) x } b.f.test.2 <- MCHTest(ts, ts, rg, seed = 123, N = 1000) b.f.test.2(setosa, formula = Sepal.Width ~ Sepal.Length)

## ## Monte Carlo Test ## ## data: setosa ## S = 58.994, p-value < 2.2e-16

This shows that you can have a lot of control over how `MCHTest`

objects handle their inputs, giving you considerable flexibility.

Next post: time series and **MCHT**

- R. A. Fisher,
*The design of experiments*(1935)

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Now that we’ve seen **MCHT** basics, how to make `MCHTest()`

objects self-contained, and maximized Monte Carlo (MMC) testing with **MCHT**, let’s now talk about bootstrap testing. Not much is different when we’re doing bootstrap testing; the main difference is that the replicates used to generate test statistics depend on the data we feed to the test, and thus are not completely independent of it. You can read more about bootstrap testing in [1].

Let represent our test statistic. For bootstrap hypothesis testing, we will construct test statistics from data generated using our sample. Call these test statistics . These statistics are generated in such a way that we know that the null hypothesis holds for them. Suppose for the sake of demonstration that large values of constitute evidence against the null hypothesis. Then the -value for the bootstrap hypothesis test is

Here, is the indicator function.

There are many ways to generate the data used to compute . There’s the parametric bootstrap, where the data is used to estimate the parameters of a distribution, then those parameters are plugged into that distribution and then the distribution is used to generate new samples. There’s also the nonparametric bootstrap that doesn’t make such strong assumptions about the data, perhaps sampling from the data itself to generate new samples. Either of these methods can be used in bootstrap testing, and `MCHTest()`

supports both.

Unlike Monte Carlo tests, bootstrap tests cannot claim to be exact tests for any sample size; they’re better for larger sample sizes. That said, they often work well even in small sample sizes and thus are still a good alternative to inference based on asymptotic results. They also could serve as an alternative approach to the nuisance parameter problem, as MMC often has weak power.

In **MCHT**, there is little difference between bootstrap testing and Monte Carlo testing. Bootstrap tests need the original dataset to generate replicates; Monte Carlo tests do not. So the difference here is that the function passed to `rand_gen`

needs to accept a parameter `x`

rather than `n`

, with `x`

representing the original dataset, like that passed to `test_stat`

.

That’s the only difference. All else is the same.

Suppose we wish to test for the location of the mean. Our nonparametric bootstrap procedure is as follows:

- Generate samples of data from the demeaned dataset.
- Suppose our mean under the null hypothesis is . Add this mean to each generated dataset and compute the statistic for each of those datasets; these will be the simulated test statistics .
- Compute the test statistic on the main data and use the empirical distribution function of the simulated test statistics to compute a -value.

The code below implements this procedure.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x, mu = 0) { sqrt(length(x)) * (mean(x) - mu)/sd(x) } rg <- function(x) { x_demeaned <- x - mean(x) sample(x_demeaned, replace = TRUE, size = length(x)) } sg <- function(x, mu = 0) { x <- x + mu test_stat(x, mu = mu) # Will be localizing } b.t.test <- MCHTest(ts, sg, rg, seed = 123, N = 1000, lock_alternative = FALSE, test_params = "mu", localize_functions = TRUE) dat <- c(2.3, 1.1, 8.1, -0.2, -0.8, 4.7, -1.9) b.t.test(dat, alternative = "two.sided", mu = 1)

## ## Monte Carlo Test ## ## data: dat ## S = 0.68164, p-value = 0.432 ## alternative hypothesis: true mu is not equal to 1

b.t.test(dat, alternative = "less", mu = 7)

## ## Monte Carlo Test ## ## data: dat ## S = -3.8626, p-value = 0.025 ## alternative hypothesis: true mu is less than 7

The parametric bootstrap test assumes that the observed data was generated using a specific distribution, such as the Gaussian distribution. All that’s missing, in essence, is the parameters of that distribution. The procedure thust starts by estimating all nuisance parameters of the assumed distribution using the data. Then the first step of the process mentioned above (which admittedly was specific to a test for the mean but still strongly resembles the general process) is replaced with simulating data from the assumed distribution using any parameters assumed under the null hypothesis and the estimated values of any nuisance parameters. The other two steps of the above process are unchanged.

We can use the parametric bootstrap to test for goodness of fit with the Kolmogorov-Smirnov test. Without going into much detail, suppose that represents a distribution that is known except maybe for the values of its parameters. Assume that is an independently and identically distributed dataset, and we have observed values . We wish to use the dataset to decide between the hypotheses

That is, we want to test whether our data was drawn from the distribution or whether it was drawn from a different distribution. This is what the Kolmogorov-Smirnov test checks.

R implements this test in `ks.test()`

but that function does not allow for any nuisance parameters. It will only check for an exact match between distributions. This is often not what we want; we want to check whether out data was drawn from any member of the family of distributions , not a particular member with a particular combination of parameters. It’s tempting to plug in the estimated values of these parameters, but then the -value needs to be computed differently, not in the way that is prescribed by `ks.test()`

. Thus we will need to approach the test differently.

Since the distribution of the data is known under the null hypothesis, this is a good situation to use a bootstrap test. We’ll use maximum likelihood estimation to estimate the values of the missing parameters, as implemented by **fitdistrplus** (see [2]). Then we generate samples from this distribution using the estimated parameter values and use those samples to generate simulated test statistic values that follow the distribution prescribed by the null hypothesis.

Suppose we wished to test whether the data was drawn from a Weibull distribution. The result is the following test.

library(fitdistrplus) ts <- function(x) { param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] ks.test(x, pweibull, shape = shape, scale = scale, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] rweibull(n, shape = shape, scale = scale) } b.ks.test <- MCHTest(test_stat = ts, stat_gen = ts, rand_gen = rg, seed = 123, N = 1000) b.ks.test(rweibull(1000, 2, 2))

## ## Monte Carlo Test ## ## data: rweibull(1000, 2, 2) ## S = 0.021907, p-value = 0.275

b.ks.test(rbeta(1000, 2, 2))

## ## Monte Carlo Test ## ## data: rbeta(1000, 2, 2) ## S = 0.047165, p-value < 2.2e-16

Given the choice between a MMC test and a bootstrap test, which should you prefer? If you’re concerned about speed and power, go with the bootstrap test. If you’re concerned about precision and getting an “exact” test that’s at least conservative, then go with a MMC test. I think most of the time, though, the bootstrap test will be good enough, even with small samples, but that’s mostly a hunch.

Next week we will see how we can go beyond one-sample or univariate tests to multi-sample or multivariate tests. See the next blog post.

- J. G. MacKinnon,
*Bootstrap hypothesis testing*in*Handbook of computational econometrics*(2009) pp. 183-213 - M. L. Delignette-Muller and C. Dulag,
*fitdistrplus: an R package for fitting distributions*, J. Stat. Soft., vol. 64 no. 4 (2015)

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>I introduced **MCHT** two weeks ago and presented it as a package for Monte Carlo and boostrap hypothesis testing. Last week, I delved into important technical details and showed how to make self-contained `MCHTest`

objects that don’t suffer side effects from changes in the global namespace. In this article I show how to perform maximized Monte Carlo hypothesis testing using **MCHT**, as described in [1].

The usual procedure for Monte Carlo hypothesis testing is:

- Compute a test statistic for the data on which you wish to test a hypothesis
- Generate random datasets like the one of interest but with the data generating process (DGP) being the one prescribed by the null hypothesis, and compute the test statistic on each of these datasets
- Use the empirical distribution function of the simulated test statistics to compute the -value of the test

Monte Carlo tests often make strong distributional assumptions, such as what distribution generated the dataset being tested, but when those assumptions hold, they are exact tests (see [2]). They are not as powerful as if we had the exact distribution of the test statistic under those assumptions but the power increases with (see [3]) and given the power of modern computers getting a large is usually not a problem. Thus Monte Carlo tests are attractive in small sample situations where we do not want to rely on an asymptotic distribution for inference.

However, the procedure outlined above does not allow for nuisance parameters (parameters that are not the subject of interest but whose values are needed in order to conduct inference). In the introductory blog post, while one may view the population standard deviation as a nuisance parameter, the test statistic does not depend on it when the data follows a Gaussian distribution so there was no need to worry about it. In the case when we switched to data following the exponential distribution, it was still not a problem since its value was specified under the null hypothesis (). Thus it was no longer a nuisance parameter.

That said, nuisance parameters can still appear when we need to perform inference. Suppose, for example, that our data follows a Weibull distribution, denoted by , with being the shape parameter and the scale parameter. We want to test the set of hypotheses:

We can use the generalized likelihood ratio test to form a test statistic (which I won’t repeat hear but does appear in the code below). While Wilks’ theorem tells us about the asymptotic distribution of the test statistic, it says nothing about the exact distribution of the test statistic at a particular sample size, and it’s not given that the test statistic is pivotal and thus independent of the value of nuisance parameters under the null hypothesis (the nuisance parameter, in this case, being the shape parameter ).

What then can we do? A bootstrap test would estimate the value of the nuisance parameter under the null hypothesis and use that estimate as the actual value when generating new, simulated test statistics. Bootstrap tests, however, are not exact tests (see [2]) and we’ve decided that we want a test with stronger guarantees.

[1] introduced the maximized Monte Carlo (MMC) test, which proceeds as described below:

- Compute the test statistic from the data.
- Generate collections of random numbers, such as uniformly distributed random numbers, and use those random numbers for generating random copies of test statistics that depend on the values of nuisance parameters (notice that the random numbers are effectively
*not*discarded) - Use an optimization procedure ([1] suggested simulated annealing) to pick values for the nuisance parameters such that the -value is maximized; the maximally chosen -value is the -value of the test

[1] showed that this procedure yields -values that, while not as precise as if we knew the values of the nuisance parameters that produced the data, are at least *conservative*, in the sense that they’re no smaller than they should be (thus biasing our conclusions in favor of the null hypothesis). This is the best we can hope for in this context.

MMC is intuitive and compelling, and the theoretical guarantee gives us confidence in our conclusions, but it’s not a panacea. First, the optimization procedure is costly in work and time. Second (and, in my opinion, the biggest problem), the procedure may be *too* conservative. There’s a strong chance that the procedure will find *some* values for nuisance parameters that yield a large -value, perhaps a combination not at all resembling the actual values of the nuisance parameters that produced the data. In short, MMC can be severely lacking in power. When it does reject the null hypothesis, it’s compelling, but otherwise it’s not convincing that the alternative hypothesis is false.

Creating an implementation of MMC in R was my original goal in developing **MCHT**, and all that needs to be done to perform MMC is pass a value the `nuisance_params`

parameter and an appropriate list to `optim_control`

.

Let’s take the hypothesis test mentioned above and prepare to implement it using **MCHT**. I will be using **fitdistrplus** for maximum likelihood estimation, as required by the test statistic (see [4]).

library(MCHT)

## .------..------..------..------. ## |M.--. ||C.--. ||H.--. ||T.--. | ## | (\/) || :/\: || :/\: || :/\: | ## | :\/: || :\/: || (__) || (__) | ## | '--'M|| '--'C|| '--'H|| '--'T| ## `------'`------'`------'`------' v. 0.1.0 ## Type citation("MCHT") for citing this R package in publications

library(fitdistrplus)

registerDoParallel(detectCores()) # To be passed to test_stat ts <- function(x, scale = 1) { fit_null <- coef(fitdist(x, "weibull", fix.arg = list("scale" = scale))) kt <- fit_null[["shape"]] l0 <- scale fit_all <- coef(fitdist(x, "weibull")) kh <- fit_all[["shape"]] lh <- fit_all[["scale"]] n <- length(x) # Test statistic, based on the negative-log-likelihood ratio suppressWarnings(n * ((kt - 1) * log(l0) - (kh - 1) * log(lh) - log(kt/kh) - log(lh/l0)) - (kt - kh) * sum(log(x)) + l0^(-kt) * sum(x^kt) - lh^(-kh) * sum(x^kh)) } # To be passed to stat_gen; localize_functions will be TRUE sg <- function(x, scale = 1, shape = 1) { x <- qweibull(x, shape = shape, scale = scale) test_stat(x, scale = scale) }

The `MCHTest()`

parameter `nuisance_params`

accepts a character vector giving the names of nuisance parameters the distribution of the test statistic may depend upon, and those names must be among the arguments of the function passed to `stat_gen`

; that function is expected to know how to handle those parameters. In this case, `rand_gen`

will not be specified since by default it gives uniformly distributed random variables. It’s a well-known fact in probability that the inverse of the CDF of a random variable (which are the `q`

functions in R) applied to a uniformly distributed random variable yields a random variable that follows the distribution prescribed by the CDF. Hence the use of `qweibull()`

above, which is being applied to datasets of uniformly distributed random variables that are effectively fixed when `stat_gen`

will be called. Then the test statistic computed will be computed from data that follows the scale parameter prescribed by the null hypothesis but for some set value of , the shape parameter.

The `MCHTest`

object will then perform simulated annealing to choose the value of the nuisance parameter that maximizes the -value under the null hypothesis for the given dataset. Simulated annealing is implemented in the `GenSA()`

function provided by the **GenSA** package (see [5]). `GenSA()`

needs a description of what set of parameter values over which to optimize and there is no general method for choosing this, so `MCHTest()`

will require that a list be passed to `optim_control`

that effectively contains the parameters that will be passed to `GenSA()`

. At minimum, this list must contain an `upper`

and a `lower`

element, each of which are numeric vectors with names exactly like the character vector passed to `nuisance_params`

; these vectors specify the space `GenSA()`

will search to find the optima. Other elements can be passed to control `GenSA()`

, and I highly recommend reading the function’s documentation for more details.

There’s an additional parameter of `MCHTest()`

, `threshold_pval`

, that matters to the optimization. `GenSA()`

will take many steps to make sure it reaches a good optimal value, perhaps taking too long. The authors of **GenSA** recommend specifying an additional terminating condition to speed up the process. `threshold_pval`

will alter the `threshold.stop`

parameter in the `control`

list of `optim_control`

so that the algorithm terminates when the estimated -value crosses `threshold_pval`

‘s value. Effectively, this means that we know that whatever the true -value of the test is, it’s larger than that threshold, and if the threshold is chosen appropriately, we know that we should not reject the null hypothesis based on the results of this test.

While giving `threshold_pval`

a number less than 1 can help terminate the algorithm in the case of not rejecting the null hypothesis, the algorithm can still run for a long time if the test will eventually return a statistically significant result. For this reason, I recommend that `optim_list`

contain a list called `control`

and that the list should have a `max.time`

element telling the algorithm the maximum running time (in seconds) it should have.

With all this in mind, we create the `MCHTest`

object below:

mc.wei.scale.test <- MCHTest(ts, sg, N = 1000, seed = 123, test_params = "scale", nuisance_params = "shape", optim_control = list("lower" = c("shape" = 0), "upper" = c("shape" = 100), "control" = list( "max.time" = 10 )), threshold_pval = .2, localize_functions = TRUE) mc.wei.scale.test(rweibull(10, 2, 2), scale = 4)

## ## Monte Carlo Test ## ## data: rweibull(10, 2, 2) ## S = 7.2983, p-value < 2.2e-16

The MMC procedure is intersting and I don’t think any package implements it to the level I have in **MCHT**. The power of the procedure itself concerns me, though. Fortunately, the package also supports bootstrap testing, which I will discuss next week.

- J-M Dufour,
*Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics*, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477 - J. G. MacKinnon,
*Bootstrap hypothesis testing*in*Handbook of computational econometrics*(2009) pp. 183-213 - A. C. A. Hope,
*A simplified Monte Carlo test procedure*, JRSSB, vol. 30 (1968) pp. 582-598 - M. L. Delignette-Muller and C. Dulag,
*fitdistrplus: an R package for fitting distributions*, J. Stat. Soft., vol. 64 no. 4 (2015) - Y. Xiang et. al.,
*Generalized simulated annealing for global optimization: the GenSA package*, R Journal, vol. 5 no. 1 (2013) pp. 13-28

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Last week I announced the first release of **MCHT**, an R package that facilitates bootstrap and Monte Carlo hypothesis testing. In this article, I will elaborate on some important technical details about making `MCHTest`

objects, explaining in the process how closures and R environments work.

To recap, last week I made a basic `MCHTest`

-class object. These are S3-class objects; really they are just functions with a `class`

attribute. All the work is done in the initial function call creating the object. But there’s more to the story.

We want these objects to be self-contained. Specifically, we don’t want changes in the global namespace to change how a `MCHTest`

object behaves. By default, these objects are *not* self-contained and a programmer who isn’t careful can accidentally break these objects. Here I explain how to prefent this from happening.

I highly recommend those who want to learn more about closures and environments read [1], but I will briefly explain these critical concepts here.

A closure is a function created by another function. `MCHTest`

objects are closures, functions created by `MCHTest()`

(then given a `class`

attribute). An environment is an R object where other R objects are effectively defined. For example, there is the global environment where most R objects created by users live.

environment()

## <environment: R_GlobalEnv>

globalenv()

## <environment: R_GlobalEnv>

Ever wonder why a variable defined inside a function doesn’t affect anything outside of that function and why it simply disappears? It’s because when a function is called, a new environment is created, and all assignments within the function are done within that new environment. We can see this occuring with some clever use of `print()`

.

x <- 2 u <- function() { x } u()

## [1] 2

f <- function() { x <- 1 function() { x } } g <- f() g()

## [1] 1

environment(g)

## <environment: 0x9c45d78>

environment(u)

## <environment: R_GlobalEnv>

parent.env(environment(g))

## <environment: R_GlobalEnv>

`u()`

is a function and lives in the global environment so it looks for variables in the global environment. `g()`

, however, lives in an environment created by `f()`

. Normally, when a function creates an environment, it disappears the moment the function finishes execution. Closures, however, still use that environment created by the function, so the environment doesn’t disappear when the function finishes execution.

When a function looks for an object, it first looks for that object in its environment. If it doesn’t find the object there, it looks for the object in the parent environment of its environment. It will continue this process until it either finds the object or discovers that none of its environment’s ancestors has the object (prompting an error).

This means that the function is sensitive to changes in its environment or its environment’s ancestors, as we see here:

x <- 3 h <- function() { function() { x } } u()

## [1] 3

j <- h() environment(j)

## <environment: 0xa6e7cb4>

parent.env(environment(j))

## <environment: R_GlobalEnv>

j()

## [1] 3

One of R’s attractive features is that it promotes a style of programming that discourages side effects, where changes to one object doesn’t change the behavior of another. But the examples above show how closures can suffer side effects when objects in the global namespace are changed. The closures created above depend on the global environment in surprising ways for those not familiar with how environments in R work.

By default, `MCHTest`

objects can suffer from these side effects, and they can creep in if the functions passed to the parameters of `MCHTest()`

are carelessly defined, as we see below. (The tests being defined are effectively Monte Carlo -tests; learn about the -test here.)

library(MCHT)

## .------..------..------..------. ## |M.--. ||C.--. ||H.--. ||T.--. | ## | (\/) || :/\: || :/\: || :/\: | ## | :\/: || :\/: || (__) || (__) | ## | '--'M|| '--'C|| '--'H|| '--'T| ## `------'`------'`------'`------' v. 0.1.0 ## Type citation("MCHT") for citing this R package in publications

library(doParallel) registerDoParallel(detectCores()) ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma # z-test for mean = 0 } sg <- function(x, sigma = 1) { x <- sigma * x ts(x, sigma = sigma) # unsafe } unsafe.test.1 <- MCHTest(ts, sg, rnorm, seed = 100, N = 100, fixed_params = "sigma") unsafe.test.1(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 1.1972, sigma = 1, p-value = 0.15

ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) # again, unsafe } unsafe.test.2 <- MCHTest(ts, sg, rnorm, seed = 100, N = 100) unsafe.test.2(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.22926, p-value = 0.46

# ERROR unsafe.test.1(rnorm(10))

## Error in {: task 1 failed - "unused argument (sigma = sigma)"

What happened? Let’s pick it apart by looking at the `stat_gen`

parameter of `unsafe.test.1()`

.

get_MCHTest_settings(unsafe.test.1)$stat_gen

## function(x, sigma = 1) { ## x <- sigma * x ## ts(x, sigma = sigma) # unsafe ## }

This function depends on an object called `ts()`

. When the function looks for `ts()`

, it looks *in the global namespace!* This means that changes to `ts()`

in that namespace will change the behavior of the function. The most recent version of `ts()`

does not have a parameter called `sigma`

, prompting an error. *The object is not self-contained!*

How can we prevent side effects like this? One answer is to define the functions passed to `MCHTest()`

in a way that doesn’t depend on objects defined in the global namespace. For example, we would not call `ts()`

in `sg()`

above but instead rewrite the test statistic as we defined it in `ts()`

. (Using functions and objects defined in packages is okay, though, since these generally don’t change in an R session.)

However, this is not always practical. The test statistic written in `ts()`

could be complicated, and writing that same statistic again would not only be a lot of work but be tempting bugs to invade. Fortunately, `MCHTest()`

supports methods for making `MCHTest`

objects self-contained.

The first step is to set the `localize_functions`

parameter to `TRUE`

. This changes the environment of the `test_stat`

`stat_gen`

, `rand_gen`

, and `pval_func`

functions so that they belong to the environment the `MCHTest`

object lives in. Not only does this help make the function self-contained we may even be able to write our inputs in a more idiomatic way, like so:

ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma } sg <- function(x, sigma = 1) { x <- sigma * x test_stat(x, sigma = 1) # Would not be able to do this if localize_functions # were FALSE } safe.test.1 <- MCHTest(ts, sg, function(n) {rnorm(n)}, seed = 100, N = 100, fixed_params = "sigma", localize_functions = TRUE) safe.test.1(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 2.0277, sigma = 1, p-value = 0.02

ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) } safe.test.1(rnorm(10)) # Still works

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 1.0038, sigma = 1, p-value = 0.21

(Notice how `rand_gen`

was handled; it was wrapped in a function rather than passed directly. In short, this is to prevent the function `rnorm`

from being stripped of its namespace, since it needs functions from that namespace.)

This is the first step to removing side effects. (In fact it makes our functions better written since we can anticipate the existence of `test_stat`

as a function). However, we could still have variables or functions defined outside of our input functions. We can expose these functions to our localized input functions via the `imported_objects`

parameter, a list (the doppleganger of R’s environments) containing these objects.

ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma } sg <- function(x, sigma = 1) { x <- sigma * x ts(x) # We're going to do this safely now } safe.test.2 <- MCHTest(ts, sg, function(n) {rnorm(n)}, seed = 100, N = 100, fixed_params = "sigma", localize_functions = TRUE, imported_objects = list("ts" = ts)) safe.test.2(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.57274, sigma = 1, p-value = 0.39

ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) } safe.test.2(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.24935, sigma = 1, p-value = 0.45

Both `safe.test.1()`

and `safe.test.2()`

are now immune to changes in the global namespace. They are self-contained and thus safe to use.

By default, `localize_functions`

is `FALSE`

. I thought of making it `TRUE`

by default but I feared that those not familiar with the concept of environments would be bewildered by all the errors that would be thrown whenever they tried to use a function they defined. Setting the parameter to `TRUE`

makes using `MCHTest()`

more difficult.

That said, I highly recommend using the parameter in a longer script. It makes the function safer (errors are good when they’re enforcing safety), so become acquainted with it.

(Next post: maximized Monte Carlo hypothesis testing)

- H. Wickham,
*Advanced R*(2015), CRC Press, Boca Raton

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>