Dave’s Donuts offers 14 flavors of donuts (consider the supply of each flavor as being unlimited). The “grab bag” box consists of flavors randomly selected to be in the box, each flavor equally likely for each one of the dozen donuts. What is the probability that at most three flavors are in the grab bag box of a dozen?

For this we will need the multinomial distribution, which is a discrete probability distribution. In a string of characters there are characters possible to fill one position of the string, which is characters long. Therandom variable counts the number of occurrences of character 1 in the string, the number of occurrences of character 2, and so on until . Let be the individual probability each of the characters could appear in a position of the string; each position is filled independently of the characters in other positions. Let such that . Then

Here, , , and . So we can say

We will say

Compute each of those probabilities separately.

If and , there is exactly one flavor in the box.(1) shows the probability this happens is . Since we could pick an and there were 14 ways to make this decision, we can say

Let’s now compute . We start by fixing . We get

Unfortunately (2) includes cases where there’s actually only one flavor present in the box, so compute

(3) | ||

(4) |

Of course we could have picked different variables to fix at zero, andthere were ways to pick the variables to fix at zero (or equivalently, pick the variables to not fix at zero), finally yielding

Now to compute . Again we start by fixing and compute

We could try and use tricks to compute (6) or we can acknowledge that we’re busy people and ask SymPy to do it. Check that the following Python code is correct:

from sympy import init_session, binomial init_session() def multinomial(params): if len(params) == 1: return 1 return binomial(sum(params), params[-1]) * \ multinomial(params[:-1]) l1 = list() for i in range(1, 10 + 1): v = sum([multinomial([i, j, (12 - i - j)]) for j in range(1, 11 - i + 1)]) l1.append(v) sum(l1)/14**12 # Solution

The resulting probability is . We could have picked different flavors to fix, and there were ways to pick the flavors to fix, so we get

We can write (1) and (5) as and , respectively. Summing these probabilities yields

This is the proper way to obtain the probability that there are at most three flavors in the “grab bag” box, but how many boxes exist in which there are at most three flavors when we discount the number of ways there are to arrange the donuts in a box?

If there’s exactly one flavor, then we pick it and fill the box with that flavor; there’s 14 ways to pick one flavor. If there’s exactly two flavors in the box, we’ll call them Flavor 1 and Flavor 2. There is at least one donut of Flavor 1 and one of Flavor 2. Now pick the rest of the donuts’ flavors, order doesn’t matter, there is replacement; there are ways to do that. Then pick the two flavors: there’s ways to do that, and thus boxes with exactly twoflavors. Similarly, for exactly three flavors, there are ways for there to be exactly three flavors. Sum these numbers. (See `https://math.stackexchange.com/q/3230011`.) There are 21,035 such boxes.

Special thanks to Math Stack Exchange user wavex for his help with this problem! He provided the following R script for simulating it:

total = 0 for (y in 1:10000000){ x = rmultinom(1,12,c(1/14,1/14,1/14,1/14,1/14, 1/14,1/14,1/14,1/14,1/14,1/14,1/14,1/14,1/14)) x <- c(x) count = 14 for(i in x){ if(i==0){ count = count -1 } } if(count <= 3){total = total + 1} } sprintf("%.20f", total / 10000000)

In his run of the code this event occured only 27 out of 10,000,000 times. A rare event indeed!

This document was generated using the LaTeX2HTML translator Version 2019 (Released January 1, 2019)

The command line arguments were:

`latex2html -split 0 -nonavigation -lcase_tags -image_type gif Example6Solution.tex`

The translation was initiated on 2019-05-20

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

Advertisements

Last week I analyzed player rankings of the Arkham Horror LCG classes. This week I explain what I did in the data analysis. As I mentioned, this is the first time that I attempted inference with rank data, and I discovered how rich the subject is. A lot of the tools for the analysis I had to write myself, so you now have the code I didn’t have access to when I started.

This post will *not* discuss rank data modelling. Instead, it will cover what one may consider basic statistics and inference. The primary reference for what I did here is *Analyzing and Modeling Rank Data*, by John Marden. So far I’ve enjoyed his book and I may even buy a personal copy.

Suppose we have objects we ask our study participants (also known as “judges”) to rank. For example, suppose we asked people to rank apples, oranges, and bananas. What we then get is a prioritization of these objects according to our judges. This could come in the form

and we interpret the number in the position as the ranking of the item. In this case, if the tuple is in the order of apples, oranges, and bananas, then oranges recieved the highest ranking, bananas the second-highest, and apples the last position.

An alternative view of this data may be

where the items are arranged in order of preference. This form of describing a ranking has its uses, but we will consider only the first form in this introduction.

Ranking data has the following distinguishing characteristics from other data: first, the data is ordinal. All that matters is the order in which items were placed, not necessarily the numbers themselves. We could insist on writing rank data as and the information content would not have changed. (But of course we would never do this.) Second, every item gets a ranking. This excludes “Choose your top 3 out of 50”-type questions, since not every item would receive a ranking (this is called an incomplete ranking and requires special care; I won’t discuss this type of data in this article). Finally, every item’s ranking is distinct; no ties are allowed.

Thus ranking data is distinct even from just ordinal data since data comes from judges in the form of a tuple, not just a single ordinal value. (Thus we would not consider, say, Likert scale responses as automatically being an instance of rank data.) An ideal method for rank data would account for this unique nature and exploit its features.

From this point on I will be working with the Arkham Horror player class ranking data. I made the `Timestamp`

column nonsense to anonymize the data. You can download a CSV file of the data from here, then convert it to a `.Rda`

file with the script below (which is intended to be run as an executable):

#!/usr/bin/Rscript ################################################################################ # ArkhamHorrorClassPreferenceSurveyDataCleaner.R ################################################################################ # 2019-02-10 # Curtis Miller ################################################################################ # This file takes a CSV file read in and cleans it for later analysis, saving # the resulting data in a .Rda file. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(input, output = "out.Rda", help = FALSE) { input_file <- read.csv(input) input_columns <- names(input_file) arkham_classes <- c("Survivor", "Guardian", "Rogue", "Seeker", "Mystic") for (cl in arkham_classes) { names(input_file)[grepl(cl, input_columns)] <- cl } names(input_file)[grepl("Reason", input_columns)] <- "Reason" input_file$Reason <- as.character(input_file$Reason) input_file$Timestamp <- as.POSIXct(input_file$Timestamp, format = "%m/%d/%Y %H:%M:%S", tz = "MST") for (cl in arkham_classes) { input_file[[cl]] <- substr(as.character(input_file[[cl]]), 1, 1) input_file[[cl]] <- as.numeric(input_file[[cl]]) } survey_data <- input_file save(survey_data, file = output) } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = paste("Converts a CSV file with survey data ranking", "Arkham Horror classes into a .Rda file with a", "well-formated data.frame"), option_list = list( make_option(c("--input", "-i"), type = "character", help = "Name of input file"), make_option(c("--output", "-o"), type = "character", default = "out.Rda", help = "Name of output file to create") ) )) do.call(main, cl_args) }

(The script with all the code for the actual analysis appears at the end of this article.)

The first statistic we will compute for this data is the marginals matrix. This matrix simply records the proportion of times an item received a particular ranking in the sample. If we want to get mathematical, if is a ranking tuple and is the ranking of the option and the sample is , then the entry of the marginal’s matrix is

where the function $\latex I_{{A}}$ is 1 if is true and 0 otherwise. (Thus the sum above simply counts how many times was equal to .)

The marginals matrix for the Arkham Horror data is given below

MARGINALS --------- 1 2 3 4 5 Guardian 18.29 20.43 26.84 19.71 14.73 Mystic 19.71 18.29 17.81 20.90 23.28 Rogue 19.24 14.73 20.67 21.38 23.99 Seeker 28.03 25.18 17.10 18.53 11.16 Survivor 14.73 21.38 17.58 19.48 26.84

Below is a visual representation of the marginals matrix.

From the marginals matrix you could compute the vector representing the “mean” ranking of the data. For instance, the mean ranking of the Guardian class is the sum of the ranking numbers (column headers) times their respective proportions (in the Guardian row); here, that’s about 2.9 for Guardians. Repeat this process for every other group to get the mean ranking vector; here, the mean rank vector is (keeping the ordering of the classes suggested by the rows above, which is alphabetical order; this will always be the ordering I use unless otherwise stated.) Of couse this is not a ranking vectors; rankings are integers. The corresponding ranking vector would be to rank the means themselves; this gives a ranking vector of .

I don’t like inference using the mean ranking vector. As mentioned above, this data is ordinal; that means the magnitude of the numbers themselves should not matter. We could replace 1, 2, 3, 4, 5 with 1, 10, 100, 1000, 10000 and the data would mean the same thing. That is *not* the case if you’re using the mean rank unless you first apply a transformation to the rankings. In short, I don’t think that the mean ranking vector appreciates the nature of the data well. And since the marginals matrix is closely tied to this notion of “mean”, I don’t think the matrix is fully informative.

Another matrix providing descriptive statistics is the pairs matrix. The matrix records the proportion of respondents who preferred one option to the other (specifically, the row option to the column option). Mathematically, the entry of the pairs matrix is

The pairs matrix for the Arkham Horror data is below:

PAIRS ----- Guardian Mystic Rogue Seeker Survivor Guardian 0.00 54.16 55.34 42.52 55.82 Mystic 45.84 0.00 51.07 39.90 53.44 Rogue 44.66 48.93 0.00 38.72 51.54 Seeker 57.48 60.10 61.28 0.00 61.52 Survivor 44.18 46.56 48.46 38.48 0.00

First, notice that the diagonal entries are all zero; this will always be the case. Second, the pairs matrix is essentially completely determined by the entries above the diagonal of the matrix. Other forms of interence use these upper-diagonal entries and don’t use the lower-diagonal entries since they give no new information. The number of upper-diagonal entries is , which is the number of ways to pick pairs of classes.

The pairs matrix for the Arkham Horror data is visualized below.

With the pairs matrix, crossing above or below 50% of the sample being in the bin is a significant event; it indicates which classes are preferred to the other. In fact, by counting how many times this threshold was crossed, we can estimate that the overall favorite class is the Seeker class, followed by Guardians, then Mystics, then Rogues, and finally Survivors. This is another estimate of the “central”, “modal”, or “consensus” ranking. (This agrees with the “mean” ranking, but that’s not always going to be the case; the metrics can disagree with each other.)

While I did not like the marginals matrix I do like the pairs matrix; I feel as if it accounts for the features of rank data I want any measures or inference to take account of. It turns out that the pairs matrix is also related to my favorite distance metric for analyzing rank data.

A *distance metric* is a generalized notion of distance, or “how far away” two objects and are. In order for a function to be a metric, it must have the following properties:

- for all and .
- if and only if .
- for all and .
- for all (the “triangle

inequality”)

The notion of distance you use in every-day life, the one taught in middle-school geometry and computed whenever you use a ruler, is known as Euclidean distance. It’s not the only notion of distance, though, and may not be the only distance function you use in real-life. For instance, Manhattan or taxi cab distance is the distance from one point to another when you can only make 90-degree turns and is the distance that makes the most sense when travelling in the city.

There are many distance metrics we could consider when working with rank data. The Spearman distance is the square of the Euclidean distance, while the footrule distance corresponds to the Manhattan distance. It turns out that the mean rank vector above minimizes the sum of Spearman distances. The distance metric I based my analysis on, though, was the Kendall distance. I like this distance metric since it is not connected to the mean and considers the distance between the rankings and to be greater than the distance between and (unlike, say, the Hamming distance, which gives the same distance in either case).

Kendall’s distance even has an interpretation. Suppose that two ranking tuples are seen as the ordering of books on a bookshelf. We want to go from one ordering of books to another ordering of books. The Kendall distance is how many times we would need to switch adjacent pairs of books (chosen well, so as not to waste time and energy) to go from one ordering to the other. Thus the Kendall distance between and is one; we only need to make one swap. The distance between and , in comparison, is seven, since we need to make seven swaps.

It also turns out that the Kendall distance is related to the pairs matrix. The average Kendall distance of the judges from any chosen ranking is

(There is a similar expression relating the Spearman distance to the marginal matrix.)

Once we have a distance metric, we can define what the “best” estimate for the most central ranking is. The central ranking is the that minimizes

In other words, the most central ranking minimized the sum of distances of all the rankings in the data to that ranking.

Sometimes this ranking has already been determined. For instance, when using the Spearman distance, the central ranking emerges from the “mean” rankings. Otherwise, though, we may need to apply some search procedure to find this optimal ranking.

Since we’re working with rank data, though, it’s very tempting to not use any fancy optimization algorithms and simply compute the sum of distances for every possible ranking. This isn’t a bad idea at all if the number of items being ranked is relatively small. Here, since there are five items being ranked, the number of possible rankings is , which is not too big for a modern computer to handle. It may take some time for the exhaustive search approach to yield and answer, but the answer produced by exhaustive search comes with the reassurance that it does, in fact, minimize the sum of distances.

This is in fact what I did for estimating the central ranking when minimizing the sum of Kendall distances from said ranking. The resulting ranking, again, was Seeker/Guardian/Mystic/Rogue/Survivor (which agrees with what we determined just by looking at the pairs matrix; this likely is not a coincidence).

All of the above I consider falling into the category of descriptive statistics. It describes aspects of the sample without attempting to extrapolate to the rest of the population. With statistical inference we want to see what we can say about the population as a whole.

I should start by saying that the usual assumptions made in statistical inference are likely not satisfied by my sample. It was an opt-in sample; people *chose* to participate. That alone makes it a non-random sample. Additionally, only participants active on Facebook, Reddit, Twitter, Board Game Geek, and the Fantasy Flight forums were targeted by my advertising of the poll. Thus the Arkham players were likely those active on the Internet, likely at a particular time of day and day of the week (given how these websites try to push older content off the main page). They were likely young, male, and engaged enough in the game to be in the community (and unlikely to be a “casual” player). Thus the participants are likely to be more homogenous than the population of Arkham Horror players overall.

Just as a thought experiment, what would be a better study, one where we could feel confident in the inferential ability of our sample? Well, we would grab randomly selected people from the population (perhaps from pulling random names from the phone book), have them join our study, teach them how to play the game, make them play the game for many hours until they could form an educated opinion of the game (probably at least 100 hours), then ask them to rate the classes. This would be high-quality data and we could believe the data is reliable, but *damn* would it be expensive! No one at FFG would consider data of that quality worth the price, and frankly neither would I.

Having said that, while the sample I have is certainly flawed in how it was collected, I actually believe we can get good results from it. The opinions of the participants are likely educated ones, so we probably still have a good idea how the Arkham Horror classes compare to one another.

In rank data analysis there is a probability model called the *uniform distribution* that serves as a starting point for inference. Under the uniform distribution, every ranking vector is equally likely to be observed; in short, there is no preference among the judges among the choices. The marginals matrix should have all entries be , all off-diagonal entries of the pairs matrix should be , and any “central” ranking is meaningless since every ranking is equally likely to be seen. According to the uniform distribution, . If we cannot distinguish our data from data drawn from the uniform distribution, our work is done; we basically say there is no “common” ranking scheme and go about our day.

There are many tests for checking for the uniform distribution, and they are often based on the statistics we’ve already seen, such as the mean rank vector, the marginals matrix, and the pairs matrix. If is small enough relative to the sample size, we could even just base a test off of how frequently each particular ranking was seen. A test based off the latter could detect any form of non-uniformity in the data, while tests based off the marginals or pairs matrices or the mean vector cannot detect all forms of non-uniformity; that said, they often require much less data to be performed.

As mentioned, I like working with the pairs matrix/Kendall distance. The statistical test, though, involves a vector , which is the aforementioned upper triangle of the pairs matrix (excluding the diagonal entries which are always zero). (More specifically, is a vector containing the upper-diagonal entries of the pairs matrix laid out in row-major form.)

The test decides between

The test statistic is

If the null hypothesis is true, then the test statistic, for large , a distribution with degrees of freedom. (For the Arkham Horror classes case, .) Large test statistics are evidence against the null hypothesis, so -values are the area underneath the curve to the right of the test statistic.

For our data set, the reported test statistic was 2309938376; not shockingly, the corresponding -value is near zero. So the data was not drawn from the uniform distribution. Arkham Horror players do have class preferences.

But what are plausible preferences players could have? We can answer this using a confidence interval. Specifically, we want to know what *rankings* are plausible, and thus what we want is a confidence set of rankings.

Finding a formula for a confidence set of the central ranking is extremely hard to do, but it’s not as hard to form one for one of the statistics we can compute from the rankings, then use the possible values of that statistic to find corresponding plausible central rankings. For example, once could find a confidence set for the mean ranking vector, then translate those mean rankings into ranking vectors (this is what Marden did in his book).

As I said before, I like the pairs matrix/Kendall distance in the rank data context, so I want to form a confidence set for , the population equivalent of , the key entries of the pairs matrix. To do this, we cannot view the rank data the same way we did before; instead of seeing the -dimensional vector , we need to see the equivalent -dimensional vector that consists only of ones and zeros and records the pair-wise relationships among the ranks, rather than the ranks themselves (the latter vector literally says that item one is not ranked higher than item two, item one is ranked higher than item three, same for four, same for five, then that item two is ranked higher than item three, same for four, same for five, and so on, finally saying in its last entry that item four is ranked higher than item five).

We first compute by taking the means of these vectors. Then we compute the sample covariance matrix of the vectors; call it . Then a % confidence set for the true , appropriate for large sample sizes, is:

where is the percentile of the distribution with degrees of freedom.

The region I’ve just described is a -dimensional ellipsoid, a football-like shape that lives in a space with (probably) more than three dimensions. It sounds daunting, but one can still figure out what rankings are plausible once this region is computed. The trick is to work with each of the coordinates of the vector and determine whether there is a in the ellipsoid where that coordinate is 1/2. If the answer is no, then the value of that coordinate, for all in the ellipsoid, is either always above or always below 1/2. You can then look to (which is in the dead center of the ellipsoid) to determine which is the case.

What’s the significance of this? Let’s say that you listed all possible rankings in a table. Let’s suppose you did this procedure for the coordinate of corresponding to the Seeker/Rogue pair. If you determine that this coordinate is not 1/2 and that all in the ellipsoid ranks Seekers above Rogues, then you would take your list of rankings and remove all rankings that Rogues before Seekers, since these rankings are not in the confidence set.

If you do find a $\latex \kappa$ in the ellipsoid where the selected coordinate is 1/2, then you would not eliminate any rows in your list of rankings since you know that your confidence set must include some rankings that rank the two items one way and some rankings where the items are ranked the opposite way.

Repeat this procedure with every coordinate of —that is, every possible pairing of choices—and you then have a confidence set for central rankings.

Determining whether there is a vector in the ellipsoid with a select coordinate valued at 1/2 can be done via optimization. That is, find a $\latex \kappa$ that minimizes subject to the constraint that . You don’t even need fancy minimization algorithms for doing this; the minimum can, in principle, be computed analytically with multivariate calculus. After you found a minimizing , determine what the value of is at that . If it is less than , then you found a in the ellipsoid; otherwise, you know there is no such .

This was the procedure I used on the Arkham Horror class ranking data. The 95% confidence interval so computed determined that Seekers were ranked higher than Rogues and Survivors. That means that Seekers cannot have a ranking worse than 3 and Rogues and Survivors could not have rankings better than 2. Any ranking consistent with these constraints, though, is a plausible population central ranking. In fact, this procedure suggested that all the rankings below are plausible central population rankings:

Guardian Mystic Rogue Seeker Survivor 1 1 2 4 3 5 2 1 2 5 3 4 3 1 3 4 2 5 4 1 3 5 2 4 5 1 4 3 2 5 6 1 4 5 2 3 7 1 5 3 2 4 8 1 5 4 2 3 9 2 1 4 3 5 10 2 1 5 3 4 11 2 3 4 1 5 12 2 3 5 1 4 13 2 4 3 1 5 14 2 4 5 1 3 15 2 5 3 1 4 16 2 5 4 1 3 17 3 1 4 2 5 18 3 1 5 2 4 19 3 2 4 1 5 20 3 2 5 1 4 21 3 4 2 1 5 22 3 4 5 1 2 23 3 5 2 1 4 24 3 5 4 1 2 25 4 1 3 2 5 26 4 1 5 2 3 27 4 2 3 1 5 28 4 2 5 1 3 29 4 3 2 1 5 30 4 3 5 1 2 31 4 5 2 1 3 32 4 5 3 1 2 33 5 1 3 2 4 34 5 1 4 2 3 35 5 2 3 1 4 36 5 2 4 1 3 37 5 3 2 1 4 38 5 3 4 1 2 39 5 4 2 1 3 40 5 4 3 1 2

The confidence interval, by design, is much less bold than just an estimate of the most central ranking. Our interval suggests that there’s a lot we don’t know about what the central ranking is; we only know that whatever it is, it ranks Seekers above Rogues and Survivors.

The confidence set here is at least conservative in that it could perhaps contain too many candidate central rankings. I don’t know for sure whether we could improve on the set and eliminate more ranks from the plausible set by querying more from the confidence set for . Perhaps there are certain combinations that cannot exist, like excluding rankings that give both Seekers and Guardians a high ranking at the same time. If I were a betting man, though, I’d bet that the confidence set found with this procedure could be improved, in that not every vector in the resulting set corresponds with a in the original ellipsoidal confidence set. Improving this set, though, would take a lot of work as one would have to consider multiple coordinates of potential simultaneously, then find a rule for eliminating ranking vectors based on the results.

Matt Newman, the lead designer of Arkham Horror: The Card Game, does not believe all players are the same. Specifically, he believes that there are player types that determine how they like to play. In statistics we might say that Matt Newman believes that there are clusters of players within any sufficiently large and well-selected sample of players. This suggests we may want to perform cluster analysis to find these sub-populations.

If you haven’t heard the term before, clustering is the practice of finding “similar” data points, grouping them together, and identifying them as belonging to some sub-population for which no label was directly observed. It’s not unreasonable to believe that these sub-populations exist and so I sought to do clustering myself.

There are many ways to cluster. Prof. Malden said that a clustering of rank data into clusters should minimize the sum of the distances of each observation from their assigned cluster’s centers. However, he did not suggest a good algorithm for finding these clusters. He did suggest that for small samples, small and for a small number of clusters, we could exhaustively search for optimal clusters, an impractical idea.

I initially attempted a k-means-type algorithm for finding good clusters, one that used the Kendall distance rather than the Euclidean distance, but unfortunately I could not get the algorithm to give good results. I don’t know whether I have errors in my code (listed below) or whether the algorithm just doesn’t work for Kendall distances, but it didn’t work; in fact, it would take a good clustering and make it worse! I eventually abandoned my home-brewed k-centers algorithm (and the hours of work that went into it) and just used spectral clustering.

Spectral clustering isn’t easily described, but the idea of spectral clustering is to find groups of data that a random walker, walking from point to point along a weighted graph, would spend a long time in before moving to another group. (That’s the best simplification I can make; the rest is linear algebra.) In order to do spectral clustering, one must have a notion of “similarity” of data points. “Similarity” roughly means the opposite of “distance”; in fact, if you have a distance metric (and we do here), you can find a similarity measure by subtracting all distances from the maximum distance between any two objects. Similarity measures are not as strictly defined as distance metrics; any function that gives two “similar” items a high score and two “dissimilar” items a low score could be considered a similarity function.

Spectral clustering takes a matrix of similarity measures, computed for each pair of observations, and spits out cluster assignments. But in addition to the similarity measure, we need to decide how many clusters to find.

I find determining the “best” number of clusters to find the hardest part of clustering. We could have only one cluster, containing all our data; this is what we start with. We could also assign each data point to its own cluster; our aforementioned measure of cluster quality would then be zero, which would be great if it weren’t for the fact that our clusters mean nothing!

One approach people use for determining how many clusters to pick is the so-called elbow method. You take a plot of, say, Malden’s metric, compared against the number of clusters, and see if you can spot the “elbow” in the plot. The elbow corresponds to the “best” number of clusters.

Here’s the corresponding plot for the dataset here:

If you’re unsure where the “elbow” of the plot is, that’s okay; I’m not sure either. My best guess is that it’s at five clusters; hence my choice of five clusters.

Another plot that people use is the silhouette plot, explained quite well by the scikit-learn documentation. The silhouette plot for the clustering found by spectral clustering is shown below:

Is this a good silhouette plot? I’m not sure. It’s not the worst silhouette plot I saw for this data set but it’s not as good as examples shown in the **scikit-learn** documentation. There are observations that appear to be in the wrong cluster according to the silhouette analysis. So… inconclusive?

I also computed the Dunn index of the clusters. I never got a value greater than 0.125. All together, these methods lead me to suspect that there are no meaningful clusters in this data set, at least none that can be found with this approach.

But people like cluster analysis, so if you’re one of those folks, I have results for you.

CLUSTERING ---------- Counts: Cluster 1 2 3 4 5 130 83 80 66 62 Centers: Guardian Mystic Rogue Seeker Survivor 1 3 2 4 1 5 2 3 5 4 1 2 3 3 4 1 2 5 4 1 5 3 4 2 5 5 1 4 3 2 Score: 881 CLUSTER CONFIDENCE INTERVALS ---------------------------- Cluster 1: With 95% confidence: Guardian is better than Rogue Guardian is better than Survivor Mystic is better than Rogue Mystic is better than Survivor Seeker is better than Rogue Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 1 2 4 3 5 2 1 2 5 3 4 3 1 3 4 2 5 4 1 3 5 2 4 5 2 1 4 3 5 6 2 1 5 3 4 7 2 3 4 1 5 8 2 3 5 1 4 9 3 1 4 2 5 10 3 1 5 2 4 11 3 2 4 1 5 12 3 2 5 1 4 Cluster 2: With 95% confidence: Guardian is better than Mystic Guardian is better than Rogue Seeker is better than Guardian Seeker is better than Mystic Survivor is better than Mystic Seeker is better than Rogue Survivor is better than Rogue Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 2 4 5 1 3 2 2 5 4 1 3 3 3 4 5 1 2 4 3 5 4 1 2 Cluster 3: With 95% confidence: Rogue is better than Guardian Rogue is better than Mystic Rogue is better than Seeker Rogue is better than Survivor Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 2 3 1 4 5 2 2 4 1 3 5 3 2 5 1 3 4 4 3 2 1 4 5 5 3 4 1 2 5 6 3 5 1 2 4 7 4 2 1 3 5 8 4 3 1 2 5 9 4 5 1 2 3 10 5 2 1 3 4 11 5 3 1 2 4 12 5 4 1 2 3 Cluster 4: With 95% confidence: Guardian is better than Mystic Guardian is better than Seeker Rogue is better than Mystic Survivor is better than Mystic Survivor is better than Seeker Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 1 4 2 5 3 2 1 4 3 5 2 3 1 5 2 4 3 4 1 5 3 4 2 5 1 5 4 3 2 6 2 4 1 5 3 7 2 4 3 5 1 8 2 5 1 4 3 9 2 5 3 4 1 10 2 5 4 3 1 11 3 4 1 5 2 12 3 4 2 5 1 13 3 5 1 4 2 14 3 5 2 4 1 Cluster 5: With 95% confidence: Mystic is better than Guardian Survivor is better than Guardian Mystic is better than Rogue Mystic is better than Seeker Survivor is better than Rogue Survivor is better than Seeker Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 3 1 4 5 2 2 3 1 5 4 2 3 3 2 4 5 1 4 3 2 5 4 1 5 4 1 3 5 2 6 4 1 5 3 2 7 4 2 3 5 1 8 4 2 5 3 1 9 5 1 3 4 2 10 5 1 4 3 2 11 5 2 3 4 1 12 5 2 4 3 1

When computing confidence sets for clusters I ran into an interesting problem: what if, say, you never see Seekers ranked below Guardians? This will cause one of the entries of to be either 0 or 1, and there is no “variance” in its value; it’s always the same. This will cause the covariance matrix to be non-invertible since it has rows/columns that are zero. The solution to this is to eliminate those rows and work only with the non-constant entries of . That said, I still treat the entries removed as if they were “statisticall significant” results and remove rankings from our confidence set that are inconsistent with what we saw in the data. In short, if Seekers are never ranked below Guardians, remove all rankings in the confidence set that rank Seekers below Guardians.

One usually isn’t satisfied with just a clustering; it would be nice to determine what a clustering signifies about those who are in the cluster. For instance, what type of player gets assigned to Cluster 1? I feel that inspecting the data in a more thoughtful and manual way can give a sense to what characteristic individuals assigned to a cluster share. For instance, I read the comments submitted by poll participants to hypothesize what types of players were being assigned to particular clusters. You can read these comments at the bottom of this article, after the code section.

All source code used to do the rank analysis done here is listed below, in a `.R`

file intended to be run as an executable from a command line. (I created and ran it on a Linux system.)

Several packages had useful functions specific for this type of analysis, such as **pmr** (meant for modelling rand data) and **rankdist** (which had a lot of tools for working with the Kendall distance). The confidence interval, central ranking estimator, and hypothesis testing tools, though, I wrote myself, and they may not exist elsewhere.

I at least feel that the script itself is well-documented and I no longer need to explain it. But I will warn others that it was tailored to my problem, and the methods employed may not work well with larger sample sizes or when more items need to be ranked.

This is only the tip of the iceberg for rank data analysis. We have not even touched on modelling for rank data, which can provide even richer inference. If you’re interested, I’ll refer you again to Malden’s book.

I enjoyed this analysis so much I asked a Reddit question about where else I could conduct surveys (while at the same time still being statistically sound) because I’d love to do it again. I feel like there’s much to learn from rank data; it has great potential. Hopefully this article sparked your interest too.

#!/usr/bin/Rscript ################################################################################ # ArkhamHorrorClassPreferenceAnalysis.R ################################################################################ # 2019-02-10 # Curtis Miller ################################################################################ # Analyze Arkham Horror LCG class preference survey data. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # CONSTANTS ################################################################################ CLASS_COUNT <- 5 CLASSES <- c("Guardian", "Mystic", "Rogue", "Seeker", "Survivor") CLASS_COLORS <- c("Guardian" = "#00628C", "Mystic" = "#44397D", "Rogue" = "#17623B", "Seeker" = "#B87D37", "Survivor" = "#AA242D") ################################################################################ # FUNCTIONS ################################################################################ `%s%` <- function(x, y) {paste(x, y)} `%s0%` <- function(x, y) {paste0(x, y)} #' Sum of Kendall Distances #' #' Given a ranking vector and a matrix of rankings, compute the sum of Kendall #' distances. #' #' @param r The ranking vector #' @param mat The matrix of rankings, with each row having its own ranking #' @param weight Optional vector weighting each row of \code{mat} in the sum, #' perhaps representing how many times that ranking is repeated #' @return The (weighted) sum of the Kendall distances #' @examples #' mat <- rbind(1:3, #' 3:1) #' skd(c(2, 1, 3), mat) skd <- function(r, mat, weight = 1) { dr <- partial(DistancePair, r2 = r) sum(apply(mat, 1, dr) * weight) } #' Least Sum of Kendall Distances Estimator #' #' Estimates the "central" ranking by minimizing the sum of Kendall distances, #' via exhaustive search. #' #' @param mat The matrix of rankings, with each row having its own ranking #' @param weight Optional vector weighting each row of \code{mat} in the sum, #' perhaps representing how many times that ranking is repeated #' @return Ranking vector that minimizes the (weighted) sum of rankings #' @examples #' mat <- rbind(1:3, #' 3:1) #' lskd_estimator(mat) lskd_estimator <- function(mat, weight = NULL) { if (is.null(weight)) { reduced <- rank_vec_count(mat) mat <- reduced$mat weight <- reduced$count } skdm <- partial(skd, mat = mat, weight = weight) m <- max(mat) permutation_mat <- permutations(m, m) sums <- apply(permutation_mat, 1, skdm) permutation_mat[which.min(sums),] } #' Identify Ranking With Center #' #' Find the index of the center closest to a ranking vector. #' #' @param r The ranking vector #' @param mat The matrix of rankings, with each row having its own ranking #' @return Index of row that is closest to \code{r} #' @examples #' mat <- rbind(1:3, #' 3:1) #' close_center(c(2, 1, 3), mat) close_center <- function(r, mat) { dr <- partial(DistancePair, r2 = r) which.min(apply(mat, 1, dr)) } #' Simplify Rank Matrix To Unique Rows #' #' Given a matrix with rows representing rankings, this function reduced the #' matrix to rows of only unique rankings and also counts how many times a #' ranking appeared. #' #' @param mat The matrix of rankings, with each row having its own ranking #' @return A list with entries \code{"mat"} and \code{"count"}, with #' \code{"mat"} being a matrix now with unique rankings and #' \code{"count"} being a vector of times each row in new matrix #' appeared in the old matrix #' @examples #' mat <- rbind(1:3, #' 3:1) #' rank_vec_count(mat) rank_vec_count <- function(mat) { old_col_names <- colnames(mat) old_row_names <- rownames(mat) res_df <- aggregate(list(numdup = rep(1, times = nrow(mat))), as.data.frame(mat), length) count <- res_df$numdup new_mat <- res_df[1:ncol(mat)] colnames(new_mat) <- old_col_names rownames(new_mat) <- old_row_names list("mat" = as.matrix(new_mat), "count" = count) } #' Find \eqn{k} Ranking Clusters #' #' Estimate \eqn{k} clusters of rankings. #' #' The algorithm to find the ranking clusters resembles the \eqn{k}-means++ #' algorithm except that the distance metric is the Kendall distance. #' #' @param mat The matrix of rankings, with each row having its own ranking #' @param k The number of clusters to find #' @param max_iter The maximum number of iterations for algorithm #' @param tol The numerical tolerance at which to end the algorithm if met #' @return A list containing the central rankings of each cluster (in #' \code{"centers"}) and a vector with integers representing cluster #' assignments #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' rank_cluster(mat, 2) rank_cluster <- function(mat, k, init_type = c("spectral", "kmeans++"), max_iter = 100, tol = 1e-4) { simplified_mat <- rank_vec_count(mat) mat <- simplified_mat$mat count <- simplified_mat$count init_type <- init_type[1] if (init_type == "kmeans++") { centers <- rank_cluster_center_init(mat, k) } else if (init_type == "spectral") { centers <- rank_cluster_spectral(mat, k)$centers } else { stop("Don't know init_type" %s% init_type) } old_centers <- centers cc_centers <- partial(close_center, mat = centers) clusters <- apply(mat, 1, cc_centers) for (iter in 1:max_iter) { centers <- find_cluster_centers(mat, clusters, count) stopifnot(all(dim(centers) == dim(old_centers))) cc_centers <- partial(close_center, mat = centers) clusters <- apply(mat, 1, cc_centers) if (center_distance_change(centers, old_centers) < tol) { break } else { old_centers <- centers } } if (iter == max_iter) {warning("Maximum iterations reached")} colnames(centers) <- colnames(mat) list("centers" = centers, "clusters" = rep(clusters, times = count)) } #' Find the Distance Between Two Ranking Matrices #' #' Find the distance between two ranking matrices by summing the distance #' between each row of the respective matrices. #' #' @param mat1 First matrix of ranks #' @param mat2 Second matrix of ranks #' @return The sum of distances between rows of \code{mat1} and \code{mat2} #' @examples #' mat <- rbind(1:3, #' 3:1) #' center_distance_change(mat, mat) center_distance_change <- function(mat1, mat2) { if (any(dim(mat1) != dim(mat2))) {stop("Dimensions of matrices don't match")} sum(sapply(1:nrow(mat1), function(i) {DistancePair(mat1[i, ], mat2[i, ])})) } #' Initialize Cluster Centers #' #' Find initial cluster centers as prescribed by the \eqn{k}-means++ algorithm. #' #' @param mat The matrix of rankings, with each row having its own ranking #' @param k The number of clusters to find #' @return A matrix containing cluster centers. #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' rank_cluster_center_init(mat, 2) rank_cluster_center_init <- function(mat, k) { n <- nrow(mat) center <- mat[sample(1:n, 1), ] centers_mat <- rbind(center) for (i in 2:k) { min_distances <- sapply(1:n, function(l) { min(sapply(1:(i - 1), function(j) { DistancePair(mat[l, ], centers_mat[j, ]) })) }) center <- mat[sample(1:n, 1, prob = min_distances/sum(min_distances)), ] centers_mat <- rbind(centers_mat, center) } rownames(centers_mat) <- NULL colnames(centers_mat) <- colnames(mat) centers_mat } #' Evaluation Metric for Clustering Quality #' #' Evaluates a clustering's quality by summing the distance of each observation #' to its assigned cluster center. #' #' @param mat Matrix of rankings (in the rows); the data #' @param centers Matrix of rankings (in the rows) representing the centers of #' the clusters #' @param clusters Vector of indices corresponding to cluster assignments (the #' rows of the \code{clusters} matrix) #' @return Score of the clustering #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' centers <- rbind(1:3, 3:1) #' clusters <- c(1, 1, 2, 2) #' clustering_score(mat, centers, clusters) clustering_score <- function(mat, centers, clusters) { sum(sapply(1:nrow(centers), function(i) { center <- centers[i, ] submat <- mat[which(clusters == i), ] skd(center, submat) })) } #' Clustering with Restarts #' #' Clusters multiple times and returns the clustering with the lowest clustering #' score #' #' @param ... Parameters to pass to \code{\link{rank_cluster}} #' @param restarts Number of restarts #' @return A list containing the central rankings of each cluster (in #' \code{"centers"}) and a vector with integers representing cluster #' assignments #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' rank_cluster_restarts(mat, 2, 5) rank_cluster_restarts <- function(mat, ..., restarts = 10) { best_score <- Inf rank_cluster_args <- list(...) rank_cluster_args$mat <- mat for (i in 1:restarts) { new_cluster_scheme <- do.call(rank_cluster, rank_cluster_args) score <- clustering_score(mat, new_cluster_scheme$centers, new_cluster_scheme$clusters) if (score < best_score) { best_score <- score best_scheme <- new_cluster_scheme } } return(best_scheme) } #' Given Clusters, Find Centers #' #' Given a collection of clusters, find centers for the clusters. #' #' @param mat Matrix of rankings (in rows) #' @param clusters Vector containing integers identifying cluster assignments, #' where the integers range from one to the number of clusters #' @param weight Optional vector weighting each row of \code{mat} in the sum, #' perhaps representing how many times that ranking is repeated #' @return Ranking vector that minimizes the (weighted) sum of rankings #' @return A matrix of ranks representing cluster centers #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' find_cluster_centers(mat, c(1, 1, 2, 2)) find_cluster_centers <- function(mat, clusters, weight = NULL) { if (is.null(weight)) { weight <- rep(1, times = nrow(mat)) } centers <- t(sapply(unique(clusters), function(i) { submat <- mat[which(clusters == i), ] subweight <- weight[which(clusters == i)] lskd_estimator(submat, subweight) })) colnames(centers) <- colnames(mat) centers } #' Cluster Rankings Via Spectral Clustering #' #' Obtain a clustering of rank data via spectral clustering. #' #' @param mat Matrix containing rank data #' @param k Number of clusters to find #' @return A list with entries: \code{"centers"}, the centers of the clusters; #' and \code{"clusters"}, a vector assigning rows to clusters. #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' rank_cluster_spectral(mat, 2) rank_cluster_spectral <- function(mat, k = 2) { dist_mat <- DistanceMatrix(mat) sim_mat <- max(dist_mat) - dist_mat clusters <- spectralClustering(sim_mat, k) centers <- find_cluster_centers(mat, clusters) list("centers" = centers, "clusters" = clusters) } #' Compute the Test Statistic for Uniformity Based on the Pairs Matrix #' #' Compute a test for uniformity based on the estimated pairs matrix. #' #' Let \eqn{m} be the number of items ranked and \eqn{n} the size of the data #' set. Let \eqn{\bar{k} = k(k - 1)/2} and \eqn{\bar{y}} the mean rank vector. #' Let \eqn{\hat{K}^*} be the upper-triangular part of the estimated pairs #' matrix (excluding the diagonal), laid out as a vector in row-major order. #' Finally, let \eqn{1_k} be a vector of \eqn{k} ones. Then the test statistic #' is #' #' \deqn{12n(\|\hat{K}^* - \frac{1}{2} 1_{\bar{m}}\|^2 - \|\bar{y} - \frac{m + #' 1}{2} 1_m\|^2 / (m + 1))} #' #' Under the null hypothesis this statistic asympotically follow a \eqn{\chi^2} #' distribution with \eqn{\bar{m}} degrees of freedom. #' #' @param mat The data matrix, with rankings in rows #' @return The value of the test statistic #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' pairs_uniform_test_stat(mat) pairs_uniform_test_stat <- function(mat) { desc_stat <- suppressMessages(destat(mat)) mean_rank <- desc_stat$mean.rank pair <- desc_stat$pair m <- ncol(mat) - 1 n <- nrow(mat) mbar <- choose(m, 2) K <- pair[upper.tri(pair, diag = FALSE)] meanK <- rep(1/2, times = mbar) cm <- rep((m + 1)/2, times = m) 12 * n * (sum((K - meanK)^2) - sum((mean_rank - cm)^2)/(m + 1)) } #' Compute Covariance Matrix of Pairs Matrix Upper Triangle #' #' Compute the covariance matrix of the pairs matrix estimator. #' #' @param mat Data matrix, with each ranking having its own row #' @return The \eqn{m(m - 1)/2}-square matrix representing the covariance matrix #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' pairs_mat_cov(mat) pairs_mat_cov <- function(mat) { n <- nrow(mat) m <- ncol(mat) pair <- kappa_est(mat) pair <- as.matrix(pair) # Transform data into a dataset of pair-wise rank comparisons if (m == 1) { return(0) } kappa_data <- sapply(2:m, function(j) {mat[, j] > mat[, 1]}) for (i in 2:(m - 1)) { kappa_data <- cbind(kappa_data, sapply((i + 1):m, function(j) { mat[, j] > mat[, i] })) } kappa_data <- kappa_data + 0 # Converts to integers cov(kappa_data) } #' Estimate \eqn{\kappa} Vector #' #' Estimate the \eqn{\kappa} vector, which fully defines the pairs matrix. #' #' @param mat Data matrix, with each ranking having its own row #' @return The \eqn{m(m - 1)/2}-dimensional vector #' @examples #' mat <- rbind(1:3, #' 3:1, #' c(2, 1, 3), #' c(3, 1, 2)) #' kappa_est(mat) kappa_est <- function(mat) { n <- nrow(mat) df <- as.data.frame(mat) df$n <- 1 pair <- suppressMessages(destat(df)) pair <- t(pair$pair) pair <- pair[lower.tri(pair, diag = FALSE)]/n pair } #' Get Plausible Rankings For Central Ranking Based on Kendall Distance #' #' Determine a set of plausible central rankings based on the Kendall distance. #' #' Let \eqn{\alpha} be one minus the confidence level, \eqn{m} the number of #' options, \eqn{\bar{m} = m(m - 1)/2}, \eqn{\kappa} the vectorized #' upper-triangle of the pairs matrix of the population, \eqn{\hat{\kappa}} the #' sample estimate of \eqn{\kappa}, and \eqn{\hat{\Sigma}} the estimated #' covariance matrix of \eqn{\hat{kappa}}. Then the approximate \eqn{100(1 - #' \alpha)}% confidence interval for \eqn{\kappa} is #' #' \deqn{\kappa: (\hat{\kappa} - \kappa)^T \hat{\Sigma}^{-1} (\hat{kappa} - #' \kappa) < \chi^2_{\bar{m}}} #' #' One we have such an interval the next task is to determine which ranking #' vectors are consistent with plausible \eqn{\kappa}. To do this, the function #' determines which choices could plausibly be tied according to the confidence #' interval; that is, which entries of \eqn{\kappa} could plausibly be #' \eqn{1/2}. Whenever this is rejected, there is a statistically significant #' difference in the preference of the two choices; looking at \hat{\kappa} can #' determine which of the two choices is favored. All ranking vectors that would #' agree that disagree with that preference are eliminated from the space of #' plausible central ranking vectors. The ranking vectors surviving at the end #' of this process constitute the confidence interval. #' #' @param mat Matrix of rank data, each observation having its own row #' @param conf_level Desired confidence level #' @return A list with entries \code{"ranks"} holding the matrix of plausible #' rankings in the confidence interval and \code{"preference_string"}, a #' string enumerating which options are, with statistical significance, #' preferred over others #' @examples #' mat <- t(replicate(100, {sample(1:3)})) #' kendall_rank_conf_interval(mat) kendall_rank_conf_interval <- function(mat, conf_level = 0.95) { n <- nrow(mat) m <- max(mat) mbar <- choose(m, 2) kap <- kappa_est(mat) Sigma <- pairs_mat_cov(mat) crit_value <- qchisq(1 - conf_level, df = mbar, lower.tail = FALSE) # Find bad rows of Sigma, where the covariance is zero; that variable must be # constant const_vars <- which(colSums(Sigma^2) == 0) safe_vars <- which(colSums(Sigma^2) > 0) safe_kap <- kap[safe_vars] safe_Sigma <- Sigma[safe_vars, safe_vars] # Determine if hyperplanes where one coordinate is 1/2 intersect confidence # set b <- as.matrix(solve(safe_Sigma, safe_kap)) a <- t(safe_kap) %*% b a <- a[1, 1] check_half <- partial(hei_check, x = 1/2, A = safe_Sigma, b = -2 * b, d = crit_value/n - a, invert_A = TRUE) sig_diff_safe_vars <- !sapply(1:length(safe_vars), check_half) if (length(const_vars) > 0) { sig_diff <- rep(NA, times = mbar) sig_diff[safe_vars] <- sig_diff_safe_vars sig_diff[const_vars] <- TRUE } else { sig_diff <- sig_diff_safe_vars } idx_matrix <- matrix(0, nrow = m, ncol = m) idx_matrix[lower.tri(idx_matrix, diag = FALSE)] <- 1:mbar idx_matrix <- t(idx_matrix) rownames(idx_matrix) <- colnames(mat) colnames(idx_matrix) <- colnames(mat) # Remove rows of potential centers matrix to reflect confidence interval # results; also, record which groups seem to have significant difference in # ranking rank_string <- "" permutation_mat <- permutations(m, m) for (i in 1:(m - 1)) { for (j in (i + 1):m) { sig_diff_index <- idx_matrix[i, j] if (sig_diff[sig_diff_index]) { direction <- sign(kap[sig_diff_index] - 1/2) if (direction > 0) { # Row option (i) is preferred to column option (j) permutation_mat <- permutation_mat[permutation_mat[, i] < permutation_mat[, j], ] rank_string <- rank_string %s0% colnames(mat)[i] %s% "is better than" %s% colnames(mat)[j] %s0% '\n' } else if (direction < 0) { # Row option (i) is inferior to column option (j) permutation_mat <- permutation_mat[permutation_mat[, i] > permutation_mat[, j], ] rank_string <- rank_string %s0% colnames(mat)[j] %s% "is better than" %s% colnames(mat)[i] %s0% '\n' } } } } colnames(permutation_mat) <- colnames(mat) return(list("ranks" = permutation_mat, "preference_string" = rank_string)) } #' Straight Hyperplane and Ellipse Intersection Test #' #' Test whether a hyperplane parallel to an axis intersects an ellipse. #' #' The ellipse is fully determined by the parameters \code{A}, \code{b}, and #' \code{d}; in fact, the ellipse consists of all \eqn{x} such that #' #' \deqn{x^T A x + b^T x \leq d} #' #' \code{x} is the intercept of the hyperplane and \code{k} is the coordinate #' that is fixed to the value \code{x} and thus determine along which axis the #' hyperplane is parallel. A value of \code{TRUE} means that there is an #' intersection, while \code{FALSE} means there is no intersection. #' #' @param x The fixed value of the hyperplane #' @param k The coordinate fixed to \code{x} #' @param A A \eqn{n \times n} matrix #' @param b An \eqn{n}-dimensional vector #' @param d A scalar representing the upper bound of the ellipse #' @return \code{TRUE} or \code{FALSE} depending on whether the hyperplane #' intersects the ellipse or not #' @examples #' hei_check(1, 2, diag(3), rep(0, times = 3), 10) hei_check <- function(x, k, A, b, d, invert_A = FALSE) { b <- as.matrix(b) n <- nrow(b) stopifnot(k >= 1 & k <= n) stopifnot(nrow(A) == ncol(A) & nrow(A) == n) stopifnot(all(eigen(A)$values > 0)) all_but_k <- (1:n)[which(1:n != k)] s <- rep(0, times = n) s[k] <- x s <- as.matrix(s) if (invert_A) { tb <- as.matrix(solve(A, s)) } else { tb <- A %*% s } td <- t(s) %*% tb + t(b) %*% s if (invert_A) { # XXX: curtis: NUMERICALLY BAD; FIX THIS -- Thu 14 Feb 2019 07:50:19 PM MST A <- solve(A) } tA <- A[all_but_k, all_but_k] tx <- -solve(tA, (b/2 + tb)[all_but_k, ]) tx <- as.matrix(tx) val <- t(tx)%*% tA %*% tx + t((b + 2 * tb)[all_but_k]) %*% tx + td - d val <- val[1, 1] val <= 0 } ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(input, prefix = "", width = 6, height = 4, clusters = 5, conflevel = 95, comments = "AHLCGClusterComments.txt", detailed = FALSE, help = FALSE) { suppressPackageStartupMessages(library(pmr)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(reshape2)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(rankdist)) suppressPackageStartupMessages(library(gtools)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(anocva)) load(input) n <- nrow(survey_data) rank_data <- survey_data[CLASSES] rank_data$n <- 1 rank_mat <- as.matrix(survey_data[CLASSES]) # Get basic descriptive statistics: mean ranks, marginals, pairs desc_stat <- suppressMessages(destat(rank_data)) mean_rank <- desc_stat$mean.rank marginal <- desc_stat$mar pair <- desc_stat$pair names(mean_rank) <- CLASSES rownames(marginal) <- CLASSES colnames(marginal) <- 1:CLASS_COUNT rownames(pair) <- CLASSES colnames(pair) <- CLASSES # Compute "typical" distance based on least sum of Kendall distances best_rank <- lskd_estimator(rank_mat) names(best_rank) <- CLASSES # Hypothesis Testing for Uniformity statistic <- pairs_uniform_test_stat(rank_data) # Confidence Interval ci <- kendall_rank_conf_interval(rank_mat, conf_level = conflevel / 100) # Cluster data rank_clustering <- rank_cluster_spectral(rank_mat, k = clusters) centers <- rank_clustering$centers Cluster <- rank_clustering$clusters # Naming convention broke for printing rownames(centers) <- 1:nrow(centers) # Plotting marginal_plot <- ggplot( melt(100 * marginal / n, varnames = c("Class", "Rank"), value.name = "Percent"), aes(fill = Class, x = Class, y = Percent, group = Rank)) + geom_bar(position = "dodge", stat = "identity") + scale_fill_manual(values = CLASS_COLORS) + labs(title = "Class Rankings") + theme_bw() ggsave(prefix %s0% "marginal_plot.png", plot = marginal_plot, width = width, height = height, units = "in", dpi = 300) pair_plot <- ggplot( melt(100 * pair / n, varnames = c("Class", "Opposite"), value.name = "Percent") %>% filter(Percent > 0), aes(fill = Opposite, x = Class, y = Percent)) + geom_bar(position = "dodge", stat = "identity") + geom_hline(yintercept = 50, linetype = 2, color = "red") + scale_fill_manual(values = CLASS_COLORS) + labs(title = "Class Ranking Comparison") + theme_bw() ggsave(prefix %s0% "pair_plot.png", plot = pair_plot, width = width, height = height, units = "in", dpi = 300) # Place cluster comments in file comment_string <- "" for (i in 1:clusters) { comment_string <- comment_string %s0% "\n\nCLUSTER" %s% i %s0% "\n------------\n\n" %s0% paste(survey_data$Reason[survey_data$Reason != "" & Cluster == i], collapse = "\n\n-*-\n\n") } cat(comment_string, file = comments) # Printing cat("\nMEAN RANK\n---------\n") print(round(mean_rank, digits = 2)) cat("\nMARGINALS\n---------\n") print(round(100 * marginal / n, digits = 2)) cat("\nPAIRS\n-----\n") print(round(100 * pair / n, digits = 2)) cat("\nUNIFORMITY TEST\n---------------\n") cat("Test Statistic:", statistic, "\n") cat("P-value:", pchisq(statistic, df = choose(CLASS_COUNT, 2), lower.tail = FALSE), "\n") cat("\nOPTIMAL RANK ESTIMATE\n---------------------\n") print(sort(best_rank)) cat("\nWith", conflevel %s0% '%', "confidence:", '\n' %s0% ci$preference_string) if (detailed) { cat("\nPlausible Modal Rankings:\n") print(as.data.frame(ci$ranks)) } cat("\nCLUSTERING\n----------\nCounts: ") print(table(Cluster)) cat("\nCenters:\n") print(centers) cat("\nScore:", clustering_score(rank_mat, centers, Cluster), "\n") if (detailed) { cat("\nCLUSTER CONFIDENCE INTERVALS\n----------------------------\n") for (i in 1:clusters) { cat("\nCluster", i %s0% ':\n') ci_cluster <- kendall_rank_conf_interval(rank_mat[Cluster == i, ]) cat("\nWith", conflevel %s0% '%', "confidence:", '\n' %s0% ci_cluster$preference_string) cat("\nPlausible Modal Rankings:\n") print(as.data.frame(ci_cluster$ranks)) } } } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = paste("Analyze Arkham Horror LCG class preference survey", "data and print results."), option_list = list( make_option(c("--input", "-i"), type = "character", help = paste("Input file containing survey data")), make_option(c("--prefix", "-p"), type = "character", default = "", help = "Another command-line argument"), make_option(c("--width", "-w"), type = "double", default = 6, help = "Width of plots"), make_option(c("--height", "-H"), type = "double", default = 4, help = "Height of plots"), make_option(c("--clusters", "-k"), type = "integer", default = 5, help = "Number of clusters in spectral clustering"), make_option(c("--comments", "-c"), type = "character", default = "AHLCGClusterComments.txt", help = "File to store participant comments organized" %s% "by cluster"), make_option(c("--conflevel", "-a"), type = "double", default = 95, help = "Confidence level of confidence set"), make_option(c("--detailed", "-d"), action = "store_true", default = FALSE, help = "More detail in report") ) )) do.call(main, cl_args) }

$ ./ArkhamHorrorClassPreferenceAnalysis.R -i AHLCGClassPreferenceSurveys.Rda --detailed

MEAN RANK --------- Guardian Mystic Rogue Seeker Survivor 2.92 3.10 3.16 2.60 3.22 MARGINALS --------- 1 2 3 4 5 Guardian 18.29 20.43 26.84 19.71 14.73 Mystic 19.71 18.29 17.81 20.90 23.28 Rogue 19.24 14.73 20.67 21.38 23.99 Seeker 28.03 25.18 17.10 18.53 11.16 Survivor 14.73 21.38 17.58 19.48 26.84 PAIRS ----- Guardian Mystic Rogue Seeker Survivor Guardian 0.00 54.16 55.34 42.52 55.82 Mystic 45.84 0.00 51.07 39.90 53.44 Rogue 44.66 48.93 0.00 38.72 51.54 Seeker 57.48 60.10 61.28 0.00 61.52 Survivor 44.18 46.56 48.46 38.48 0.00 UNIFORMITY TEST --------------- Test Statistic: 2309938376 P-value: 0 OPTIMAL RANK ESTIMATE --------------------- Seeker Guardian Mystic Rogue Survivor 1 2 3 4 5 With 95% confidence: Seeker is better than Rogue Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 1 2 4 3 5 2 1 2 5 3 4 3 1 3 4 2 5 4 1 3 5 2 4 5 1 4 3 2 5 6 1 4 5 2 3 7 1 5 3 2 4 8 1 5 4 2 3 9 2 1 4 3 5 10 2 1 5 3 4 11 2 3 4 1 5 12 2 3 5 1 4 13 2 4 3 1 5 14 2 4 5 1 3 15 2 5 3 1 4 16 2 5 4 1 3 17 3 1 4 2 5 18 3 1 5 2 4 19 3 2 4 1 5 20 3 2 5 1 4 21 3 4 2 1 5 22 3 4 5 1 2 23 3 5 2 1 4 24 3 5 4 1 2 25 4 1 3 2 5 26 4 1 5 2 3 27 4 2 3 1 5 28 4 2 5 1 3 29 4 3 2 1 5 30 4 3 5 1 2 31 4 5 2 1 3 32 4 5 3 1 2 33 5 1 3 2 4 34 5 1 4 2 3 35 5 2 3 1 4 36 5 2 4 1 3 37 5 3 2 1 4 38 5 3 4 1 2 39 5 4 2 1 3 40 5 4 3 1 2 CLUSTERING ---------- Counts: Cluster 1 2 3 4 5 130 83 80 66 62 Centers: Guardian Mystic Rogue Seeker Survivor 1 3 2 4 1 5 2 3 5 4 1 2 3 3 4 1 2 5 4 1 5 3 4 2 5 5 1 4 3 2 Score: 881 CLUSTER CONFIDENCE INTERVALS ---------------------------- Cluster 1: With 95% confidence: Guardian is better than Rogue Guardian is better than Survivor Mystic is better than Rogue Mystic is better than Survivor Seeker is better than Rogue Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 1 2 4 3 5 2 1 2 5 3 4 3 1 3 4 2 5 4 1 3 5 2 4 5 2 1 4 3 5 6 2 1 5 3 4 7 2 3 4 1 5 8 2 3 5 1 4 9 3 1 4 2 5 10 3 1 5 2 4 11 3 2 4 1 5 12 3 2 5 1 4 Cluster 2: With 95% confidence: Guardian is better than Mystic Guardian is better than Rogue Seeker is better than Guardian Seeker is better than Mystic Survivor is better than Mystic Seeker is better than Rogue Survivor is better than Rogue Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 2 4 5 1 3 2 2 5 4 1 3 3 3 4 5 1 2 4 3 5 4 1 2 Cluster 3: With 95% confidence: Rogue is better than Guardian Rogue is better than Mystic Rogue is better than Seeker Rogue is better than Survivor Seeker is better than Survivor Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 2 3 1 4 5 2 2 4 1 3 5 3 2 5 1 3 4 4 3 2 1 4 5 5 3 4 1 2 5 6 3 5 1 2 4 7 4 2 1 3 5 8 4 3 1 2 5 9 4 5 1 2 3 10 5 2 1 3 4 11 5 3 1 2 4 12 5 4 1 2 3 Cluster 4: With 95% confidence: Guardian is better than Mystic Guardian is better than Seeker Rogue is better than Mystic Survivor is better than Mystic Survivor is better than Seeker Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 1 4 2 5 3 2 1 4 3 5 2 3 1 5 2 4 3 4 1 5 3 4 2 5 1 5 4 3 2 6 2 4 1 5 3 7 2 4 3 5 1 8 2 5 1 4 3 9 2 5 3 4 1 10 2 5 4 3 1 11 3 4 1 5 2 12 3 4 2 5 1 13 3 5 1 4 2 14 3 5 2 4 1 Cluster 5: With 95% confidence: Mystic is better than Guardian Survivor is better than Guardian Mystic is better than Rogue Mystic is better than Seeker Survivor is better than Rogue Survivor is better than Seeker Plausible Modal Rankings: Guardian Mystic Rogue Seeker Survivor 1 3 1 4 5 2 2 3 1 5 4 2 3 3 2 4 5 1 4 3 2 5 4 1 5 4 1 3 5 2 6 4 1 5 3 2 7 4 2 3 5 1 8 4 2 5 3 1 9 5 1 3 4 2 10 5 1 4 3 2 11 5 2 3 4 1 12 5 2 4 3 1

CLUSTER 1 ------------ Guardians have serious bling and they're awesome at what they do, so they're number 1. Seekers also have great cards that guzzle clues and generally provide solid deck building, so they're #2. Rogues have cards that look like a lot of fun (there's bling there too) and they are often good at both clue gathering and fighting, depending on which is needed. Mystic decks feel like they're all the same, so building decks with them is not as much fun. Survivor cards are extremely limited so they're my least favorite. -*- I love the Mystic spells, especially the versatility. Hated Rogues since Skids days, although Jenny is great and Preston is very good fun. Guardians and Seeker fall very easy into the usable archetypes of Attack and Investigate. -*- I love supporting guardians and seekers. Control focused mistics are also fun. -*- Purple is top just because of Recall the Future and Premonition. Yellow for being weird, Green for extra-actions and Finn. Red for cool, weird interactions at a bargain price. Blue is boring. -*- I don't like playing Rogues, alright? Please don't crucify me! Oh, this is anonymous? Excellent. -*- Simplicity of play and planning. -*- I love spells and magic items -*- Guardian are probably te most rounded IMO. Seekers next, but great at clue gathering. -*- Seeker pool has best card draw & selection; guardian has stick to the plan + stand together + weapons; survivor pool is good but good xp options are less varied (will to survive/true survivor or bust); mystics delve too deep + bag control + David Renfield are good. Rogue pool is harder to build a full game plan around—-its best cards enable great turns (pocket watch, double or nothing, etc) and are valuable to have in the party, but they have a harder time spending actions 2-3 as usefully since some of their best things exhaust (lockpicks). -*- Mystic and Rogue tied for first. Mystic is my prefered and I like how I can stack my deck to be heavy in either investigating and/or combat. Rogue because most get a lot of recources where you can purchase more expensive cards. -*- I feel as though Mystic have the broadest tool kit and be specialise in almost any direction. However my experience is solely limited to two player with my wife and she plays a cloover, so we need someone with bashing power. -*- Matt's response -*- I primarily play a seeker (Daisy) -*- Yellow fits with my playstyle the best -*- I really like all of them, so there's not a ton of distance between them. -*- gameplay style, clear focus on purposes -*- Guardian and Seeker are very straightforward, and I like that. They have a clear objective, and they do it well. -*- While I feel that most classes have merit, the rogue is generally the worst at the core aspects of the game: fighting and clue finding. Evading does not have the punch that killing the enemy foes. -*- I prefer a support / team role, and play for consistency over tricks. -*- Most useful for the group -*- I just looked at options. Mystics have a lot of options in every way, shape or form, and so do Guardians. I just prefer the mystic combos better, since Guardians are pretty bland in that regard. I feel you really can now make different mystic decks, from support to tank and combat master, to main seeking investigator etc. They have everything and even playing one deck a few times is till fun because of so many exp. options. And while their decks are pretty deep, the premise is simple - boost willpower. That leaves them with a nice weakness you have to cover. Guardians have better weapons (more fun) than mystics have combat spells, although Shattered Aeons really gave Mystics a fun new icy option. And maybe I'd like to see a Mystic that wouldn't be pure Mystic if you get me. Some hybrid guy or girl, that's not just using spells and artifacts from the same class over and over again. That's really what they're missing. Guardians are just so great, because they are sooo well balanced imo. It's quite relaxing looking at their options. You have everything from amazing gear, weapons, allies, events that cover literally everything + your friends' asses, awesome skillcards that can also combo, fun and engaging exp. options etc. But they lack different kinds of investigators. They have options, just some other classes have more. Maybe my least favorite on investigator side. Mystics again are so simple to make in that regard. I gave Seekers 3. because they just have some 0 exp. cards that are just too strong for any class, not just for them. Otherwise I really like Seeker cards theme, maybe even more than Guardian, maybe even my favorite I'd say, but again, Seekers just have so much random stuff and OP stuff (you know what they are). I don't care for balance in a co-op game, OP cards can be really fun, but this stuff really limits their options and sometimes even other classes' options, because not including them just hinders your deck and you know it (example is Shortcut). And that's not good. They have really fun and diverse roster of investigators though. And their experience options are quite game breaking, but in a good way imo. There's seeking, combat, running and evading so much support and combos, really fun and diverse. Rogues have maybe some of my least favorite cards, but they have A LOT of options. They have quite a few very awesome weapons, but they also have SO MUCH cards that are meant for combos and while combo decks are fun, they, in my opinion, are niche, or at least not used in every game. Sometimes you just want a simple deck and Rouges have a limited card pool when you look at it that way (example: no useful combat ally or even asset - there is a new Guardian tarrot card for Jenny and Skids, but they need more imo). They got their quite fresh Lockpicks and the seeker gator and that was an amazing get. But more, Leo breaks their ally pool, because he's just too strong. They also have no pure combat investigators, but otherwise their investigators are really really fun and diverse. They have AMAZING experience options. Maybe the best in the game. And btw, they were my favorite to play before the last few expansions. I love Preston, but again the new cards are very niche. The new seeker agent Joe with 4 combat elevates seekers above Rogues for me in the options in card pool department though. They now have an optional pure combat investigator, while Rogues still don't. Survivors have AWESOME cards, especially investigators are just so fun and weird, but they just lack options in the card pool. You have so many "survive" cards, but they lack anything else strong. Their weapons are quite fun, but there are no heavy hitting options. That for me may be their biggest minus. Lack of experience pure combat options. They have quite a few very strong investigate cards though like Look What I Found and Newspaper 2 exp. And their allies, while strong, are still nicely balanced and quite diverse. They have a million evade options, maybe even too much. It would sometimes be nice to get something else rather than just another evade. These new Track Shoes are pretty cool though. Their skill cards are pretty awesome imo. But still, I feel like they have so much niche cards that only allow some very specific combos, like Rogues, and lack anything else meaningful. They are extremely fun to play though, with all their Survivor specializations like Seeker Urchin, combat Gravekeeper, being dead crazy guy, new athlete runner and evader etc. They may even be my favorite class, but they still lack options in a big way. And they even lack one investigator only available for 15 bucks along a cheaply written book. CLUSTER 2 ------------ survivors da best -*- Guardian just have so many cards that, when looking at them, seem useful. Mystic is my actual favourite class, but it has soo many cards where they went too far with the punishing effects that almost made them useless. Survivor on the other hand has too many events that end up feeling almost the same. Seekers I dont really know, Ive never played them, but everytime I see them looks like they can do many things. And rogue, while it has improved a bit more, I still miss a useful level 1 weapon -*- Difficulty wrapping my head around some classes -*- Mystics are incredibly dependent on their cards. -*- Seekers usually win the game, because the snitch is 150 points -*- Always cards in these classes that I have a hard time cutting. Which means they have the deepest pools marking them the most fun to me -*- I love deck manipulation for seekers, and the flexibility of survivors. I just can't get my head wrapped around mystics. -*- Guardians have a lot of great tools for not just fighting but getting clues. Seeker has the best support so splashing it is great. Rogue and survivor are ties for good splash but survivors card pool is mediocre to me. Mystic aren't bad but I haven't seen it great with others very well. Mystics are good as themselves but really expensive and not great for splash IMO. -*- Survivor have many nice tricks to survive and gather clues. Guardians because they have the best weapons (flamethrower) and protective tools. seeker for their upgradable cards and higher ed. mystic for canceling cards but dont like their only good stat is willpower... rogues seems interesting but never played one. -*- Seekers have action economy (shortcut, pathfinder), card economy, resource economy (Dr Milan + cheap cards) and they advance the game quickly (i.e. discover clues). Specialist decks are better than generalist decks (in multiplayer, which I play) as they accomplish their goals more consistently, and this favours seekers and guardians. Stick To The Plan with Ever Vigilant is the most powerful deck element I am aware of. -*- I tend to play builds focused around consistency of succeeding at tests and action efficiency and my rankings reflect the build consistencies in order except rogue who are consistent but just not interesting. -*- Love survivors -*- Seeker is m'y main class -*- Firstly let me preface this with I only own 2 cores and the Dunwich cycle and have yet to play through Dunwich. Survivor offers the most versatility and always seems to be one of the key factors when beating the odds in most cases as well as enhancing evasion and action economy (survival instinct etc). Seeker cards are my second favourite due to the amount of utility included within them (i.e. Shortcut, Barricade, Medical Texts, Old Book of Lore etc) as well as allowing you what you need to catapult out in front of the agenda deck with cluevering abilities. Guardian and Mystic operate on a similar field marginally behind Seeker to me though mystic finds itself slightly higher because of the unique interactions with the encounter deck and rule bending. though in my limited experience they both seem to be the more combat based of the card pools so operate in that same niche for me. Rogue is unfortunately last but honestly that's just because I haven't had many interactions with them, most of their effects seem too situational to be able to use consistently. -*- I don't like taking the obvious solutions to a problem. I.E: Gun to the face, or Spells for everything. -*- Efficiency at what is needed to complete scenarios - mostly clue getting and combat. -*- Rogue and survivor seem to have the most cards that support each other to suggest a new way of playing. Recursion survivor is fun and different from no money survivor (though you can do both). Rogue has succeed by 2 and rich as options. Seeker has less of that but has the power of untranslated etc cards. Guardians are okay but kind of blah. I can’t see any fun decks to do with mystic. Like, messing with the bag is a cool thing to do in any deck, it isn’t a deck. Playing with doom is a couple cards that need each other but it isn’t a plan for how the game will play out. -*- Definitely hard to rank them, but ranked in order of which I'd most like to have as an off-class -*- I like the consistency in the Survivor card pool and how much individual support there is for the variety of Survivor investigators. Although I like the power level of Mystic cards, it always sucks to have your Rite of Seeking or Shriveling 15 cards down after a full mulligan for them. -*- More scenarios need cloovers and fighters, so all classes outside of Seeker and Guardian are more tricksy and less focused on the goal. This is a hard-enough game as it is! -*- Seeker cards are way too powerful. Rogues are the most fun to play. Survivor cards are super efficient at what they do. Guardian pool is decent but overpriced. Mystics have a few amazing cards, but the rest is pretty meh. CLUSTER 3 ------------ Vaguely from ‘most interactive’ to ‘most straightforward’ with a special mention for the Survivor card pool which has been largely static since roughly core with a few major exceptions. -*- Rogue cards are the most fun for me. More money, more actions, more fun. -*- I seem to like the classes that are less straight-forward than Guardian and Seeker tend to be. (In the sense that they are the archetypical fighters and cluevers.) -*- I like cards that cheat the system and don't depend on leveraging board state -*- Green and purple cards have fun and flashy effects. Blue and yellow cards have more standard effects and narrower deck building options. -*- I didn't play mystics a lot yet -*- The numbers are different depending whether we’re talking theory or practice. In theory the Mystic cards are my favorite, both for flavor and interesting mechanics. In practice I struggle with them and they’re usually the first cut. -*- Combos! -*- I like moneeeey -*- seekers have literally everything, and their cards usually aren't too expensive. rogues have adaptable, streetwise, and really good allies, but they're a bit low in damage output. guardians have really good cards but are limited by how expensive they are. mystic events are amazing, but they are 4th place because level 0 spells kinda suck and are expensive as hell. mystic cards are much better with exp. survivor cards are almost decent. it really sucks that many of their leveled up cards are exile cards, but survivors don't get any extra exp. but in general i find their cards to be lacking in clue-gathering capability and damage dealing. they can turn failures into successes, but that's about it. -*- Guardian is solid and predictable, Rogue is fun. Mystic is challenging, Seeker and Survivor are necessary. -*- THE JANK -*- I really dislike survivors as I simply dont understand how to properly build them (appart maybe Wendy). Even if I have rated mystics 4, I enjoy playing Mystic nearly as much as seeker (which I rated 1) rather than Survivor. -*- I think the rogue theme is portayed very well in their card pool -*- corelation between mechanisms and theme -*- I like big, flashy, ridiculous turns and risky plays, so rogue and mystic are the most fun for me. Guardian and seeker are fine and all, just a bit dry. I don’t understand survivor at all, but I’m happy other people have a thing they like. -*- Rogue and survivor give you interesting and powerful but situational tools that you have to figure out how to apply to the scenario. Mystic and guardian are more about powerful assets that you either draw early and use a bunch or wish you’d drawn earlier but can’t afford now and just commit for icons. Seeker pool makes me sleepy every time I look at it; the only mechanic there I really enjoy is the tome synergies and that’s only with Daisy (Rex, of course, is only played one way). -*- Role-play Value -*- I went for those that get me excited to play or provide thrills or cool combinations as I play (rather than, say, the power of the cards) CLUSTER 4 ------------ Lol moments. We’d all be survivor if we were in this game! -*- The top two were tricky to place; Rogues have fantastically fun combo plays available to them, while I love the 'feel' of many Survivor cards, fighting against fate as hard as they damn well can. Overall, I find the Survivor pool *just* wins out, especially with the excellent Will To Survive and semi-immortal Pete Sylvestre. Guardians and Seekers are two sides of the same coin; I'd say Guardians edge out, because while a Guardian has a few tools (including the infamous Flashlight) to find clues, Seekers have very few options to take care of enemies. As with Survivors and Rogues, though, this is close. Mystics... weeeeell. .. I acknowledge they are arguably the best class, once set up, and while their charges last on their spells. The ability to do everything while testing just one stat can make them very efficient. But... this is the class I enjoy the least, in part due to their over-reliance on their spells. Their solutions never feel more than stopgaps for me, so I find Mystics a hard class to play. (That won't stop me taking Mystics for a spin though, especially for goodies like Delve Too Deep ) -*- Ability to bounce off Investigators with good resource and action economy, other card pools (including Neutral), as well as capability to start off with no experience — all the way to full campaign with as much power-card investment as possible. Seeker may have 2 of the best cards in the game (Higher Education and Dr. Milan Christopher), but the Seeker card pool as a whole does not stand up. It is both narrow and shallow. Mystic is the most detailed and the most broad, but suffers from undue delay leading to deterioration. Guardian definitely needs to be more broad as well. Both Rogue and Survivor blend well, and provide the necessary breadth to take on challenges while melding with the high-economy Investigators. Rogue has a few 3, 4, and 5 xp cards that push it to the top spot. Even for Lola these statements hold up. -*- On a scale of most interesting vs. most boring. Options for rogues and survivors feel fresh and like there are multiple deck archetypes that are valid. Less so for the seeker and mystic card pools, where I feel like there are more "must include" cards which makes deck building less exciting and more rote. -*- survivor da bass -*- The card pool allows rogue/survivor decks to make specialists. Seekers are all just different flavours of clueverer -*- Personally, I like healing and support best, which guardian does quite well. Survivor has my second favorite card pool, though, for tricks and card recursion. -*- Not much between them but I like guns & ammo, survivor class is cool because it is normies vs horror -*- I really like the guardian cards as i enjoy fighting the monsters that appear in scenarios. Unfortunately my least favorite is mystic. Although they have powerful cards, they often take time to set up and I think that the draw backs on some of their cards are too harsh for what they do. -*- Just what I gravitate towards -*- I like killing monsters -*- Mystics have so much of everything with cool effects added on. Guardian cards are efficient at what they do, but really boring. -*- Survivors feel more unique, guardians kill stuff, seekers feel like you can't win without them (though you really can). Rogues and mystics by default. I like rogues better because of Finn and Sefina being really fun to play. -*- Almost always let my partner(s) play the seekers as I find the rogue and survivor cardpools allow you to fly by the seat of your pants, which I find even more exciting than just being the clue gatherer. Mystic card pool can sometimes take too long to develop. Also many marquis mystic cards flirt around with the doom mechanic which always bites me in the arse. Thirdly, mystic pool doesn't have a strong ally base. What's funny about that is I always play spellcasters in D n D. Guardian pool is pretty straightforward, one I look at as more of a necessity within the context of the game,but doesn't tug at my heartstrings . Apologize for the sloppy phrasing in my opine due to a long day. Rankings based on personal preferences only. No meta analysis -*- Agnes -*- Just prefer the in your face monster destruction that Guardian is themed with. Really enjoy that class. -*- Flexibility -*- I love killing things and then getting clues! -*- I like all of them but play seekers least, I also like that guardians can just take the mythos phase to the face -*- I like to be the tank, and with some of the new additions guardians have gotten with getting clues they just shine even more. Mystic I never really play but has so many cards I want if I am playing a dunwich survivor or anyone who can get them, same goes for survivor, very few cards from rogue or seeker makes it into my decks unless I am playing chars like Wendy or Leo who kinda needs them to make them work -*- Number of fun upgrade options in green, base card pool for red, upgrade options in blue, useful upgrades in seeker, purple sucks to play. -*- I like support / killing role, Carolyn FTW CLUSTER 5 ------------ Weapons are fun. -*- Leo is da alpha male. -*- Red's best -*- There’s more variety of cards that let me build decks I enjoy. As you go down the ranking, there’s less variety or fewer viable sub themes to build around. -*- Seeker is powerful but boring, while mystic getting to punch pack at the game is great, with good support to boot. -*- I enjoy the cardplay in survivor, and the mystic side of things. Seeker cards are generally very powerful. I don’t enjoy playing rogue but there is some good cardplay. Guardian I find less interesting overall as a class -*- Wendy is literally the best investigator in the game -*- I enjoy support cards and interesting, unique effects. -*- I tend to go for lazy game play, and usually guardians allow to smash enemies without worrying too much about strategy. Seekers I love them thematically. Mystics, I never understood how to play them efficiently

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>This blog post was prompted by this meme posted in the Arkham Horror: The Card Game Facebook group:

This meme got 144 comments and 94 reactions, and I’d say that most of them were in effect agreeing with the point the meme makes; the Survivor class feels like a black sheep among the Arkham Horror LCG classes (not that there aren’t people who like Survivors).

One of the comments on the thread was made by me:

Seriously, we’re three cycles in, starting the fourth, and there’s no Level 4-5 Survivor cards? It makes you wonder if there will EVER be high-level Survivor cards!

I’m sure you can make one that does [sic]. Here’s one: King of the Hobos! When you have no resources, take two extra actions.

I dunno, just give us something. Late in the campaign it becomes a pain to upgrade your deck because all cards are low XP. It also makes seeing “Survivor cards level 0-5” a joke and Lola basically another survivor.

This comment, on its own, got 58 replies. So let’s just say that how the Arkham Horror designers are handling the Survivor class is a hot-button topic. It’s pretty clear what I think about how Survivors are handled. But I also wanted to see to what extent the community agreed or disagreed with me.

So I created a poll (now closed) asking people to rank the Arkham Horror LCG classes from best (1) to worst (5). I’ll first present this and other data, then present my own opinion.

The first data I saw related to this was a response to my comment, showing the popularity of investigator decks on ArkhamDB. Below is a screenshot of the interactive tool (that you should check out):

(This data set does not include the investigators released with *The Circle Undone*, but considering how new the cycle is that may be for the best.)

Survivors supposedly occupy the low tier of ArkhamDB decks, and thus are less popular. I don’t agree with this conclusion from this data set; as others pointed out by others, lots of people may be playing survivors without posting their decks, and the amount of new decks being made doesn’t necessarily correlate well with how people feel the class performs. For instance, supposedly there are not many Mystic decks because all Mystic decks include the same key cards and thus look largely the same. (And that’s not a good thing, by the way.)

I also want to add that the fundamental question is how good a card pool is rather than how good Survivor investigators are. I don’t think that there’s anything wrong with the Survivor investigators in a vacuum; they’re all great investigators. (That includes Calvin, too; he’s not just a “challenge” investigator, he can pull his weight and more in a game when played right.) That said, the identification of an investigator as a “Guardian” or “Seeker” or “Survivor” serves no mechanical purpose. Nothin in the game pings off the class of the investigator; all investigators could be in the “Neutral” class (like Lola) and the game would be the same because of the deck building requirements. All that the class of an investigator does is indicate what the deck building requirements will be. And some investigators, such as Carolyn and Norman, really drive this point home with their deck-building requirements.

So to answer this question, I created a poll, asking people to rate the card pools of the classes. My question was simple: “Rank which class card pool you prefer, from most favorite (1) to least favorite (5).” Not everyone agreed that this was the best question to ask; there are different aspects in which classes may be “better” or “worse”. My primary Arkham Horror LCG partner and friendly local game store owner Matt Freed (who owns Mind Games, LLC, in West Valley, Utah) refused to answer the question as phrased because his rankings would differ completely depending on whether the class in question was the primary class or an off class of an investigator (I told him that I cared most about the card pool class as a primary class of an investigator). That said, the simplicity of the question (plus attempting to publicize the poll on the Internet as well as I could) managed to get me 421 responses, a decent sample size.

When looking at this data one must remember that this is *not* a random sample as statisticians prefer. The people who participated chose to do so and they’re all from the Internet, which means that one can raise questions about the external validity of the data. That said, I think that we can still learn a lot from the data, even if it’s not perfect.

By the way, analyzing the data I got was my first forray into analyzing rank data. I had to learn new statistical methods to meaningfully pry into the data to see what it said. I loved the methodology and the details of what I did will be presented in a later post, along with the scripts I wrote for doing this analaysis. For now, I’ll just mention that my primary reference for learning how to analyze the data was *Analyzing and Modeling Rank Data*, by John Marden.

Let’s start with some basic charts. Below is the “marginals” plot; simply put, it’s how frequently a class was assigned a particular ranking.

One reading of this plot is that Seekers are rated high, Survivors are rated low, Guardians are rated third, and it’s hard to tell how Mystics and Rogues rank, though it appears that Rogues are better liked than Mystics.

I however don’t like this plot since it doesn’t take advantage of the fact that the data is rank data. A plot that better accounts for the nature of the data is the “pairs” plot, which shows how many people prefer the -axis class to the class represented by the bar. If the bar is above 50%, then the -axis class is preferred to the bar class, while if the bar is below 50%, the bar’s class is preferred. The pairs plot (with a line marking the important 50% cutoff) is shown below:

It’s much clearer from this plot which classes people prefer. Seekers are preferred to all other classes. Guardians are preferred to all classes but Seekers. Mystics are peferred to Rogues and Survivors, and Rogues are preferred only to Survivors. The Survivors are handily in last place, not being preferred overall to any other class.

This suggests that the ranking of the classes are, from best to worst: Seeker, Guardian, Mystic, Rogue, and finally Survivor. This ranking was the ranking obtained by one estimator of the central (or “closest to consensus”) ranking obtained by finding a ranking that minimizes the sum of Kendall distances (which is closely tied to the pairs plot). Also, a statistical test confirmed that the respondents were not equally likely to give any ranking and thus have preferences. A 95% confidence interval could only conclude, though, that Seekers were preferred to Rogues and Survivors; any other ranking is plausible under that confidence interval. That means that Seekers overall rank in the community is at least three, while Rogues and Survivors cannot be ranked first in the consensus ranking. (All tests and confidence intervals were based on the pairs matrix/Kendall distance.)

These are the statistics for the community considered as a whole, but it’s possible that players fall into different “archetypes” and thus may have different class preferences. Matt Newman, the lead designer of Arkham Horror, seems to believe so according to this article he wrote. I never asked players what “type” they were, but I attempted to determine types via cluster analysis, based on spectral clustering.

Let me start by saying that I’m not convinced there are meaningful clusters of players in this data. All the metrics for finding clusters were bad. But if you insist that there must be clusters of player types in the data, read on.

If there are clusters, my best guess is that there are five clusters of players. Based on reading the (optional) responses of players in each cluster, I’ve labeled (in a very subjective manner) cluster 1 as the “power” players (this is the cluster of Matt and I), cluster 2 as the “versatilityi/efficiency” cluster, cluster 3 as the “bling” cluster, cluster 4 as the “kill monsters” cluster, and cluster 5 as the cluster that likes “theme” in the game (although honestly this cluster was the hardest to define and often looked like the cluster of the most confused participants; the comment rate from this cluster was the lowest.)

31% of players were in the “power” cluster, 20% in the “versatility/efficiency” cluster, 19% in the “bling” cluster, 16% in the “kill monsters” cluster, and 14% in the “theme” cluster. The class preference in each cluster was, in order:

- Seeker, Mystic, Guardian, Rogue, Survivor
- Seeker, Survivor, Guardian, Mystic, Rogue
- Rogue, Seeker, Guardian, Mystic, Survivor
- Guardian, Survivor, Rogue, Seeker, Mystic
- Mystic, Survivor, Seeker, Rogue, Guardian

The confidence intervals (which are starting to lose their validity in the cluster analysis due to sample size and some preference pairings never being seen in the cluster) suggest that the “power” players dislike Rogues and Survivors; any ranking that puts Rogues and Survivors in the two worst rankings could be this group’s “central” ranking. The second group is more difficult to infer; Guardians and Survivors are supposedly better than Mystics and Rogues, while Seekers are better than all other classes. The “bling” cluster loves Rogues more than any other class and prefers Seekers to Survivors. The “kill monsters” cluster prefers Guardians and Survivors to Mystics and Seekers and Rogues more than Mystics. Finally, the “theme” cluster prefers Mystics and Survivors to all other classes.

I would say that, taken together, this data suggests that the Survivor class is, indeed, problematic, and players on the whole are not a fan of how it’s handled. (That said, one could make a case that this is true for the Rogue class too.) Here’s my argument regarding what’s wrong with the Survivor card pool.

Let me start by saying there are great Survivor cards. Lucky! may be the best card in the game. Peter Sylvestre is a great ally (at both levels), and I even really like Level 3 Rabbit’s Foot. Survivor cards are among the top 20 most popular (faction) cards in ArkhamDB decks.

While no one can credibly argue that Survivors have bad cards, I think we can credibly argue that the Survivor card pool is weak and eventually becomes a drag to play in a campaign.

The two most recent Survivors I played were “Ashcan” Pete and William Yorick. My “Ashcan” deck was a deck that heavily utilized Yaotl, Cornered, and the desperate cards. I was playing this deck in The Forgotten Age cycle. At first I really enjoyed the deck and how it worked. I don’t think it was the most powerful deck but it was fun to play. Eventually, though, I fell out of love with the deck and I don’t think I will ever try it again. Eventually I could not upgrade the deck without replacing what I saw as “core” cards. This problem occured mid-campaign, too.

The same problem happened with my William Yorick deck (again in *Forgotten Age*). Granted, Yorick has access to the Guardian card pool, which provided another outlet for spending my experience without removing core cards. The Guardian cards were an important experience point outlet. Eventually, though, I ran into the same problem: lots of experience points and nowhere to spend them (but at least it was late in the campaign when I hit this problem).

Thus I have basically one complaint with the Survivor card pool: there are no high level Survivor cards. It turns out this is by design; Matt Newman told the hosts of the Drawn to the Flame podcast in episode 22 that he liked keeping the level of Survivor cards capped at 3 since it fit with the Survivor theme of “not being ready.” Thus we are starting our fourth Arkham Horror cycle and there are no high-level Survivor cards.

What’s the consequences of this? Well, in my FLGS, “Survivor cards level 0-5” is a long-running joke since there are no high-level Survivor cards. Now Matt (the FLGS owner) said that, as an off-class, Survivor cards are one of the best pools to pull from, and I agree with him. Consider the table below:

Class | Total Cards | High-Level Cards (at least 3 XP) |
---|---|---|

Seeker | 81 | 18 |

Guardian | 79 | 14 |

Mystic | 80 | 17 |

Rogue | 79 | 12 |

Survivor | 80 | 11 |

69 Survivor cards are accessible to off-class Survivors, more than any other class. Furthermore, these are just about all the best Survivor cards.

Now let’s take a step back. Remember when Joe Diamond was announced to be a Seeker? That was a shock to many in the community, who were pretty sure that Joe Diamond would be a Guardian. Furthermore, making Joe Diamond a Seeker was a big deal that had major implications for his deck building. A primary-class Seeker/off-class Guardian will look very different from a primary-class Guardian/off-cass Seeker, both in deck and style of play.

Let’s take William Yorick. What would change if William Yorick went from primary-class Survivor/off-class Guardian to primary-class Guardian/off-class Survivor? Well, Yorick would lose access to the 11 Survivor cards that are level 3 and gain access to the 14 Guardian cards that are levels 3, 4, and 5. And to be completely honest, I wouldn’t miss any of the Survivor cards I lost; I didn’t use them in the Yorick deck I built and honestly none of these cards feel like great cards we’d enthusiastically put in decks. I’d say that the level 3 Survivor cards are generally cards that get placed in decks that have experience points to burn; that XP has to be spent on *something*. So by making this switch, my Yorick deck would, almost unambiguously, become *better*.

Thus my first point: high-level cards help distinguish class capabilities. It is because these powerful cards are not accessible to other investigators that these classes are distinctive. Seekers, Guardians, Rogues, and Mystics all have cards that make that class memorable and help separate that class from the rest. The Survivor card pool does not have this since off-class Survivors are basically just as good as primary-class Survivors. Heck, even Lola has full access to the Survivor pool! She may as well be one!

I think that if Matt Newman were to read this his response would be “Survivors lack of high-level cards is what makes them distinctive. It’s their lack of preparedness that makes them thematically work.” First, I think we’ve seen that the lack of high-level cards makes the class worse from a gameplay perspective. Second, think about who we would call “survivalists” today, such as tribal people, prehistoric humans, or guys that go off into the woods and cut themselves off from civilization. There are actually many aspects of these people that appear almost superhuman. They’re impoverished but often skilled in everything necessary to survival. Most people from regular society would die if immediately forced into the dire situations survivalists deal with on a regular basis. Survivalism is not about being unprepared or low-skilled. It’s about being well-rounded and internalizing all strengths, thus not dependent on the tools available to you. Survivalists are actually well-prepared! And high-level cards can be designed to fit this ethos.

And thus my second criticism of the low-XP policy: it restricts player growth. Upgrading your deck not only is a way to get new toys; it shows how the encounters with the mythos caused the investigator to grow and improve. By preventing access to high-level cards, the investigator’s growth is restricted. There is access to a lot of low-level cards, but putting these cards into a deck pushes out other cards to such an extent that the deck at the beginning and end of the campaign look extremely different. This was the case with my aforementioned Ashcan deck; I could not upgrade it without drastically altering it. I could not keep the same deck archetype while at the same time staying a Survivor. The deck would have to transform in character in order to upgrade; it wasn’t really possible for the deck to just get better at what it already does.

Now there are the exile that can help players burn XP. But I hate those cards! Not only are they very narrow cards, I don’t like burning XP when I play a card. (That said, I like the upcoming Survivor ally Guiding Spirit and wouldn’t have a problem putting it into a deck even early in the campaign.) I think most players don’t want to use their experience points on Exile cards either, so I wish there were other places I could spend my experience points in the Survivor class than the exile cards.

I’ve spent this article picking on Survivor cards but while there is a lot of evidence suggesting this class is in need of the most work there’s also evidence that people don’t like the overall design of the Mystic and Rogue classes, either. I think people’s main complaint with the Mystic class is that deck building with Mystics feels stale; there are some key cards that every Mystic deck includes and thus they all start to look the same. I think this is a valid point, and one good way to fix this would be to make a Mystic permanent granting another Arcane slot. That would make more Mystic players willing to look beyond the Shrivelling/Rite of Seeking/Mists of R’lyeh staples.

As for Rogues, I don’t see why Rogues get hate. The cluster analysis suggests there’s a class of player that *loves* Rogues. I think that Rogue-hate stems from a belief that Rogues are not good enough at investigating/fighting, or a general lack of interest in evading enemies (which Rogues should do well at).

Seekers and Guardians are great classes and don’t need much work. If anything, those classes are too good. But no complaints from me.

But I stand by my conclusion that Survivors need work, and that what they need are high-level cards. I think it is possible to give Survivors high-level cards while keeping with the ethos of the class. In keeping with the “Survivors’ strengths are innate and well-rounded,” I think high-level Survivor cards could consist of events, skill cards, and non-item and non-ally assets. For instance, perhaps a Level 5 permanent called “Survivalism” that allows a survivor to spend two resources to boost any skill, or a high-level Dark Horse that gives Survivors +2 to all their skills when they have no resources or assets. (I’m spitballing here, guys; I’m sure these card ideas suck.)

I bet there are probably die-hard Survivor fans who will bristle at my criticisms of the class. To them I have to ask: do you *like* the fact that you don’t have the option to buy high-level Survivor cards? I certainly don’t, and I hope that “Survivor cards level 0-5” will no longer be the joke it currently is.

Months ago, I asked a question to the community: how should I organize my R research projects? After writing that post, doing some reading, then putting a plan in practice, I now have my own answer.

First, some background. In the early months of 2016 I began a research project with my current Ph.D. advisor that involved extensive coding and spanned over at least two years. My code was poorly organized and thus problematic, as managing the chaos and extending the code became difficult. Meanwhile, I was reading articles by programmers and researchers about ways to organize R code so that research results are reproducible, distributable, and extensible. I identified two different approaches to organizing a project to meet these goals: one centered around makefiles, and another around package development. Given these competing approaches and their differing advantages, I was unsure what to do.

Since writing that post, I did more reading. First, I read two of Hadley Wickham’s (excellent) books: *R Packages* and *Advanced R*. (I loved *R Packages* so much I bought a physical copy.) I also read a book I picked up in a Humble Bundle book sale called *Code Craft; The Practice of Writing Excellent Code* by Pete Goodliffe for learning about good coding practices. Finally, I read a good portion of the GNU `make`

manual.

I also spent *months* restructuring the project to comply with what I learned. Many, *many* hours were spent just fixing the mess I had made by not doing things right in the first place.

The result is **CPAT**, an R package implementing some change point analysis statistical tests. What **CPAT** does will be the subject of a future post (it will be published when the accompanying paper is made available online); what I want to focus on in this article is how I learned to organize an R research project, and how that culminated in CPAT.

In the earlier article I presented two approaches that I suggested were “competing” approaches to organizing a research project: the *project as executable* approach of Jon Zelner and the *project as package* approach of Csefalvay and Flight. Both approaches, in my view, possessed unique advantages, but seemed to be at odds.

They are not at odds. **CPAT** demonstrates that it is possible to view an R project as both an executable and as a package. That said, the package development approach becomes dominant; making the package executable (from the command line) is an additional feature that makes the project even more portable and extensible.

If one is going to adopt the package development approach, one must use the hierarchy R packages needs. So that means:

- R code that defines the package (which are mostly just functions) is placed in the
`R/`

directory. - Documentation is placed in the
`man/`

directory (if you’re using**roxygen2**and**devtools**like a sane human being, though, this is something you won’t do yourself, though). - Project data goes in the
`data/`

directory. - Compiled code from other languages (such as C++ when using
**Rcpp**) goes in the`src/`

directory. - Code tests—
*which are not optional and must be written!*—go in the`tests/`

directory (but if you’re using**testthat**for your testing then the tests you actually wrote go in`tests/testthat/`

). - Long-form documentation goes in
`vignettes/`

. This could be the paper itself, if written in the form of a vignette. - Other important files should be placed in a reasonably-organized
`inst/`

directory, to be installed with the package, along with other files that should be installed into the base directory (such as`Makefile`

). For example, I put all my plots in`inst/plots/`

, and this would also be a good directory to put the paper that accompanies the project. - Put executable scripts, including R scripts, in
`exec/`

.

The approach championed by Zelner doesn’t require a particular organizational style but simply that there be a coherent organization to the project. R package development not only has a coherent structure but even *enforces* it. If that structure doesn’t quite work, then one can add other files and directories as needed and note them in the `.Rbuildignore`

file, so they’re ignored when the package is built.

When writing an R package, the relevant R tools basically enforce some essential points of style such as documenting objects. Also, the developer-researcher starts to think of important functionality of the project in terms of reusable functions that should be added to the package to be called by the scripts that actually execute the analysis—with documentation and everything else. Having well-documented functions, even if they serve a minor purpose, helps greatly in making the project more easily understood and written not only by others but by the original author as well. In my case, since I wrote **CPAT** almost exclusively with vim, I wrote a UltiSnips snippet creating a function skeleton that not only defines the function but automatically adds the framework of the documentation, as seen below.

While package development does place (helpful) constraints, it does not specify everything. In other words, there is room for style. I essentially define *style* to mean any aspect of programming in which a choice is made that was not determined by the programming language or software. Examples of style include naming conventions, indentation, etc. Consistent style makes for understandable code; having consistent style is arguably more important than the stylistic decisions made. So I decided to codify my own stylistic preferences in a style guide, and when I did my code rewrite I made the new code comply with my style guide, even if that aded more time to editing. Whenever I encountered a new “decision point” (such as, say, dataset naming conventions), I committed my decision to the style guide.

As I mentioned above, the package development approach turns out not to be mutually exclusive with the project-as-executable approach. While it seems like documentation on R package development (including Dr. Wickham’s book) mentions the `exec/`

directory of a package only in passing, I found it to be a good place to place executable R scripts. Similarly, `make`

can still be used to automate analysis tasks; R packages allow for including `make`

files.

So in addition to the files that essentially defined the package, I also wrote stand-alone, command line executable R scripts and placed them in the `exec/`

directory (which causes them to be flagged as “executable” when the package is installed). I wrote a Vim template file for R scripts that provides a skeleton for making the package executable from the command line. That template is listed below:

#!/usr/bin/Rscript ################################################################################ # MyFile.R ################################################################################ # 2018-12-31 (last modified date) # John Doe (author) ################################################################################ # This is a one-line description of the file. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(foo, bar, help = FALSE) { # This function will be executed when the script is called from the command # line; the help parameter does nothing, but is needed for do.call() to work quit() } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = "This is a template for executable R scripts.", option_list = list( make_option(c("--foo", "-f"), type = "integer", default = 0, help = "A command-line argument"), make_option(c("--bar", "-b"), type = "character", help = "Another command-line argument") ) )) do.call(main, cl_args) }

Converting my scripts into modularized, executable programs was, not surprisingly, very time consuming, and the transition was not perfect; some scripts just could not be modularized well. Nevertheless, the end result was likely worth it, and I could then write a Makefile defining how the pieces fit together. This tamed the complexity of the project and made it more reproducible; someone looking to repeat my analysis should only have to type `make`

in a Linux terminal^{1} to see the results themselves.

While I did make my project modular and executable, though, I did not try to make it stand-alone with, say, **packrat** or Docker. I did try to use **packrat**, even setting it up to work with my package. But I ran into severe problems when I tried to work with my package at the University of Utah Mathematics Department, since the computer system’s R installation is almost four years old as of this writing and highly tempermental due to how the system administrator set it up. **packrat** made complications working with the department computers even worse, and I disabled it in a huff one day and never looked back. As for Docker or GitLab, I did not want my project tied up with proprietary or web-based services, and I felt that the end result Zelner was seeking when using these services is overkill; when you’ve added **packrat** (which I didn’t because of complications, but still) and defined how the project pieces fit together with `make`

, you’ve mostly conquered the reproducibility problem, in my view. So I never missed these services.

The end result of this work can be seen in the `paper`

branch of CPAT, also permanently available in this tarball. The directory tree is also informative.

In some sense the end goal is to have an R package that could be distributed to others via, say, CRAN, so they can *use* the methods you employed and developed, not just reproduce your research; at least, that’s the case for me, a mathematical statistician interested in analyzing and developing statistical tests and procedures. When a package is written to contain research and not just for software distribution, it comes with a lot of files that aren’t needed for the package to function; just look at the dirctory tree!

The solution is to just delete the files that can be recreated—perhaps with `make clean`

if you set it up right—and consider adding other files to `.Rbuildignore`

when you want to distribute the package for others to use. So this isn’t actually a big problem.

Another issue that I encountered and am still unsure about are functions that are useful to the project but not useful outside of it. If you look through the `paper`

branch manual or even the public version manual you will find functions that were useful only for the project, perhaps for converting data structures created by scripts or making particular plots that make sense only for the paper. They’re all private functions that need to be accessed via the `:::`

operator, yet they’re still in the manual.

I’m undecided whether this is good style. On the one hand, it’s nice that when others read your code there’s manual entries even for functions that are local to the project to further document what was done and how the code works. Even when distributing the software, having every function documented, even ones that are “private” to the package, seems to be in concordance with the spirit of open source software, making the source code easier understood by users who need and want to know how your software works. It also could serve as a good way to modularize documentation; a statistical formula is kept with the function that computes it rather than the interface to that function (which likely links to that underlying function). Having examples for those internal functions also should provide an additional layer of testing and helps when others want to extend the package.

On the other hand… most of the pages of the manual are devoted to functions the user isn’t supposed to be calling directly in their work. Of all those functions, maybe five are functions the user is expected to use. Should all that documentation space be devoted to something the user doesn’t use?

While I’m not set in my opinion, I lean to having more documentation rather than less, even if most of it is for private functions. After all, it’s useful to me when I’m developing the project and package.

I feel like spending those months to make my project logical and reporducible was time well spent. Not only did I learn a lot in the process, I had a useful end product that is now available on CRAN. Additionally, this project is not over; my advisor and I are continuing to work on extending the results that lead to the creation of this package in the first place, which will call for more simulation experiments. Now that I’ve organized my work I now have a good base for continuing that work.

I hope that this article inspired others on how to organize their R research projects. Gauging from reactions to my previous article, I think this is an underappreciated topic, unfortunately. Having a plan for managing package complexity and organization goes a long way to keeping your work under control and helps others appreciate what you’ve done. It also can lead to your work having a greater impact since others can use it as well.

I got a lot of good feedback from my previous article. I look forward to hearing what the community has to say now. I’m always open to suggestion.

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

- Sadly my project is tied closely to the Unix/Linux setup; I have no idea how well it would work on Windows and I’m not very interested in making the project easy for Windows use (despite having Windows 10 installed on my primary laptop). What this means is that the goal of full reproducibility isn’t met for most Windows users, a large market of users. That said… if you’re a Windows user, just download VirtualBox for free, download and install Ubuntu or some other Linux distribution you like, install R and the needed packages, and now you can reproduce my work. You may even discover why I profer to work in a Linux environment yourself. ↩

Now here is a blog post that has been sitting on the shelf far longer than it should have. Over a year ago I wrote an article about problems I was having when estimating the parameters of a GARCH(1,1) model in R. I documented the behavior of parameter estimates (with a focus on ) and perceived pathological behavior when those estimates are computed using **fGarch**. I called for help from the R community, including sending out the blog post over the R Finance mailing list.

I was not disappointed in the feedback. You can see some mailing list feedback, and there were some comments on Reddit that were helpful, but I think the best feedback I got was through my own e-mail.

Dr. Brian G. Peterson, a member of the R finance community, sent some thought provoking e-mails. The first informed me that **fGarch** is no longer the go-to package for working with GARCH models. The RMetrics suite of packages (which include **fGarch**) was maintained by Prof. Diethelm Würtz at ETH Zürich. He was killed in a car accident in 2016.

Dr. Peterson recommended I look into two more modern packages for GARCH modelling, **rugarch** (for univariate GARCH models) and **rmgarch** (for multivariate GARCH models). I had not heard of these packages before (the reason I was aware of **fGarch** was because it was referred to in the time series textbook Time Series Analysis and Its Applications with R Examples by Shumway and Stoffer), so I’m very thankful for the suggestion. Since I’m interested in univariate time series for now, I only looked at **rugarch**. The package appears to have more features and power than **fGarch**, which may explain why it seems more difficult to use. However the package’s vignette is helpful and worth printing out.

Dr. Peterson also had interesting comments about my proposed applications. He argued that intraday data should be preferred to daily data and that simulated data (including simulated GARCH processes) has idiosyncracies not seen in real data. The ease of getting daily data (particularly for USD/JPY around the time of Asian financial crises, which was an intended application of a test statistic I’m studying) motivated my interest in daily data. His comments, though, may lead me to reconsider this application.^{1} (I might try to detect the 2010 eurozone financial crises via EUR/USD instead. I can get free intraday data from HistData.com for this.) However, if standard error estimates cannot be trusted for small sample sizes, our test statistic would still be in trouble since it involves estimating parameters even for small sample sizes.

He also warned that simulated data exhibits behaviors not seen in real data. That may be true, but simulated data is important since it can be considered a statistician’s best-case scenario. Additionally, the properties of the process that generated simulated data are known *a priori*, including the values of the generating parameters and whether certain hypotheses (such as whether there is a structural change in the series) are true. This allows for sanity checks of estimators and tests. This is impossible for real-world since we don’t have the *a priori* knowledge needed.

Prof. André Portela Santos asked that I repeat the simulations but with since these values are supposedly more common than my choice of . It’s a good suggestion and I will consider parameters in this range in addition to in this post. However, my simulations seemed to suggest that when , the estimation procedures nevertheless seem to want to be near the range of large . I’m also surprised since my advisor gave me the impression that GARCH processes with either or large are more difficult to work with. Finally, if the estimators are strongly biased, we might expect to see most estimated parameters to lie in that range, though that does not mean the “correct” values lie in that range. My simulations suggest **fGarch** struggles to discover even when those parameters are “true.”” Prof. Santos’ comment leads me to desire a metastudy about what common estimates of GARCH parameters are on real world. (There may or may not be one; I haven’t checked. If anyone knows of one, please share.)

My advisor contacted another expert on GARCH models and got some feedback. Supposedly the standard error for is large, so there should be great variation in parameter estimates. Some of my simulations agreed with this behavior even for small sample sizes, but at the same time showed an uncomfortable bias towards and . This might be a consequence of the optimization procedures, as I hypothesized.

So given this feedback, I will be conducting more simulation experiments. I won’t be looking at **fGarch** or **tseries** anymore; I will be working exclusively with **rugarch**. I will explore different optimization procedures supported by the package. I won’t be creating plots like I did in my first post; those plots were meant only to show the existence of a problem and its severity. Instead I will be looking at properties of the resulting estimators produced by different optimization procedures.

As mentioned above, **rugarch** is a package for working with GARCH models; a major use case is estimating their parameters, obviously. Here I will demonstrate how to specify a GARCH model, simulate data from the model, and estimate parameters. After this we can dive into simulation studies.

library(rugarch)

## Loading required package: parallel ## ## Attaching package: 'rugarch' ## The following object is masked from 'package:stats': ## ## sigma

To work with a GARCH model we need to specify it. The function for doing this is `ugarchspec()`

. I think the parameters `variance.model`

and `mean.model`

are the most important parameters.

`variance.model`

is a list with named entries, perhaps the two most interesting being `model`

and `garchOrder`

. `model`

is a string specify which type of GARCH model is being fitted. Many major classes of GARCH models (such as EGARCH, IGARCH, etc.) are supported; for the “vanilla” GARCH model, set this to `"sGARCH"`

(or just omit it; the standard model is the default). `garchOrder`

is a vector for the order of the ARCH and GARCH components of the model.

`mean.model`

allows for fitting ARMA-GARCH models, and functions like `variance.model`

in that it accepts a list of named entries, the most interesting being `armaOrder`

and `include.mean`

. `armaOrder`

is like `garchOrder`

; it’s a vector specifying the order of the ARMA model. `include.mean`

is a boolean that, if true, allows for the ARMA part of the model to have non-zero mean.

When simulating a process, we need to set the values of our parameters. This is done via the `fixed.pars`

parameter, which accepts a list of named elements, the elements of the list being numeric. They need to fit the conventions the function uses for parameters; for example, if we want to set the parameters of a model, the names of our list elements should be `"alpha1"`

and `"beta1"`

. If the plan is to simulate a model, every parameter in the model should be set this way.

There are other parameters interesting in their own right but I focus on these since the default specification is an ARMA-GARCH model with ARMA order of with non-zero mean and a GARCH model of order . This is not a vanilla model as I desire, so I almost always change this.

spec1 <- ugarchspec(mean.model = list(armaOrder = c(0,0), include.mean = FALSE), fixed.pars = list("omega" = 0.2, "alpha1" = 0.2, "beta1" = 0.2)) spec2 <- ugarchspec(mean.model = list(armaOrder = c(0,0), include.mean = FALSE), fixed.pars = list("omega" = 0.2, "alpha1" = 0.1, "beta1" = 0.7)) show(spec1)

## ## *---------------------------------* ## * GARCH Model Spec * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ------------------------------------ ## GARCH Model : sGARCH(1,1) ## Variance Targeting : FALSE ## ## Conditional Mean Dynamics ## ------------------------------------ ## Mean Model : ARFIMA(0,0,0) ## Include Mean : FALSE ## GARCH-in-Mean : FALSE ## ## Conditional Distribution ## ------------------------------------ ## Distribution : norm ## Includes Skew : FALSE ## Includes Shape : FALSE ## Includes Lambda : FALSE

show(spec2)

## ## *---------------------------------* ## * GARCH Model Spec * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ------------------------------------ ## GARCH Model : sGARCH(1,1) ## Variance Targeting : FALSE ## ## Conditional Mean Dynamics ## ------------------------------------ ## Mean Model : ARFIMA(0,0,0) ## Include Mean : FALSE ## GARCH-in-Mean : FALSE ## ## Conditional Distribution ## ------------------------------------ ## Distribution : norm ## Includes Skew : FALSE ## Includes Shape : FALSE ## Includes Lambda : FALSE

The function `ugarchpath()`

simulates GARCH models specified via `ugarchspec()`

. The function needs a specification objectect created by `ugarchspec()`

first. The parameters `n.sim`

and `n.start`

specify the size of the process and the length of the burn-in period, respectively (with defaults 1000 and 0, respectively; I strongly recommend setting the burn-in period to at least 500, but I go for 1000). The function creates an object that contains not only the simulated series but also residuals and .

The `rseed`

parameter controls the random seed the function uses for generating data. Be warned that `set.seed()`

is effectively ignored by this function, so if you want consistent results, you will need to set this parameter.

The `plot()`

method accompanying these objects is not completely transparent; there are a few plots it could create and when calling `plot()`

on a `uGARCHpath`

object in the command line users are prompted to input a number corresponding to the desired plot. This is a pain sometimes so don’t forget to pass the desired plot’s number to the `which`

parameter to avoid the prompt; setting `which = 2`

will give the plot of the series proper.

old_par <- par() par(mfrow = c(2, 2)) x_obj <- ugarchpath(spec1, n.sim = 1000, n.start = 1000, rseed = 111217) show(x_obj)

## ## *------------------------------------* ## * GARCH Model Path Simulation * ## *------------------------------------* ## Model: sGARCH ## Horizon: 1000 ## Simulations: 1 ## Seed Sigma2.Mean Sigma2.Min Sigma2.Max Series.Mean ## sim1 111217 0.332 0.251 0.915 0.000165 ## Mean(All) 0 0.332 0.251 0.915 0.000165 ## Unconditional NA 0.333 NA NA 0.000000 ## Series.Min Series.Max ## sim1 -1.76 1.62 ## Mean(All) -1.76 1.62 ## Unconditional NA NA

for (i in 1:4) { plot(x_obj, which = i) }

par(old_par)

## Warning in par(old_par): graphical parameter "cin" cannot be set ## Warning in par(old_par): graphical parameter "cra" cannot be set ## Warning in par(old_par): graphical parameter "csi" cannot be set ## Warning in par(old_par): graphical parameter "cxy" cannot be set ## Warning in par(old_par): graphical parameter "din" cannot be set ## Warning in par(old_par): graphical parameter "page" cannot be set

# The actual series x1 <- x_obj@path$seriesSim plot.ts(x1)

The `ugarchfit()`

function fits GARCH models. The function needs a specification and a dataset. The `solver`

parameter accepts a string stating which numerical optimizer to use to find the parameter estimates. Most of the parameters of the function manage interfacing with the numerical optimizer. In particular, `solver.control`

can be given a list of arguments to pass to the optimizer. We will be looking at this in more detail later.

The specification used for generating the simulated data won’t be appropriate for `ugarchfit()`

, since it contains fixed values for its parameters. In my case I will need to create a second specification object.

spec <- ugarchspec(mean.model = list(armaOrder = c(0, 0), include.mean = FALSE)) fit <- ugarchfit(spec, data = x1) show(fit)

## ## *---------------------------------* ## * GARCH Model Fit * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ----------------------------------- ## GARCH Model : sGARCH(1,1) ## Mean Model : ARFIMA(0,0,0) ## Distribution : norm ## ## Optimal Parameters ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## omega 0.000713 0.001258 0.56696 0.57074 ## alpha1 0.002905 0.003714 0.78206 0.43418 ## beta1 0.994744 0.000357 2786.08631 0.00000 ## ## Robust Standard Errors: ## Estimate Std. Error t value Pr(>|t|) ## omega 0.000713 0.001217 0.58597 0.55789 ## alpha1 0.002905 0.003661 0.79330 0.42760 ## beta1 0.994744 0.000137 7250.45186 0.00000 ## ## LogLikelihood : -860.486 ## ## Information Criteria ## ------------------------------------ ## ## Akaike 1.7270 ## Bayes 1.7417 ## Shibata 1.7270 ## Hannan-Quinn 1.7326 ## ## Weighted Ljung-Box Test on Standardized Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 3.998 0.04555 ## Lag[2*(p+q)+(p+q)-1][2] 4.507 0.05511 ## Lag[4*(p+q)+(p+q)-1][5] 9.108 0.01555 ## d.o.f=0 ## H0 : No serial correlation ## ## Weighted Ljung-Box Test on Standardized Squared Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 29.12 6.786e-08 ## Lag[2*(p+q)+(p+q)-1][5] 31.03 1.621e-08 ## Lag[4*(p+q)+(p+q)-1][9] 32.26 1.044e-07 ## d.o.f=2 ## ## Weighted ARCH LM Tests ## ------------------------------------ ## Statistic Shape Scale P-Value ## ARCH Lag[3] 1.422 0.500 2.000 0.2331 ## ARCH Lag[5] 2.407 1.440 1.667 0.3882 ## ARCH Lag[7] 2.627 2.315 1.543 0.5865 ## ## Nyblom stability test ## ------------------------------------ ## Joint Statistic: 0.9518 ## Individual Statistics: ## omega 0.3296 ## alpha1 0.2880 ## beta1 0.3195 ## ## Asymptotic Critical Values (10% 5% 1%) ## Joint Statistic: 0.846 1.01 1.35 ## Individual Statistic: 0.35 0.47 0.75 ## ## Sign Bias Test ## ------------------------------------ ## t-value prob sig ## Sign Bias 0.3946 6.933e-01 ## Negative Sign Bias 3.2332 1.264e-03 *** ## Positive Sign Bias 4.2142 2.734e-05 *** ## Joint Effect 28.2986 3.144e-06 *** ## ## ## Adjusted Pearson Goodness-of-Fit Test: ## ------------------------------------ ## group statistic p-value(g-1) ## 1 20 20.28 0.3779 ## 2 30 26.54 0.5965 ## 3 40 36.56 0.5817 ## 4 50 47.10 0.5505 ## ## ## Elapsed time : 2.60606

par(mfrow = c(3, 4)) for (i in 1:12) { plot(fit, which = i) }

## ## please wait...calculating quantiles...

par(old_par)

## Warning in par(old_par): graphical parameter "cin" cannot be set ## Warning in par(old_par): graphical parameter "cra" cannot be set ## Warning in par(old_par): graphical parameter "csi" cannot be set ## Warning in par(old_par): graphical parameter "cxy" cannot be set ## Warning in par(old_par): graphical parameter "din" cannot be set ## Warning in par(old_par): graphical parameter "page" cannot be set

Notice the estimated parameters and standard errors? The estimates are nowhere near the “correct” numbers even for a sample size of 1000, and there is no way a reasonable confidence interval based on the estimated standard errors would contain the correct values. It looks like the problems I documented in my last post have not gone away.

Out of curiosity, what would happen with the other specification, one in the range Prof. Santos suggested?

x_obj <- ugarchpath(spec2, n.start = 1000, rseed = 111317) x2 <- x_obj@path$seriesSim fit <- ugarchfit(spec, x2) show(fit)

## ## *---------------------------------* ## * GARCH Model Fit * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ----------------------------------- ## GARCH Model : sGARCH(1,1) ## Mean Model : ARFIMA(0,0,0) ## Distribution : norm ## ## Optimal Parameters ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## omega 0.001076 0.002501 0.43025 0.66701 ## alpha1 0.001992 0.002948 0.67573 0.49921 ## beta1 0.997008 0.000472 2112.23364 0.00000 ## ## Robust Standard Errors: ## Estimate Std. Error t value Pr(>|t|) ## omega 0.001076 0.002957 0.36389 0.71594 ## alpha1 0.001992 0.003510 0.56767 0.57026 ## beta1 0.997008 0.000359 2777.24390 0.00000 ## ## LogLikelihood : -1375.951 ## ## Information Criteria ## ------------------------------------ ## ## Akaike 2.7579 ## Bayes 2.7726 ## Shibata 2.7579 ## Hannan-Quinn 2.7635 ## ## Weighted Ljung-Box Test on Standardized Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 0.9901 0.3197 ## Lag[2*(p+q)+(p+q)-1][2] 1.0274 0.4894 ## Lag[4*(p+q)+(p+q)-1][5] 3.4159 0.3363 ## d.o.f=0 ## H0 : No serial correlation ## ## Weighted Ljung-Box Test on Standardized Squared Residuals ## ------------------------------------ ## statistic p-value ## Lag[1] 3.768 0.05226 ## Lag[2*(p+q)+(p+q)-1][5] 4.986 0.15424 ## Lag[4*(p+q)+(p+q)-1][9] 7.473 0.16272 ## d.o.f=2 ## ## Weighted ARCH LM Tests ## ------------------------------------ ## Statistic Shape Scale P-Value ## ARCH Lag[3] 0.2232 0.500 2.000 0.6366 ## ARCH Lag[5] 0.4793 1.440 1.667 0.8897 ## ARCH Lag[7] 2.2303 2.315 1.543 0.6686 ## ## Nyblom stability test ## ------------------------------------ ## Joint Statistic: 0.3868 ## Individual Statistics: ## omega 0.2682 ## alpha1 0.2683 ## beta1 0.2669 ## ## Asymptotic Critical Values (10% 5% 1%) ## Joint Statistic: 0.846 1.01 1.35 ## Individual Statistic: 0.35 0.47 0.75 ## ## Sign Bias Test ## ------------------------------------ ## t-value prob sig ## Sign Bias 0.5793 0.5625 ## Negative Sign Bias 1.3358 0.1819 ## Positive Sign Bias 1.5552 0.1202 ## Joint Effect 5.3837 0.1458 ## ## ## Adjusted Pearson Goodness-of-Fit Test: ## ------------------------------------ ## group statistic p-value(g-1) ## 1 20 24.24 0.1871 ## 2 30 30.50 0.3894 ## 3 40 38.88 0.4753 ## 4 50 48.40 0.4974 ## ## ## Elapsed time : 2.841597

That’s no better Now let’s see what happens when we use different optimization routines.

`ugarchfit()`

‘s default parameters did a good job of finding appropriate parameters for what I will refer to as model 2 (where and ) but not for model 1 (). What I want to know is when one solver seems to beat another.

As pointed out by Vivek Rao^{2} on the R-SIG-Finance mailing list, the “best” estimate is the estimate that maximizes the likelihood function (or, equivalently, the log-likelihood function), and I omitted inspecting the log likelihood function’s values in my last post. Here I will see which optimization procedures lead to the maximum log-likelihood.

Below is a helper function that simplifies the process of fitting a GARCH model’s parameters and extracting the log-likelihood, parameter values, and standard errors while allowing for different values to be passed to `solver`

and `solver.control`

.

evalSolverFit <- function(spec, data, solver = "solnp", solver.control = list()) { # Calls ugarchfit(spec, data, solver, solver.control), and returns a vector # containing the log likelihood, parameters, and parameter standard errors. # Parameters are equivalent to those seen in ugarchfit(). If the solver fails # to converge, NA will be returned vec <- NA tryCatch({ fit <- ugarchfit(spec = spec, data = data, solver = solver, solver.control = solver.control) coef_se_names <- paste("se", names(fit@fit$coef), sep = ".") se <- fit@fit$se.coef names(se) <- coef_se_names robust_coef_se_names <- paste("robust.se", names(fit@fit$coef), sep = ".") robust.se <- fit@fit$robust.se.coef names(robust.se) <- robust_coef_se_names vec <- c(fit@fit$coef, se, robust.se) vec["LLH"] <- fit@fit$LLH }, error = function(w) { NA }) return(vec) }

Below I list out all optimization schemes I will consider. I only fiddle with `solver.control`

, but there may be other parameters that could help the numerical optimization routines, namely `numderiv.control`

, which are control arguments passed to the numerical routines responsible for standard error computation. This utilizes the package **numDeriv** which performs numerical differentiation.

solvers <- list( # A list of lists where each sublist contains parameters to # pass to a solver list("solver" = "nlminb", "solver.control" = list()), list("solver" = "solnp", "solver.control" = list()), list("solver" = "lbfgs", "solver.control" = list()), list("solver" = "gosolnp", "solver.control" = list( "n.restarts" = 100, "n.sim" = 100 )), list("solver" = "hybrid", "solver.control" = list()), list("solver" = "nloptr", "solver.control" = list("solver" = 1)), # COBYLA list("solver" = "nloptr", "solver.control" = list("solver" = 2)), # BOBYQA list("solver" = "nloptr", "solver.control" = list("solver" = 3)), # PRAXIS list("solver" = "nloptr", "solver.control" = list("solver" = 4)), # NELDERMEAD list("solver" = "nloptr", "solver.control" = list("solver" = 5)), # SBPLX list("solver" = "nloptr", "solver.control" = list("solver" = 6)), # AUGLAG+COBYLA list("solver" = "nloptr", "solver.control" = list("solver" = 7)), # AUGLAG+BOBYQA list("solver" = "nloptr", "solver.control" = list("solver" = 8)), # AUGLAG+PRAXIS list("solver" = "nloptr", "solver.control" = list("solver" = 9)), # AUGLAG+NELDERMEAD list("solver" = "nloptr", "solver.control" = list("solver" = 10)) # AUGLAG+SBPLX ) tags <- c( # Names for the above list "nlminb", "solnp", "lbfgs", "gosolnp", "hybrid", "nloptr+COBYLA", "nloptr+BOBYQA", "nloptr+PRAXIS", "nloptr+NELDERMEAD", "nloptr+SBPLX", "nloptr+AUGLAG+COBYLA", "nloptr+AUGLAG+BOBYQA", "nloptr+AUGLAG+PRAXIS", "nloptr+AUGLAG+NELDERMEAD", "nloptr+AUGLAG+SBPLX" ) names(solvers) <- tags

Now let’s run the gauntlet of optimization choices and see which produces the estimates with the largest log likelihood for data generated by model 1. The `lbfgs`

method (low-storage version of the Broyden-Fletcher-Goldfarb-Shanno method, provided in **nloptr**) unfortunately does not converge for this series, so I omit it.

optMethodCompare <- function(data, spec, solvers) { # Runs all solvers in a list for a dataset # # Args: # data: An object to pass to ugarchfit's data parameter containing the data # to fit # spec: A specification created by ugarchspec to pass to ugarchfit # solvers: A list of lists containing strings of solvers and a list for # solver.control # # Return: # A matrix containing the result of the solvers (including parameters, se's, # and LLH) model_solutions <- lapply(solvers, function(s) { args <- s args[["spec"]] <- spec args[["data"]] <- data res <- do.call(evalSolverFit, args = args) return(res) }) model_solutions <- do.call(rbind, model_solutions) return(model_solutions) } round(optMethodCompare(x1, spec, solvers[c(1:2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- ---------- ## nlminb 0.2689 0.1774 0.0000 0.0787 0.0472 0.2447 0.0890 0.0352 0.2830 -849.6927 ## solnp 0.0007 0.0029 0.9947 0.0013 0.0037 0.0004 0.0012 0.0037 0.0001 -860.4860 ## gosolnp 0.2689 0.1774 0.0000 0.0787 0.0472 0.2446 0.0890 0.0352 0.2828 -849.6927 ## hybrid 0.0007 0.0029 0.9947 0.0013 0.0037 0.0004 0.0012 0.0037 0.0001 -860.4860 ## nloptr+COBYLA 0.0006 0.0899 0.9101 0.0039 0.0306 0.0370 0.0052 0.0527 0.0677 -871.5006 ## nloptr+BOBYQA 0.0003 0.0907 0.9093 0.0040 0.0298 0.0375 0.0057 0.0532 0.0718 -872.3436 ## nloptr+PRAXIS 0.2689 0.1774 0.0000 0.0786 0.0472 0.2444 0.0888 0.0352 0.2823 -849.6927 ## nloptr+NELDERMEAD 0.0010 0.0033 0.9935 0.0013 0.0039 0.0004 0.0013 0.0038 0.0001 -860.4845 ## nloptr+SBPLX 0.0010 0.1000 0.9000 0.0042 0.0324 0.0386 0.0055 0.0536 0.0680 -872.2736 ## nloptr+AUGLAG+COBYLA 0.0006 0.0899 0.9101 0.0039 0.0306 0.0370 0.0052 0.0527 0.0677 -871.5006 ## nloptr+AUGLAG+BOBYQA 0.0003 0.0907 0.9093 0.0040 0.0298 0.0375 0.0057 0.0532 0.0718 -872.3412 ## nloptr+AUGLAG+PRAXIS 0.1246 0.1232 0.4948 0.0620 0.0475 0.2225 0.0701 0.0439 0.2508 -851.0547 ## nloptr+AUGLAG+NELDERMEAD 0.2689 0.1774 0.0000 0.0786 0.0472 0.2445 0.0889 0.0352 0.2826 -849.6927 ## nloptr+AUGLAG+SBPLX 0.0010 0.1000 0.9000 0.0042 0.0324 0.0386 0.0055 0.0536 0.0680 -872.2736

According the the maximum likelihood criterion, the “best” result is achieved by `gosolnp`

. The result has the unfortunate property that , which is certainly not true, but at least the standard error for would create a confidence interval that contains ‘s true value. Of these, my preferred estimates are produced by AUGLAG+PRAXIS, as seems reasonable and in fact the estimates are all close to the truth, (at least in the sense that the confidence intervals contain the true values), but unfortunately the estimates do *not* maximize the log likelihood, even though they are the most reasonable.

If we looked at model 2, what do we see? Again, `lbfgs`

does not converge so I omit it. Unfortunately, `nlminb`

does not converge either, so it too must be omitted.

round(optMethodCompare(x2, spec, solvers[c(2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- ---------- ## solnp 0.0011 0.0020 0.9970 0.0025 0.0029 0.0005 0.0030 0.0035 0.0004 -1375.951 ## gosolnp 0.0011 0.0020 0.9970 0.0025 0.0029 0.0005 0.0030 0.0035 0.0004 -1375.951 ## hybrid 0.0011 0.0020 0.9970 0.0025 0.0029 0.0005 0.0030 0.0035 0.0004 -1375.951 ## nloptr+COBYLA 0.0016 0.0888 0.9112 0.0175 0.0619 0.0790 0.0540 0.2167 0.2834 -1394.529 ## nloptr+BOBYQA 0.0010 0.0892 0.9108 0.0194 0.0659 0.0874 0.0710 0.2631 0.3572 -1395.310 ## nloptr+PRAXIS 0.5018 0.0739 0.3803 0.3178 0.0401 0.3637 0.2777 0.0341 0.3225 -1373.632 ## nloptr+NELDERMEAD 0.0028 0.0026 0.9944 0.0028 0.0031 0.0004 0.0031 0.0035 0.0001 -1375.976 ## nloptr+SBPLX 0.0029 0.1000 0.9000 0.0146 0.0475 0.0577 0.0275 0.1108 0.1408 -1395.807 ## nloptr+AUGLAG+COBYLA 0.0016 0.0888 0.9112 0.0175 0.0619 0.0790 0.0540 0.2167 0.2834 -1394.529 ## nloptr+AUGLAG+BOBYQA 0.0010 0.0892 0.9108 0.0194 0.0659 0.0874 0.0710 0.2631 0.3572 -1395.310 ## nloptr+AUGLAG+PRAXIS 0.5018 0.0739 0.3803 0.3178 0.0401 0.3637 0.2777 0.0341 0.3225 -1373.632 ## nloptr+AUGLAG+NELDERMEAD 0.0001 0.0000 1.0000 0.0003 0.0003 0.0000 0.0004 0.0004 0.0000 -1375.885 ## nloptr+AUGLAG+SBPLX 0.0029 0.1000 0.9000 0.0146 0.0475 0.0577 0.0275 0.1108 0.1408 -1395.807

Here it was PRAXIS and AUGLAG+PRAXIS that gave the “optimal” result, and it was only those two methods that did. Other optimizers gave visibly bad results. That said, the “optimal” solution is the preferred on with the parameters being nonzero and their confidence intervals containing the correct values.

What happens if we restrict the sample to size 100? (`lbfgs`

still does not work.)

round(optMethodCompare(x1[1:100], spec, solvers[c(1:2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- --------- ## nlminb 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## solnp 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## gosolnp 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## hybrid 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+COBYLA 0.0007 0.1202 0.8798 0.0085 0.0999 0.0983 0.0081 0.1875 0.1778 -85.3121 ## nloptr+BOBYQA 0.0005 0.1190 0.8810 0.0085 0.0994 0.0992 0.0084 0.1892 0.1831 -85.3717 ## nloptr+PRAXIS 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+NELDERMEAD 0.0451 0.2742 0.5920 0.0281 0.1230 0.1297 0.0191 0.0906 0.0667 -80.6587 ## nloptr+SBPLX 0.0433 0.2740 0.5998 0.0269 0.1237 0.1268 0.0182 0.0916 0.0648 -80.6616 ## nloptr+AUGLAG+COBYLA 0.0007 0.1202 0.8798 0.0085 0.0999 0.0983 0.0081 0.1875 0.1778 -85.3121 ## nloptr+AUGLAG+BOBYQA 0.0005 0.1190 0.8810 0.0085 0.0994 0.0992 0.0084 0.1892 0.1831 -85.3717 ## nloptr+AUGLAG+PRAXIS 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+AUGLAG+NELDERMEAD 0.0451 0.2742 0.5921 0.0280 0.1229 0.1296 0.0191 0.0905 0.0667 -80.6587 ## nloptr+AUGLAG+SBPLX 0.0450 0.2742 0.5924 0.0280 0.1230 0.1295 0.0191 0.0906 0.0666 -80.6587

round(optMethodCompare(x2[1:100], spec, solvers[c(1:2, 4:15)]), digits = 4)

## omega alpha1 beta1 se.omega se.alpha1 se.beta1 robust.se.omega robust.se.alpha1 robust.se.beta1 LLH ## ------------------------- ------- ------- ------- --------- ---------- --------- ---------------- ----------------- ---------------- ---------- ## nlminb 0.7592 0.0850 0.0000 2.1366 0.4813 3.0945 7.5439 1.7763 11.0570 -132.4614 ## solnp 0.0008 0.0000 0.9990 0.0291 0.0417 0.0066 0.0232 0.0328 0.0034 -132.9182 ## gosolnp 0.0537 0.0000 0.9369 0.0521 0.0087 0.0713 0.0430 0.0012 0.0529 -132.9124 ## hybrid 0.0008 0.0000 0.9990 0.0291 0.0417 0.0066 0.0232 0.0328 0.0034 -132.9182 ## nloptr+COBYLA 0.0014 0.0899 0.9101 0.0259 0.0330 0.1192 0.0709 0.0943 0.1344 -135.7495 ## nloptr+BOBYQA 0.0008 0.0905 0.9095 0.0220 0.0051 0.1145 0.0687 0.0907 0.1261 -135.8228 ## nloptr+PRAXIS 0.0602 0.0000 0.9293 0.0522 0.0088 0.0773 0.0462 0.0015 0.0565 -132.9125 ## nloptr+NELDERMEAD 0.0024 0.0000 0.9971 0.0473 0.0629 0.0116 0.0499 0.0680 0.0066 -132.9186 ## nloptr+SBPLX 0.0027 0.1000 0.9000 0.0238 0.0493 0.1308 0.0769 0.1049 0.1535 -135.9175 ## nloptr+AUGLAG+COBYLA 0.0014 0.0899 0.9101 0.0259 0.0330 0.1192 0.0709 0.0943 0.1344 -135.7495 ## nloptr+AUGLAG+BOBYQA 0.0008 0.0905 0.9095 0.0221 0.0053 0.1145 0.0687 0.0907 0.1262 -135.8226 ## nloptr+AUGLAG+PRAXIS 0.0602 0.0000 0.9294 0.0523 0.0090 0.0771 0.0462 0.0014 0.0565 -132.9125 ## nloptr+AUGLAG+NELDERMEAD 0.0000 0.0000 0.9999 0.0027 0.0006 0.0005 0.0013 0.0004 0.0003 -132.9180 ## nloptr+AUGLAG+SBPLX 0.0027 0.1000 0.9000 0.0238 0.0493 0.1308 0.0769 0.1049 0.1535 -135.9175

The results are not thrilling. The “best” result for the series generated by model 1 was attained by multiple solvers, and the 95% confidence interval (CI) for would not contain ‘s true value, though the CIs for the other parameters would contain their true values. For the series generated by model 2 the best result was attained by the `nlminb`

solver; the parameter values are not plausible and the standard errors are huge. At least the CI would contain the correct value.

From here we should no longer stick to two series but see the performance of these methods on many simulated series generated by both models. Simulations in this post will be too computationally intensive for my laptop so I will use my department’s supercomputer to perform them, taking advantage of its many cores for parallelization.

library(foreach) library(doParallel) logfile <- "" # logfile <- "outfile.log" # if (!file.exists(logfile)) { # file.create(logfile) # } cl <- makeCluster(detectCores() - 1, outfile = logfile) registerDoParallel(cl) optMethodSims <- function(gen_spec, n.sim = 1000, m.sim = 1000, fit_spec = ugarchspec(mean.model = list( armaOrder = c(0,0), include.mean = FALSE)), solvers = list("solnp" = list( "solver" = "solnp", "solver.control" = list())), rseed = NA, verbose = FALSE) { # Performs simulations in parallel of GARCH processes via rugarch and returns # a list with the results of different optimization routines # # Args: # gen_spec: The specification for generating a GARCH sequence, produced by # ugarchspec # n.sim: An integer denoting the length of the simulated series # m.sim: An integer for the number of simulated sequences to generate # fit_spec: A ugarchspec specification for the model to fit # solvers: A list of lists containing strings of solvers and a list for # solver.control # rseed: Optional seeding value(s) for the random number generator. For # m.sim>1, it is possible to provide either a single seed to # initialize all values, or one seed per separate simulation (i.e. # m.sim seeds). However, in the latter case this may result in some # slight overhead depending on how large m.sim is # verbose: Boolean for whether to write data tracking the progress of the # loop into an output file # outfile: A string for the file to store verbose output to (relevant only # if verbose is TRUE) # # Return: # A list containing the result of calling optMethodCompare on each generated # sequence fits <- foreach(i = 1:m.sim, .packages = c("rugarch"), .export = c("optMethodCompare", "evalSolverFit")) %dopar% { if (is.na(rseed)) { newseed <- NA } else if (is.vector(rseed)) { newseed <- rseed[i] } else { newseed <- rseed + i - 1 } if (verbose) { cat(as.character(Sys.time()), ": Now on simulation ", i, "\n") } sim <- ugarchpath(gen_spec, n.sim = n.sim, n.start = 1000, m.sim = 1, rseed = newseed) x <- sim@path$seriesSim optMethodCompare(x, spec = fit_spec, solvers = solvers) } return(fits) } # Specification 1 first spec1_n100 <- optMethodSims(spec1, n.sim = 100, m.sim = 1000, solvers = solvers, verbose = TRUE) spec1_n500 <- optMethodSims(spec1, n.sim = 500, m.sim = 1000, solvers = solvers, verbose = TRUE) spec1_n1000 <- optMethodSims(spec1, n.sim = 1000, m.sim = 1000, solvers = solvers, verbose = TRUE) # Specification 2 next spec2_n100 <- optMethodSims(spec2, n.sim = 100, m.sim = 1000, solvers = solvers, verbose = TRUE) spec2_n500 <- optMethodSims(spec2, n.sim = 500, m.sim = 1000, solvers = solvers, verbose = TRUE) spec2_n1000 <- optMethodSims(spec2, n.sim = 1000, m.sim = 1000, solvers = solvers, verbose = TRUE)

Below is a set of helper functions I will use for the analytics I want.

optMethodSims_getAllVals <- function(param, solver, reslist) { # Get all values for a parameter obtained by a certain solver after getting a # list of results via optMethodSims # # Args: # param: A string for the parameter to get (such as "beta1") # solver: A string for the solver for which to get the parameter (such as # "nlminb") # reslist: A list created by optMethodSims # # Return: # A vector of values of the parameter for each simulation res <- sapply(reslist, function(l) { return(l[solver, param]) }) return(res) } optMethodSims_getBestVals <- function(reslist, opt_vec = TRUE, reslike = FALSE) { # A function that gets the optimizer that maximized the likelihood function # for each entry in reslist # # Args: # reslist: A list created by optMethodSims # opt_vec: A boolean indicating whether to return a vector with the name of # the optimizers that maximized the log likelihood # reslike: A bookean indicating whether the resulting list should consist of # matrices of only one row labeled "best" with a structure like # reslist # # Return: # If opt_vec is TRUE, a list of lists, where each sublist contains a vector # of strings naming the opimizers that maximized the likelihood function and # a matrix of the parameters found. Otherwise, just the matrix (resembles # the list generated by optMethodSims) res <- lapply(reslist, function(l) { max_llh <- max(l[, "LLH"], na.rm = TRUE) best_idx <- (l[, "LLH"] == max_llh) & (!is.na(l[, "LLH"])) best_mat <- l[best_idx, , drop = FALSE] if (opt_vec) { return(list("solvers" = rownames(best_mat), "params" = best_mat)) } else { return(best_mat) } }) if (reslike) { res <- lapply(res, function(l) { mat <- l$params[1, , drop = FALSE] rownames(mat) <- "best" return(mat) }) } return(res) } optMethodSims_getCaptureRate <- function(param, solver, reslist, multiplier = 2, spec, use_robust = TRUE) { # Gets the rate a confidence interval for a parameter captures the true value # # Args: # param: A string for the parameter being worked with # solver: A string for the solver used to estimate the parameter # reslist: A list created by optMethodSims # multiplier: A floating-point number for the multiplier to the standard # error, appropriate for the desired confidence level # spec: A ugarchspec specification with the fixed parameters containing the # true parameter value # use_robust: Use robust standard errors for computing CIs # # Return: # A float for the proportion of times the confidence interval captured the # true parameter value se_string <- ifelse(use_robust, "robust.se.", "se.") est <- optMethodSims_getAllVals(param, solver, reslist) moe_est <- multiplier * optMethodSims_getAllVals( paste0(se_string, param), solver, reslist) param_val <- spec@model$fixed.pars[[param]] contained <- (param_val <= est + moe_est) & (param_val >= est - moe_est) return(mean(contained, na.rm = TRUE)) } optMethodSims_getMaxRate <- function(solver, maxlist) { # Gets how frequently a solver found a maximal log likelihood # # Args: # solver: A string for the solver # maxlist A list created by optMethodSims_getBestVals with entries # containing vectors naming the solvers that maximized the log # likelihood # # Return: # The proportion of times the solver maximized the log likelihood maxed <- sapply(maxlist, function(l) { solver %in% l$solvers }) return(mean(maxed)) } optMethodSims_getFailureRate <- function(solver, reslist) { # Computes the proportion of times a solver failed to converge. # # Args: # solver: A string for the solver # reslist: A list created by optMethodSims # # Return: # Numeric proportion of times a solver failed to converge failed <- sapply(reslist, function(l) { is.na(l[solver, "LLH"]) }) return(mean(failed)) } # Vectorization optMethodSims_getCaptureRate <- Vectorize(optMethodSims_getCaptureRate, vectorize.args = "solver") optMethodSims_getMaxRate <- Vectorize(optMethodSims_getMaxRate, vectorize.args = "solver") optMethodSims_getFailureRate <- Vectorize(optMethodSims_getFailureRate, vectorize.args = "solver")

I first create tables containing, for a fixed sample size and model:

- The rate at which a solver attains the highest log likelihood among all solvers for a series
- The rate at which a solver failed to converge
- The rate at which a roughly 95% confidence interval based on the solver’s solution managed to contain the true parameter value for each parameter (referred to as the “capture rate”, and using the robust standard errors)

solver_table <- function(reslist, tags, spec) { # Creates a table describing important solver statistics # # Args: # reslist: A list created by optMethodSims # tags: A vector with strings naming all solvers to include in the table # spec: A ugarchspec specification with the fixed parameters containing the # true parameter value # # Return: # A matrix containing metrics describing the performance of the solvers params <- names(spec1@model$fixed.pars) max_rate <- optMethodSims_getMaxRate(tags, optMethodSims_getBestVals(reslist)) failure_rate <- optMethodSims_getFailureRate(tags, reslist) capture_rate <- lapply(params, function(p) { optMethodSims_getCaptureRate(p, tags, reslist, spec = spec) }) return_mat <- cbind("Maximization Rate" = max_rate, "Failure Rate" = failure_rate) capture_mat <- do.call(cbind, capture_rate) colnames(capture_mat) <- paste(params, "95% CI Capture Rate") return_mat <- cbind(return_mat, capture_mat) return(return_mat) }

as.data.frame(round(solver_table(spec1_n100, tags, spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 16.2 20.0 21.8 29.2 24.0 ## solnp 0.1 0.0 13.7 24.0 15.4 ## lbfgs 15.1 35.2 56.6 67.9 58.0 ## gosolnp 20.3 0.0 20.3 32.6 21.9 ## hybrid 0.1 0.0 13.7 24.0 15.4 ## nloptr+COBYLA 0.0 0.0 6.3 82.6 19.8 ## nloptr+BOBYQA 0.0 0.0 5.4 82.1 18.5 ## nloptr+PRAXIS 15.8 0.0 42.1 54.5 44.1 ## nloptr+NELDERMEAD 0.4 0.0 5.7 19.3 8.1 ## nloptr+SBPLX 0.1 0.0 7.7 85.7 24.1 ## nloptr+AUGLAG+COBYLA 0.0 0.0 6.1 84.5 19.9 ## nloptr+AUGLAG+BOBYQA 0.1 0.0 6.5 83.2 19.4 ## nloptr+AUGLAG+PRAXIS 22.6 0.0 41.2 54.6 44.1 ## nloptr+AUGLAG+NELDERMEAD 11.1 0.0 7.5 18.8 9.7 ## nloptr+AUGLAG+SBPLX 0.6 0.0 7.9 86.5 23.0

as.data.frame(round(solver_table(spec1_n500, tags, spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 21.2 0.4 63.3 67.2 63.8 ## solnp 0.1 0.2 32.2 35.6 32.7 ## lbfgs 4.5 41.3 85.0 87.6 85.7 ## gosolnp 35.1 0.0 69.0 73.2 69.5 ## hybrid 0.1 0.0 32.3 35.7 32.8 ## nloptr+COBYLA 0.0 0.0 3.2 83.3 17.8 ## nloptr+BOBYQA 0.0 0.0 3.5 81.5 18.1 ## nloptr+PRAXIS 18.0 0.0 83.9 87.0 84.2 ## nloptr+NELDERMEAD 0.0 0.0 16.4 20.7 16.7 ## nloptr+SBPLX 0.1 0.0 3.7 91.4 15.7 ## nloptr+AUGLAG+COBYLA 0.0 0.0 3.2 83.3 17.8 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 3.5 81.5 18.1 ## nloptr+AUGLAG+PRAXIS 21.9 0.0 80.2 87.4 83.4 ## nloptr+AUGLAG+NELDERMEAD 0.6 0.0 20.0 24.0 20.5 ## nloptr+AUGLAG+SBPLX 0.0 0.0 3.7 91.4 15.7

as.data.frame(round(solver_table(spec1_n1000, tags, spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 21.5 0.1 88.2 86.1 87.8 ## solnp 0.4 0.2 54.9 53.6 54.6 ## lbfgs 1.1 44.8 91.5 88.0 91.8 ## gosolnp 46.8 0.0 87.2 85.1 87.0 ## hybrid 0.5 0.0 55.0 53.6 54.7 ## nloptr+COBYLA 0.0 0.0 4.1 74.5 15.0 ## nloptr+BOBYQA 0.0 0.0 3.6 74.3 15.9 ## nloptr+PRAXIS 17.7 0.0 92.6 90.2 92.2 ## nloptr+NELDERMEAD 0.0 0.0 30.5 29.6 30.9 ## nloptr+SBPLX 0.0 0.0 3.0 82.3 11.6 ## nloptr+AUGLAG+COBYLA 0.0 0.0 4.1 74.5 15.0 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 3.6 74.3 15.9 ## nloptr+AUGLAG+PRAXIS 13.0 0.0 83.4 93.9 86.7 ## nloptr+AUGLAG+NELDERMEAD 0.0 0.0 34.6 33.8 35.0 ## nloptr+AUGLAG+SBPLX 0.0 0.0 3.0 82.3 11.6

as.data.frame(round(solver_table(spec2_n100, tags, spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 8.2 24.2 22.3 34.7 23.9 ## solnp 0.3 0.0 21.1 32.6 21.3 ## lbfgs 11.6 29.5 74.9 73.2 70.4 ## gosolnp 19.0 0.0 31.9 41.2 30.8 ## hybrid 0.3 0.0 21.1 32.6 21.3 ## nloptr+COBYLA 0.0 0.0 20.5 94.7 61.7 ## nloptr+BOBYQA 0.2 0.0 19.3 95.8 62.2 ## nloptr+PRAXIS 16.0 0.0 70.2 57.2 52.8 ## nloptr+NELDERMEAD 0.2 0.0 7.8 27.8 14.1 ## nloptr+SBPLX 0.1 0.0 24.9 91.0 65.0 ## nloptr+AUGLAG+COBYLA 0.0 0.0 21.2 95.1 62.5 ## nloptr+AUGLAG+BOBYQA 0.9 0.0 20.1 96.2 62.5 ## nloptr+AUGLAG+PRAXIS 38.8 0.0 70.4 57.2 52.7 ## nloptr+AUGLAG+NELDERMEAD 14.4 0.0 10.7 26.0 16.1 ## nloptr+AUGLAG+SBPLX 0.1 0.0 25.8 91.9 65.5

as.data.frame(round(solver_table(spec2_n500, tags, spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 1.7 1.6 35.0 37.2 34.2 ## solnp 0.1 0.2 46.2 48.6 45.3 ## lbfgs 2.2 38.4 85.2 88.1 82.3 ## gosolnp 5.2 0.0 74.9 77.8 72.7 ## hybrid 0.1 0.0 46.1 48.5 45.2 ## nloptr+COBYLA 0.0 0.0 8.2 100.0 40.5 ## nloptr+BOBYQA 0.0 0.0 9.5 100.0 41.0 ## nloptr+PRAXIS 17.0 0.0 83.8 85.1 81.0 ## nloptr+NELDERMEAD 0.0 0.0 26.9 38.2 27.0 ## nloptr+SBPLX 0.0 0.0 8.2 100.0 40.2 ## nloptr+AUGLAG+COBYLA 0.0 0.0 8.2 100.0 40.5 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 9.5 100.0 41.0 ## nloptr+AUGLAG+PRAXIS 77.8 0.0 84.4 85.4 81.3 ## nloptr+AUGLAG+NELDERMEAD 1.1 0.0 32.5 40.3 32.3 ## nloptr+AUGLAG+SBPLX 0.0 0.0 8.2 100.0 40.2

as.data.frame(round(solver_table(spec2_n1000, tags, spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ------------------------- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## nlminb 2.7 0.7 64.1 68.0 63.8 ## solnp 0.0 0.0 70.1 73.8 69.8 ## lbfgs 0.0 43.4 90.6 91.5 89.9 ## gosolnp 3.2 0.0 87.5 90.3 86.9 ## hybrid 0.0 0.0 70.1 73.8 69.8 ## nloptr+COBYLA 0.0 0.0 2.3 100.0 20.6 ## nloptr+BOBYQA 0.0 0.0 2.5 100.0 22.6 ## nloptr+PRAXIS 14.1 0.0 89.1 91.3 88.5 ## nloptr+NELDERMEAD 0.0 0.0 46.3 55.6 45.4 ## nloptr+SBPLX 0.0 0.0 2.2 100.0 19.5 ## nloptr+AUGLAG+COBYLA 0.0 0.0 2.3 100.0 20.6 ## nloptr+AUGLAG+BOBYQA 0.0 0.0 2.5 100.0 22.6 ## nloptr+AUGLAG+PRAXIS 85.5 0.0 89.1 91.3 88.5 ## nloptr+AUGLAG+NELDERMEAD 0.3 0.0 51.9 58.2 51.3 ## nloptr+AUGLAG+SBPLX 0.0 0.0 2.2 100.0 19.5

These tables already reveal a lot of information. In general it seems that the AUGLAG-PRAXIS method (the augmented Lagrangian method using the principal axis solver) provided in NLOpt does best for model 2 especially for large sample sizes, while for model 1 the `gosolnp`

method, which uses the `solnp`

solver by Yinyu Ye but with random initializations and restarts, seems to win out for larger sample sizes.

The bigger story, though, is the failure of any method to be the “best”, especially in the case of smaller sample sizes. While there are some optimizers that consistently fail to attain the maximum log-likelihood, no optimizer can claim to consistently obtain the best result. Additionally, different optimizers seem to perform better with different models. The implication for real-world data–where the true model parameters are never known–is to try every optimizer (or at least those that have a chance of maximizing the log-likelihood) and pick the results that yield the largest log-likelihood. No algorithm is trustworthy enough to be the go-to algorithm.

Let’s now look at plots of the estimated distribution of the parameters. First comes a helper function.

library(ggplot2) solver_density_plot <- function(param, tags, list_reslist, sample_sizes, spec) { # Given a parameter, creates a density plot for each solver's distribution # at different sample sizes # # Args: # param: A string for the parameter to plot # tags: A character vector containing the solver names # list_reslist: A list of lists created by optMethodSimsf, one for each # sample size # sample_sizes: A numeric vector identifying the sample size corresponding # to each object in the above list # spec: A ugarchspec object containing the specification that generated the # datasets # # Returns: # A ggplot object containing the plot generated p <- spec@model$fixed.pars[[param]] nlist <- lapply(list_reslist, function(l) { optlist <- lapply(tags, function(t) { return(na.omit(optMethodSims_getAllVals(param, t, l))) }) names(optlist) <- tags df <- stack(optlist) names(df) <- c("param", "optimizer") return(df) }) ndf <- do.call(rbind, nlist) ndf$n <- rep(sample_sizes, times = sapply(nlist, nrow)) ggplot(ndf, aes(x = param)) + geom_density(fill = "black", alpha = 0.5) + geom_vline(xintercept = p, color = "blue") + facet_grid(optimizer ~ n, scales = "free_y") }

Now for plots.

solver_density_plot("omega", tags, list(spec1_n100, spec1_n500, spec1_n1000), c(100, 500, 1000), spec1)

solver_density_plot("alpha1", tags, list(spec1_n100, spec1_n500, spec1_n1000), c(100, 500, 1000), spec1)

solver_density_plot("beta1", tags, list(spec1_n100, spec1_n500, spec1_n1000), c(100, 500, 1000), spec1)

Bear in mind that there are only 1,000 simulated series and the optimizers produce solutions for each series, so in principle optimizer results should not be independent, yet the only time these density plots look the same is when the optimizer performs terribly. But even when an optimizer isn’t performing terribly (as is the case for the `gosolnp`

, `PRAXIS`

, and `AUGLAG-PRAXIS`

methods) there’s evidence of artifacts around 0 for the estimates of and and 1 for . These artifacts are more pronounced for smaller sample sizes. That said, for the better optimizers the estimators look almost unbiased, especially for and , but their spread is large even for large sample sizes, especially for ‘s estimator. That’s not the case for the `AUGLAG-PRAXIS`

optimizer, though; it appears to produce biased estimates.

Let’s look at plots for model 2.

solver_density_plot("omega", tags, list(spec2_n100, spec2_n500, spec2_n1000), c(100, 500, 1000), spec2)

solver_density_plot("alpha1", tags, list(spec2_n100, spec2_n500, spec2_n1000), c(100, 500, 1000), spec2)

solver_density_plot("beta1", tags, list(spec2_n100, spec2_n500, spec2_n1000), c(100, 500, 1000), spec2)

The estimators don’t struggle as much for model 2, but the picture is still hardly rosy. The `PRAXIS`

and `AUGLAG-PRAXIS`

methods seem to perform well, but far from spectacularly for small sample sizes.

So far, my experiments suggest practitioners should not rely on any one optimizer but instead to try different ones and choose the results that have the largest log-likelihood. Suppose we call this optimization routine the “best” optimizer. how does this optimizer perform?

Let’s find out.

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec1_n100, reslike = TRUE), "best", spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 49.5 63.3 52.2

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec1_n500, reslike = TRUE), "best", spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 86 88.8 86.2

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec1_n1000, reslike = TRUE), "best", spec1) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 92.8 90.3 92.4

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec2_n100, reslike = TRUE), "best", spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 55.2 63.2 52.2

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec2_n500, reslike = TRUE), "best", spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 83 86.3 80.5

as.data.frame(round(solver_table( optMethodSims_getBestVals(spec2_n1000, reslike = TRUE), "best", spec2) * 100, digits = 1))

## Maximization Rate Failure Rate omega 95% CI Capture Rate alpha1 95% CI Capture Rate beta1 95% CI Capture Rate ## ----- ------------------ ------------- -------------------------- --------------------------- -------------------------- ## best 100 0 88.7 91.4 88.1

Bear in mind that we evaluate the performance of the “best” optimizer by the CI capture rate, which should be around 95%. The “best” optimizer obviously has good performance but does not outperform all optimizers. This is disappointing; I had hoped that the “best” optimizer would have the highly desirable property of a 95% capture rate. Performance is nowhere near that except for larger sample sizes. Either the standard errors are being underestimated or for small sample sizes the Normal distribution poorly describes the actual distribution of the estimators (which means multiplying by two does not lead to intervals with the desired confidence level).

Interestingly, there is no noticeable difference in performance between the two models for this “best” estimator. This suggests to me that the seemingly better results for models often seen in actual data might be exploiting the bias of the optimizers.

Let’s look at the distribution of the estimated parameters.

solver_density_plot("omega", "best", lapply(list(spec1_n100, spec1_n500, spec1_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec1)

solver_density_plot("alpha1", "best", lapply(list(spec1_n100, spec1_n500, spec1_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec1)

solver_density_plot("beta1", "best", lapply(list(spec1_n100, spec1_n500, spec1_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec1)

solver_density_plot("omega", "best", lapply(list(spec2_n100, spec2_n500, spec2_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec2)

solver_density_plot("alpha1", "best", lapply(list(spec2_n100, spec2_n500, spec2_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec2)

solver_density_plot("beta1", "best", lapply(list(spec2_n100, spec2_n500, spec2_n1000), function(l) {optMethodSims_getBestVals(l, reslike = TRUE)}), c(100, 500, 1000), spec2)

The plots suggest that the “best” estimator still shows some pathologies even though it behaves less poorly than the other estimators. I don’t see evidence for bias in parameter estimates regardless of choice of model but I’m not convinced the “best” estimator truly maximizes the log-likelihood, especially for smaller sample sizes. the estimates for are especially bad. Even if the standard error for should be large I don’t think it should show the propensity for being zero or one that these plots reveal.

I initially wrote this article over a year ago and didn’t publish it until now. The reason for the hang up was because I wanted a literature review of alternative ways to estimate the parameters of a GARCH model. Unfortunately I never completed such a review, and I’ve decided to release this article regardless.

That said, I’ll share what I was reading. One article by Gilles Zumbach tried to explain why estimating GARCH parameters is hard. He noted that the quasi-likelihood equation that solvers try to maximize has bad properties, such as being non-concave and having “flat” regions that algorithms can become stuck in. He suggested an alternative procedure to finding the parameters of GARCH models, where one finds the best fit in an alternative parameter space (which supposedly has better properties than working with the original parameter space of GARCH models) and estimating one of the parameters using, say, the method of moments, without any optimization algorithm. Another article, by Fiorentini, Calzolari, and Panattoni, showed that analytic gradients for GARCH models could be computed explicitly, so gradient-free methods like those used by the optimization algorithms seen here are not actually necessary. Since numerical differentiation is generally a difficult problem, this could help ensure that no additional numerical error is being introduced that causes these algorithms to fail to converge. I also wanted to explore other estimation methods to see if they somehow can avoid numerical techniques altogether or have better numerical properties, such as estimation via method of moments. I wanted to read an article by Andersen, Chung, and Sørensen to learn more about this approach to estimation.

Life happens, though, and I didn’t complete this review. The project moved on and the problem of estimating GARCH model parameters well was essentially avoided. That said, I want to revisit this point, perhaps exploring how techniques such as simulated annealing do for estimating GARCH model parameters.

So for now, if you’re a practitioner, what should you do when estimating a GARCH model? I would say don’t take for granted that the default estimation procedure your package uses will work. You should explore different procedure and different parameter choices and go with the results that lead to the largest log-likelihood value. I showed how this could be done in an automated fashion but you should be prepared to *manually* pick the model with the best fit (as determined by the log-likelihood). If you don’t do this the model you estimated may not actually be the one for which theory works.

I will say it again, one last time, in the last sentence of this article for extra emphasis: *don’t take numerical techniques and results for granted!*

sessionInfo()

## R version 3.4.2 (2017-09-28) ## Platform: i686-pc-linux-gnu (32-bit) ## Running under: Ubuntu 16.04.2 LTS ## ## Matrix products: default ## BLAS: /usr/lib/libblas/libblas.so.3.6.0 ## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] ggplot2_2.2.1 rugarch_1.3-8 printr_0.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.16 htmltools_0.3.6 ## [3] SkewHyperbolic_0.3-2 expm_0.999-2 ## [5] scales_0.5.0 DistributionUtils_0.5-1 ## [7] Rsolnp_1.16 rprojroot_1.2 ## [9] grid_3.4.2 stringr_1.3.1 ## [11] knitr_1.17 numDeriv_2016.8-1 ## [13] GeneralizedHyperbolic_0.8-1 munsell_0.4.3 ## [15] pillar_1.3.0 tibble_1.4.2 ## [17] compiler_3.4.2 highr_0.6 ## [19] lattice_0.20-35 labeling_0.3 ## [21] Matrix_1.2-8 KernSmooth_2.23-15 ## [23] plyr_1.8.4 xts_0.10-0 ## [25] spd_2.0-1 zoo_1.8-0 ## [27] stringi_1.2.4 magrittr_1.5 ## [29] reshape2_1.4.2 rlang_0.2.2 ## [31] rmarkdown_1.7 evaluate_0.10.1 ## [33] gtable_0.2.0 colorspace_1.3-2 ## [35] yaml_2.1.14 tools_3.4.2 ## [37] mclust_5.4.1 mvtnorm_1.0-6 ## [39] truncnorm_1.0-7 ks_1.11.3 ## [41] nloptr_1.0.4 lazyeval_0.2.1 ## [43] crayon_1.3.4 backports_1.1.1 ## [45] Rcpp_1.0.0

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

- When I wrote this article initially, my advisor and a former student of his developed a test statistic that should detect early or late change points in a time series, including a change in the parameters of a GARCH model. My contribution to the paper we were writing included demonstrating that the test statistic detects structural change sooner than other test statistics when applied to real-world data. To be convincing to reviewers, our test statistic should detect a change that another statistic won’t detect until getting more data. This means that the change should be present but not so strong that both statistics immediately detect the change with miniscule -values. ↩
- The profile on LinkedIn I linked to may or may not be the correct person; I’m guessing it is based on the listed occupations and history. If I got the wrong person, I’m sorry. ↩

The *Forgotten Age* cycle of Arkham Horror is at a close and Fantasy Flight Games already announced the next cycle, *The Circle Undone*. Not only that, they’ve announced two mythos packs at a rate that… surprised me. A new cycle announcement and two mythos pack announcements in less than two months? Am I the only one who finds the new pace of announcements surprising? Perhaps that means they want to get product out at a faster pace?

Eh, enough speculation. I wrote about Arkham Horror before, analyzing Olive McBride specifically. This analysis (despite errors in the initial publication) was well received, even earning me a shoutout from my favorite Arkham-related YouTube channel.

In the announcement of the mythos pack *The Wages of Sin*, another mathematically interesting card was spoiled: Henry Wan, seen below.

Designing new allies for Arkham Horror is very hard because there can effectively only be one ally in a deck and there are many good allies already released, many of them in the core set. Henry Wan, specifically, is competing with Leo de Luca, who competes with Dr. Milan Christopher for the title of “Best Ally”. Cards like Charisma help the problem, but only if you plan on running multiple allies and are willing to pay the experience points for it.

Can Henry Wan compete with Leo de Luca? That strongly depends on how good his ability is. Actions are a precious commodity in Arkham Horror; this is why Leo de Luca is considered such a great card. Card draw and resource gain *can* help action economy, especially in a spendthrift class such as the Rogue (green) class, but it often takes many resources to compensate for a lost action.

Consider, for instance, Father Mateo’s Elder Sign ability; gain an extra action, or a card and a resource. As a point of reference, players can draw a card *or* gain a resource for one of their actions, so a raw evaluation would say that drawing a card *and* gaining a resource is actually worth two actions and thus is better than just getting a free action. But I feel most of the time people use Father Mateo’s elder sign effect to gain the additional action rather than the card and resource (though choosing the latter effect is far from rare). In fact, I think that a single action could be valued at *three* resources, based only on the fact that when a player draws Emergency Cache they will eagerly play it. When viewed from this perspective, Leo de Luca pays for himself after about three turns, and drawing him early gives an investigator a major boost in a scenario.

Henry Wan will thus live or die based on how strong his ability is. That said, “strong” depends on how well a player can use his ability, which is not a trivial task.

Make no mistake: Henry Wan is a gambler’s card (which fits the Rogue theme very well). Not only does a player gamble the resources spent on him, they gamble the action spent to trigger his ability; heck, using a deck slot on him is a gamble! A player thus will gain value from him *only* if they use him optimally.

Optimal play is not trivially determined, but fortunately Henry Wan’s ability is easy to model mathematically if you’re familiar with Markov chains. Wait, are most people *not* familiar with Markov chains? Oh, I didn’t know that. Oh well, maybe they’ll learn something from what follows. I’ll do my best to make it simple.

From here on, I consider drawing a card or gaining a resource with Henry Wan as equivalent; I’ll simply imagine that we’re trying to gain resources using his ability. Henry Wan’s ability calls on players to institute a policy for playing him of the following form:

**After X draws, take your winnings; do not draw anymore.**

Our job is thus to choose X so that we maximize the *expected* resource gain (in the probabilistic sense of expectation.)

I’m going to call utilizing Henry Wan’s ability a single “game”. Here’s how I view the game: the chaos bag is filled with tokens labeled either “S” or “F”, with every “F” being one of the icon tokens mentioned in Henry Wan’s ability. When an “S” is drawn, the game continues, while the game ends the moment an “F” is drawn. Every time we draw an additional “S”, there is one fewer “S” in the bag, and the odds of drawing an “F” increase; that said, our total winnings increase with each “S” we draw.

The game ends when either an “F” is drawn or the policy is triggered. Our winnings depend on which of these outcomes we find ourselves. If it’s the former, our winnings are 0, while if the latter, our winnings are X. Thus it’s easy to see (if you’re familiar with probability) that the expected winnings for any given policy is X times the probability of winning with the chosen policy: , if you prefer (with be the probability of not failing using the policy of ending after X draws). We thus want to pick X that maximizes .

Calculating calls for the Markov chain. Below is the chain I imagine:

- The initial state is state 0, representing zero draws. There are also states numbered 1 to X, and a state F.
- If the chain is at state , the chain moves to state with probability or to state F with probability .
- Both state X and state F are absorbing states. (Once entered, the chain does not leave the state; in other words, the "game" ends.)

The problem now is to calculate the probability the chain is absorbed into state X. The solution of ending in a particular absorbing state is well known (and given in the above link to Wikipedia).

No special trick for finding a maximizing X is necessary once we know how to solve this problem for any X; just list out all possible policies (there's only finitely many we need to worry about, and the number doesn't exceed 20 most of the time) and the expected winnings and pick the X maximizing this number.

The maximizing policy depends on what's in the chaos bag. Shocking, right? That said, this is an important point; each campaign/scenario/difficulty level has its own chaos bag, and thanks to cards with the **seal** keyword, the chaos bag can be changed *during* a scenario, perhaps to either the benefit or detriment of Henry Wan. Fortunately, the "S" and "F" language makes modelling the contents of the chaos bag so simple, we can create two-dimensional tables depending only on the number of "S's" and "F's" in the bag and those tables will cover nearly every scenario an investigator will encounter.

The script below (which can be made executable on Unix systems with R installed) can be used for generating such tables.

#!/usr/bin/Rscript ################################################################################ # ArkhamHorrorHenryWanTableGenerator.R ################################################################################ # 2018-12-02 # Curtis Miller ################################################################################ # This is a one-line description of the file. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # FUNCTIONS ################################################################################ #' Henry Wan Policy Calculator #' #' Calculates important quantities for optimal play with Henry Wan #' #' @param s The number of "S" (or "success") tokens in the bag #' @param f The number of "F" (or "failure") tokens in the bag #' @param olive If \code{TRUE}, the first draw is done with Olive McBride #' @param out If \code{"X"}, return the optimal stopping time (default); if #' \code{"EV"}, return the expected winnings of the optimal policy; #' if \code{"P"}, return the probability of success of the optimal #' policy #' @return Numeric depending on the value of the parameter \code{out} #' @examples #' wan_policy_calculator(11, 5) wan_policy_calculator <- function(s, f, olive = FALSE, out = c("X", "EV", "P")) { out <- out[[1]] policies <- (ifelse(olive, 2, 1)):s # Candidate X values policy_probs <- sapply(policies, function(X) { # Set up transition matrix of Markov chain P <- 0 * diag(X + 2) rownames(P) <- c(0:X, "F") colnames(P) <- rownames(P) P[c(X, "F"), c(X, "F")] <- diag(2) transient_states <- ifelse(X > 1, list(c("0", 1:(X - 1))), "0")[[1]] P[transient_states, "F"] <- f/(s + f - (0:(X - 1))) if (olive) { if (s + f < 3 | X == 1) { stop("X or chaos bag doesn't make sense with Olive!") } # Failure with Olive is modeled with a hypergeometric RV, with drawing one # or fewer "S's" P["0", "F"] <- phyper(1, m = s, n = f, k = 3) # The state 1 is effectively removed when Olive is used transient_states <- transient_states[-2] P <- P[-2, -2] # TODO: curtis: OLIVE IMPELENTATION -- Sun 02 Dec 2018 11:05:17 PM MST } if (X > 1) { if (olive & X == 2) { P["0", "2"] <- 1 - P["0", "F"] } else { P[transient_states, as.character((ifelse(olive, 2, 1)):X)] <- diag( c(1 - P[transient_states, "F"])) } } else { P["0", "1"] <- 1 - P["0", "F"] } # Compute absorption probability R <- P[transient_states, c(X, "F")] Q <- P[transient_states, transient_states, drop = FALSE] N <- solve(diag(nrow(Q)) - Q) B <- N %*% R B[1,1][[1]] }) X <- which.max(policy_probs * policies) if (out == "X") { policies[[X]] } else if (out == "EV") { policies[[X]] * policy_probs[[X]] } else if (out == "P") { policy_probs[[X]] } else { stop(paste("Don't know how to handle out =", out)) } } wan_policy_calculator <- Vectorize(wan_policy_calculator, c("s", "f")) ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(olive = FALSE, value = FALSE, prob = FALSE, digits = 2, lower_s = 5, upper_s = 20, lower_f = 0, upper_f = 8, help = FALSE) { # This function will be executed when the script is called from the command # line; the help parameter does nothing, but is needed for do.call() to work library(pander) sl <- lower_s su <- upper_s fl <- lower_f fu <- upper_f out <- "X" if (value) {out <- "EV"} if (prob) {out <- "P"} wan_table <- outer(sl:su, fl:fu, FUN = function(r, c) { wan_policy_calculator(r, c, olive = olive, out = out) }) rownames(wan_table) <- sl:su colnames(wan_table) <- fl:fu wan_table <- round(wan_table, digits = digits) pandoc.table(wan_table, style = "rmarkdown") } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = paste("Generates tables describing optimal policies", "for playing with the card Henry Wan in", "Arkham Horror: The Card Game (number of icon", "tokens in bag are columns; non-icon rows)."), option_list = list( make_option(c("--olive", "-o"), action = "store_true", default = FALSE, help = "The first draw is done with Olive"), make_option(c("--value", "-v"), action = "store_true", default = FALSE, help = paste("Report expected value rather than", "optimal stopping policy")), make_option(c("--prob", "-p"), action = "store_true", default = FALSE, help = paste("Report success probability of optimal", "stopping policy rather than the", "optimal stopping policy itself")), make_option(c("--digits", "-d"), type = "integer", default = 2, help = "Number of digits for rounding"), make_option(c("--lower-s", "-s"), type = "integer", default = 5, help = "Lowest considered number of non-icon tokens"), make_option(c("--upper-s", "-w"), type = "integer", default = 20, help = "Highest considered number of non-icon tokens"), make_option(c("--lower-f", "-f"), type = "integer", default = 0, help = "Lowest considered number of icon tokens"), make_option(c("--upper-f", "-r"), type = "integer", default = 8, help = "Highest number of icon tokens") ))) cl_args <- cl_args[c("olive", "value", "prob", "digits", "lower-s", "upper-s", "lower-f", "upper-f", "help")] names(cl_args) <- c("olive", "value", "prob", "digits", "lower_s", "upper_s", "lower_f", "upper_f", "help") do.call(main, cl_args) }

With the above script I can make the following three tables. The columns represent the number of (bad) icon tokens in the bag, while rows represent the number of other tokens in the bag. The first table is the optimal stopping policy; the second, the probability of success of the optimal stopping policy; and the third, the expected winnings of the optimal policy (which is the product of the previous two tables).

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |

6 |
6 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |

7 |
7 | 4 | 3 | 2 | 2 | 2 | 1 | 1 | 1 |

8 |
8 | 4 | 3 | 3 | 2 | 2 | 2 | 1 | 1 |

9 |
9 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 1 |

10 |
10 | 5 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

11 |
11 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

12 |
12 | 6 | 4 | 4 | 3 | 3 | 2 | 2 | 2 |

13 |
13 | 7 | 5 | 4 | 3 | 3 | 2 | 2 | 2 |

14 |
14 | 7 | 5 | 4 | 3 | 3 | 3 | 2 | 2 |

15 |
15 | 8 | 6 | 4 | 3 | 3 | 3 | 2 | 2 |

16 |
16 | 8 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

17 |
17 | 9 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

18 |
18 | 10 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

19 |
19 | 10 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

20 |
20 | 11 | 7 | 5 | 5 | 4 | 3 | 3 | 3 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
1 | 0.5 | 0.48 | 0.36 | 0.56 | 0.5 | 0.45 | 0.42 | 0.38 |

6 |
1 | 0.57 | 0.54 | 0.42 | 0.33 | 0.55 | 0.5 | 0.46 | 0.43 |

7 |
1 | 0.5 | 0.42 | 0.47 | 0.38 | 0.32 | 0.54 | 0.5 | 0.47 |

8 |
1 | 0.56 | 0.47 | 0.34 | 0.42 | 0.36 | 0.31 | 0.53 | 0.5 |

9 |
1 | 0.5 | 0.51 | 0.38 | 0.46 | 0.4 | 0.34 | 0.3 | 0.53 |

10 |
1 | 0.55 | 0.42 | 0.42 | 0.33 | 0.43 | 0.38 | 0.33 | 0.29 |

11 |
1 | 0.5 | 0.46 | 0.45 | 0.36 | 0.46 | 0.4 | 0.36 | 0.32 |

12 |
1 | 0.54 | 0.49 | 0.36 | 0.39 | 0.32 | 0.43 | 0.39 | 0.35 |

13 |
1 | 0.5 | 0.43 | 0.39 | 0.42 | 0.35 | 0.46 | 0.41 | 0.37 |

14 |
1 | 0.53 | 0.46 | 0.42 | 0.45 | 0.38 | 0.32 | 0.43 | 0.39 |

15 |
1 | 0.5 | 0.4 | 0.45 | 0.47 | 0.4 | 0.34 | 0.45 | 0.42 |

16 |
1 | 0.53 | 0.43 | 0.38 | 0.38 | 0.42 | 0.36 | 0.32 | 0.43 |

17 |
1 | 0.5 | 0.46 | 0.4 | 0.4 | 0.44 | 0.38 | 0.34 | 0.45 |

18 |
1 | 0.47 | 0.48 | 0.42 | 0.42 | 0.46 | 0.4 | 0.35 | 0.47 |

19 |
1 | 0.5 | 0.43 | 0.44 | 0.44 | 0.36 | 0.42 | 0.37 | 0.33 |

20 |
1 | 0.48 | 0.45 | 0.46 | 0.36 | 0.38 | 0.44 | 0.39 | 0.35 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 1.5 | 0.95 | 0.71 | 0.56 | 0.5 | 0.45 | 0.42 | 0.38 |

6 |
6 | 1.71 | 1.07 | 0.83 | 0.67 | 0.55 | 0.5 | 0.46 | 0.43 |

7 |
7 | 2 | 1.25 | 0.93 | 0.76 | 0.64 | 0.54 | 0.5 | 0.47 |

8 |
8 | 2.22 | 1.4 | 1.02 | 0.85 | 0.72 | 0.62 | 0.53 | 0.5 |

9 |
9 | 2.5 | 1.53 | 1.15 | 0.92 | 0.79 | 0.69 | 0.6 | 0.53 |

10 |
10 | 2.73 | 1.7 | 1.26 | 0.99 | 0.86 | 0.75 | 0.66 | 0.59 |

11 |
11 | 3 | 1.85 | 1.36 | 1.09 | 0.92 | 0.81 | 0.72 | 0.64 |

12 |
12 | 3.23 | 1.98 | 1.45 | 1.18 | 0.97 | 0.86 | 0.77 | 0.69 |

13 |
13 | 3.5 | 2.14 | 1.57 | 1.26 | 1.05 | 0.91 | 0.82 | 0.74 |

14 |
14 | 3.73 | 2.29 | 1.68 | 1.34 | 1.13 | 0.96 | 0.87 | 0.79 |

15 |
15 | 4 | 2.43 | 1.78 | 1.41 | 1.2 | 1.03 | 0.91 | 0.83 |

16 |
16 | 4.24 | 2.59 | 1.88 | 1.5 | 1.26 | 1.09 | 0.95 | 0.87 |

17 |
17 | 4.5 | 2.74 | 2 | 1.59 | 1.32 | 1.15 | 1.01 | 0.91 |

18 |
18 | 4.74 | 2.87 | 2.11 | 1.67 | 1.38 | 1.21 | 1.06 | 0.94 |

19 |
19 | 5 | 3.03 | 2.21 | 1.75 | 1.46 | 1.26 | 1.12 | 0.99 |

20 |
20 | 5.24 | 3.18 | 2.3 | 1.82 | 1.53 | 1.32 | 1.17 | 1.04 |

I view column 6, row 11 as the "typical" scenario, and the conclusion is this: *you'd be better off just grabbing a resource/drawing a card the usual way than by using Henry Wan!* Not only is Henry Wan worse than Leo de Luca, *he's worse than gaining resources with a regular action!*

Granted, there are cards with the *seal* keyword that can help improve the odds. But one must ask whether the opportunity cost of playing those cards is worth it. Perhaps the benefits of a favorable chaos bag for skill tests plus better Henry Wan games would give the investigators a *teeny tiny* edge… after a hell of a lot of work and lucky draw. That said, I'm sure there's much easier ways to play the game that are also more fun.

When Henry Wan was announced, people considered pairing him up with Olive McBride, who's ability works "when you would reveal a chaos token". Any investigator that can take both Mystic (purple) and Rogue (green) cards (including Sefina Rousseau and all Dunwich investigators; I don't count Lola Hayes since, while she can include both cards in her deck, using them together may not be possible) can include these two cards in the same deck.

I'll always assume that Olive's ability is utilized on the first draw. When using Olive with Henry, one can get two tokens drawn without either of them being a bad icon that ends the "game". Thus Olive boosts the success rate and the ultimate payout.

Having Olive and Henry out at the same time is extremely difficult; first, you'd have to have charismas to accomodate them, then draw them both in a game at reasonable times. The likelihood of getting the combo out is low and comes with significant opportunity costs.

That said, when Olive is out, she provides Henry enough of a boost to make him playable. The following tables account for Olive's effect (see the code for how) on the first draw but otherwise match up with the earlier tables.

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

6 |
6 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

7 |
7 | 4 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |

8 |
8 | 4 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |

9 |
9 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |

10 |
10 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

11 |
11 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

12 |
12 | 7 | 4 | 3 | 3 | 3 | 2 | 2 | 2 |

13 |
13 | 7 | 5 | 4 | 3 | 3 | 2 | 2 | 2 |

14 |
14 | 7 | 5 | 4 | 3 | 3 | 3 | 2 | 2 |

15 |
15 | 8 | 6 | 4 | 3 | 3 | 3 | 2 | 2 |

16 |
16 | 8 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

17 |
17 | 9 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

18 |
18 | 10 | 6 | 5 | 4 | 3 | 3 | 3 | 3 |

19 |
19 | 10 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

20 |
20 | 11 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
1 | 0.75 | 0.86 | 0.71 | 0.6 | 0.5 | 0.42 | 0.36 | 0.31 |

6 |
1 | 0.8 | 0.89 | 0.77 | 0.67 | 0.58 | 0.5 | 0.44 | 0.38 |

7 |
1 | 0.67 | 0.65 | 0.82 | 0.72 | 0.64 | 0.56 | 0.5 | 0.45 |

8 |
1 | 0.71 | 0.7 | 0.85 | 0.76 | 0.69 | 0.62 | 0.55 | 0.5 |

9 |
1 | 0.62 | 0.74 | 0.61 | 0.8 | 0.73 | 0.66 | 0.6 | 0.55 |

10 |
1 | 0.56 | 0.59 | 0.65 | 0.55 | 0.76 | 0.7 | 0.64 | 0.59 |

11 |
1 | 0.6 | 0.63 | 0.68 | 0.59 | 0.79 | 0.73 | 0.67 | 0.62 |

12 |
1 | 0.55 | 0.66 | 0.71 | 0.62 | 0.54 | 0.75 | 0.7 | 0.66 |

13 |
1 | 0.58 | 0.56 | 0.56 | 0.64 | 0.57 | 0.78 | 0.73 | 0.68 |

14 |
1 | 0.62 | 0.59 | 0.59 | 0.67 | 0.6 | 0.53 | 0.75 | 0.71 |

15 |
1 | 0.57 | 0.51 | 0.61 | 0.69 | 0.62 | 0.56 | 0.77 | 0.73 |

16 |
1 | 0.6 | 0.54 | 0.51 | 0.54 | 0.64 | 0.58 | 0.53 | 0.75 |

17 |
1 | 0.56 | 0.56 | 0.53 | 0.57 | 0.66 | 0.6 | 0.55 | 0.77 |

18 |
1 | 0.53 | 0.59 | 0.55 | 0.59 | 0.68 | 0.62 | 0.57 | 0.52 |

19 |
1 | 0.56 | 0.52 | 0.57 | 0.6 | 0.53 | 0.64 | 0.59 | 0.54 |

20 |
1 | 0.53 | 0.55 | 0.59 | 0.62 | 0.55 | 0.66 | 0.61 | 0.56 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 2.25 | 1.71 | 1.43 | 1.19 | 1 | 0.85 | 0.73 | 0.63 |

6 |
6 | 2.4 | 1.79 | 1.55 | 1.33 | 1.15 | 1 | 0.87 | 0.77 |

7 |
7 | 2.67 | 1.96 | 1.63 | 1.44 | 1.27 | 1.13 | 1 | 0.89 |

8 |
8 | 2.86 | 2.1 | 1.7 | 1.53 | 1.37 | 1.23 | 1.11 | 1 |

9 |
9 | 3.12 | 2.21 | 1.83 | 1.59 | 1.45 | 1.32 | 1.2 | 1.09 |

10 |
10 | 3.33 | 2.38 | 1.95 | 1.65 | 1.52 | 1.39 | 1.28 | 1.18 |

11 |
11 | 3.6 | 2.52 | 2.04 | 1.76 | 1.57 | 1.46 | 1.35 | 1.25 |

12 |
12 | 3.82 | 2.64 | 2.12 | 1.85 | 1.62 | 1.51 | 1.41 | 1.31 |

13 |
13 | 4.08 | 2.8 | 2.24 | 1.93 | 1.71 | 1.56 | 1.46 | 1.37 |

14 |
14 | 4.31 | 2.95 | 2.36 | 2.01 | 1.79 | 1.6 | 1.51 | 1.42 |

15 |
15 | 4.57 | 3.07 | 2.45 | 2.07 | 1.86 | 1.67 | 1.55 | 1.46 |

16 |
16 | 4.8 | 3.24 | 2.54 | 2.17 | 1.93 | 1.75 | 1.58 | 1.5 |

17 |
17 | 5.06 | 3.38 | 2.66 | 2.26 | 1.99 | 1.81 | 1.65 | 1.54 |

18 |
18 | 5.29 | 3.51 | 2.77 | 2.34 | 2.04 | 1.87 | 1.71 | 1.57 |

19 |
19 | 5.56 | 3.67 | 2.87 | 2.42 | 2.12 | 1.92 | 1.77 | 1.63 |

20 |
20 | 5.79 | 3.82 | 2.96 | 2.49 | 2.2 | 1.97 | 1.82 | 1.69 |

Notice that when Olive is being used it's optimal to use Olive to get two tokens out (that you can pick) then end the "game". It seems that Olive does make Henry's ability profitable… albeit mildly. If we're using the heuristic that a card needs to pay for its own resource cost plus three for each action involved, I'd say that the combo would need at least eight turns to be profitable in a typical game… which is terrible.

Henry Wan is an expensive way to attempt to milk a little more value from Olive. Even with Olive I don’t think he’s worth the trouble.

It's equally true in Arkham as it is in real life: gambling is better for the house than the gambler (with the house being the forces of the mythos, in this case). If you're looking to have fun gambling, Henry Wan is your card. If you're looking to win… look elsewhere.

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>These past few weeks I’ve been writing about a new package I created, **MCHT**. Those blog posts were basically tutorials demonstrating how to use the package. (Read the first in the series here.) I’m done for now explaining the technical details of the package. Now I’m going to use the package for purpose I initially had: exploring the distribution of time separating U.S. economic recessions.

I wrote about this before. I suggested that the distribution of times between recessions can be modeled with a Weibull distribution, and based on this, a recession was likely to occur prior to the 2020 presidential election.

This claim raised eyebrows, and I want to respond to some of the comments made. Now, I would not be surprised to find this post the subject of an R1 on r/badeconomics, and I hope that no future potential employer finds this (or my previous) post, reads it, and then decides I’m an idiot and denies me a job. I don’t know enough to dogmatically subscribe to the idea but I do want to explore it. Blog posts are not journal articles, and I think this is a good space for me to make arguments that could be wrong and then see how others more intelligent than myself respond. The act of keeping a blog is good for me and my learning (which never ends).

My previous post on the distribution of times between recessions was… controversial. Have a look at the comments section of the original article and the comments of this reddit thread. Here is my summarization of some of the responses:

- There was no statistical test for the goodness-of-fit of the Weibull distribution.
- No data generating process (DGP) was proposed, in the sense that there’s no explanation for
*why*the Weibull distribution would be appropriate, or the economic processes that produce memory in the distribution of times between recessions. - Isn’t it strange to suggest that other economic variables are irrelevant to when a recession occurs? That seems counterintuitive.
- MAGA! (actually there were no MAGAs, thankfully)

Then there was this comment, by far the harshest one, by u/must_not_forget_pwd:

The idea that recessions are dependent on time is genuinely laughable. It is an idea that seems to be getting some traction in the chattering classes, who seem more interested in spewing forth political rantings rather than even the semblance of serious analysis. This also explains why no serious economist talks about the time and recession relationship.

The lack of substance behind this time and recession idea is revealed by asking some very basic questions and having a grasp of some basic data. If recessions were so predictable, wouldn’t recessions be easy to prevent? Monetary and fiscal policies could be easily manipulated so as to engineer a persistent boom.

Also, if investors could correctly predict the state of the economy it would be far easier for them to determine when to invest and to capture the subsequent boom. That is, invest in the recession, when goods and services are cheaper and have the project come on stream during the following boom and make a massive profit. If enough investors acted like this, there would be no recession to begin with due to the increase in investment.

Finally, have a look at the growth of other countries. Australia hasn’t had two consecutive quarters of negative growth since the 1990-91 recession. Sure there have been hiccups along the way for Australia, such as the Asian Financial Crisis, the introduction of the GST, a US recession in the early 2000s, and more recently the Global Financial Crisis. Yet, Australia has managed to persist without a recession despite the passage of time. No one in Australia would take you seriously if you said that recessions were time dependent.

If these “chattering classes” were interested in even half serious analysis of the US economy, while still wanting to paint a bleak picture, they could very easily look at what is going on right now. Most economists have the US economy growing above trend. This can be seen in the low unemployment rate and that inflation is starting to pickup. Sure wages growth is subdued, but wages growth should be looking to pickup anytime now.

However, during this period the US government is injecting a large amount of fiscal stimulus into the US economy through tax cuts. Pumping large amounts of cash into the economy during a boom isn’t exactly a good thing to do and is a great way to overheat the economy and bring about higher inflation. This higher inflation would then cause the US Federal Reserve to react by increasing interest rates. This in turn could spark a US recession.

Instead of this very simple and defensible story that requires a little bit of homework, we get subjected to this nonsense that recessions are linked to time. I think it’s time that people call out as nonsense the “analysis” that this blog post has.

TL;DR: The idea that recessions are dependent on time is dumb, and if recessions were so easy to predict would mean that recessions wouldn’t exist. This doesn’t mean that a US recession couldn’t happen within the next few years, because it is easy to see how one could occur.

I think that the tone of this message could have been… nicer. That said, I generally welcome direct, harsh criticism, as I often learn a lot from it, or at least am given a lot to think about.

So let’s discuss these comments.

First, a statistical test for the goodness of fit of the Weibull distribution. I personally was satisfied looking at the plots I made, but some people want a statistical test. The test that comes to mind is the Kolmogorov-Smirnov test, and R does support the simplest version of this test via `ks.test()`

, but when you don’t know all of the parameters of the distribution assumed under the null hypothesis, then you cannot use `ks.test()`

. This is because the test was derived assuming there were no unknown parameters; when nuisance parameters are present and need to be estimated, then the distribution used to compute -values is no longer appropriate.

Good news, though; **MCHT** allows us to do the test properly! First, let’s get set up.

library(MCHT) library(doParallel) library(fitdistrplus) recessions <- c( 4+ 2/12, 6+ 8/12, 3+ 1/12, 3+ 9/12, 3+ 3/12, 2+ 0/12, 8+10/12, 3+ 0/12, 4+10/12, 1+ 0/12, 7+ 8/12, 10+ 0/12, 6+ 1/12) registerDoParallel(detectCores())

I already demonstrated how to perform a bootstrap version of the Kolmogorov-Smirnov test in one of my blog posts about **MCHT**, and the code below is basically a direct copy of that code. While the test is not exact, it should be asymptotically appropriate.

ts <- function(x) { param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] ks.test(x, pweibull, shape = shape, scale = scale, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] rweibull(n, shape = shape, scale = scale) } b.wei.ks.test <- MCHTest(test_stat = ts, stat_gen = ts, rand_gen = rg, seed = 123, N = 1000, method = paste("Goodness-of-Fit Test for Weibull", "Distribution")) b.wei.ks.test(recessions)

## ## Goodness-of-Fit Test for Weibull Distribution ## ## data: recessions ## S = 0.11318, p-value = 0.94

The test does not reject the null hypothesis; there isn’t evidence that the data is not following a Weibull distribution (according to that test; read on).

Compare this to the Kolmogorov-Smirnov test checking whether the data follows the exponential distribution.

ts <- function(x) { mu <- mean(x) ks.test(x, pexp, rate = 1/mu, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) mu <- mean(x) rexp(n, rate = 1/mu) } b.ks.exp.test <- MCHTest(ts, ts, rg, seed = 123, N = 1000, method = paste("Goodness-of-Fit Test for Exponential", "Distribution")) b.ks.exp.test(recessions)

## ## Goodness-of-Fit Test for Exponential Distribution ## ## data: recessions ## S = 0.30074, p-value = 0.023

Here, the null hypothesis is rejected; there is evidence that the data wasn’t drawn from an exponential distribution.

What do the above two results signify? If we assume that the time between recessions is independent and identically distributed, then there is not evidence against the Weibull distribution, but there is evidence against the exponential distribution. (The exponential distribution is actually a special case of the Weibull distribution, so the second test effectively rules out that special case.) The exponential distribution has the *memoryless* property; if we say that the time between events follows an exponential distribution, then knowing that it’s been minutes since the last event occurs tells us *nothing* about when the next event occurs. The Weibull distribution, however, has *memory* when the shape parameter is not 1. That is, knowing how long it’s been since the last event occured does change how likely the event is to occur in the near future. (For the parameter estimates I found, a recession seems to become more likely the longer it’s been since the last one.)

We will revisit the goodness of fit later, though.

I do have some personal beliefs about what causes recessions to occur that would lead me to think that the time between recessions does exhibit some form of memory and would also address the point raised by u/must_not_forget_pwd about Australia not having had a recession in decades. This perspective is primarily shaped by two books, [1] and [2].

In short, I agree with the aforementioned reddit user; recessions are not inevitable. The stability of an economy is a characteristic of that economy and some economies are more stable than others. [1] notes that the Canadian economy had a dearth of banking crises in the 19th and 20th centuries, with the most recent one effectively due to the 2008 crisis in the United States. Often the stability of the financial sector (and probably the economy as a whole) is strongly related to the political coalition responsible for drafting the *de facto* rules that the financial system follows. In some cases the financial sector is politically weak and continuously plundered by the government. Sometimes it’s politically weak and allowed to exist unmolested by the government but is well whipped. Financiers are allowed to make money and the government repays its debts but if the financial sector steps out of line and takes on too much risk it will be punished. And then there’s the situation where the financial sector is politically powerful and able to get away with bad behavior, perhaps even being rewarded for that behavior by government bailouts. That’s the financial system the United States has.

So let’s consider the latter case, where the financial sector is politically powerful. This is where the Minsky narrative (see [2]) takes hold. He describes a boom-and-bust cycle, but critically, the cause of the bust was built into the boom. After a bust, many in the financial sector “learn their lesson” and become more conservative risk-takers. In this regime the economy recovers and some growth resumes. Over time, the financial sector “forgets” the lessons it learned from the previous bust and begins to take greater risks. Eventually these risks become so great that a greater systematic risk appears and the financial sector, as a whole, stands on shaky ground. Something goes wrong (like the bottom falls out of the housing market or the Russian government defaults), the bets taken by the financial sector go the wrong way, and a crisis ensues. The extra wrinkle in the American financial system is that the financial sector not only isn’t punished for the risks they’ve taken, they get rewarded with a bailout financed by taxpayers and the executives who made those decisions get golden parachutes (although there may be a trivial fine).

If the Minsky narrative is correct, then economic booms do die of “old age”, as eventually the boom is driven by increasingly risky behavior that eventually leads to collapse. When the government is essentially encouraging this behavior with blank-check guarantees, the risks taken grow (risky contracts become lotto tickets paid for by someone else when you lose, but you get all the winnings). Taken together, one can see why there could be some form of memory in the time between recessions. Busts are an essential feature of such an economy.

So what about the Australian economy, as u/must_not_forget_pwd brought up? In short, I think the Australian economy is prototyped by the Canadian economy as described in [1] and thus doesn’t follow the rules driving the boom/bust cycle in America. I think the Australian economy is the Australian economy and the American economy is the American economy. One is stable, the other is not. I’m studying the unstable one, not trying the explain the stability of the other.

First, does time matter to when a recession occurs? The short answer is “Yes, duh!” If you’re going to have any meaningful discussion about when a recession will occur you have to account for the time frame you’re considering. A recession within the next 30 years is much more likely than a recession in the next couple months (if only because one case covers the other, but in general a recession should be more likely to occur within a longer period of time than a shorter one).

But I think the question about “does time matter” is more a question about whether an economy essentially remembers how long it has been since the last recession or not. That’s both an economic and statistical question.

What about other variables? Am I saying that other variables don’t matter when I use only time to predict when the next recession occurs? No, that’s not what I’m saying.

Let’s consider regression equations, often of the form

I think economists are used to thinking about equations like this as essentially causal statements, but that’s not what a regression equation is, and when we estimate a regression equation we are not automatically estimating a function that needs to be interpreted causally. If a regression equation tells us something about causality, that’s great, but that’s not what they do.

Granted, economics students are continuously being reminded the correlation is not causation, but I think many then start to think that we should not compute a regression equation unless the relationship expressed can be interpreted causally. However, knowing that two variables are correlated, and how they are correlated, is often useful.

When we compute a regression function from data, we are computing a function that estimates *conditional expectations*. This function, when given the value of one variable, tells us what value we can expect for the other variable. That relationship may or may not be due to causality, but the fact that the two variables are not independent of each other can be, in and of itself, a useful fact.

My favorite example in the “correlation is not causation” discussion (probably mentioned first in some econometrics textbook or my econometrics professor) is the relationship between the damage caused by a fire and the number of firefighters at the scene of the fire. Let’s just suppose that we have some data, is the amount of damage in a fire (in thousands of dollars), is the number of firefighters, and we estimated the relationship

There is a positive relationship between the number of firefighters at the scene of the fire and the damage done by the fire. Does this mean that firefighters make fires worse? No, it does not. But if you’re a spectator and you see ten firefighters running the scene of a fire, can you expect the fire to be more damaging than fires where there are five firefighters and not as damaging as fires with fifteen firefighters? Sure, this is reasonable. Not only that, it’s a useful fact to know.

Importantly, when we choose the variables to include in a regression equation, we are deciding what variables we want to use for conditioning. That choice could be motivated by a causal model (because we care about causality), or by model fit (making the smallest error in our predictions while being sufficiently simple), or simply by what’s available. Some models may do better than others at predicting a variable but they all do the same thing: compute conditional expectations.

My point is this: when I use time as the only variable of interest when attempting to predict when a recession occurs, I’m essentially making a prediction based on a model that conditions only on time and nothing else. That’s not the same thing as saying that excluded variables don’t matter. Rather, a variable excluded in the model is effectively treated as being a part of the random soup that generated the data I observe. I’m not conditioning on its values to make predictions. Could my prediction be refined by including that information? Perhaps. But that doesn’t make the prediction automatically useless. In fact, I think we should *start* with predictions that condition on little to see if conditioning on more variables adds any useful information, generally preferring the simple to the complex given equal predictive value. This is essentially what most -tests automatically reported with statistical software do; they check if the regression model involving possibly multiple parameters does any better than one that only uses the mean of the data to predict values.

I never looked at a model that uses more information than just time, though. I wouldn’t be shocked if using more variables would lead to a better model. But I don’t have that data, and to be completely honest, I don’t want to spend the time to try and get a “great” prediction for when the next recession will occur. My numbers are essentially a back-of-the-envelope calculation. It could be improved, but just because there’s (perhaps significant) room for improvement doesn’t render the calculation useless, and I think I may have evidence that shows the calculation has some merit.

The reddit user had a long discussion about how well the economy would function if predicting the time between recessions only depended on time, that the Federal Reserve would head off every recession and investors would be adjusting their behavior in ways that render the calculation useless. My response is this: I’m not a member of the Fed. I have no investments. My opinion doesn’t matter to the economy. Thus, it’s okay for me to treat the decisions of the Fed, politicians, bank presidents, other investors, and so forth, as part of that random soup producing the economy I’m experiencing, because my opinions do not invalidate the assumptions of the calculation.

There is a sense in which statistics are produced with an audience in mind. I remember Nate Silver making this point in a podcast (don’t ask me which) when discussing former FBI director James Comey’s decision almost days before the 2016 presidential election to announce a reopening of an investigation into Hillary Clinton’s e-mails, which was apparently at least partially driven by the belief that Clinton was very likely to win. Silver said that Comey did not account for the fact that he was a key actor in the process he was trying to predict and that his decisions could change the likelihood of Clinton winning. He invalidated the numbers with his decision based on them. He was not the target audience of the numbers Nate Silver was producing.

I think a similar argument can be made here. If my decisions and beliefs mattered to the economy, then I should account for them in predictions, conditioning on them. But they don’t matter, so I’ve invalidated nothing, and the people who do matter likely are (or should be) reaching conclusions in a much more sophisticated way.

I’m a statistician. Statistics is my hammer. Everything looks like a nail to me. You know why? Because hammering nails is fun.

When I read u/must_not_forget_pwd’s critique, I tried to formulate it in a mathematical way, because that’s what I do. Here’s my best way to describe it in mathematical terms:

- The time between recessions are all independent of one another.
- Each period of growth follows its own distribution, with its own unique parameters.
- The time separating recessions is memoryless. Knowing how long it has been since the last recession tells us nothing about how much longer we have till the next recession.

I wanted a model that one might call “maximum unpredictability”. So if are the times separating recessions, then points 1, 2, and 3 together say that are independent random variables and , and there’s no known relationship between . If this is true, we have no idea when the next recession will occur because there’s no pattern we can extract.

My claim is essentially that , with and there’s only one . If I were to then attempt to formulate these as statistical hypotheses, those hypotheses would be:

Is it possible to decide between these two hypotheses? They’re not nested and it’s not really possible to use the generalized likelihood ratio test because the parameter space that includes both and is too big (you’d have to estimate parameters using data points). That said, they both suggest likelihood functions that, individually, can be maximized, and you might consider using the ratio between these two maximized functions as a test statistic. (Well, actually, the negative log likelihood ratio, which I won’t write down in math or try to explain unless asked, but you can see the end result in the code below in the definition of `ts()`

.)

Could that statistic be used to decide between the two hypotheses? I tried searching through literature (in particular, see [3]) and my conclusion is… *maybe?* To be completely honest, by this point we’ve left the realm of conventional statistics and are now turning into mad scientists, because not only are the hypotheses we’re testing and the statistic we’re using to decide between them just *wacky*, how the hell are we supposed to know the distribution of this test statistic under the null hypothesis when there are *two* nuisance parameters that likely aren’t going anywhere? Oh, and while we’re at it, the sample size of the data set of interest is really small, so don’t even *think* about using asymptotic reasoning!

I think you can see how this descent into madness would end up with me discovering the maximized Monte Carlo test (see [4]) and then writing **MCHT** to implement it. I’ll try anyting once, so the product of all that sweat and labor is below.

ts <- function(x) { n <- length(x) params <- coef(fitdist(x, "weibull")) k <- params[["shape"]] l <- params[["scale"]] (n * k - n + 1) * log(l) - log(k) + sum(l * (-k) * x^k - k * log(x)) - n } mcsg <- function(x, shape = 2, scale = 1) { x <- qweibull(x, shape = shape, scale = scale) test_stat(x) } brg <- function(x) { n <- length(x) params <- coef(fitdist(x, "weibull")) k <- params[["shape"]] l <- params[["scale"]] rweibull(n, shape = k, scale = l) } mc.mem.test <- MCHTest(ts, mcsg, seed = 123, nuisance_params = c("shape", "scale"), N = 1000, optim_control = list("lower" = c("shape" = 0, "scale" = 0), "upper" = c("shape" = 100, "scale" = 100), "control" = list("max.time" = 60)), threshold_pval = 0.2, localize_functions = TRUE, method = "MMC Test for IID With Memory") b.mem.test <- MCHTest(ts, ts, brg, seed = 123, N = 1000, method = "Bootstrap Test for IID With Memory") b.mem.test(recessions)

## ## Bootstrap Test for IID With Memory ## ## data: recessions ## S = -4601.9, p-value = 0.391

mc.mem.test(recessions)

## Warning in mc.mem.test(recessions): Computed p-value is greater than ## threshold value (0.2); the optimization algorithm may have terminated early

## ## MMC Test for IID With Memory ## ## data: recessions ## S = -4601.9, p-value = 0.962

Both tests failed to reject the null hypothesis. Unfortunately that doesn’t seem to say much. First, it doesn’t show the null hypothesis isn’t correct; it’s just not *obviously* incorrect. This is always the case, but the bizarre test I’m implementing here is severely underpowered perhaps to the point of being useless. The alternative hypothesis (which I assigned to my “opponent”) is severely disadvantaged.

The conclusion of the above results isn’t in fact that I’m right. Given the severe lack of power of the test, I would say that the results of the test above are essentially inconclusive.

I’m going to be straight with you: if you read this whole article, I probably wasted your time, and for that I am truly sorry.

I suppose you got to enjoy some stream-of-consciousness thoughts about a controversial blog post I wrote where I made a defense that may or may not be convincing, then watched as I developed a strange statistical test that probably didn’t even work to settle a debate with some random guy on reddit, saying he claimed something that honestly he would likely deny and end that imaginary argument inconclusively.

But hey, at least I satisfied my curiosity. And I’m pretty proud of **MCHT**, which I created to help me write this blog post. Maybe if I hadn’t spent three straight days writing nothing but blog posts, this one would have been better, but the others seemed pretty good. So something good came out of this trip… right?

Maybe I can end like this: do I still think that a recession before the 2020 election is likely? Yes. Do I think that a Weibull describes the time between recessions decently? Conditioning on nothing else, I think so. I still think that my previous work has some merit as a decent back-of-the-envelope calculation. Do I think that the time between recessions has a memory? In short, yes. And while we’re on the topic, I’m not the Fed, so my opinions don’t matter.

All that said, though, smarter people than me may have different opinions and their contributions to this discussion are probably more valuable than mine. For instance, the people at Goldman Sachs believe a recession soon is unlikely; but the people at J.P. Morgan Chase believe a recession could strike in 2020. I’m certainly persuadable on the above points, and as I’ve said before, I think the simple analysis could enhance the narrative advanced by better predictions.

Now that I’ve written this post, we will return to our regular scheduled programming. Thanks for reading! (Please don’t judge me.)

- C. Calomiris and S. Haber,
*Fragile by design: the political origins of banking crises and scarce credit*(2014), Princeton University Press, Princeton - H. P. Minsky,
*Stabilizing an unstable economy*(1986), Yale University Press, New Haven - D. R. Cox,
*Tests of separate families of hypotheses*, Proc. Fourth Berkeley Symp. on Math. Stat. and Prob., vol. 1 (1961) pp. 105-123 - J-M Dufour,
*Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics*, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Over the past few weeks I’ve published articles about my new package, **MCHT**, starting with an introduction, a further technical discussion, demonstrating maximized Monte Carlo (MMC) hypothesis testing, bootstrap hypothesis testing, and last week I showed how to handle multi-sample and multivariate data. This is the final article where I explain the capabilities of the package. I show how **MCHT** can handle time series data.

I should mention that I’m not focused on the merits of the procedures I use as examples in these posts, and that’s going to be the case here. It’s possible (perhaps even likely) that there’s a better way to decide between the hypotheses than what I show here. In these articles, I’m more interested in showing what *can* be done rather than what *should* be done. In particular, I like simple examples that many can understand, even if they may not be the best tool for the task at hand.

So far I don’t think this has been a serious issue; that is, I don’t think the procedures I’ve shown so far could be considered controversial (I think the most controversial would be the permutation test example). But the example I want to use here could be argued with; I personally would not use it. That said, I’m still willing to demonstrate it because it doesn’t take much to understand what’s going on and it does demonstrate how time series data can be handled.

Suppose we want to perform a test for the location of the mean, and thus decide between the hypotheses

There is the usual -statistic, which is , and as mentioned before the statistic assumes that the data came from a Normal distribution. That’s not all the test assumes, though. It also assumes that the data is independent and identically distributed.

In cross-sectional contexts this is fine, but it’s not okay when the data could depend on time and thus is not independent and identically distributed. Suppose instead that our data was generated according to a first-order autoregressive process (AR(1)), described below:

In this context, assume and is independent and identically distribution. It’s no longer given that the conventional -test will work as marketed since the data is no longer independent or identically distributed. Additionally, we have two nuisance parameters, and , that need to be accounted for.

We will view and as nuisance parameters and use MMC testing to handle them. That leaves the question of how to simulate an AR(1) process. With **MCHT**, if you can simulate a process, you can test with it.

The time series model above has a stationary solution when and when ranges between and . It's not possible to simulate a series of infinite length but one can get close by simulating a series that is very long. In particular, one can simulate, say, 500 terms of the series starting at a fixed number, then the actual number of terms of the series wanted, then throw away the first 500 terms. This is known as burn-in and it's very common practice in time series simulation.

Fortunately `MCHTest()`

allows for burn-in. Suppose that the sample size of the actual dataset is and we've decided that we want a burn-in period of . Then we can do the following:

- Generate random numbers to represent (except possibly for the scaling factor, as we're treating that as a nuisance parameter).
- Apply the recursive formula described above to the series after scaling the series by and using a chosen , and add to it.
- Keep only the last terms of the series; throw away the rest. This is your simulated dataset.
- After having obtained the simulated dataset, proceed with the Monte Carlo test as usual.

With MMC, the unscaled series is fixed after we generate it and we use optimization to adversarially choose and so that we maximize the -value of the test.

When using `MCHTest()`

, the `rand_gen`

function does not need to produce a dataset of the same length as the original dataset; this allows for burning it. However, if you're going to do this, then the `stat_gen`

function needs to know what the sample size of the dataset is, but all you need to do is give the `stat_gen`

function the parameter `n`

; this will be given the sample size of the original dataset. And of course the `test_stat`

function won't care whether the data came from a time series or not.

Putting this all together, we create the following test.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x, mu = 0) { sqrt(length(x)) * (mean(x) - mu)/sd(x) } rg <- function(n) { rnorm(n + 500) # Extra terms for a burn-in period } sg <- function(x, n, mu = 0, rho = 0, sigma = 1) { x <- sigma * x if (abs(rho) >= 1) {stop("Bad rho given!")} eps <- filter(x, rho, "recursive") # Apply the recursion eps <- eps[-(1:500)] # Throw away first 500 observations; they're burn-in dat <- eps + mu test_stat(dat, mu = mu) # Will be localizing } mc.ar1.t.test <- MCHTest(ts, sg, rg, N = 1000, seed = 123, test_params = "mu", nuisance_params = c("rho", "sigma"), optim_control = list(lower = c("rho" = -0.999, "sigma" = 0), upper = c("rho" = 0.999, "sigma" = 100), control = list("max.time" = 10)), threshold_pval = 0.2, localize_functions = TRUE, lock_alternative = FALSE) dat <- c(-1.02, -1.13, 0.53, 0.21, 1.76, 1.79, 1.42, -0.31, -0.28, -0.44) mc.ar1.t.test(dat, mu = 0, alternative = "two.sided")

## Warning in mc.ar1.t.test(dat, mu = 0, alternative = "two.sided"): Computed ## p-value is greater than threshold value (0.2); the optimization algorithm ## may have terminated early

## ## Monte Carlo Test ## ## data: dat ## S = 0.73415, p-value = 0.264 ## alternative hypothesis: true mu is not equal to 0

mc.ar1.t.test(dat, mu = 3, alternative = "two.sided")

## Warning in mc.ar1.t.test(dat, mu = 3, alternative = "two.sided"): Computed ## p-value is greater than threshold value (0.2); the optimization algorithm ## may have terminated early

## ## Monte Carlo Test ## ## data: dat ## S = -7.9712, p-value = 0.504 ## alternative hypothesis: true mu is not equal to 3

t.test(dat, mu = 3, alternative = "two.sided") # For reference

## ## One Sample t-test ## ## data: dat ## t = -7.9712, df = 9, p-value = 2.278e-05 ## alternative hypothesis: true mean is not equal to 3 ## 95 percent confidence interval: ## -0.5265753 1.0325753 ## sample estimates: ## mean of x ## 0.253

I have now covered what I consider the essential technical functionality of **MCHT**. All of the functionality I described in these posts is functionality that I want this package to have. Thus I personally am quite happy this package exists, which is good; I'm the package's primary audience, after all. All I can hope is that others find the package useful too.

I wrote this article more than a month before it was published, so perhaps I have made an update that isn't being accounted for here, but as of this version (0.1.0), I'd call the package in a beta stage of stability; it's usable, but features could be added or removed and there could be unknown bugs.

The following is a list of possible areas of expansion. This list exists mostly because I think it needs to exist; it gives me something to aim for before making a 1.0 release. That said, they could be useful features.

*A function for making diagnostic-type plots for tests, such as a function creating a plot for the rejection probability function (RPF) as described in [1].

*A function that accepts a `MCHTest`

-class object and returns a function that, rather than returning a `htest`

-class object, returns a function that will give the test statistic, simulated test statistics, and a -value, in a list; could be useful for diagnostic work.

*Real-world datasets that can be used for examples.

*Functions with a simpler interface than `MCHTest`

, perhaps with more restrictions on inputs.

*Pre-made `MCHTest`

objects perhaps implementing common Monte Carlo or Bootstrap tests.

I also welcome community requests and collaboration. If you want a feature, consider issuing a pull request on GitHub.

Do you want more documentation? More examples? More background? Let me know! I'd be willing to write more on this subject. Perhaps if I amass enough content I could write a book documenting **MCHT** and Monte Carlo/bootstrap testing.

These blog posts together extend beyond 10,000 words, so I'm thinking I have enough material to submit an article to, say, *J. Stat. Soft.* or the *R Journal* and thus get my first publication where I'm the sole author. But this is something I'm still considering; I'm an insecure person at heart.

Next week I will still be using this package in a blog post, but I won't be writing about how to use it anymore; instead, I'll be using it to revisit a proposition I made many months ago. (It was because of that article I created this package.) Stay tuned, and thanks for reading!

- R. Davidson and J. G. MacKinnon,
*The size distortion of bootstrap test*, Econometric Theory, vol. 15 (1999) pp. 361-376

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>My grandfather (to me, “Grandpa”) died on Friday, November 2nd, 2018, a few days before this post was written. As I write this I am with my family in Blackfoot, Idaho, staying in his and my grandmother’s (or “Grandma’s”) house; she has survived him. I will be staying with them until the funeral on Saturday, November 10th, 2018 at the Hawker Funeral Home, is over. (The funeral starts at 2:00 PM.)

My Grandpa was 90 years old, born on April 16th, 1928 (sharing his birthday with my brother, who was born in 1993). He died in a car accident while driving from Idaho Falls, Idaho to his home in Blackfoot, after having called my family and told them “I’m loaded up and I’m coming home.” He rear-ended a semi truck stopped to turn into Love’s truck stop on the highway. We believe that he must not have seen the truck in time, because there were long skid marks indicating that he slammed the brakes of his truck, which was towing a heavy trailer. (So we also know that he did not fall asleep at the wheel.) There was little damage to either the trailer or his vehicle, but the airbags in his car went off. He was awake for a moment after the accident since he unbuckled himself, but soon lost consciousness. He had a pacemaker, and we think that it failed and he effectively had a heart attack since his heart had stopped. (It’s possible that the airbag dealt trauma to his chest and head, thus prompting the failure of the pacemaker.) He died in the ambulance. No member of our family was with him.

That Friday I was planning to have dinner with a graduate student friend of mine. I had canceled the week earlier due to a surprise visit from my sister, but I told him “Nothing could possibly interfere with our dinner tonight except for possibly my brother, and I’ll just tell him that I canceled on you before and I’m not going to do it again.” That was some time around noon. At 4:30, my friend and I started reading the rules for a game we were going to play before having dinner, then got into a long discussion with another student about issues relating to privilege, the state of minorities, microagressions, and so on, which lasted till about 6:00. I checked my phone just to make sure that no one had tried to contact me and I discovered there were at least eight calls and a bunch of messages. So I called my brother and he told me to cancel my plans because something happened. After some prodding he told me that Grandpa died. So I agreed to meet him at my apartment. I told my friend, “I have to go; my Grandpa’s dead.” He said “Okay,” and I left. (So I canceled on him again; I’ll have to remember not to tell someone “Nothing could possibly cause me to cancel” because I might just kill someone else for calling down the karma.)

That’s all I want to say about my Grandpa’s death for now. The rest of this article is my memorial of him and the impact he had on my life.

My Grandpa Douglas was born and raised in Blackfoot, Idaho. His father was George Wareing, his mother was Amelia Hansen, and his brother was LaVere. He was born in 1928, so he was able to remember the depression. He remembered hobos coming to his parents’ door and his mother making them sandwiches.

I don’t know much about his relationships except for his relationship with his brother. LaVere was the eldest and I remember my Grandpa not particularly caring for games when he was older since LaVere would get very angry with Grandpa when Grandpa won. (But Grandpa definitely knew how to play Go Fish; I remember him occasionally cleaning someone out in games since he better remembered who had what in their hand. He enjoyed lightly competitive family games as well.) My understanding was that for most of his life my Grandpa had a poor relationship with his brother. That said, I heard that before LaVere died after the passing of his wife, their relationship improved.

I think the most distinct story I remember of Grandpa’s childhood was him riding trains. He loved trains all through his life. He would go to the train yard and the workers would bring him into the locomotive and he’d ride around with them. Thus for all his life Grandpa loved trains.

Grandpa had dyslexia. He was an intelligent person but he struggled with reading and writing. He was told repeatedly that he was stupid and he took those words to heart; throughout his life he would belittle himself, which his loved ones did not like hearing. The military would disagree with this idea that he was “stupid”; Grandpa said he was told he scored high on a military IQ test and the military wanted to have him join an intelligence division. The thought of a sea of documents Grandpa would have to read lead him to turn the offer down; while in the Air Force in the Korean War, he worked on a boat tasked with rescuing downed pilots. (Grandpa enlisted because he didn’t want to get drafted.)

My Grandpa met my Grandma when a friend of his said he met a girl from Chicago with a sister and invited my Grandpa to take the sister on a date with him. Grandpa accepted, and while he was at the house waiting, he saw my Grandma, Barbara Marshall, come into the room. She was a very beautiful girl, and Grandpa enjoyed the chat he had with her. He said to himself “I want that one,” and he courted and eventually married her. (She was not the sister he was supposed to meet.) Next summer would have been their 70th wedding anniversary. Grandpa never ceased to adore Grandma. To me, the best evidence of his love for her were the love notes he would stick to her mirror. When I would visit my Grandparents I made sure to see what new notes were on the mirror.

Grandpa taught me what true love is and that while marriage may take work it’s worth it. When I envision the ideal marriage I picture Grandma and Grandpa.

Grandma and Grandpa had seven children in total, four boys and three girls. David, Nancy, and Michael (“Mike”) were the first three and the eldest (in that order); Grandpa called them “the first family.” Grant, Paul, Amy (my mother) and Karen (in birth order again) were “the second family”.

While Grandpa did a variety of jobs and sometimes the money was very tight, he was a teacher by profession, and he loved his work. After his period in the military, he went to college at Idaho State University. He majored in political science and minored in economics. He learned to teach music. While a teacher he taught government classes, history classes, and music. He loved working with kids and many of his students remember him fondly.

Grandpa was once a Republican, even running for state office as a Republican and leading the local Republican party. I think he turned against the party because he opposed their foreign policy (Grandpa hated wars), and I don’t think he could ever be considered a conservative.

Grandpa lived on a farm that, to my understanding, had been in the family. The farm was divided when the I-15 was built, and two branches of the family now live on two separate parts of that original family farm. Grandpa was not a farmer, but he knew how to run farm equipment and took good care of the property for his whole life. In recent years he leased the property out to others, allowing their herds of cattle to graze before being sold to slaughter.

Grandpa himself hated killing anything. He once worked on a feed lot, which of course fed cattle that were intended to be slaughtered. (Grandpa was a real cowboy.) He hated that idea. He would rescue the spiders in the house before Grandma found and squashed them. Recently his property became infested with marmots, which were destroying his equipment. He shot them to death, but with quivering hands; he didn’t want to kill them.

My Grandpa didn’t have a mean bone in his body. In fact, the word I remember him most saying was “love”. He meant not only love for his wife but love for everyone. He would tell the nurse in a hospital or a stranger in a shopping line that they were wonderful and special and loved. He twisted a common poem with Christian overtones to read

One life to live

t’will soon be passed;

only what’s done

inkindnesswill last

Grandpa was one of the kindest people I knew.

My Mom and Dad initially lived in a single-wide trailer on my Grandparents property, when my Dad first was editor of the local newspaper, the *Morning News*. At that age my Grandpa got to know me. He gave me my first nickname, “destructo”, since I often left a big mess.

I obviously don’t remember much of this early time of my life, but my Grandpa did. He remembered driving me around in his pickup while playing jazz on the radio (Grandpa always loved jazz, and he gave me my taste for it). This was one of his favorite memories. He would carry me around while he did his chores, even when operating tractors on the property.

My Dad lost his job as editor of the *Morning News* and went to school for two years to become a computer programmer. When he graduated, he moved the family to Salt Lake City, and I grew up in a suburb of the city, West Jordan. Yet I still managed to grow close to my Grandpa. I think this is primarily because when I was in elementary school, we were on the track schedule: there were four tracks (A, B, C, and D), which rotated through a three-week vacation throughout the school year so that we had a schedule of three weeks off, nine weeks on. Frequently during those breaks my Mom would take my brother and I (and later my sister, when she was born in 1999) to spend a week with my Grandparents while my Dad stayed home to work.

Grandpa was known for giving nicknames to people. After “destructo”, I was “Colonel”, then “decum” after I turned ten. I think I’ve gotten other names too; he recently would call me “the professor”. But of the names he gave me, I think “Colonel” is my favorite. It’s also the first name I remember.

Grandpa liked trains and perhaps it was from him that I learned to like trains, especially the old steam trains. My Grandpa’s children bought for him a fancy HO-scale electric train, the locomotive a 4-6-6-4 *Challenger*. Around that time one of the remaining *Challengers* traveled through our area and we were able to see the real locomotive. But I was enamored with the model; I loved watching it drive around the track. Apparently I was the person who broke the model; it was never fixed. Nevertheless, my Grandpa got me liking trains. (To this day, my favorite locomotive is the *Challenger*.)

My parents bought me a Life-Like HO-scale electric train set not intended just for children but for model railroad enthusiasts as well one Christmas. This spawned an ill-advised hobby in my childhood around model trains; no one knew what they were doing but we were going to try to build a layout complete with scenics and landscaping. My Grandpa encouraged this. One of my favorite memories of him was driving from Blackfoot to Pocatello to a hobby shop to buy model trains and accessories. He bought me a wonderful little locomotive which could even puff smoke as it drove. We set up the tracks in the basement area and he and I would drive the trains.

I can’t remember when I stopped my pursuit of the model train hobby; I had a big wooden board in my bedroom with tracks on it that just turned into a giant table with train parts strewn about, lacking any sense of direction. In the end, I gave all of my trains, tracks, scenics, etc., to my Grandpa. We promised to one year build a layout at his home together. He had more space, including extra buildings to store the train, so it could be a great layout without inconveniencing anyone.

We never built that layout.

Another hobby that Grandpa tried to support me in was model airplanes. He helped my parents buy a plane for me. We tried to fly it, but no matter how many times we tried we could not get the plane to stay in the air, whether we were at an elementary school or at Grandpa’s expansive property. I can only remember one successful flight, and Grandpa was there to see it.

I feel that I can attribute my interest in politics to my Grandpa. My Dad, being a newspaper man, was interested in politics too, but I think the initial political conversations I would have was with my Grandpa. The first major political event I recall was the 2000 presidential election; it was the first year I discovered my family was a political minority—Democrats—in the states where we lived (Utah and Idaho). But my Grandpa and I would talk for a long time about politics. There was once a time where I would say “I like politics”; that was largely my Grandpa’s doing (even though that’s not how I would say it today).

Grandpa strongly opposed the war in Iraq. I remember mornings with Meet the Press on TV (back then Tim Russert was in charge, and the show has not been the same since he died in 2008), and the case for weapons of mass destruction (WMDs, which is a dumb word if only because it’s so poorly defined) being in Iraq was pushed. My Grandpa said there were none there unless we gave them to Saddam Hussein, but Iraq did not have the ability to acquire such weapons. Grandpa was right.

He hated Republican economic policy and feared they would try to gut Social Security and Medicaid. He disliked the loss of manufacturing jobs and feared automation putting people out of work. He wanted church and state separated and thus wasn’t too sympathetic for anti-gay and anti-abortion laws. He wanted stronger gun control. He was skeptical of capitalism, saying we needed a little socialism for the country to run. He hated the Idaho government’s approach to education (cut the budget) and a general unwillingness among conservatives to pay taxes, especially the rich. He was concerned about wealth and income inequality. And so on.

Grandpa’s views were powerful, and at family reunions it seems that the family’s political opinions are very homogeneous. Few in those reunions with perhaps around 50 people were sympathetic to Republicans. There was once a time when I was in community college I thought I might be a Libertarian or a Republican, but that view did not survive the University of Utah. Grandpa was skeptical of this potential change in attitude but he loved me regardless of what I believed.

I believe that Grandpa inspiring me to care about politics set me on the track that lead me to where I am today. When I was a kid I didn’t care for math; I could understand it but I had no love for it. I cared about politics, government, and social studies. When I was in high school, while I was taking math classes, I cared about debate (more on that later) and the school literary magazine. I *hated* physics. To this day I care only for broad descriptions of physics concepts, not for the details. (Grandpa didn’t understand physics but he was fascinated by it, as well as how people can discover things using just mathematics.) But I felt that with my mathematics background and my interest in politics I should try and get a degree in economics. That lead me to take more math classes and a statistics classes (a subject I once thought was likely the most dry mathematical subject not extending beyond using means and proportions for baseball statistics). I fell in love with these subjects and now I’m pursuing a Ph.D. in mathematics, studying mathematical statistics.

You can now see the line of thought that lead me to where I am, and I thank my Grandpa for planting that seed. I still am very interested in current affairs and politics and likely will be for the rest of my life no matter what I do.

I have hayfever and my grandparents lived on a farm, so often when I visited I couldn’t breath through my nose and my eyes would become itchy and inflamed. Sleeping at night was hard since I couldn’t breath. One night was particularly bad and I think I got up to try and find some nasal spray. Grandpa was awake too (Grandpa struggled to sleep; more on that later), and when he saw me we got into the car together and drove into town. It was very early in the morning so most of the town’s stores were closed, but we managed to find a convenience store that was open. He bought a nasal spray for me, we rode back home, and the spray helped my nose clear up.

I played piano (and one year tried the clarinet) as a kid. I was never a great piano player, but I did develop some skill. My Grandpa loved music and wanted me to study it as well. I remember painful sessions of Grandpa sitting me down and giving me his version of a piano lesson. He was highly critical of me and mistakes I would make. These lessons would always turn into a lecture about how valuable a skill like playing piano would be (not from a financial perspective but more from a civic one). He’d berate me for spending time playing with toys or computer games and not spending more time practicing piano.

For all his talk of my possibly enjoying piano, I don’t think I ever loved it as much as I did other things. When I started college I dropped piano, and my Grandpa always reminded me of that decision. He wished I kept it up. I may return to practicing piano some day when I have more time, but I wish I could have played for him one more time. As painful as his lessons were, I liked being with my Grandpa and I put up with them.

I enjoyed Grandpa’s music, though. He lead a jazz band all his life, whether it was a high school band or a volunteer community band. I remember as a kid going to his room at the Eastern Idaho Technical College to listen to his bands rehearse, then attending his concerts in Idaho Falls parks. My favorite Fourth of July was when I was very young, when the day started with one of his jazz concerts. The day ended with fireworks over the Snake River while we sat on the banks. Days like that were beautiful.

He and his band was featured by a local television station; you can see them play here.

Grandpa was not one to mince words. His never physically hit anyone (except once when he whopped my Aunt Karen on the butt when she and my Mom were teenagers after she made a rude comment to my Mom while lying on the bed; I don’t know what she said but I bet she deserved what she got). But Grandpa’s lectures were legendary. I think every child and grandchild got at least one lecture. I got my fair share. And he would tell you what he thought, and nothing less.

Grandpa was not always right, but he was a wise man and I always listened to what he said. I never got upset when I got a lecture. I knew he loved me and wanted to tell me something he thought I needed to hear in order to be the best and happiest person I could be.

Grandpa exhibited many virtues, but I don’t see “patience” as one of them, at least from my experience. There were the aforementioned piano lessons. I also remember when Grandpa was teaching me to drive. He was the first person to put me behind the wheel of a vehicle and tell me to drive. He would get after me for many things while driving. I did learn, but not until after a good verbal whipping for my mistakes. (I know very well *never* to cross my arms when turning the steering wheel.)

I remember going to his property so many times to “build fences”. I never once remember building a fence. When we would go to his place for “building fences” we often did something else, perhaps having nothing to do with fences or even work. We would clean up grass, tear down old buildings and fencing, dig holes, and many other non-fence-building things. I remember one year *after* I learned how to drive we were towing old vehicles. I drove the towing truck while Grandpa steered the vehicle being towed. This went well until we tried to tow a very old, rusty yellow car. My brother was in the back of the pickup truck I was driving, directing me. I pressed the gas and was having a hard time getting the truck going, so I hit the gas too hard and pulled the car’s bumper off. Grandpa got out and kicked the car and gave me a verbal tongue lashing. I felt terrible, but Grandpa forgave me and gave me a hug. My pulling off the bumper prompted him to decide that the car was beyond refurbishing anyway.

Grandpa cared a great deal for his property. I remember him trudging off in his irrigation boots to start irrigating the property. I loved when he irrigated; I would run through the watery half-acre lawn and swim in a particularly deep divet in the lawn, deep enough to reach my neck when I was little. He changed irrigation technique later, and flooding the lawn no longer occurred. (I missed this.) Even though he was in his late 70s or even early 80s he would carry several large metal pipes on his shoulders with sprinklers on them. In the evening the sprinklers would be running. He mowed his massive lawns by hand for years but in his later years he learned to appreciate the riding lawnmower.

The lawns of his house are beautiful. My Dad wanted Grandpa to show him how to take care of his property and run his machines after my parents moved back in a couple months ago. Dad got to run the lawnmower but Grandpa died before he showed Dad what else needs to be done to keep the place in good condition. If Dad plans to learn on his own, it will be a heavy lift to keep the place in the same condition Grandpa did without his guidance.

Grandpa was a hyper person; he had ADHD and could not stand sitting around. Whenever he caught a child or grandchild sitting around he would give them something to do. He would sometimes ask “Are you bored?” I learned to answer “no” when he asked this question, because otherwise he would give me some chore to do. Grandpa valued hard work.

I think that Grandpa telling me to come stay with him to help build fences was just an excuse to have me around. I was fine with this. This was more time to spend with Grandpa. We often did some work, but we also did fun things. I don’t think Grandpa would say that he spoiled his grandchildren (in fact I think I once mentioned that most Grandparents spoil their grandchildren and he said “too bad for you”). I remember Grandpa buying us ice cream, soda, and candy bars, even as recently as a few months ago.

Grandpa lived on a farm in a very rural area. He took advantage of this. He would go on a walk every morning. When a big truck on the freeway would drive by, he would motion for the truck to blow its horn, and often the truck drivers obliged. Grandpa became known among the trucker community, always being spotted on his walks in the morning.

I remember night walks with my Grandpa, too. My family would put on their jackets and walk through the night in the area. The stars were bright and truck lights passed by on crisp evenings, sometimes in the winter, sometimes with a distant thunderstorm lighting up the sky. I remember watching dogs walk with us while Grandpa lead us in fun walking and marching songs. I still remember some of those songs.

I had such good times with my Grandparents as a child that one year, when we had to end our vacation and return home, I was completely beside myself. I didn’t want to leave them. I may have cried the whole way home. I loved being with my Grandparents. I was very close to my Grandpa.

I end this section with a story: Grandpa and I were driving to Pocatello to visit a hobby shop when I was interested in model railroading. The drive is about 30 minutes. He bought me a PayDay bar, the first time I remember having one of those bars. He asked me what I was thinking about. I said “nothing.”

“Nothing?” he replied. “You mean your mind is a void? Nothing going on?”

“I guess so,” I said.

“But there’s so much to think about. You should always be thinking about

something.”

I’ve always been thinking since. I’m almost never bored.

I think Grandpa was on the debate team in high school; he recalled competing in extemp. Grandpa and my Mom encouraged me to join the debate team, and I did so. This was important to how I developed as a person. Prior to debate I was an incredibly shy person; giving a presentation in front of a class was an act of great courage. Debate helped pull me out of my bubble. I was a debater during all of high school, and I did well, placing and winning in several events, one of which was extemp. Today, while I struggle to develop non-professional relationships with people (especially women), I can confidently teach a class of any size and give a presentation with basically no notes to a crowded theater without breaking a sweat.

One year I wanted my Grandpa to be a judge in a debate tournament. He agreed but somehow he got the impression that he would be watching me compete. At the time a debate only included the debaters and the judge, with some exceptions when there were multiple people competing in the same room, but I did not want to challenge the norm. I felt pressured not by Grandpa but by my family to allow him to watch, and I was upset; eventually they relented.

However my Grandpa was involved in my debate career. I remember demonstrating speeches for him that I had rehearsed extensively. One day my Grandpa even taught my debate class. I remember it was in 2008, since he was about to turn 80 years old.

I got my first girlfriend, Andrea, in December 2009 and we were together until January 2011, with a one-month break. Grandpa liked Andrea when he met her, and he invited her to the 2010 family reunion. I appreciated that.

He did meet my second girlfriend, Jasmin, years later in December 2014, but he didn’t get to see her for long. Jasmin broke up with me in May 2015 and I was greatly hurt by this. With all respect, Jasmin was my favorite girlfriend, even though I was with her only for nine months. I was very happy with her and basically saw her as the girlfriend I always wanted, ever since I was a teenager praying to God for a particular girl. Losing her hurt me deeply and I think that break-up changed me. As an undergrad I was largely confident and even becoming more friendly, but as a grad student (post-Jasmin) I’ve become less confident, more pessimistic, and more withdrawn.

2016 was a harder year for me and at the family reunion I was still struggling with my grief. I had moved out of my parents’ house and I was feeling lonely; I missed Jasmin a lot. I was studying out of a real analysis textbook at the time; I saw my mathematical abilities as one of the few things that gave me value.

I was alone at the reunion when Grandpa came up to me. I was working problems in the analysis book and he asked me if I was happy. I broke down in tears and said “No.” He put his hand on my arm to comfort me. I told him that I missed Jasmin a lot. He wanted to help me. He wanted me to move back in with my parents and he wanted to arrange for me to use one of his cars (I rely on transit, which makes it very difficult for me to get out and meet people). I refused to move back no matter how much he protested, and I never got that car even when he recommitted to trying to get me one when my parents moved back into his home in Idaho. That said, his caring meant a lot to me and it helped me to seek out help from a professional.

As a kid, one thing that I wanted was for Grandpa’s jazz band to play at my wedding, with Grandpa conducting. That was going to be his wedding gift to me. I don’t think I ever told him this. Within the last couple of years, as my romantic life turned into an even greater failure, I lost hope that this would happen. Grandpa tried to help me, giving me tips on how to talk to girls and where to meet them. I did try for a little while, but I couldn’t overcome myself. Now Grandpa is dead, and with him my dream.

Grandpa went on a trip to Peru with my Aunt Karen and my Aunt Dalena. He didn’t know any Spanish but he wanted to help the people there in any way he could. I heard that the people in the villages he visited were amazed by him; they had never seen anyone as old as he was. Yet he was still a capable individual. The story I remember the most was him telling the children about steam trains like he knew from when he was a kid; they were very poor with a weak education so they were not familiar with trains. As he got onto the plane to leave, he gave a “Toot, toot!” for the kids, with tears in his eyes. I heard it was a touching moment.

When I was in college Grandpa grew more frail. He hated it. He didn’t want to be weak. He sometimes would say he’s “not a man anymore”, when nothing could be further from the truth. He couldn’t stand up straight like he used to. He had problems with his knees, his feet, his heart. He once reached underneath his lawnmower and lost his fingertips since the machine was still running. He was very angry and distraught with this mistake and his now maimed hands. (In my opinion the wound wasn’t noticeable.) My aunts and uncles wanted to do more for him in order to prevent him from straining himself or getting into a dangerous situation. He resisted help. He would even refuse my offers to do the dishes for him; that was his job and I wasn’t taking it from him.

When my brother lived with Grandma and Grandpa he had to help them in a few emergencies. They were resilient but Grandpa was growing weaker. Doctors visits became more common. He was feeling less well. More surgeries were needed. His heart grew weaker, and a pacemaker had to be implanted. Grandpa was starting to get very old.

Grandpa feared old age and resisted it. He didn’t want to be deprived of his independence. He told me in car rides with just him and I that he wanted to die in his house, not in a nursing home. His home would be his only home, and nowhere else. And he was uneasy about the prospect of death. As much as he said that he wanted to see important life events for his grandchildren (even great grandchildren; he pointed to my baby nephew Ayven and said “I want to see *him* get is Ph.D”), he spoke of his own life as if he believed he would not be around for much longer.

He said he never doubted there was a creator; life looked very created to him. He did question Christianity in general, Seventh-Day Adventist’s flavor in particular. But while he questioned the details he embraced the idea of loving everyone and living what one could call a Christian life. He attended church regularly with Grandma until his death, and my brother tells me that in Sabbath school class he would end a discussion by saying that he loved everyone in the room and they all were special.

Grandpa also decided it was highly unlikely that this life was the end and death was eternal oblivion. I agreed with him.

Grandpa wanted to attend my college graduation ceremony, but he had a shingles outbreak and could not go. He and Grandma were heartbroken, and I wished the were able to see me walk. (I’m glad that my Uncle Mike and Aunt Donna managed to come, though; it meant a lot to me that they did.) After that I decided that I really wanted to have Grandpa see me walk to get my Ph.D. He was proud of my studies and always enjoyed talking to me and seeing how I thought. (Sometimes I would start to feel like a freak when so much attention was paid to my mathematical ability, but I knew that any attention was out of love and pride in his grandson.) I started to picture a dinner the day before my Ph.D. graduation where my family, including Grandma and Grandpa, would meet my adviser—Lajos Horváth—for the first time over dinner at a nice restaurant. Then the next day my family (including Grandpa) would see my adviser put the sash over my neck that made me Dr. Curtis Miller. I felt as if this vision could be attained.

Grandpa is dead now, and I don’t have my Ph.D. Another vision I really wanted that will never come true.

Last year Grandma spent several days in the hospital in Salt Lake City for a heart surgery. I spent a lot of time with Grandma and Grandpa and the aunts and uncles who came with them. In addition to enjoying a pancake breakfast every morning in the hospital cafeteria, I spent many hours just about every day I could with them, talking with them. This was the first time I saw Grandpa with a cane. But he seemed to take to it well.

There were other times throughout last year that I intermittently saw my Grandparents. Sometimes they came to Salt Lake City for medical reasons, sometimes it was for good things like the birth of my nephew, Ayven, or my sister’s graduation. I missed the last family reunion because it was planned to be around my Grandparents’ 69th wedding anniversary and I was already arranged to travel to San Francisco on a grant to attend an MSRI workshop. (I would have gladly missed the workshop if I was aware there would be a conflict, but by the time the date was announced I felt that I could not cancel. But I was there in spirit since half of the reunion’s attendees caught the stomach flu that I caught from the wild and then spread to the Utah branch of the family.) I saw him shortly before the semester started, though, along with the weekend I helped my family move to Idaho back into Grandma and Grandpa’s house, and also the week of fall break this semester.

I was actually debating whether to visit during fall break this year, but I decided in favor of visiting, and I’m so glad that I did. That visit was the last time I would see my Grandpa, and he always loved to see me. He was hoping that next summer I would spend a length of time in Idaho with them. I think Grandpa was becoming increasingly doubtful of his longevity and wanted to see me as much as he could before he was gone.

The night I arrived after taking a Salt Lake Express bus to Pocatello (my sister Alicia picked me up from there) I went into Grandma and Grandpa’s bedroom, where he was sitting. He was recovering from another knee surgery so he didn’t want to leave his room. I sat beside him on the bed and he and I talked for a very long time about current events, how science works, the world, and many other things. These were the conversations he loved to have with me, the kind of conversations he and I would have over periods ranging from my childhood to my teenage years to my college years. He didn’t understand everything I said, because despite my best efforts I sometimes struggle to make myself understood, no matter how much I fear talking over anyone’s head. But he loved it. As he always loved talking to me.

Sometimes I wonder if anyone enjoyed talking to me as much as my Grandpa did.

Grandpa looked miserable the last time I saw him. He couldn’t sleep at night. He got only a few hours of sleep, then he was awake. I remember one night while I was trying to go asleep on the couch seeing him walk to the chair behind me and just sit down and stare over me, effectively alone (because I was going to just pretend I was asleep to try and go to sleep; perhaps I should have talked to him). These sleepless nights were increasingly common for him; I heard from my Mom that one night he went to his car and turned on jazz music so he didn’t disturb anyone while he dealt with being awake. He felt very lonely in these times.

To be completely honest, during my last trip, Grandpa did not seem happy anymore. He seemed miserable. He looked more haggard and frail. He couldn’t do anything because he was tired all day since he didn’t sleep at night. This was the reason he (and thus Grandma, too; she was not leaving him) missed my nephew’s first birthday party.

He would still try to work. He wanted to get the tin building on his property ready for winter and cleaned out so that my family could store their stuff in there and their belongings would be safe from mice. We went to a local lumber yard to buy wood, then to C.A.L. Ranch for rat traps. While there, he bought me a candy bar. Then we returned to his home and tried to put the boards he bought into the doorway, only to find that the lumber yard had cut them to our *exact* specifications; this apparently was not what we wanted because the boards were too snug to slide in. Grandpa found a buzz saw in the shed and turned it on. I held the boards while he cut. He was too weak to hold the saw up so it ended up hanging right next to his leg while it was still running. I saw this enormous safety hazard and wanted to say something to him, perhaps offering to take control of the saw instead, but as before I could never bring myself to question my Grandpa, even when I *really* should have.

My Grandpa’s last advice to me was about regret. He questioned whether I was living a healthy lifestyle. I don’t work out all that much these days; that seems like time better spent studying or at least doing something I personally find fun. He said that regret is a hard thing to deal with later in life as you deal with the consequences of bad decisions. One should try to minimize regret as much as they can.

You can see in this article a number of regrets I have.

Before I left, Grandpa asked me to promise him that I would go to the gym. He was adamant about me making that promise, so I did. (I still haven’t gone.)

Grant took me home and while I did say goodbye to Grandma and Grandpa I didn’t have that final goodbye hug I’m used to getting from them. We had already pulled out and Grant wanted to get home soon; he was already late according to his schedule. So I called my Mom and I told Grandma and Grandpa that I forgot that hug but I loved them and I would see them soon at this year’s Thanksgiving dinner.

I never saw my Grandpa again. I should have told Grant to turn around so I could get that last goodbye. There’s another regret.

I have yet to see Grandpa’s body, but I will be a pall bearer at his funeral. I’m planning on wearing one of the outfits my sister helped me pick, which I wore regularly to teaching: a black coat and a black sweater vest over a black-and-white plaid shirt, with black jeans and sneakers. When Grandpa saw me wearing this outfit he would call me “the professor”. It’s an outfit I like and I think it looks sharp, so it seems fitting.

Many people remember Grandpa and were touched by him. It was just a couple months ago one of his students came to visit him, spending an hour at his house with her family; she remembered him fondly. When he died and the news was released an unusual number of people called to ask when the funeral was scheduled to take place. Our family thinks there could be many people at his funeral. We’re pleased he touched so many lives.

I want to place a copy of this article in his casket. I won’t print it on any fancy paper, but after posting it I’ll print it out with all the metadata associated with web pages that browsers print out. I thought about this and like

it; it shows where I made my memories of the only Grandpa I knew and who I loved dearly public, on my personal website, along with the time and date.

Since Grandpa’s death, there’s been talk about whether we would bring him back if we could, or whether he died at a good time. I’m entitled to whatever opinion I want because my opinion doesn’t change anything.

I’m happy that Grandpa avoided the worst of aging. In some ways his death was a mercy. He did not lose his independence. He did not lose his home. He did not see his health decline even further and he was doing what he loved.

But I feel that there was a lot of unfinished business, things we wanted him to see. My brother wanted Grandpa to see him become an electrician. And I wish so badly that he could have at least seen me get my Ph.D.

I wish he could have seen Ayven get *his* Ph.D.

I will never have another Grandpa in my life. I had a damn good Grandpa though, one of the finest men who’s lived. I will never stop missing him.

I love you Grandpa.

While Grandpa loved jazz, he also loved classical music. He said that if he had to pick one composer to listen to for the rest of his life, it would be Beethoven. So below is the last piano piece I played for Grandpa, “Moonlight Sonata,” by Beethoven.

]]>I’ve spent the past few weeks writing about **MCHT**, my new package for Monte Carlo and bootstrap hypothesis testing. After discussing how to use **MCHT** safely, I discussed how to use it for maximized Monte Carlo (MMC) testing, then bootstrap testing. One may think I’ve said all I want to say about the package, but in truth, I’ve only barely passed the halfway point!

Today I’m demonstrating how general **MCHT** is, allowing one to use it for multiple samples and on non-univariate data. I’ll be doing so with two examples: a permutation test and the test for significance of a regression model.

The idea of the permutation test dates back to Fisher (see [1]) and it forms the basis of computational testing for difference in mean. Let’s suppose that we have two samples with respective means and , respectively. Suppose we wish to test

against

using samples and , respectively.

If the null hypothesis is true and we also make the stronger assumption that the two samples were drawn from distributions that could differ only in their means, then the labelling of the two samples is artificial, and if it were removed the two samples would be indistinguishable. Relabelling the data and artificially calling one sample the sample and the other the sample would produce highly similar statistics to the one we actually observed. This observation suggests the following procedure:

- Generate new datasets by randomly assigning labels to the combined sample of and .
- Compute copies of the test statistic on each of the new samples; suppose that the test statistic used is the difference in means, .
- Compute the test statistic on the actual sample and compare to the simulated statistics. If the actual statistic is relatively large compared to the simulated statistics, then reject the null hypothesis in favor of the alternative; otherwise, don’t reject.

In practice step 3 is done by computing a -value representing the proportion of simulated statistics larger than the one actually computed.

The permutation test is effectively a bootstrap test, so it is supported by **MCHT**, though one may wonder how that’s the case when the parameters `test_stat`

, `stat_gen`

, and `rand_gen`

only accept one parameter, `x`

, representing the dataset (as opposed to, say, `t.test()`

, which has an `x`

and an optional `y`

parameter). But `MCHTest()`

makes very few assumptions about what object `x`

actually is; if your object is either a vector or tabular, then the `MCHTest`

object should not have a problem with it (it’s even possible a loosely structured `list`

would be fine, but I have not tested this; tabular formats should cover most use cases).

In this case, putting our data in long-form format makes doing a permutation test fairly simple. One column will contain the group an observation belongs to while the other contains observation values. The `test_stat`

function will split the data according to group, compute group-wise means, and finally compute the test statistic. `rand_gen`

generates new dataset by permuting the labels in the data frame. `stat_gen`

merely serves as the glue between the two.

The result is the following test.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x) { grp_means <- aggregate(value ~ group, data = x, FUN = mean) grp_means$value[1] - grp_means$value[2] } rg <- function(x) { x$group <- sample(x$group) x } sg <- function(x) { test_stat(x) } permute.test <- MCHTest(ts, sg, rg, seed = 123, N = 1000, localize_functions = TRUE) df <- data.frame("value" = c(rnorm(5, 2, 1), rnorm(10, 0, 1)), "group" = rep(c("x", "y"), times = c(5, 10))) permute.test(df)

## ## Monte Carlo Test ## ## data: df ## S = 1.3985, p-value = 0.036

Suppose for each observation in our dataset there is an outcome of interest, , and there are variables that could together help predict the value of if they are known. Consider then the following linear regression model (with ):

The first question someone should asked when considering a regression model is whether it’s worth anything at all. An alternative approach to predicting is simply to predict its mean value. That is, the model

is much simpler and should be preferred to the more complicated model listed above if it’s just as good at explaining the behavior of for all . Notice the second model is simply the first model with all the coefficients identically equal to zero.

The -test (described in more detail here) can help us decide between these two competing models. Under the null hypothesis, the second model is the true model.

The alternative says that at least one of the regressors is helpful in predicting .

We can use the statistic to decide between the two models:

and are the residual sum of squares of models 1 and

2, respectively.

This test is called the -test because usually the F-distribution is used to compute -values (as this is the distributiont the statistic should follow when certain conditions hold, at least asymptotically if not exactly). What then would a bootstrap-based procedure look like?

If the null hypothesis is true then the best model for the data is this:

is the sample mean of and is the residual. This suggests the following procedure:

- Shuffle over all rows of the input dataset, with replacement, to generate new datasets.
- Compute statistics for each of the generated datasets.
- Compare the statistic of the actual dataset to the generated datasets’ statistics.

Let’s perform the test on a subset of the `iris`

dataset. We will see if there is a relationship between the sepal length and sepal width among *iris setosa* flowers. Below is an initial split and visualization:

library(dplyr) setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width) plot(Sepal.Width ~ Sepal.Length, data = setosa)

There is an obvious relationship between the variables. Thus we should expect the test to reject the null hypothesis. That is what we would conclude if we were to run the conventional test:

res <- lm(Sepal.Width ~ Sepal.Length, data = setosa) summary(res)

## ## Call: ## lm(formula = Sepal.Width ~ Sepal.Length, data = setosa) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.72394 -0.18273 -0.00306 0.15738 0.51709 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.5694 0.5217 -1.091 0.281 ## Sepal.Length 0.7985 0.1040 7.681 6.71e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2565 on 48 degrees of freedom ## Multiple R-squared: 0.5514, Adjusted R-squared: 0.542 ## F-statistic: 58.99 on 1 and 48 DF, p-value: 6.71e-10

Let’s now implement the procedure I described with `MCHTest()`

.

ts <- function(x) { res <- lm(Sepal.Width ~ Sepal.Length, data = x) summary(res)$fstatistic[[1]] # Only way I know to automatically compute the # statistic } # rand_gen's function can use both x and n, and n will be the number of rows of # the dataset rg <- function(x, n) { x$Sepal.Width <- sample(x$Sepal.Width, replace = TRUE, size = n) x } b.f.test.1 <- MCHTest(ts, ts, rg, seed = 123, N = 1000) b.f.test.1(setosa)

## ## Monte Carlo Test ## ## data: setosa ## S = 58.994, p-value < 2.2e-16

Excellent! It reached the correct conclusion.

One may naturally ask whether we can write functions a bit more general than what I’ve shown here at least in the regression context. For example, one may want parameters specifying a formula so that the regression model isn’t hard-coded into the test. In short, the answer is yes; `MCHTest`

objects try to pass as many parameters to the input functions as they can.

Here is the revised example that works for basically any formula:

ts <- function(x, formula) { res <- lm(formula = formula, data = x) summary(res)$fstatistic[[1]] } rg <- function(x, n, formula) { dep_var <- all.vars(formula)[1] # Get the name of the dependent variable x[[dep_var]] <- sample(x[[dep_var]], replace = TRUE, size = n) x } b.f.test.2 <- MCHTest(ts, ts, rg, seed = 123, N = 1000) b.f.test.2(setosa, formula = Sepal.Width ~ Sepal.Length)

## ## Monte Carlo Test ## ## data: setosa ## S = 58.994, p-value < 2.2e-16

This shows that you can have a lot of control over how `MCHTest`

objects handle their inputs, giving you considerable flexibility.

Next post: time series and **MCHT**

- R. A. Fisher,
*The design of experiments*(1935)

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>