The *Forgotten Age* cycle of Arkham Horror is at a close and Fantasy Flight Games already announced the next cycle, *The Circle Undone*. Not only that, they’ve announced two mythos packs at a rate that… surprised me. A new cycle announcement and two mythos pack announcements in less than two months? Am I the only one who finds the new pace of announcements surprising? Perhaps that means they want to get product out at a faster pace?

Eh, enough speculation. I wrote about Arkham Horror before, analyzing Olive McBride specifically. This analysis (despite errors in the initial publication) was well received, even earning me a shoutout from my favorite Arkham-related YouTube channel.

In the announcement of the mythos pack *The Wages of Sin*, another mathematically interesting card was spoiled: Henry Wan, seen below.

Designing new allies for Arkham Horror is very hard because there can effectively only be one ally in a deck and there are many good allies already released, many of them in the core set. Henry Wan, specifically, is competing with Leo de Luca, who competes with Dr. Milan Christopher for the title of “Best Ally”. Cards like Charisma help the problem, but only if you plan on running multiple allies and are willing to pay the experience points for it.

Can Henry Wan compete with Leo de Luca? That strongly depends on how good his ability is. Actions are a precious commodity in Arkham Horror; this is why Leo de Luca is considered such a great card. Card draw and resource gain *can* help action economy, especially in a spendthrift class such as the Rogue (green) class, but it often takes many resources to compensate for a lost action.

Consider, for instance, Father Mateo’s Elder Sign ability; gain an extra action, or a card and a resource. As a point of reference, players can draw a card *or* gain a resource for one of their actions, so a raw evaluation would say that drawing a card *and* gaining a resource is actually worth two actions and thus is better than just getting a free action. But I feel most of the time people use Father Mateo’s elder sign effect to gain the additional action rather than the card and resource (though choosing the latter effect is far from rare). In fact, I think that a single action could be valued at *three* resources, based only on the fact that when a player draws Emergency Cache they will eagerly play it. When viewed from this perspective, Leo de Luca pays for himself after about three turns, and drawing him early gives an investigator a major boost in a scenario.

Henry Wan will thus live or die based on how strong his ability is. That said, “strong” depends on how well a player can use his ability, which is not a trivial task.

Make no mistake: Henry Wan is a gambler’s card (which fits the Rogue theme very well). Not only does a player gamble the resources spent on him, they gamble the action spent to trigger his ability; heck, using a deck slot on him is a gamble! A player thus will gain value from him *only* if they use him optimally.

Optimal play is not trivially determined, but fortunately Henry Wan’s ability is easy to model mathematically if you’re familiar with Markov chains. Wait, are most people *not* familiar with Markov chains? Oh, I didn’t know that. Oh well, maybe they’ll learn something from what follows. I’ll do my best to make it simple.

From here on, I consider drawing a card or gaining a resource with Henry Wan as equivalent; I’ll simply imagine that we’re trying to gain resources using his ability. Henry Wan’s ability calls on players to institute a policy for playing him of the following form:

**After X draws, take your winnings; do not draw anymore.**

Our job is thus to choose X so that we maximize the *expected* resource gain (in the probabilistic sense of expectation.)

I’m going to call utilizing Henry Wan’s ability a single “game”. Here’s how I view the game: the chaos bag is filled with tokens labeled either “S” or “F”, with every “F” being one of the icon tokens mentioned in Henry Wan’s ability. When an “S” is drawn, the game continues, while the game ends the moment an “F” is drawn. Every time we draw an additional “S”, there is one fewer “S” in the bag, and the odds of drawing an “F” increase; that said, our total winnings increase with each “S” we draw.

The game ends when either an “F” is drawn or the policy is triggered. Our winnings depend on which of these outcomes we find ourselves. If it’s the former, our winnings are 0, while if the latter, our winnings are X. Thus it’s easy to see (if you’re familiar with probability) that the expected winnings for any given policy is X times the probability of winning with the chosen policy: , if you prefer (with be the probability of not failing using the policy of ending after X draws). We thus want to pick X that maximizes .

Calculating calls for the Markov chain. Below is the chain I imagine:

- The initial state is state 0, representing zero draws. There are also states numbered 1 to X, and a state F.
- If the chain is at state , the chain moves to state with probability or to state F with probability .
- Both state X and state F are absorbing states. (Once entered, the chain does not leave the state; in other words, the "game" ends.)

The problem now is to calculate the probability the chain is absorbed into state X. The solution of ending in a particular absorbing state is well known (and given in the above link to Wikipedia).

No special trick for finding a maximizing X is necessary once we know how to solve this problem for any X; just list out all possible policies (there's only finitely many we need to worry about, and the number doesn't exceed 20 most of the time) and the expected winnings and pick the X maximizing this number.

The maximizing policy depends on what's in the chaos bag. Shocking, right? That said, this is an important point; each campaign/scenario/difficulty level has its own chaos bag, and thanks to cards with the **seal** keyword, the chaos bag can be changed *during* a scenario, perhaps to either the benefit or detriment of Henry Wan. Fortunately, the "S" and "F" language makes modelling the contents of the chaos bag so simple, we can create two-dimensional tables depending only on the number of "S's" and "F's" in the bag and those tables will cover nearly every scenario an investigator will encounter.

The script below (which can be made executable on Unix systems with R installed) can be used for generating such tables.

#!/usr/bin/Rscript ################################################################################ # ArkhamHorrorHenryWanTableGenerator.R ################################################################################ # 2018-12-02 # Curtis Miller ################################################################################ # This is a one-line description of the file. ################################################################################ # optparse: A package for handling command line arguments if (!suppressPackageStartupMessages(require("optparse"))) { install.packages("optparse") require("optparse") } ################################################################################ # FUNCTIONS ################################################################################ #' Henry Wan Policy Calculator #' #' Calculates important quantities for optimal play with Henry Wan #' #' @param s The number of "S" (or "success") tokens in the bag #' @param f The number of "F" (or "failure") tokens in the bag #' @param olive If \code{TRUE}, the first draw is done with Olive McBride #' @param out If \code{"X"}, return the optimal stopping time (default); if #' \code{"EV"}, return the expected winnings of the optimal policy; #' if \code{"P"}, return the probability of success of the optimal #' policy #' @return Numeric depending on the value of the parameter \code{out} #' @examples #' wan_policy_calculator(11, 5) wan_policy_calculator <- function(s, f, olive = FALSE, out = c("X", "EV", "P")) { out <- out[[1]] policies <- (ifelse(olive, 2, 1)):s # Candidate X values policy_probs <- sapply(policies, function(X) { # Set up transition matrix of Markov chain P <- 0 * diag(X + 2) rownames(P) <- c(0:X, "F") colnames(P) <- rownames(P) P[c(X, "F"), c(X, "F")] <- diag(2) transient_states <- ifelse(X > 1, list(c("0", 1:(X - 1))), "0")[[1]] P[transient_states, "F"] <- f/(s + f - (0:(X - 1))) if (olive) { if (s + f < 3 | X == 1) { stop("X or chaos bag doesn't make sense with Olive!") } # Failure with Olive is modeled with a hypergeometric RV, with drawing one # or fewer "S's" P["0", "F"] <- phyper(1, m = s, n = f, k = 3) # The state 1 is effectively removed when Olive is used transient_states <- transient_states[-2] P <- P[-2, -2] # TODO: curtis: OLIVE IMPELENTATION -- Sun 02 Dec 2018 11:05:17 PM MST } if (X > 1) { if (olive & X == 2) { P["0", "2"] <- 1 - P["0", "F"] } else { P[transient_states, as.character((ifelse(olive, 2, 1)):X)] <- diag( c(1 - P[transient_states, "F"])) } } else { P["0", "1"] <- 1 - P["0", "F"] } # Compute absorption probability R <- P[transient_states, c(X, "F")] Q <- P[transient_states, transient_states, drop = FALSE] N <- solve(diag(nrow(Q)) - Q) B <- N %*% R B[1,1][[1]] }) X <- which.max(policy_probs * policies) if (out == "X") { policies[[X]] } else if (out == "EV") { policies[[X]] * policy_probs[[X]] } else if (out == "P") { policy_probs[[X]] } else { stop(paste("Don't know how to handle out =", out)) } } wan_policy_calculator <- Vectorize(wan_policy_calculator, c("s", "f")) ################################################################################ # MAIN FUNCTION DEFINITION ################################################################################ main <- function(olive = FALSE, value = FALSE, prob = FALSE, digits = 2, lower_s = 5, upper_s = 20, lower_f = 0, upper_f = 8, help = FALSE) { # This function will be executed when the script is called from the command # line; the help parameter does nothing, but is needed for do.call() to work library(pander) sl <- lower_s su <- upper_s fl <- lower_f fu <- upper_f out <- "X" if (value) {out <- "EV"} if (prob) {out <- "P"} wan_table <- outer(sl:su, fl:fu, FUN = function(r, c) { wan_policy_calculator(r, c, olive = olive, out = out) }) rownames(wan_table) <- sl:su colnames(wan_table) <- fl:fu wan_table <- round(wan_table, digits = digits) pandoc.table(wan_table, style = "rmarkdown") } ################################################################################ # INTERFACE SETUP ################################################################################ if (sys.nframe() == 0) { cl_args <- parse_args(OptionParser( description = paste("Generates tables describing optimal policies", "for playing with the card Henry Wan in", "Arkham Horror: The Card Game (number of icon", "tokens in bag are columns; non-icon rows)."), option_list = list( make_option(c("--olive", "-o"), action = "store_true", default = FALSE, help = "The first draw is done with Olive"), make_option(c("--value", "-v"), action = "store_true", default = FALSE, help = paste("Report expected value rather than", "optimal stopping policy")), make_option(c("--prob", "-p"), action = "store_true", default = FALSE, help = paste("Report success probability of optimal", "stopping policy rather than the", "optimal stopping policy itself")), make_option(c("--digits", "-d"), type = "integer", default = 2, help = "Number of digits for rounding"), make_option(c("--lower-s", "-s"), type = "integer", default = 5, help = "Lowest considered number of non-icon tokens"), make_option(c("--upper-s", "-w"), type = "integer", default = 20, help = "Highest considered number of non-icon tokens"), make_option(c("--lower-f", "-f"), type = "integer", default = 0, help = "Lowest considered number of icon tokens"), make_option(c("--upper-f", "-r"), type = "integer", default = 8, help = "Highest number of icon tokens") ))) cl_args <- cl_args[c("olive", "value", "prob", "digits", "lower-s", "upper-s", "lower-f", "upper-f", "help")] names(cl_args) <- c("olive", "value", "prob", "digits", "lower_s", "upper_s", "lower_f", "upper_f", "help") do.call(main, cl_args) }

With the above script I can make the following three tables. The columns represent the number of (bad) icon tokens in the bag, while rows represent the number of other tokens in the bag. The first table is the optimal stopping policy; the second, the probability of success of the optimal stopping policy; and the third, the expected winnings of the optimal policy (which is the product of the previous two tables).

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |

6 |
6 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |

7 |
7 | 4 | 3 | 2 | 2 | 2 | 1 | 1 | 1 |

8 |
8 | 4 | 3 | 3 | 2 | 2 | 2 | 1 | 1 |

9 |
9 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 1 |

10 |
10 | 5 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

11 |
11 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

12 |
12 | 6 | 4 | 4 | 3 | 3 | 2 | 2 | 2 |

13 |
13 | 7 | 5 | 4 | 3 | 3 | 2 | 2 | 2 |

14 |
14 | 7 | 5 | 4 | 3 | 3 | 3 | 2 | 2 |

15 |
15 | 8 | 6 | 4 | 3 | 3 | 3 | 2 | 2 |

16 |
16 | 8 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

17 |
17 | 9 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

18 |
18 | 10 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

19 |
19 | 10 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

20 |
20 | 11 | 7 | 5 | 5 | 4 | 3 | 3 | 3 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
1 | 0.5 | 0.48 | 0.36 | 0.56 | 0.5 | 0.45 | 0.42 | 0.38 |

6 |
1 | 0.57 | 0.54 | 0.42 | 0.33 | 0.55 | 0.5 | 0.46 | 0.43 |

7 |
1 | 0.5 | 0.42 | 0.47 | 0.38 | 0.32 | 0.54 | 0.5 | 0.47 |

8 |
1 | 0.56 | 0.47 | 0.34 | 0.42 | 0.36 | 0.31 | 0.53 | 0.5 |

9 |
1 | 0.5 | 0.51 | 0.38 | 0.46 | 0.4 | 0.34 | 0.3 | 0.53 |

10 |
1 | 0.55 | 0.42 | 0.42 | 0.33 | 0.43 | 0.38 | 0.33 | 0.29 |

11 |
1 | 0.5 | 0.46 | 0.45 | 0.36 | 0.46 | 0.4 | 0.36 | 0.32 |

12 |
1 | 0.54 | 0.49 | 0.36 | 0.39 | 0.32 | 0.43 | 0.39 | 0.35 |

13 |
1 | 0.5 | 0.43 | 0.39 | 0.42 | 0.35 | 0.46 | 0.41 | 0.37 |

14 |
1 | 0.53 | 0.46 | 0.42 | 0.45 | 0.38 | 0.32 | 0.43 | 0.39 |

15 |
1 | 0.5 | 0.4 | 0.45 | 0.47 | 0.4 | 0.34 | 0.45 | 0.42 |

16 |
1 | 0.53 | 0.43 | 0.38 | 0.38 | 0.42 | 0.36 | 0.32 | 0.43 |

17 |
1 | 0.5 | 0.46 | 0.4 | 0.4 | 0.44 | 0.38 | 0.34 | 0.45 |

18 |
1 | 0.47 | 0.48 | 0.42 | 0.42 | 0.46 | 0.4 | 0.35 | 0.47 |

19 |
1 | 0.5 | 0.43 | 0.44 | 0.44 | 0.36 | 0.42 | 0.37 | 0.33 |

20 |
1 | 0.48 | 0.45 | 0.46 | 0.36 | 0.38 | 0.44 | 0.39 | 0.35 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 1.5 | 0.95 | 0.71 | 0.56 | 0.5 | 0.45 | 0.42 | 0.38 |

6 |
6 | 1.71 | 1.07 | 0.83 | 0.67 | 0.55 | 0.5 | 0.46 | 0.43 |

7 |
7 | 2 | 1.25 | 0.93 | 0.76 | 0.64 | 0.54 | 0.5 | 0.47 |

8 |
8 | 2.22 | 1.4 | 1.02 | 0.85 | 0.72 | 0.62 | 0.53 | 0.5 |

9 |
9 | 2.5 | 1.53 | 1.15 | 0.92 | 0.79 | 0.69 | 0.6 | 0.53 |

10 |
10 | 2.73 | 1.7 | 1.26 | 0.99 | 0.86 | 0.75 | 0.66 | 0.59 |

11 |
11 | 3 | 1.85 | 1.36 | 1.09 | 0.92 | 0.81 | 0.72 | 0.64 |

12 |
12 | 3.23 | 1.98 | 1.45 | 1.18 | 0.97 | 0.86 | 0.77 | 0.69 |

13 |
13 | 3.5 | 2.14 | 1.57 | 1.26 | 1.05 | 0.91 | 0.82 | 0.74 |

14 |
14 | 3.73 | 2.29 | 1.68 | 1.34 | 1.13 | 0.96 | 0.87 | 0.79 |

15 |
15 | 4 | 2.43 | 1.78 | 1.41 | 1.2 | 1.03 | 0.91 | 0.83 |

16 |
16 | 4.24 | 2.59 | 1.88 | 1.5 | 1.26 | 1.09 | 0.95 | 0.87 |

17 |
17 | 4.5 | 2.74 | 2 | 1.59 | 1.32 | 1.15 | 1.01 | 0.91 |

18 |
18 | 4.74 | 2.87 | 2.11 | 1.67 | 1.38 | 1.21 | 1.06 | 0.94 |

19 |
19 | 5 | 3.03 | 2.21 | 1.75 | 1.46 | 1.26 | 1.12 | 0.99 |

20 |
20 | 5.24 | 3.18 | 2.3 | 1.82 | 1.53 | 1.32 | 1.17 | 1.04 |

I view column 6, row 11 as the "typical" scenario, and the conclusion is this: *you'd be better off just grabbing a resource/drawing a card the usual way than by using Henry Wan!* Not only is Henry Wan worse than Leo de Luca, *he's worse than gaining resources with a regular action!*

Granted, there are cards with the *seal* keyword that can help improve the odds. But one must ask whether the opportunity cost of playing those cards is worth it. Perhaps the benefits of a favorable chaos bag for skill tests plus better Henry Wan games would give the investigators a *teeny tiny* edge… after a hell of a lot of work and lucky draw. That said, I'm sure there's much easier ways to play the game that are also more fun.

When Henry Wan was announced, people considered pairing him up with Olive McBride, who's ability works "when you would reveal a chaos token". Any investigator that can take both Mystic (purple) and Rogue (green) cards (including Sefina Rousseau and all Dunwich investigators; I don't count Lola Hayes since, while she can include both cards in her deck, using them together may not be possible) can include these two cards in the same deck.

I'll always assume that Olive's ability is utilized on the first draw. When using Olive with Henry, one can get two tokens drawn without either of them being a bad icon that ends the "game". Thus Olive boosts the success rate and the ultimate payout.

Having Olive and Henry out at the same time is extremely difficult; first, you'd have to have charismas to accomodate them, then draw them both in a game at reasonable times. The likelihood of getting the combo out is low and comes with significant opportunity costs.

That said, when Olive is out, she provides Henry enough of a boost to make him playable. The following tables account for Olive's effect (see the code for how) on the first draw but otherwise match up with the earlier tables.

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

6 |
6 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

7 |
7 | 4 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |

8 |
8 | 4 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |

9 |
9 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |

10 |
10 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

11 |
11 | 6 | 4 | 3 | 3 | 2 | 2 | 2 | 2 |

12 |
12 | 7 | 4 | 3 | 3 | 3 | 2 | 2 | 2 |

13 |
13 | 7 | 5 | 4 | 3 | 3 | 2 | 2 | 2 |

14 |
14 | 7 | 5 | 4 | 3 | 3 | 3 | 2 | 2 |

15 |
15 | 8 | 6 | 4 | 3 | 3 | 3 | 2 | 2 |

16 |
16 | 8 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

17 |
17 | 9 | 6 | 5 | 4 | 3 | 3 | 3 | 2 |

18 |
18 | 10 | 6 | 5 | 4 | 3 | 3 | 3 | 3 |

19 |
19 | 10 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

20 |
20 | 11 | 7 | 5 | 4 | 4 | 3 | 3 | 3 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
1 | 0.75 | 0.86 | 0.71 | 0.6 | 0.5 | 0.42 | 0.36 | 0.31 |

6 |
1 | 0.8 | 0.89 | 0.77 | 0.67 | 0.58 | 0.5 | 0.44 | 0.38 |

7 |
1 | 0.67 | 0.65 | 0.82 | 0.72 | 0.64 | 0.56 | 0.5 | 0.45 |

8 |
1 | 0.71 | 0.7 | 0.85 | 0.76 | 0.69 | 0.62 | 0.55 | 0.5 |

9 |
1 | 0.62 | 0.74 | 0.61 | 0.8 | 0.73 | 0.66 | 0.6 | 0.55 |

10 |
1 | 0.56 | 0.59 | 0.65 | 0.55 | 0.76 | 0.7 | 0.64 | 0.59 |

11 |
1 | 0.6 | 0.63 | 0.68 | 0.59 | 0.79 | 0.73 | 0.67 | 0.62 |

12 |
1 | 0.55 | 0.66 | 0.71 | 0.62 | 0.54 | 0.75 | 0.7 | 0.66 |

13 |
1 | 0.58 | 0.56 | 0.56 | 0.64 | 0.57 | 0.78 | 0.73 | 0.68 |

14 |
1 | 0.62 | 0.59 | 0.59 | 0.67 | 0.6 | 0.53 | 0.75 | 0.71 |

15 |
1 | 0.57 | 0.51 | 0.61 | 0.69 | 0.62 | 0.56 | 0.77 | 0.73 |

16 |
1 | 0.6 | 0.54 | 0.51 | 0.54 | 0.64 | 0.58 | 0.53 | 0.75 |

17 |
1 | 0.56 | 0.56 | 0.53 | 0.57 | 0.66 | 0.6 | 0.55 | 0.77 |

18 |
1 | 0.53 | 0.59 | 0.55 | 0.59 | 0.68 | 0.62 | 0.57 | 0.52 |

19 |
1 | 0.56 | 0.52 | 0.57 | 0.6 | 0.53 | 0.64 | 0.59 | 0.54 |

20 |
1 | 0.53 | 0.55 | 0.59 | 0.62 | 0.55 | 0.66 | 0.61 | 0.56 |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

5 |
5 | 2.25 | 1.71 | 1.43 | 1.19 | 1 | 0.85 | 0.73 | 0.63 |

6 |
6 | 2.4 | 1.79 | 1.55 | 1.33 | 1.15 | 1 | 0.87 | 0.77 |

7 |
7 | 2.67 | 1.96 | 1.63 | 1.44 | 1.27 | 1.13 | 1 | 0.89 |

8 |
8 | 2.86 | 2.1 | 1.7 | 1.53 | 1.37 | 1.23 | 1.11 | 1 |

9 |
9 | 3.12 | 2.21 | 1.83 | 1.59 | 1.45 | 1.32 | 1.2 | 1.09 |

10 |
10 | 3.33 | 2.38 | 1.95 | 1.65 | 1.52 | 1.39 | 1.28 | 1.18 |

11 |
11 | 3.6 | 2.52 | 2.04 | 1.76 | 1.57 | 1.46 | 1.35 | 1.25 |

12 |
12 | 3.82 | 2.64 | 2.12 | 1.85 | 1.62 | 1.51 | 1.41 | 1.31 |

13 |
13 | 4.08 | 2.8 | 2.24 | 1.93 | 1.71 | 1.56 | 1.46 | 1.37 |

14 |
14 | 4.31 | 2.95 | 2.36 | 2.01 | 1.79 | 1.6 | 1.51 | 1.42 |

15 |
15 | 4.57 | 3.07 | 2.45 | 2.07 | 1.86 | 1.67 | 1.55 | 1.46 |

16 |
16 | 4.8 | 3.24 | 2.54 | 2.17 | 1.93 | 1.75 | 1.58 | 1.5 |

17 |
17 | 5.06 | 3.38 | 2.66 | 2.26 | 1.99 | 1.81 | 1.65 | 1.54 |

18 |
18 | 5.29 | 3.51 | 2.77 | 2.34 | 2.04 | 1.87 | 1.71 | 1.57 |

19 |
19 | 5.56 | 3.67 | 2.87 | 2.42 | 2.12 | 1.92 | 1.77 | 1.63 |

20 |
20 | 5.79 | 3.82 | 2.96 | 2.49 | 2.2 | 1.97 | 1.82 | 1.69 |

Notice that when Olive is being used it's optimal to use Olive to get two tokens out (that you can pick) then end the "game". It seems that Olive does make Henry's ability profitable… albeit mildly. If we're using the heuristic that a card needs to pay for its own resource cost plus three for each action involved, I'd say that the combo would need at least eight turns to be profitable in a typical game… which is terrible.

Henry Wan is an expensive way to attempt to milk a little more value from Olive. Even with Olive I don’t think he’s worth the trouble.

It's equally true in Arkham as it is in real life: gambling is better for the house than the gambler (with the house being the forces of the mythos, in this case). If you're looking to have fun gambling, Henry Wan is your card. If you're looking to win… look elsewhere.

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>These past few weeks I’ve been writing about a new package I created, **MCHT**. Those blog posts were basically tutorials demonstrating how to use the package. (Read the first in the series here.) I’m done for now explaining the technical details of the package. Now I’m going to use the package for purpose I initially had: exploring the distribution of time separating U.S. economic recessions.

I wrote about this before. I suggested that the distribution of times between recessions can be modeled with a Weibull distribution, and based on this, a recession was likely to occur prior to the 2020 presidential election.

This claim raised eyebrows, and I want to respond to some of the comments made. Now, I would not be surprised to find this post the subject of an R1 on r/badeconomics, and I hope that no future potential employer finds this (or my previous) post, reads it, and then decides I’m an idiot and denies me a job. I don’t know enough to dogmatically subscribe to the idea but I do want to explore it. Blog posts are not journal articles, and I think this is a good space for me to make arguments that could be wrong and then see how others more intelligent than myself respond. The act of keeping a blog is good for me and my learning (which never ends).

My previous post on the distribution of times between recessions was… controversial. Have a look at the comments section of the original article and the comments of this reddit thread. Here is my summarization of some of the responses:

- There was no statistical test for the goodness-of-fit of the Weibull distribution.
- No data generating process (DGP) was proposed, in the sense that there’s no explanation for
*why*the Weibull distribution would be appropriate, or the economic processes that produce memory in the distribution of times between recessions. - Isn’t it strange to suggest that other economic variables are irrelevant to when a recession occurs? That seems counterintuitive.
- MAGA! (actually there were no MAGAs, thankfully)

Then there was this comment, by far the harshest one, by u/must_not_forget_pwd:

The idea that recessions are dependent on time is genuinely laughable. It is an idea that seems to be getting some traction in the chattering classes, who seem more interested in spewing forth political rantings rather than even the semblance of serious analysis. This also explains why no serious economist talks about the time and recession relationship.

The lack of substance behind this time and recession idea is revealed by asking some very basic questions and having a grasp of some basic data. If recessions were so predictable, wouldn’t recessions be easy to prevent? Monetary and fiscal policies could be easily manipulated so as to engineer a persistent boom.

Also, if investors could correctly predict the state of the economy it would be far easier for them to determine when to invest and to capture the subsequent boom. That is, invest in the recession, when goods and services are cheaper and have the project come on stream during the following boom and make a massive profit. If enough investors acted like this, there would be no recession to begin with due to the increase in investment.

Finally, have a look at the growth of other countries. Australia hasn’t had two consecutive quarters of negative growth since the 1990-91 recession. Sure there have been hiccups along the way for Australia, such as the Asian Financial Crisis, the introduction of the GST, a US recession in the early 2000s, and more recently the Global Financial Crisis. Yet, Australia has managed to persist without a recession despite the passage of time. No one in Australia would take you seriously if you said that recessions were time dependent.

If these “chattering classes” were interested in even half serious analysis of the US economy, while still wanting to paint a bleak picture, they could very easily look at what is going on right now. Most economists have the US economy growing above trend. This can be seen in the low unemployment rate and that inflation is starting to pickup. Sure wages growth is subdued, but wages growth should be looking to pickup anytime now.

However, during this period the US government is injecting a large amount of fiscal stimulus into the US economy through tax cuts. Pumping large amounts of cash into the economy during a boom isn’t exactly a good thing to do and is a great way to overheat the economy and bring about higher inflation. This higher inflation would then cause the US Federal Reserve to react by increasing interest rates. This in turn could spark a US recession.

Instead of this very simple and defensible story that requires a little bit of homework, we get subjected to this nonsense that recessions are linked to time. I think it’s time that people call out as nonsense the “analysis” that this blog post has.

TL;DR: The idea that recessions are dependent on time is dumb, and if recessions were so easy to predict would mean that recessions wouldn’t exist. This doesn’t mean that a US recession couldn’t happen within the next few years, because it is easy to see how one could occur.

I think that the tone of this message could have been… nicer. That said, I generally welcome direct, harsh criticism, as I often learn a lot from it, or at least am given a lot to think about.

So let’s discuss these comments.

First, a statistical test for the goodness of fit of the Weibull distribution. I personally was satisfied looking at the plots I made, but some people want a statistical test. The test that comes to mind is the Kolmogorov-Smirnov test, and R does support the simplest version of this test via `ks.test()`

, but when you don’t know all of the parameters of the distribution assumed under the null hypothesis, then you cannot use `ks.test()`

. This is because the test was derived assuming there were no unknown parameters; when nuisance parameters are present and need to be estimated, then the distribution used to compute -values is no longer appropriate.

Good news, though; **MCHT** allows us to do the test properly! First, let’s get set up.

library(MCHT) library(doParallel) library(fitdistrplus) recessions <- c( 4+ 2/12, 6+ 8/12, 3+ 1/12, 3+ 9/12, 3+ 3/12, 2+ 0/12, 8+10/12, 3+ 0/12, 4+10/12, 1+ 0/12, 7+ 8/12, 10+ 0/12, 6+ 1/12) registerDoParallel(detectCores())

I already demonstrated how to perform a bootstrap version of the Kolmogorov-Smirnov test in one of my blog posts about **MCHT**, and the code below is basically a direct copy of that code. While the test is not exact, it should be asymptotically appropriate.

ts <- function(x) { param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] ks.test(x, pweibull, shape = shape, scale = scale, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] rweibull(n, shape = shape, scale = scale) } b.wei.ks.test <- MCHTest(test_stat = ts, stat_gen = ts, rand_gen = rg, seed = 123, N = 1000, method = paste("Goodness-of-Fit Test for Weibull", "Distribution")) b.wei.ks.test(recessions)

## ## Goodness-of-Fit Test for Weibull Distribution ## ## data: recessions ## S = 0.11318, p-value = 0.94

The test does not reject the null hypothesis; there isn’t evidence that the data is not following a Weibull distribution (according to that test; read on).

Compare this to the Kolmogorov-Smirnov test checking whether the data follows the exponential distribution.

ts <- function(x) { mu <- mean(x) ks.test(x, pexp, rate = 1/mu, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) mu <- mean(x) rexp(n, rate = 1/mu) } b.ks.exp.test <- MCHTest(ts, ts, rg, seed = 123, N = 1000, method = paste("Goodness-of-Fit Test for Exponential", "Distribution")) b.ks.exp.test(recessions)

## ## Goodness-of-Fit Test for Exponential Distribution ## ## data: recessions ## S = 0.30074, p-value = 0.023

Here, the null hypothesis is rejected; there is evidence that the data wasn’t drawn from an exponential distribution.

What do the above two results signify? If we assume that the time between recessions is independent and identically distributed, then there is not evidence against the Weibull distribution, but there is evidence against the exponential distribution. (The exponential distribution is actually a special case of the Weibull distribution, so the second test effectively rules out that special case.) The exponential distribution has the *memoryless* property; if we say that the time between events follows an exponential distribution, then knowing that it’s been minutes since the last event occurs tells us *nothing* about when the next event occurs. The Weibull distribution, however, has *memory* when the shape parameter is not 1. That is, knowing how long it’s been since the last event occured does change how likely the event is to occur in the near future. (For the parameter estimates I found, a recession seems to become more likely the longer it’s been since the last one.)

We will revisit the goodness of fit later, though.

I do have some personal beliefs about what causes recessions to occur that would lead me to think that the time between recessions does exhibit some form of memory and would also address the point raised by u/must_not_forget_pwd about Australia not having had a recession in decades. This perspective is primarily shaped by two books, [1] and [2].

In short, I agree with the aforementioned reddit user; recessions are not inevitable. The stability of an economy is a characteristic of that economy and some economies are more stable than others. [1] notes that the Canadian economy had a dearth of banking crises in the 19th and 20th centuries, with the most recent one effectively due to the 2008 crisis in the United States. Often the stability of the financial sector (and probably the economy as a whole) is strongly related to the political coalition responsible for drafting the *de facto* rules that the financial system follows. In some cases the financial sector is politically weak and continuously plundered by the government. Sometimes it’s politically weak and allowed to exist unmolested by the government but is well whipped. Financiers are allowed to make money and the government repays its debts but if the financial sector steps out of line and takes on too much risk it will be punished. And then there’s the situation where the financial sector is politically powerful and able to get away with bad behavior, perhaps even being rewarded for that behavior by government bailouts. That’s the financial system the United States has.

So let’s consider the latter case, where the financial sector is politically powerful. This is where the Minsky narrative (see [2]) takes hold. He describes a boom-and-bust cycle, but critically, the cause of the bust was built into the boom. After a bust, many in the financial sector “learn their lesson” and become more conservative risk-takers. In this regime the economy recovers and some growth resumes. Over time, the financial sector “forgets” the lessons it learned from the previous bust and begins to take greater risks. Eventually these risks become so great that a greater systematic risk appears and the financial sector, as a whole, stands on shaky ground. Something goes wrong (like the bottom falls out of the housing market or the Russian government defaults), the bets taken by the financial sector go the wrong way, and a crisis ensues. The extra wrinkle in the American financial system is that the financial sector not only isn’t punished for the risks they’ve taken, they get rewarded with a bailout financed by taxpayers and the executives who made those decisions get golden parachutes (although there may be a trivial fine).

If the Minsky narrative is correct, then economic booms do die of “old age”, as eventually the boom is driven by increasingly risky behavior that eventually leads to collapse. When the government is essentially encouraging this behavior with blank-check guarantees, the risks taken grow (risky contracts become lotto tickets paid for by someone else when you lose, but you get all the winnings). Taken together, one can see why there could be some form of memory in the time between recessions. Busts are an essential feature of such an economy.

So what about the Australian economy, as u/must_not_forget_pwd brought up? In short, I think the Australian economy is prototyped by the Canadian economy as described in [1] and thus doesn’t follow the rules driving the boom/bust cycle in America. I think the Australian economy is the Australian economy and the American economy is the American economy. One is stable, the other is not. I’m studying the unstable one, not trying the explain the stability of the other.

First, does time matter to when a recession occurs? The short answer is “Yes, duh!” If you’re going to have any meaningful discussion about when a recession will occur you have to account for the time frame you’re considering. A recession within the next 30 years is much more likely than a recession in the next couple months (if only because one case covers the other, but in general a recession should be more likely to occur within a longer period of time than a shorter one).

But I think the question about “does time matter” is more a question about whether an economy essentially remembers how long it has been since the last recession or not. That’s both an economic and statistical question.

What about other variables? Am I saying that other variables don’t matter when I use only time to predict when the next recession occurs? No, that’s not what I’m saying.

Let’s consider regression equations, often of the form

I think economists are used to thinking about equations like this as essentially causal statements, but that’s not what a regression equation is, and when we estimate a regression equation we are not automatically estimating a function that needs to be interpreted causally. If a regression equation tells us something about causality, that’s great, but that’s not what they do.

Granted, economics students are continuously being reminded the correlation is not causation, but I think many then start to think that we should not compute a regression equation unless the relationship expressed can be interpreted causally. However, knowing that two variables are correlated, and how they are correlated, is often useful.

When we compute a regression function from data, we are computing a function that estimates *conditional expectations*. This function, when given the value of one variable, tells us what value we can expect for the other variable. That relationship may or may not be due to causality, but the fact that the two variables are not independent of each other can be, in and of itself, a useful fact.

My favorite example in the “correlation is not causation” discussion (probably mentioned first in some econometrics textbook or my econometrics professor) is the relationship between the damage caused by a fire and the number of firefighters at the scene of the fire. Let’s just suppose that we have some data, is the amount of damage in a fire (in thousands of dollars), is the number of firefighters, and we estimated the relationship

There is a positive relationship between the number of firefighters at the scene of the fire and the damage done by the fire. Does this mean that firefighters make fires worse? No, it does not. But if you’re a spectator and you see ten firefighters running the scene of a fire, can you expect the fire to be more damaging than fires where there are five firefighters and not as damaging as fires with fifteen firefighters? Sure, this is reasonable. Not only that, it’s a useful fact to know.

Importantly, when we choose the variables to include in a regression equation, we are deciding what variables we want to use for conditioning. That choice could be motivated by a causal model (because we care about causality), or by model fit (making the smallest error in our predictions while being sufficiently simple), or simply by what’s available. Some models may do better than others at predicting a variable but they all do the same thing: compute conditional expectations.

My point is this: when I use time as the only variable of interest when attempting to predict when a recession occurs, I’m essentially making a prediction based on a model that conditions only on time and nothing else. That’s not the same thing as saying that excluded variables don’t matter. Rather, a variable excluded in the model is effectively treated as being a part of the random soup that generated the data I observe. I’m not conditioning on its values to make predictions. Could my prediction be refined by including that information? Perhaps. But that doesn’t make the prediction automatically useless. In fact, I think we should *start* with predictions that condition on little to see if conditioning on more variables adds any useful information, generally preferring the simple to the complex given equal predictive value. This is essentially what most -tests automatically reported with statistical software do; they check if the regression model involving possibly multiple parameters does any better than one that only uses the mean of the data to predict values.

I never looked at a model that uses more information than just time, though. I wouldn’t be shocked if using more variables would lead to a better model. But I don’t have that data, and to be completely honest, I don’t want to spend the time to try and get a “great” prediction for when the next recession will occur. My numbers are essentially a back-of-the-envelope calculation. It could be improved, but just because there’s (perhaps significant) room for improvement doesn’t render the calculation useless, and I think I may have evidence that shows the calculation has some merit.

The reddit user had a long discussion about how well the economy would function if predicting the time between recessions only depended on time, that the Federal Reserve would head off every recession and investors would be adjusting their behavior in ways that render the calculation useless. My response is this: I’m not a member of the Fed. I have no investments. My opinion doesn’t matter to the economy. Thus, it’s okay for me to treat the decisions of the Fed, politicians, bank presidents, other investors, and so forth, as part of that random soup producing the economy I’m experiencing, because my opinions do not invalidate the assumptions of the calculation.

There is a sense in which statistics are produced with an audience in mind. I remember Nate Silver making this point in a podcast (don’t ask me which) when discussing former FBI director James Comey’s decision almost days before the 2016 presidential election to announce a reopening of an investigation into Hillary Clinton’s e-mails, which was apparently at least partially driven by the belief that Clinton was very likely to win. Silver said that Comey did not account for the fact that he was a key actor in the process he was trying to predict and that his decisions could change the likelihood of Clinton winning. He invalidated the numbers with his decision based on them. He was not the target audience of the numbers Nate Silver was producing.

I think a similar argument can be made here. If my decisions and beliefs mattered to the economy, then I should account for them in predictions, conditioning on them. But they don’t matter, so I’ve invalidated nothing, and the people who do matter likely are (or should be) reaching conclusions in a much more sophisticated way.

I’m a statistician. Statistics is my hammer. Everything looks like a nail to me. You know why? Because hammering nails is fun.

When I read u/must_not_forget_pwd’s critique, I tried to formulate it in a mathematical way, because that’s what I do. Here’s my best way to describe it in mathematical terms:

- The time between recessions are all independent of one another.
- Each period of growth follows its own distribution, with its own unique parameters.
- The time separating recessions is memoryless. Knowing how long it has been since the last recession tells us nothing about how much longer we have till the next recession.

I wanted a model that one might call “maximum unpredictability”. So if are the times separating recessions, then points 1, 2, and 3 together say that are independent random variables and , and there’s no known relationship between . If this is true, we have no idea when the next recession will occur because there’s no pattern we can extract.

My claim is essentially that , with and there’s only one . If I were to then attempt to formulate these as statistical hypotheses, those hypotheses would be:

Is it possible to decide between these two hypotheses? They’re not nested and it’s not really possible to use the generalized likelihood ratio test because the parameter space that includes both and is too big (you’d have to estimate parameters using data points). That said, they both suggest likelihood functions that, individually, can be maximized, and you might consider using the ratio between these two maximized functions as a test statistic. (Well, actually, the negative log likelihood ratio, which I won’t write down in math or try to explain unless asked, but you can see the end result in the code below in the definition of `ts()`

.)

Could that statistic be used to decide between the two hypotheses? I tried searching through literature (in particular, see [3]) and my conclusion is… *maybe?* To be completely honest, by this point we’ve left the realm of conventional statistics and are now turning into mad scientists, because not only are the hypotheses we’re testing and the statistic we’re using to decide between them just *wacky*, how the hell are we supposed to know the distribution of this test statistic under the null hypothesis when there are *two* nuisance parameters that likely aren’t going anywhere? Oh, and while we’re at it, the sample size of the data set of interest is really small, so don’t even *think* about using asymptotic reasoning!

I think you can see how this descent into madness would end up with me discovering the maximized Monte Carlo test (see [4]) and then writing **MCHT** to implement it. I’ll try anyting once, so the product of all that sweat and labor is below.

ts <- function(x) { n <- length(x) params <- coef(fitdist(x, "weibull")) k <- params[["shape"]] l <- params[["scale"]] (n * k - n + 1) * log(l) - log(k) + sum(l * (-k) * x^k - k * log(x)) - n } mcsg <- function(x, shape = 2, scale = 1) { x <- qweibull(x, shape = shape, scale = scale) test_stat(x) } brg <- function(x) { n <- length(x) params <- coef(fitdist(x, "weibull")) k <- params[["shape"]] l <- params[["scale"]] rweibull(n, shape = k, scale = l) } mc.mem.test <- MCHTest(ts, mcsg, seed = 123, nuisance_params = c("shape", "scale"), N = 1000, optim_control = list("lower" = c("shape" = 0, "scale" = 0), "upper" = c("shape" = 100, "scale" = 100), "control" = list("max.time" = 60)), threshold_pval = 0.2, localize_functions = TRUE, method = "MMC Test for IID With Memory") b.mem.test <- MCHTest(ts, ts, brg, seed = 123, N = 1000, method = "Bootstrap Test for IID With Memory") b.mem.test(recessions)

## ## Bootstrap Test for IID With Memory ## ## data: recessions ## S = -4601.9, p-value = 0.391

mc.mem.test(recessions)

## Warning in mc.mem.test(recessions): Computed p-value is greater than ## threshold value (0.2); the optimization algorithm may have terminated early

## ## MMC Test for IID With Memory ## ## data: recessions ## S = -4601.9, p-value = 0.962

Both tests failed to reject the null hypothesis. Unfortunately that doesn’t seem to say much. First, it doesn’t show the null hypothesis isn’t correct; it’s just not *obviously* incorrect. This is always the case, but the bizarre test I’m implementing here is severely underpowered perhaps to the point of being useless. The alternative hypothesis (which I assigned to my “opponent”) is severely disadvantaged.

The conclusion of the above results isn’t in fact that I’m right. Given the severe lack of power of the test, I would say that the results of the test above are essentially inconclusive.

I’m going to be straight with you: if you read this whole article, I probably wasted your time, and for that I am truly sorry.

I suppose you got to enjoy some stream-of-consciousness thoughts about a controversial blog post I wrote where I made a defense that may or may not be convincing, then watched as I developed a strange statistical test that probably didn’t even work to settle a debate with some random guy on reddit, saying he claimed something that honestly he would likely deny and end that imaginary argument inconclusively.

But hey, at least I satisfied my curiosity. And I’m pretty proud of **MCHT**, which I created to help me write this blog post. Maybe if I hadn’t spent three straight days writing nothing but blog posts, this one would have been better, but the others seemed pretty good. So something good came out of this trip… right?

Maybe I can end like this: do I still think that a recession before the 2020 election is likely? Yes. Do I think that a Weibull describes the time between recessions decently? Conditioning on nothing else, I think so. I still think that my previous work has some merit as a decent back-of-the-envelope calculation. Do I think that the time between recessions has a memory? In short, yes. And while we’re on the topic, I’m not the Fed, so my opinions don’t matter.

All that said, though, smarter people than me may have different opinions and their contributions to this discussion are probably more valuable than mine. For instance, the people at Goldman Sachs believe a recession soon is unlikely; but the people at J.P. Morgan Chase believe a recession could strike in 2020. I’m certainly persuadable on the above points, and as I’ve said before, I think the simple analysis could enhance the narrative advanced by better predictions.

Now that I’ve written this post, we will return to our regular scheduled programming. Thanks for reading! (Please don’t judge me.)

- C. Calomiris and S. Haber,
*Fragile by design: the political origins of banking crises and scarce credit*(2014), Princeton University Press, Princeton - H. P. Minsky,
*Stabilizing an unstable economy*(1986), Yale University Press, New Haven - D. R. Cox,
*Tests of separate families of hypotheses*, Proc. Fourth Berkeley Symp. on Math. Stat. and Prob., vol. 1 (1961) pp. 105-123 - J-M Dufour,
*Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics*, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Over the past few weeks I’ve published articles about my new package, **MCHT**, starting with an introduction, a further technical discussion, demonstrating maximized Monte Carlo (MMC) hypothesis testing, bootstrap hypothesis testing, and last week I showed how to handle multi-sample and multivariate data. This is the final article where I explain the capabilities of the package. I show how **MCHT** can handle time series data.

I should mention that I’m not focused on the merits of the procedures I use as examples in these posts, and that’s going to be the case here. It’s possible (perhaps even likely) that there’s a better way to decide between the hypotheses than what I show here. In these articles, I’m more interested in showing what *can* be done rather than what *should* be done. In particular, I like simple examples that many can understand, even if they may not be the best tool for the task at hand.

So far I don’t think this has been a serious issue; that is, I don’t think the procedures I’ve shown so far could be considered controversial (I think the most controversial would be the permutation test example). But the example I want to use here could be argued with; I personally would not use it. That said, I’m still willing to demonstrate it because it doesn’t take much to understand what’s going on and it does demonstrate how time series data can be handled.

Suppose we want to perform a test for the location of the mean, and thus decide between the hypotheses

There is the usual -statistic, which is , and as mentioned before the statistic assumes that the data came from a Normal distribution. That’s not all the test assumes, though. It also assumes that the data is independent and identically distributed.

In cross-sectional contexts this is fine, but it’s not okay when the data could depend on time and thus is not independent and identically distributed. Suppose instead that our data was generated according to a first-order autoregressive process (AR(1)), described below:

In this context, assume and is independent and identically distribution. It’s no longer given that the conventional -test will work as marketed since the data is no longer independent or identically distributed. Additionally, we have two nuisance parameters, and , that need to be accounted for.

We will view and as nuisance parameters and use MMC testing to handle them. That leaves the question of how to simulate an AR(1) process. With **MCHT**, if you can simulate a process, you can test with it.

The time series model above has a stationary solution when and when ranges between and . It's not possible to simulate a series of infinite length but one can get close by simulating a series that is very long. In particular, one can simulate, say, 500 terms of the series starting at a fixed number, then the actual number of terms of the series wanted, then throw away the first 500 terms. This is known as burn-in and it's very common practice in time series simulation.

Fortunately `MCHTest()`

allows for burn-in. Suppose that the sample size of the actual dataset is and we've decided that we want a burn-in period of . Then we can do the following:

- Generate random numbers to represent (except possibly for the scaling factor, as we're treating that as a nuisance parameter).
- Apply the recursive formula described above to the series after scaling the series by and using a chosen , and add to it.
- Keep only the last terms of the series; throw away the rest. This is your simulated dataset.
- After having obtained the simulated dataset, proceed with the Monte Carlo test as usual.

With MMC, the unscaled series is fixed after we generate it and we use optimization to adversarially choose and so that we maximize the -value of the test.

When using `MCHTest()`

, the `rand_gen`

function does not need to produce a dataset of the same length as the original dataset; this allows for burning it. However, if you're going to do this, then the `stat_gen`

function needs to know what the sample size of the dataset is, but all you need to do is give the `stat_gen`

function the parameter `n`

; this will be given the sample size of the original dataset. And of course the `test_stat`

function won't care whether the data came from a time series or not.

Putting this all together, we create the following test.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x, mu = 0) { sqrt(length(x)) * (mean(x) - mu)/sd(x) } rg <- function(n) { rnorm(n + 500) # Extra terms for a burn-in period } sg <- function(x, n, mu = 0, rho = 0, sigma = 1) { x <- sigma * x if (abs(rho) >= 1) {stop("Bad rho given!")} eps <- filter(x, rho, "recursive") # Apply the recursion eps <- eps[-(1:500)] # Throw away first 500 observations; they're burn-in dat <- eps + mu test_stat(dat, mu = mu) # Will be localizing } mc.ar1.t.test <- MCHTest(ts, sg, rg, N = 1000, seed = 123, test_params = "mu", nuisance_params = c("rho", "sigma"), optim_control = list(lower = c("rho" = -0.999, "sigma" = 0), upper = c("rho" = 0.999, "sigma" = 100), control = list("max.time" = 10)), threshold_pval = 0.2, localize_functions = TRUE, lock_alternative = FALSE) dat <- c(-1.02, -1.13, 0.53, 0.21, 1.76, 1.79, 1.42, -0.31, -0.28, -0.44) mc.ar1.t.test(dat, mu = 0, alternative = "two.sided")

## Warning in mc.ar1.t.test(dat, mu = 0, alternative = "two.sided"): Computed ## p-value is greater than threshold value (0.2); the optimization algorithm ## may have terminated early

## ## Monte Carlo Test ## ## data: dat ## S = 0.73415, p-value = 0.264 ## alternative hypothesis: true mu is not equal to 0

mc.ar1.t.test(dat, mu = 3, alternative = "two.sided")

## Warning in mc.ar1.t.test(dat, mu = 3, alternative = "two.sided"): Computed ## p-value is greater than threshold value (0.2); the optimization algorithm ## may have terminated early

## ## Monte Carlo Test ## ## data: dat ## S = -7.9712, p-value = 0.504 ## alternative hypothesis: true mu is not equal to 3

t.test(dat, mu = 3, alternative = "two.sided") # For reference

## ## One Sample t-test ## ## data: dat ## t = -7.9712, df = 9, p-value = 2.278e-05 ## alternative hypothesis: true mean is not equal to 3 ## 95 percent confidence interval: ## -0.5265753 1.0325753 ## sample estimates: ## mean of x ## 0.253

I have now covered what I consider the essential technical functionality of **MCHT**. All of the functionality I described in these posts is functionality that I want this package to have. Thus I personally am quite happy this package exists, which is good; I'm the package's primary audience, after all. All I can hope is that others find the package useful too.

I wrote this article more than a month before it was published, so perhaps I have made an update that isn't being accounted for here, but as of this version (0.1.0), I'd call the package in a beta stage of stability; it's usable, but features could be added or removed and there could be unknown bugs.

The following is a list of possible areas of expansion. This list exists mostly because I think it needs to exist; it gives me something to aim for before making a 1.0 release. That said, they could be useful features.

*A function for making diagnostic-type plots for tests, such as a function creating a plot for the rejection probability function (RPF) as described in [1].

*A function that accepts a `MCHTest`

-class object and returns a function that, rather than returning a `htest`

-class object, returns a function that will give the test statistic, simulated test statistics, and a -value, in a list; could be useful for diagnostic work.

*Real-world datasets that can be used for examples.

*Functions with a simpler interface than `MCHTest`

, perhaps with more restrictions on inputs.

*Pre-made `MCHTest`

objects perhaps implementing common Monte Carlo or Bootstrap tests.

I also welcome community requests and collaboration. If you want a feature, consider issuing a pull request on GitHub.

Do you want more documentation? More examples? More background? Let me know! I'd be willing to write more on this subject. Perhaps if I amass enough content I could write a book documenting **MCHT** and Monte Carlo/bootstrap testing.

These blog posts together extend beyond 10,000 words, so I'm thinking I have enough material to submit an article to, say, *J. Stat. Soft.* or the *R Journal* and thus get my first publication where I'm the sole author. But this is something I'm still considering; I'm an insecure person at heart.

Next week I will still be using this package in a blog post, but I won't be writing about how to use it anymore; instead, I'll be using it to revisit a proposition I made many months ago. (It was because of that article I created this package.) Stay tuned, and thanks for reading!

- R. Davidson and J. G. MacKinnon,
*The size distortion of bootstrap test*, Econometric Theory, vol. 15 (1999) pp. 361-376

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>My grandfather (to me, “Grandpa”) died on Friday, November 2nd, 2018, a few days before this post was written. As I write this I am with my family in Blackfoot, Idaho, staying in his and my grandmother’s (or “Grandma’s”) house; she has survived him. I will be staying with them until the funeral on Saturday, November 10th, 2018 at the Hawker Funeral Home, is over. (The funeral starts at 2:00 PM.)

My Grandpa was 90 years old, born on April 16th, 1928 (sharing his birthday with my brother, who was born in 1993). He died in a car accident while driving from Idaho Falls, Idaho to his home in Blackfoot, after having called my family and told them “I’m loaded up and I’m coming home.” He rear-ended a semi truck stopped to turn into Love’s truck stop on the highway. We believe that he must not have seen the truck in time, because there were long skid marks indicating that he slammed the brakes of his truck, which was towing a heavy trailer. (So we also know that he did not fall asleep at the wheel.) There was little damage to either the trailer or his vehicle, but the airbags in his car went off. He was awake for a moment after the accident since he unbuckled himself, but soon lost consciousness. He had a pacemaker, and we think that it failed and he effectively had a heart attack since his heart had stopped. (It’s possible that the airbag dealt trauma to his chest and head, thus prompting the failure of the pacemaker.) He died in the ambulance. No member of our family was with him.

That Friday I was planning to have dinner with a graduate student friend of mine. I had canceled the week earlier due to a surprise visit from my sister, but I told him “Nothing could possibly interfere with our dinner tonight except for possibly my brother, and I’ll just tell him that I canceled on you before and I’m not going to do it again.” That was some time around noon. At 4:30, my friend and I started reading the rules for a game we were going to play before having dinner, then got into a long discussion with another student about issues relating to privilege, the state of minorities, microagressions, and so on, which lasted till about 6:00. I checked my phone just to make sure that no one had tried to contact me and I discovered there were at least eight calls and a bunch of messages. So I called my brother and he told me to cancel my plans because something happened. After some prodding he told me that Grandpa died. So I agreed to meet him at my apartment. I told my friend, “I have to go; my Grandpa’s dead.” He said “Okay,” and I left. (So I canceled on him again; I’ll have to remember not to tell someone “Nothing could possibly cause me to cancel” because I might just kill someone else for calling down the karma.)

That’s all I want to say about my Grandpa’s death for now. The rest of this article is my memorial of him and the impact he had on my life.

My Grandpa Douglas was born and raised in Blackfoot, Idaho. His father was George Wareing, his mother was Amelia Hansen, and his brother was LaVere. He was born in 1928, so he was able to remember the depression. He remembered hobos coming to his parents’ door and his mother making them sandwiches.

I don’t know much about his relationships except for his relationship with his brother. LaVere was the eldest and I remember my Grandpa not particularly caring for games when he was older since LaVere would get very angry with Grandpa when Grandpa won. (But Grandpa definitely knew how to play Go Fish; I remember him occasionally cleaning someone out in games since he better remembered who had what in their hand. He enjoyed lightly competitive family games as well.) My understanding was that for most of his life my Grandpa had a poor relationship with his brother. That said, I heard that before LaVere died after the passing of his wife, their relationship improved.

I think the most distinct story I remember of Grandpa’s childhood was him riding trains. He loved trains all through his life. He would go to the train yard and the workers would bring him into the locomotive and he’d ride around with them. Thus for all his life Grandpa loved trains.

Grandpa had dyslexia. He was an intelligent person but he struggled with reading and writing. He was told repeatedly that he was stupid and he took those words to heart; throughout his life he would belittle himself, which his loved ones did not like hearing. The military would disagree with this idea that he was “stupid”; Grandpa said he was told he scored high on a military IQ test and the military wanted to have him join an intelligence division. The thought of a sea of documents Grandpa would have to read lead him to turn the offer down; while in the Air Force in the Korean War, he worked on a boat tasked with rescuing downed pilots. (Grandpa enlisted because he didn’t want to get drafted.)

My Grandpa met my Grandma when a friend of his said he met a girl from Chicago with a sister and invited my Grandpa to take the sister on a date with him. Grandpa accepted, and while he was at the house waiting, he saw my Grandma, Barbara Marshall, come into the room. She was a very beautiful girl, and Grandpa enjoyed the chat he had with her. He said to himself “I want that one,” and he courted and eventually married her. (She was not the sister he was supposed to meet.) Next summer would have been their 70th wedding anniversary. Grandpa never ceased to adore Grandma. To me, the best evidence of his love for her were the love notes he would stick to her mirror. When I would visit my Grandparents I made sure to see what new notes were on the mirror.

Grandpa taught me what true love is and that while marriage may take work it’s worth it. When I envision the ideal marriage I picture Grandma and Grandpa.

Grandma and Grandpa had seven children in total, four boys and three girls. David, Nancy, and Michael (“Mike”) were the first three and the eldest (in that order); Grandpa called them “the first family.” Grant, Paul, Amy (my mother) and Karen (in birth order again) were “the second family”.

While Grandpa did a variety of jobs and sometimes the money was very tight, he was a teacher by profession, and he loved his work. After his period in the military, he went to college at Idaho State University. He majored in political science and minored in economics. He learned to teach music. While a teacher he taught government classes, history classes, and music. He loved working with kids and many of his students remember him fondly.

Grandpa was once a Republican, even running for state office as a Republican and leading the local Republican party. I think he turned against the party because he opposed their foreign policy (Grandpa hated wars), and I don’t think he could ever be considered a conservative.

Grandpa lived on a farm that, to my understanding, had been in the family. The farm was divided when the I-15 was built, and two branches of the family now live on two separate parts of that original family farm. Grandpa was not a farmer, but he knew how to run farm equipment and took good care of the property for his whole life. In recent years he leased the property out to others, allowing their herds of cattle to graze before being sold to slaughter.

Grandpa himself hated killing anything. He once worked on a feed lot, which of course fed cattle that were intended to be slaughtered. (Grandpa was a real cowboy.) He hated that idea. He would rescue the spiders in the house before Grandma found and squashed them. Recently his property became infested with marmots, which were destroying his equipment. He shot them to death, but with quivering hands; he didn’t want to kill them.

My Grandpa didn’t have a mean bone in his body. In fact, the word I remember him most saying was “love”. He meant not only love for his wife but love for everyone. He would tell the nurse in a hospital or a stranger in a shopping line that they were wonderful and special and loved. He twisted a common poem with Christian overtones to read

One life to live

t’will soon be passed;

only what’s done

inkindnesswill last

Grandpa was one of the kindest people I knew.

My Mom and Dad initially lived in a single-wide trailer on my Grandparents property, when my Dad first was editor of the local newspaper, the *Morning News*. At that age my Grandpa got to know me. He gave me my first nickname, “destructo”, since I often left a big mess.

I obviously don’t remember much of this early time of my life, but my Grandpa did. He remembered driving me around in his pickup while playing jazz on the radio (Grandpa always loved jazz, and he gave me my taste for it). This was one of his favorite memories. He would carry me around while he did his chores, even when operating tractors on the property.

My Dad lost his job as editor of the *Morning News* and went to school for two years to become a computer programmer. When he graduated, he moved the family to Salt Lake City, and I grew up in a suburb of the city, West Jordan. Yet I still managed to grow close to my Grandpa. I think this is primarily because when I was in elementary school, we were on the track schedule: there were four tracks (A, B, C, and D), which rotated through a three-week vacation throughout the school year so that we had a schedule of three weeks off, nine weeks on. Frequently during those breaks my Mom would take my brother and I (and later my sister, when she was born in 1999) to spend a week with my Grandparents while my Dad stayed home to work.

Grandpa was known for giving nicknames to people. After “destructo”, I was “Colonel”, then “decum” after I turned ten. I think I’ve gotten other names too; he recently would call me “the professor”. But of the names he gave me, I think “Colonel” is my favorite. It’s also the first name I remember.

Grandpa liked trains and perhaps it was from him that I learned to like trains, especially the old steam trains. My Grandpa’s children bought for him a fancy HO-scale electric train, the locomotive a 4-6-6-4 *Challenger*. Around that time one of the remaining *Challengers* traveled through our area and we were able to see the real locomotive. But I was enamored with the model; I loved watching it drive around the track. Apparently I was the person who broke the model; it was never fixed. Nevertheless, my Grandpa got me liking trains. (To this day, my favorite locomotive is the *Challenger*.)

My parents bought me a Life-Like HO-scale electric train set not intended just for children but for model railroad enthusiasts as well one Christmas. This spawned an ill-advised hobby in my childhood around model trains; no one knew what they were doing but we were going to try to build a layout complete with scenics and landscaping. My Grandpa encouraged this. One of my favorite memories of him was driving from Blackfoot to Pocatello to a hobby shop to buy model trains and accessories. He bought me a wonderful little locomotive which could even puff smoke as it drove. We set up the tracks in the basement area and he and I would drive the trains.

I can’t remember when I stopped my pursuit of the model train hobby; I had a big wooden board in my bedroom with tracks on it that just turned into a giant table with train parts strewn about, lacking any sense of direction. In the end, I gave all of my trains, tracks, scenics, etc., to my Grandpa. We promised to one year build a layout at his home together. He had more space, including extra buildings to store the train, so it could be a great layout without inconveniencing anyone.

We never built that layout.

Another hobby that Grandpa tried to support me in was model airplanes. He helped my parents buy a plane for me. We tried to fly it, but no matter how many times we tried we could not get the plane to stay in the air, whether we were at an elementary school or at Grandpa’s expansive property. I can only remember one successful flight, and Grandpa was there to see it.

I feel that I can attribute my interest in politics to my Grandpa. My Dad, being a newspaper man, was interested in politics too, but I think the initial political conversations I would have was with my Grandpa. The first major political event I recall was the 2000 presidential election; it was the first year I discovered my family was a political minorityâ€”Democratsâ€”in the states where we lived (Utah and Idaho). But my Grandpa and I would talk for a long time about politics. There was once a time where I would say “I like politics”; that was largely my Grandpa’s doing (even though that’s not how I would say it today).

Grandpa strongly opposed the war in Iraq. I remember mornings with Meet the Press on TV (back then Tim Russert was in charge, and the show has not been the same since he died in 2008), and the case for weapons of mass destruction (WMDs, which is a dumb word if only because it’s so poorly defined) being in Iraq was pushed. My Grandpa said there were none there unless we gave them to Saddam Hussein, but Iraq did not have the ability to acquire such weapons. Grandpa was right.

He hated Republican economic policy and feared they would try to gut Social Security and Medicaid. He disliked the loss of manufacturing jobs and feared automation putting people out of work. He wanted church and state separated and thus wasn’t too sympathetic for anti-gay and anti-abortion laws. He wanted stronger gun control. He was skeptical of capitalism, saying we needed a little socialism for the country to run. He hated the Idaho government’s approach to education (cut the budget) and a general unwillingness among conservatives to pay taxes, especially the rich. He was concerned about wealth and income inequality. And so on.

Grandpa’s views were powerful, and at family reunions it seems that the family’s political opinions are very homogeneous. Few in those reunions with perhaps around 50 people were sympathetic to Republicans. There was once a time when I was in community college I thought I might be a Libertarian or a Republican, but that view did not survive the University of Utah. Grandpa was skeptical of this potential change in attitude but he loved me regardless of what I believed.

I believe that Grandpa inspiring me to care about politics set me on the track that lead me to where I am today. When I was a kid I didn’t care for math; I could understand it but I had no love for it. I cared about politics, government, and social studies. When I was in high school, while I was taking math classes, I cared about debate (more on that later) and the school literary magazine. I *hated* physics. To this day I care only for broad descriptions of physics concepts, not for the details. (Grandpa didn’t understand physics but he was fascinated by it, as well as how people can discover things using just mathematics.) But I felt that with my mathematics background and my interest in politics I should try and get a degree in economics. That lead me to take more math classes and a statistics classes (a subject I once thought was likely the most dry mathematical subject not extending beyond using means and proportions for baseball statistics). I fell in love with these subjects and now I’m pursuing a Ph.D. in mathematics, studying mathematical statistics.

You can now see the line of thought that lead me to where I am, and I thank my Grandpa for planting that seed. I still am very interested in current affairs and politics and likely will be for the rest of my life no matter what I do.

I have hayfever and my grandparents lived on a farm, so often when I visited I couldn’t breath through my nose and my eyes would become itchy and inflamed. Sleeping at night was hard since I couldn’t breath. One night was particularly bad and I think I got up to try and find some nasal spray. Grandpa was awake too (Grandpa struggled to sleep; more on that later), and when he saw me we got into the car together and drove into town. It was very early in the morning so most of the town’s stores were closed, but we managed to find a convenience store that was open. He bought a nasal spray for me, we rode back home, and the spray helped my nose clear up.

I played piano (and one year tried the clarinet) as a kid. I was never a great piano player, but I did develop some skill. My Grandpa loved music and wanted me to study it as well. I remember painful sessions of Grandpa sitting me down and giving me his version of a piano lesson. He was highly critical of me and mistakes I would make. These lessons would always turn into a lecture about how valuable a skill like playing piano would be (not from a financial perspective but more from a civic one). He’d berate me for spending time playing with toys or computer games and not spending more time practicing piano.

For all his talk of my possibly enjoying piano, I don’t think I ever loved it as much as I did other things. When I started college I dropped piano, and my Grandpa always reminded me of that decision. He wished I kept it up. I may return to practicing piano some day when I have more time, but I wish I could have played for him one more time. As painful as his lessons were, I liked being with my Grandpa and I put up with them.

I enjoyed Grandpa’s music, though. He lead a jazz band all his life, whether it was a high school band or a volunteer community band. I remember as a kid going to his room at the Eastern Idaho Technical College to listen to his bands rehearse, then attending his concerts in Idaho Falls parks. My favorite Fourth of July was when I was very young, when the day started with one of his jazz concerts. The day ended with fireworks over the Snake River while we sat on the banks. Days like that were beautiful.

He and his band was featured by a local television station; you can see them play here.

Grandpa was not one to mince words. His never physically hit anyone (except once when he whopped my Aunt Karen on the butt when she and my Mom were teenagers after she made a rude comment to my Mom while lying on the bed; I don’t know what she said but I bet she deserved what she got). But Grandpa’s lectures were legendary. I think every child and grandchild got at least one lecture. I got my fair share. And he would tell you what he thought, and nothing less.

Grandpa was not always right, but he was a wise man and I always listened to what he said. I never got upset when I got a lecture. I knew he loved me and wanted to tell me something he thought I needed to hear in order to be the best and happiest person I could be.

Grandpa exhibited many virtues, but I don’t see “patience” as one of them, at least from my experience. There were the aforementioned piano lessons. I also remember when Grandpa was teaching me to drive. He was the first person to put me behind the wheel of a vehicle and tell me to drive. He would get after me for many things while driving. I did learn, but not until after a good verbal whipping for my mistakes. (I know very well *never* to cross my arms when turning the steering wheel.)

I remember going to his property so many times to “build fences”. I never once remember building a fence. When we would go to his place for “building fences” we often did something else, perhaps having nothing to do with fences or even work. We would clean up grass, tear down old buildings and fencing, dig holes, and many other non-fence-building things. I remember one year *after* I learned how to drive we were towing old vehicles. I drove the towing truck while Grandpa steered the vehicle being towed. This went well until we tried to tow a very old, rusty yellow car. My brother was in the back of the pickup truck I was driving, directing me. I pressed the gas and was having a hard time getting the truck going, so I hit the gas too hard and pulled the car’s bumper off. Grandpa got out and kicked the car and gave me a verbal tongue lashing. I felt terrible, but Grandpa forgave me and gave me a hug. My pulling off the bumper prompted him to decide that the car was beyond refurbishing anyway.

Grandpa cared a great deal for his property. I remember him trudging off in his irrigation boots to start irrigating the property. I loved when he irrigated; I would run through the watery half-acre lawn and swim in a particularly deep divet in the lawn, deep enough to reach my neck when I was little. He changed irrigation technique later, and flooding the lawn no longer occurred. (I missed this.) Even though he was in his late 70s or even early 80s he would carry several large metal pipes on his shoulders with sprinklers on them. In the evening the sprinklers would be running. He mowed his massive lawns by hand for years but in his later years he learned to appreciate the riding lawnmower.

The lawns of his house are beautiful. My Dad wanted Grandpa to show him how to take care of his property and run his machines after my parents moved back in a couple months ago. Dad got to run the lawnmower but Grandpa died before he showed Dad what else needs to be done to keep the place in good condition. If Dad plans to learn on his own, it will be a heavy lift to keep the place in the same condition Grandpa did without his guidance.

Grandpa was a hyper person; he had ADHD and could not stand sitting around. Whenever he caught a child or grandchild sitting around he would give them something to do. He would sometimes ask “Are you bored?” I learned to answer “no” when he asked this question, because otherwise he would give me some chore to do. Grandpa valued hard work.

I think that Grandpa telling me to come stay with him to help build fences was just an excuse to have me around. I was fine with this. This was more time to spend with Grandpa. We often did some work, but we also did fun things. I don’t think Grandpa would say that he spoiled his grandchildren (in fact I think I once mentioned that most Grandparents spoil their grandchildren and he said “too bad for you”). I remember Grandpa buying us ice cream, soda, and candy bars, even as recently as a few months ago.

Grandpa lived on a farm in a very rural area. He took advantage of this. He would go on a walk every morning. When a big truck on the freeway would drive by, he would motion for the truck to blow its horn, and often the truck drivers obliged. Grandpa became known among the trucker community, always being spotted on his walks in the morning.

I remember night walks with my Grandpa, too. My family would put on their jackets and walk through the night in the area. The stars were bright and truck lights passed by on crisp evenings, sometimes in the winter, sometimes with a distant thunderstorm lighting up the sky. I remember watching dogs walk with us while Grandpa lead us in fun walking and marching songs. I still remember some of those songs.

I had such good times with my Grandparents as a child that one year, when we had to end our vacation and return home, I was completely beside myself. I didn’t want to leave them. I may have cried the whole way home. I loved being with my Grandparents. I was very close to my Grandpa.

I end this section with a story: Grandpa and I were driving to Pocatello to visit a hobby shop when I was interested in model railroading. The drive is about 30 minutes. He bought me a PayDay bar, the first time I remember having one of those bars. He asked me what I was thinking about. I said “nothing.”

“Nothing?” he replied. “You mean your mind is a void? Nothing going on?”

“I guess so,” I said.

“But there’s so much to think about. You should always be thinking about

something.”

I’ve always been thinking since. I’m almost never bored.

I think Grandpa was on the debate team in high school; he recalled competing in extemp. Grandpa and my Mom encouraged me to join the debate team, and I did so. This was important to how I developed as a person. Prior to debate I was an incredibly shy person; giving a presentation in front of a class was an act of great courage. Debate helped pull me out of my bubble. I was a debater during all of high school, and I did well, placing and winning in several events, one of which was extemp. Today, while I struggle to develop non-professional relationships with people (especially women), I can confidently teach a class of any size and give a presentation with basically no notes to a crowded theater without breaking a sweat.

One year I wanted my Grandpa to be a judge in a debate tournament. He agreed but somehow he got the impression that he would be watching me compete. At the time a debate only included the debaters and the judge, with some exceptions when there were multiple people competing in the same room, but I did not want to challenge the norm. I felt pressured not by Grandpa but by my family to allow him to watch, and I was upset; eventually they relented.

However my Grandpa was involved in my debate career. I remember demonstrating speeches for him that I had rehearsed extensively. One day my Grandpa even taught my debate class. I remember it was in 2008, since he was about to turn 80 years old.

I got my first girlfriend, Andrea, in December 2009 and we were together until January 2011, with a one-month break. Grandpa liked Andrea when he met her, and he invited her to the 2010 family reunion. I appreciated that.

He did meet my second girlfriend, Jasmin, years later in December 2014, but he didn’t get to see her for long. Jasmin broke up with me in May 2015 and I was greatly hurt by this. With all respect, Jasmin was my favorite girlfriend, even though I was with her only for nine months. I was very happy with her and basically saw her as the girlfriend I always wanted, ever since I was a teenager praying to God for a particular girl. Losing her hurt me deeply and I think that break-up changed me. As an undergrad I was largely confident and even becoming more friendly, but as a grad student (post-Jasmin) I’ve become less confident, more pessimistic, and more withdrawn.

2016 was a harder year for me and at the family reunion I was still struggling with my grief. I had moved out of my parents’ house and I was feeling lonely; I missed Jasmin a lot. I was studying out of a real analysis textbook at the time; I saw my mathematical abilities as one of the few things that gave me value.

I was alone at the reunion when Grandpa came up to me. I was working problems in the analysis book and he asked me if I was happy. I broke down in tears and said “No.” He put his hand on my arm to comfort me. I told him that I missed Jasmin a lot. He wanted to help me. He wanted me to move back in with my parents and he wanted to arrange for me to use one of his cars (I rely on transit, which makes it very difficult for me to get out and meet people). I refused to move back no matter how much he protested, and I never got that car even when he recommitted to trying to get me one when my parents moved back into his home in Idaho. That said, his caring meant a lot to me and it helped me to seek out help from a professional.

As a kid, one thing that I wanted was for Grandpa’s jazz band to play at my wedding, with Grandpa conducting. That was going to be his wedding gift to me. I don’t think I ever told him this. Within the last couple of years, as my romantic life turned into an even greater failure, I lost hope that this would happen. Grandpa tried to help me, giving me tips on how to talk to girls and where to meet them. I did try for a little while, but I couldn’t overcome myself. Now Grandpa is dead, and with him my dream.

Grandpa went on a trip to Peru with my Aunt Karen and my Aunt Dalena. He didn’t know any Spanish but he wanted to help the people there in any way he could. I heard that the people in the villages he visited were amazed by him; they had never seen anyone as old as he was. Yet he was still a capable individual. The story I remember the most was him telling the children about steam trains like he knew from when he was a kid; they were very poor with a weak education so they were not familiar with trains. As he got onto the plane to leave, he gave a “Toot, toot!” for the kids, with tears in his eyes. I heard it was a touching moment.

When I was in college Grandpa grew more frail. He hated it. He didn’t want to be weak. He sometimes would say he’s “not a man anymore”, when nothing could be further from the truth. He couldn’t stand up straight like he used to. He had problems with his knees, his feet, his heart. He once reached underneath his lawnmower and lost his fingertips since the machine was still running. He was very angry and distraught with this mistake and his now maimed hands. (In my opinion the wound wasn’t noticeable.) My aunts and uncles wanted to do more for him in order to prevent him from straining himself or getting into a dangerous situation. He resisted help. He would even refuse my offers to do the dishes for him; that was his job and I wasn’t taking it from him.

When my brother lived with Grandma and Grandpa he had to help them in a few emergencies. They were resilient but Grandpa was growing weaker. Doctors visits became more common. He was feeling less well. More surgeries were needed. His heart grew weaker, and a pacemaker had to be implanted. Grandpa was starting to get very old.

Grandpa feared old age and resisted it. He didn’t want to be deprived of his independence. He told me in car rides with just him and I that he wanted to die in his house, not in a nursing home. His home would be his only home, and nowhere else. And he was uneasy about the prospect of death. As much as he said that he wanted to see important life events for his grandchildren (even great grandchildren; he pointed to my baby nephew Ayven and said “I want to see *him* get is Ph.D”), he spoke of his own life as if he believed he would not be around for much longer.

He said he never doubted there was a creator; life looked very created to him. He did question Christianity in general, Seventh-Day Adventist’s flavor in particular. But while he questioned the details he embraced the idea of loving everyone and living what one could call a Christian life. He attended church regularly with Grandma until his death, and my brother tells me that in Sabbath school class he would end a discussion by saying that he loved everyone in the room and they all were special.

Grandpa also decided it was highly unlikely that this life was the end and death was eternal oblivion. I agreed with him.

Grandpa wanted to attend my college graduation ceremony, but he had a shingles outbreak and could not go. He and Grandma were heartbroken, and I wished the were able to see me walk. (I’m glad that my Uncle Mike and Aunt Donna managed to come, though; it meant a lot to me that they did.) After that I decided that I really wanted to have Grandpa see me walk to get my Ph.D. He was proud of my studies and always enjoyed talking to me and seeing how I thought. (Sometimes I would start to feel like a freak when so much attention was paid to my mathematical ability, but I knew that any attention was out of love and pride in his grandson.) I started to picture a dinner the day before my Ph.D. graduation where my family, including Grandma and Grandpa, would meet my adviserâ€”Lajos HorvÃ¡thâ€”for the first time over dinner at a nice restaurant. Then the next day my family (including Grandpa) would see my adviser put the sash over my neck that made me Dr. Curtis Miller. I felt as if this vision could be attained.

Grandpa is dead now, and I don’t have my Ph.D. Another vision I really wanted that will never come true.

Last year Grandma spent several days in the hospital in Salt Lake City for a heart surgery. I spent a lot of time with Grandma and Grandpa and the aunts and uncles who came with them. In addition to enjoying a pancake breakfast every morning in the hospital cafeteria, I spent many hours just about every day I could with them, talking with them. This was the first time I saw Grandpa with a cane. But he seemed to take to it well.

There were other times throughout last year that I intermittently saw my Grandparents. Sometimes they came to Salt Lake City for medical reasons, sometimes it was for good things like the birth of my nephew, Ayven, or my sister’s graduation. I missed the last family reunion because it was planned to be around my Grandparents’ 69th wedding anniversary and I was already arranged to travel to San Francisco on a grant to attend an MSRI workshop. (I would have gladly missed the workshop if I was aware there would be a conflict, but by the time the date was announced I felt that I could not cancel. But I was there in spirit since half of the reunion’s attendees caught the stomach flu that I caught from the wild and then spread to the Utah branch of the family.) I saw him shortly before the semester started, though, along with the weekend I helped my family move to Idaho back into Grandma and Grandpa’s house, and also the week of fall break this semester.

I was actually debating whether to visit during fall break this year, but I decided in favor of visiting, and I’m so glad that I did. That visit was the last time I would see my Grandpa, and he always loved to see me. He was hoping that next summer I would spend a length of time in Idaho with them. I think Grandpa was becoming increasingly doubtful of his longevity and wanted to see me as much as he could before he was gone.

The night I arrived after taking a Salt Lake Express bus to Pocatello (my sister Alicia picked me up from there) I went into Grandma and Grandpa’s bedroom, where he was sitting. He was recovering from another knee surgery so he didn’t want to leave his room. I sat beside him on the bed and he and I talked for a very long time about current events, how science works, the world, and many other things. These were the conversations he loved to have with me, the kind of conversations he and I would have over periods ranging from my childhood to my teenage years to my college years. He didn’t understand everything I said, because despite my best efforts I sometimes struggle to make myself understood, no matter how much I fear talking over anyone’s head. But he loved it. As he always loved talking to me.

Sometimes I wonder if anyone enjoyed talking to me as much as my Grandpa did.

Grandpa looked miserable the last time I saw him. He couldn’t sleep at night. He got only a few hours of sleep, then he was awake. I remember one night while I was trying to go asleep on the couch seeing him walk to the chair behind me and just sit down and stare over me, effectively alone (because I was going to just pretend I was asleep to try and go to sleep; perhaps I should have talked to him). These sleepless nights were increasingly common for him; I heard from my Mom that one night he went to his car and turned on jazz music so he didn’t disturb anyone while he dealt with being awake. He felt very lonely in these times.

To be completely honest, during my last trip, Grandpa did not seem happy anymore. He seemed miserable. He looked more haggard and frail. He couldn’t do anything because he was tired all day since he didn’t sleep at night. This was the reason he (and thus Grandma, too; she was not leaving him) missed my nephew’s first birthday party.

He would still try to work. He wanted to get the tin building on his property ready for winter and cleaned out so that my family could store their stuff in there and their belongings would be safe from mice. We went to a local lumber yard to buy wood, then to C.A.L. Ranch for rat traps. While there, he bought me a candy bar. Then we returned to his home and tried to put the boards he bought into the doorway, only to find that the lumber yard had cut them to our *exact* specifications; this apparently was not what we wanted because the boards were too snug to slide in. Grandpa found a buzz saw in the shed and turned it on. I held the boards while he cut. He was too weak to hold the saw up so it ended up hanging right next to his leg while it was still running. I saw this enormous safety hazard and wanted to say something to him, perhaps offering to take control of the saw instead, but as before I could never bring myself to question my Grandpa, even when I *really* should have.

My Grandpa’s last advice to me was about regret. He questioned whether I was living a healthy lifestyle. I don’t work out all that much these days; that seems like time better spent studying or at least doing something I personally find fun. He said that regret is a hard thing to deal with later in life as you deal with the consequences of bad decisions. One should try to minimize regret as much as they can.

You can see in this article a number of regrets I have.

Before I left, Grandpa asked me to promise him that I would go to the gym. He was adamant about me making that promise, so I did. (I still haven’t gone.)

Grant took me home and while I did say goodbye to Grandma and Grandpa I didn’t have that final goodbye hug I’m used to getting from them. We had already pulled out and Grant wanted to get home soon; he was already late according to his schedule. So I called my Mom and I told Grandma and Grandpa that I forgot that hug but I loved them and I would see them soon at this year’s Thanksgiving dinner.

I never saw my Grandpa again. I should have told Grant to turn around so I could get that last goodbye. There’s another regret.

I have yet to see Grandpa’s body, but I will be a pall bearer at his funeral. I’m planning on wearing one of the outfits my sister helped me pick, which I wore regularly to teaching: a black coat and a black sweater vest over a black-and-white plaid shirt, with black jeans and sneakers. When Grandpa saw me wearing this outfit he would call me “the professor”. It’s an outfit I like and I think it looks sharp, so it seems fitting.

Many people remember Grandpa and were touched by him. It was just a couple months ago one of his students came to visit him, spending an hour at his house with her family; she remembered him fondly. When he died and the news was released an unusual number of people called to ask when the funeral was scheduled to take place. Our family thinks there could be many people at his funeral. We’re pleased he touched so many lives.

I want to place a copy of this article in his casket. I won’t print it on any fancy paper, but after posting it I’ll print it out with all the metadata associated with web pages that browsers print out. I thought about this and like

it; it shows where I made my memories of the only Grandpa I knew and who I loved dearly public, on my personal website, along with the time and date.

Since Grandpa’s death, there’s been talk about whether we would bring him back if we could, or whether he died at a good time. I’m entitled to whatever opinion I want because my opinion doesn’t change anything.

I’m happy that Grandpa avoided the worst of aging. In some ways his death was a mercy. He did not lose his independence. He did not lose his home. He did not see his health decline even further and he was doing what he loved.

But I feel that there was a lot of unfinished business, things we wanted him to see. My brother wanted Grandpa to see him become an electrician. And I wish so badly that he could have at least seen me get my Ph.D.

I wish he could have seen Ayven get *his* Ph.D.

I will never have another Grandpa in my life. I had a damn good Grandpa though, one of the finest men who’s lived. I will never stop missing him.

I love you Grandpa.

While Grandpa loved jazz, he also loved classical music. He said that if he had to pick one composer to listen to for the rest of his life, it would be Beethoven. So below is the last piano piece I played for Grandpa, “Moonlight Sonata,” by Beethoven.

]]>I’ve spent the past few weeks writing about **MCHT**, my new package for Monte Carlo and bootstrap hypothesis testing. After discussing how to use **MCHT** safely, I discussed how to use it for maximized Monte Carlo (MMC) testing, then bootstrap testing. One may think I’ve said all I want to say about the package, but in truth, I’ve only barely passed the halfway point!

Today I’m demonstrating how general **MCHT** is, allowing one to use it for multiple samples and on non-univariate data. I’ll be doing so with two examples: a permutation test and the test for significance of a regression model.

The idea of the permutation test dates back to Fisher (see [1]) and it forms the basis of computational testing for difference in mean. Let’s suppose that we have two samples with respective means and , respectively. Suppose we wish to test

against

using samples and , respectively.

If the null hypothesis is true and we also make the stronger assumption that the two samples were drawn from distributions that could differ only in their means, then the labelling of the two samples is artificial, and if it were removed the two samples would be indistinguishable. Relabelling the data and artificially calling one sample the sample and the other the sample would produce highly similar statistics to the one we actually observed. This observation suggests the following procedure:

- Generate new datasets by randomly assigning labels to the combined sample of and .
- Compute copies of the test statistic on each of the new samples; suppose that the test statistic used is the difference in means, .
- Compute the test statistic on the actual sample and compare to the simulated statistics. If the actual statistic is relatively large compared to the simulated statistics, then reject the null hypothesis in favor of the alternative; otherwise, don’t reject.

In practice step 3 is done by computing a -value representing the proportion of simulated statistics larger than the one actually computed.

The permutation test is effectively a bootstrap test, so it is supported by **MCHT**, though one may wonder how that’s the case when the parameters `test_stat`

, `stat_gen`

, and `rand_gen`

only accept one parameter, `x`

, representing the dataset (as opposed to, say, `t.test()`

, which has an `x`

and an optional `y`

parameter). But `MCHTest()`

makes very few assumptions about what object `x`

actually is; if your object is either a vector or tabular, then the `MCHTest`

object should not have a problem with it (it’s even possible a loosely structured `list`

would be fine, but I have not tested this; tabular formats should cover most use cases).

In this case, putting our data in long-form format makes doing a permutation test fairly simple. One column will contain the group an observation belongs to while the other contains observation values. The `test_stat`

function will split the data according to group, compute group-wise means, and finally compute the test statistic. `rand_gen`

generates new dataset by permuting the labels in the data frame. `stat_gen`

merely serves as the glue between the two.

The result is the following test.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x) { grp_means <- aggregate(value ~ group, data = x, FUN = mean) grp_means$value[1] - grp_means$value[2] } rg <- function(x) { x$group <- sample(x$group) x } sg <- function(x) { test_stat(x) } permute.test <- MCHTest(ts, sg, rg, seed = 123, N = 1000, localize_functions = TRUE) df <- data.frame("value" = c(rnorm(5, 2, 1), rnorm(10, 0, 1)), "group" = rep(c("x", "y"), times = c(5, 10))) permute.test(df)

## ## Monte Carlo Test ## ## data: df ## S = 1.3985, p-value = 0.036

Suppose for each observation in our dataset there is an outcome of interest, , and there are variables that could together help predict the value of if they are known. Consider then the following linear regression model (with ):

The first question someone should asked when considering a regression model is whether it’s worth anything at all. An alternative approach to predicting is simply to predict its mean value. That is, the model

is much simpler and should be preferred to the more complicated model listed above if it’s just as good at explaining the behavior of for all . Notice the second model is simply the first model with all the coefficients identically equal to zero.

The -test (described in more detail here) can help us decide between these two competing models. Under the null hypothesis, the second model is the true model.

The alternative says that at least one of the regressors is helpful in predicting .

We can use the statistic to decide between the two models:

and are the residual sum of squares of models 1 and

2, respectively.

This test is called the -test because usually the F-distribution is used to compute -values (as this is the distributiont the statistic should follow when certain conditions hold, at least asymptotically if not exactly). What then would a bootstrap-based procedure look like?

If the null hypothesis is true then the best model for the data is this:

is the sample mean of and is the residual. This suggests the following procedure:

- Shuffle over all rows of the input dataset, with replacement, to generate new datasets.
- Compute statistics for each of the generated datasets.
- Compare the statistic of the actual dataset to the generated datasets’ statistics.

Let’s perform the test on a subset of the `iris`

dataset. We will see if there is a relationship between the sepal length and sepal width among *iris setosa* flowers. Below is an initial split and visualization:

library(dplyr) setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width) plot(Sepal.Width ~ Sepal.Length, data = setosa)

There is an obvious relationship between the variables. Thus we should expect the test to reject the null hypothesis. That is what we would conclude if we were to run the conventional test:

res <- lm(Sepal.Width ~ Sepal.Length, data = setosa) summary(res)

## ## Call: ## lm(formula = Sepal.Width ~ Sepal.Length, data = setosa) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.72394 -0.18273 -0.00306 0.15738 0.51709 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.5694 0.5217 -1.091 0.281 ## Sepal.Length 0.7985 0.1040 7.681 6.71e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2565 on 48 degrees of freedom ## Multiple R-squared: 0.5514, Adjusted R-squared: 0.542 ## F-statistic: 58.99 on 1 and 48 DF, p-value: 6.71e-10

Let’s now implement the procedure I described with `MCHTest()`

.

ts <- function(x) { res <- lm(Sepal.Width ~ Sepal.Length, data = x) summary(res)$fstatistic[[1]] # Only way I know to automatically compute the # statistic } # rand_gen's function can use both x and n, and n will be the number of rows of # the dataset rg <- function(x, n) { x$Sepal.Width <- sample(x$Sepal.Width, replace = TRUE, size = n) x } b.f.test.1 <- MCHTest(ts, ts, rg, seed = 123, N = 1000) b.f.test.1(setosa)

## ## Monte Carlo Test ## ## data: setosa ## S = 58.994, p-value < 2.2e-16

Excellent! It reached the correct conclusion.

One may naturally ask whether we can write functions a bit more general than what I’ve shown here at least in the regression context. For example, one may want parameters specifying a formula so that the regression model isn’t hard-coded into the test. In short, the answer is yes; `MCHTest`

objects try to pass as many parameters to the input functions as they can.

Here is the revised example that works for basically any formula:

ts <- function(x, formula) { res <- lm(formula = formula, data = x) summary(res)$fstatistic[[1]] } rg <- function(x, n, formula) { dep_var <- all.vars(formula)[1] # Get the name of the dependent variable x[[dep_var]] <- sample(x[[dep_var]], replace = TRUE, size = n) x } b.f.test.2 <- MCHTest(ts, ts, rg, seed = 123, N = 1000) b.f.test.2(setosa, formula = Sepal.Width ~ Sepal.Length)

## ## Monte Carlo Test ## ## data: setosa ## S = 58.994, p-value < 2.2e-16

This shows that you can have a lot of control over how `MCHTest`

objects handle their inputs, giving you considerable flexibility.

Next post: time series and **MCHT**

- R. A. Fisher,
*The design of experiments*(1935)

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Now that we’ve seen **MCHT** basics, how to make `MCHTest()`

objects self-contained, and maximized Monte Carlo (MMC) testing with **MCHT**, let’s now talk about bootstrap testing. Not much is different when we’re doing bootstrap testing; the main difference is that the replicates used to generate test statistics depend on the data we feed to the test, and thus are not completely independent of it. You can read more about bootstrap testing in [1].

Let represent our test statistic. For bootstrap hypothesis testing, we will construct test statistics from data generated using our sample. Call these test statistics . These statistics are generated in such a way that we know that the null hypothesis holds for them. Suppose for the sake of demonstration that large values of constitute evidence against the null hypothesis. Then the -value for the bootstrap hypothesis test is

Here, is the indicator function.

There are many ways to generate the data used to compute . There’s the parametric bootstrap, where the data is used to estimate the parameters of a distribution, then those parameters are plugged into that distribution and then the distribution is used to generate new samples. There’s also the nonparametric bootstrap that doesn’t make such strong assumptions about the data, perhaps sampling from the data itself to generate new samples. Either of these methods can be used in bootstrap testing, and `MCHTest()`

supports both.

Unlike Monte Carlo tests, bootstrap tests cannot claim to be exact tests for any sample size; they’re better for larger sample sizes. That said, they often work well even in small sample sizes and thus are still a good alternative to inference based on asymptotic results. They also could serve as an alternative approach to the nuisance parameter problem, as MMC often has weak power.

In **MCHT**, there is little difference between bootstrap testing and Monte Carlo testing. Bootstrap tests need the original dataset to generate replicates; Monte Carlo tests do not. So the difference here is that the function passed to `rand_gen`

needs to accept a parameter `x`

rather than `n`

, with `x`

representing the original dataset, like that passed to `test_stat`

.

That’s the only difference. All else is the same.

Suppose we wish to test for the location of the mean. Our nonparametric bootstrap procedure is as follows:

- Generate samples of data from the demeaned dataset.
- Suppose our mean under the null hypothesis is . Add this mean to each generated dataset and compute the statistic for each of those datasets; these will be the simulated test statistics .
- Compute the test statistic on the main data and use the empirical distribution function of the simulated test statistics to compute a -value.

The code below implements this procedure.

library(MCHT) library(doParallel) registerDoParallel(detectCores()) ts <- function(x, mu = 0) { sqrt(length(x)) * (mean(x) - mu)/sd(x) } rg <- function(x) { x_demeaned <- x - mean(x) sample(x_demeaned, replace = TRUE, size = length(x)) } sg <- function(x, mu = 0) { x <- x + mu test_stat(x, mu = mu) # Will be localizing } b.t.test <- MCHTest(ts, sg, rg, seed = 123, N = 1000, lock_alternative = FALSE, test_params = "mu", localize_functions = TRUE) dat <- c(2.3, 1.1, 8.1, -0.2, -0.8, 4.7, -1.9) b.t.test(dat, alternative = "two.sided", mu = 1)

## ## Monte Carlo Test ## ## data: dat ## S = 0.68164, p-value = 0.432 ## alternative hypothesis: true mu is not equal to 1

b.t.test(dat, alternative = "less", mu = 7)

## ## Monte Carlo Test ## ## data: dat ## S = -3.8626, p-value = 0.025 ## alternative hypothesis: true mu is less than 7

The parametric bootstrap test assumes that the observed data was generated using a specific distribution, such as the Gaussian distribution. All that’s missing, in essence, is the parameters of that distribution. The procedure thust starts by estimating all nuisance parameters of the assumed distribution using the data. Then the first step of the process mentioned above (which admittedly was specific to a test for the mean but still strongly resembles the general process) is replaced with simulating data from the assumed distribution using any parameters assumed under the null hypothesis and the estimated values of any nuisance parameters. The other two steps of the above process are unchanged.

We can use the parametric bootstrap to test for goodness of fit with the Kolmogorov-Smirnov test. Without going into much detail, suppose that represents a distribution that is known except maybe for the values of its parameters. Assume that is an independently and identically distributed dataset, and we have observed values . We wish to use the dataset to decide between the hypotheses

That is, we want to test whether our data was drawn from the distribution or whether it was drawn from a different distribution. This is what the Kolmogorov-Smirnov test checks.

R implements this test in `ks.test()`

but that function does not allow for any nuisance parameters. It will only check for an exact match between distributions. This is often not what we want; we want to check whether out data was drawn from any member of the family of distributions , not a particular member with a particular combination of parameters. It’s tempting to plug in the estimated values of these parameters, but then the -value needs to be computed differently, not in the way that is prescribed by `ks.test()`

. Thus we will need to approach the test differently.

Since the distribution of the data is known under the null hypothesis, this is a good situation to use a bootstrap test. We’ll use maximum likelihood estimation to estimate the values of the missing parameters, as implemented by **fitdistrplus** (see [2]). Then we generate samples from this distribution using the estimated parameter values and use those samples to generate simulated test statistic values that follow the distribution prescribed by the null hypothesis.

Suppose we wished to test whether the data was drawn from a Weibull distribution. The result is the following test.

library(fitdistrplus) ts <- function(x) { param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] ks.test(x, pweibull, shape = shape, scale = scale, alternative = "two.sided")$statistic[[1]] } rg <- function(x) { n <- length(x) param <- coef(fitdist(x, "weibull")) shape <- param[['shape']]; scale <- param[['scale']] rweibull(n, shape = shape, scale = scale) } b.ks.test <- MCHTest(test_stat = ts, stat_gen = ts, rand_gen = rg, seed = 123, N = 1000) b.ks.test(rweibull(1000, 2, 2))

## ## Monte Carlo Test ## ## data: rweibull(1000, 2, 2) ## S = 0.021907, p-value = 0.275

b.ks.test(rbeta(1000, 2, 2))

## ## Monte Carlo Test ## ## data: rbeta(1000, 2, 2) ## S = 0.047165, p-value < 2.2e-16

Given the choice between a MMC test and a bootstrap test, which should you prefer? If you’re concerned about speed and power, go with the bootstrap test. If you’re concerned about precision and getting an “exact” test that’s at least conservative, then go with a MMC test. I think most of the time, though, the bootstrap test will be good enough, even with small samples, but that’s mostly a hunch.

Next week we will see how we can go beyond one-sample or univariate tests to multi-sample or multivariate tests. See the next blog post.

- J. G. MacKinnon,
*Bootstrap hypothesis testing*in*Handbook of computational econometrics*(2009) pp. 183-213 - M. L. Delignette-Muller and C. Dulag,
*fitdistrplus: an R package for fitting distributions*, J. Stat. Soft., vol. 64 no. 4 (2015)

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>I introduced **MCHT** two weeks ago and presented it as a package for Monte Carlo and boostrap hypothesis testing. Last week, I delved into important technical details and showed how to make self-contained `MCHTest`

objects that don’t suffer side effects from changes in the global namespace. In this article I show how to perform maximized Monte Carlo hypothesis testing using **MCHT**, as described in [1].

The usual procedure for Monte Carlo hypothesis testing is:

- Compute a test statistic for the data on which you wish to test a hypothesis
- Generate random datasets like the one of interest but with the data generating process (DGP) being the one prescribed by the null hypothesis, and compute the test statistic on each of these datasets
- Use the empirical distribution function of the simulated test statistics to compute the -value of the test

Monte Carlo tests often make strong distributional assumptions, such as what distribution generated the dataset being tested, but when those assumptions hold, they are exact tests (see [2]). They are not as powerful as if we had the exact distribution of the test statistic under those assumptions but the power increases with (see [3]) and given the power of modern computers getting a large is usually not a problem. Thus Monte Carlo tests are attractive in small sample situations where we do not want to rely on an asymptotic distribution for inference.

However, the procedure outlined above does not allow for nuisance parameters (parameters that are not the subject of interest but whose values are needed in order to conduct inference). In the introductory blog post, while one may view the population standard deviation as a nuisance parameter, the test statistic does not depend on it when the data follows a Gaussian distribution so there was no need to worry about it. In the case when we switched to data following the exponential distribution, it was still not a problem since its value was specified under the null hypothesis (). Thus it was no longer a nuisance parameter.

That said, nuisance parameters can still appear when we need to perform inference. Suppose, for example, that our data follows a Weibull distribution, denoted by , with being the shape parameter and the scale parameter. We want to test the set of hypotheses:

We can use the generalized likelihood ratio test to form a test statistic (which I won’t repeat hear but does appear in the code below). While Wilks’ theorem tells us about the asymptotic distribution of the test statistic, it says nothing about the exact distribution of the test statistic at a particular sample size, and it’s not given that the test statistic is pivotal and thus independent of the value of nuisance parameters under the null hypothesis (the nuisance parameter, in this case, being the shape parameter ).

What then can we do? A bootstrap test would estimate the value of the nuisance parameter under the null hypothesis and use that estimate as the actual value when generating new, simulated test statistics. Bootstrap tests, however, are not exact tests (see [2]) and we’ve decided that we want a test with stronger guarantees.

[1] introduced the maximized Monte Carlo (MMC) test, which proceeds as described below:

- Compute the test statistic from the data.
- Generate collections of random numbers, such as uniformly distributed random numbers, and use those random numbers for generating random copies of test statistics that depend on the values of nuisance parameters (notice that the random numbers are effectively
*not*discarded) - Use an optimization procedure ([1] suggested simulated annealing) to pick values for the nuisance parameters such that the -value is maximized; the maximally chosen -value is the -value of the test

[1] showed that this procedure yields -values that, while not as precise as if we knew the values of the nuisance parameters that produced the data, are at least *conservative*, in the sense that they’re no smaller than they should be (thus biasing our conclusions in favor of the null hypothesis). This is the best we can hope for in this context.

MMC is intuitive and compelling, and the theoretical guarantee gives us confidence in our conclusions, but it’s not a panacea. First, the optimization procedure is costly in work and time. Second (and, in my opinion, the biggest problem), the procedure may be *too* conservative. There’s a strong chance that the procedure will find *some* values for nuisance parameters that yield a large -value, perhaps a combination not at all resembling the actual values of the nuisance parameters that produced the data. In short, MMC can be severely lacking in power. When it does reject the null hypothesis, it’s compelling, but otherwise it’s not convincing that the alternative hypothesis is false.

Creating an implementation of MMC in R was my original goal in developing **MCHT**, and all that needs to be done to perform MMC is pass a value the `nuisance_params`

parameter and an appropriate list to `optim_control`

.

Let’s take the hypothesis test mentioned above and prepare to implement it using **MCHT**. I will be using **fitdistrplus** for maximum likelihood estimation, as required by the test statistic (see [4]).

library(MCHT)

## .------..------..------..------. ## |M.--. ||C.--. ||H.--. ||T.--. | ## | (\/) || :/\: || :/\: || :/\: | ## | :\/: || :\/: || (__) || (__) | ## | '--'M|| '--'C|| '--'H|| '--'T| ## `------'`------'`------'`------' v. 0.1.0 ## Type citation("MCHT") for citing this R package in publications

library(fitdistrplus)

registerDoParallel(detectCores()) # To be passed to test_stat ts <- function(x, scale = 1) { fit_null <- coef(fitdist(x, "weibull", fix.arg = list("scale" = scale))) kt <- fit_null[["shape"]] l0 <- scale fit_all <- coef(fitdist(x, "weibull")) kh <- fit_all[["shape"]] lh <- fit_all[["scale"]] n <- length(x) # Test statistic, based on the negative-log-likelihood ratio suppressWarnings(n * ((kt - 1) * log(l0) - (kh - 1) * log(lh) - log(kt/kh) - log(lh/l0)) - (kt - kh) * sum(log(x)) + l0^(-kt) * sum(x^kt) - lh^(-kh) * sum(x^kh)) } # To be passed to stat_gen; localize_functions will be TRUE sg <- function(x, scale = 1, shape = 1) { x <- qweibull(x, shape = shape, scale = scale) test_stat(x, scale = scale) }

The `MCHTest()`

parameter `nuisance_params`

accepts a character vector giving the names of nuisance parameters the distribution of the test statistic may depend upon, and those names must be among the arguments of the function passed to `stat_gen`

; that function is expected to know how to handle those parameters. In this case, `rand_gen`

will not be specified since by default it gives uniformly distributed random variables. It’s a well-known fact in probability that the inverse of the CDF of a random variable (which are the `q`

functions in R) applied to a uniformly distributed random variable yields a random variable that follows the distribution prescribed by the CDF. Hence the use of `qweibull()`

above, which is being applied to datasets of uniformly distributed random variables that are effectively fixed when `stat_gen`

will be called. Then the test statistic computed will be computed from data that follows the scale parameter prescribed by the null hypothesis but for some set value of , the shape parameter.

The `MCHTest`

object will then perform simulated annealing to choose the value of the nuisance parameter that maximizes the -value under the null hypothesis for the given dataset. Simulated annealing is implemented in the `GenSA()`

function provided by the **GenSA** package (see [5]). `GenSA()`

needs a description of what set of parameter values over which to optimize and there is no general method for choosing this, so `MCHTest()`

will require that a list be passed to `optim_control`

that effectively contains the parameters that will be passed to `GenSA()`

. At minimum, this list must contain an `upper`

and a `lower`

element, each of which are numeric vectors with names exactly like the character vector passed to `nuisance_params`

; these vectors specify the space `GenSA()`

will search to find the optima. Other elements can be passed to control `GenSA()`

, and I highly recommend reading the function’s documentation for more details.

There’s an additional parameter of `MCHTest()`

, `threshold_pval`

, that matters to the optimization. `GenSA()`

will take many steps to make sure it reaches a good optimal value, perhaps taking too long. The authors of **GenSA** recommend specifying an additional terminating condition to speed up the process. `threshold_pval`

will alter the `threshold.stop`

parameter in the `control`

list of `optim_control`

so that the algorithm terminates when the estimated -value crosses `threshold_pval`

‘s value. Effectively, this means that we know that whatever the true -value of the test is, it’s larger than that threshold, and if the threshold is chosen appropriately, we know that we should not reject the null hypothesis based on the results of this test.

While giving `threshold_pval`

a number less than 1 can help terminate the algorithm in the case of not rejecting the null hypothesis, the algorithm can still run for a long time if the test will eventually return a statistically significant result. For this reason, I recommend that `optim_list`

contain a list called `control`

and that the list should have a `max.time`

element telling the algorithm the maximum running time (in seconds) it should have.

With all this in mind, we create the `MCHTest`

object below:

mc.wei.scale.test <- MCHTest(ts, sg, N = 1000, seed = 123, test_params = "scale", nuisance_params = "shape", optim_control = list("lower" = c("shape" = 0), "upper" = c("shape" = 100), "control" = list( "max.time" = 10 )), threshold_pval = .2, localize_functions = TRUE) mc.wei.scale.test(rweibull(10, 2, 2), scale = 4)

## ## Monte Carlo Test ## ## data: rweibull(10, 2, 2) ## S = 7.2983, p-value < 2.2e-16

The MMC procedure is intersting and I don’t think any package implements it to the level I have in **MCHT**. The power of the procedure itself concerns me, though. Fortunately, the package also supports bootstrap testing, which I will discuss next week.

- J-M Dufour,
*Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics*, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477 - J. G. MacKinnon,
*Bootstrap hypothesis testing*in*Handbook of computational econometrics*(2009) pp. 183-213 - A. C. A. Hope,
*A simplified Monte Carlo test procedure*, JRSSB, vol. 30 (1968) pp. 582-598 - M. L. Delignette-Muller and C. Dulag,
*fitdistrplus: an R package for fitting distributions*, J. Stat. Soft., vol. 64 no. 4 (2015) - Y. Xiang et. al.,
*Generalized simulated annealing for global optimization: the GenSA package*, R Journal, vol. 5 no. 1 (2013) pp. 13-28

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>Last week I announced the first release of **MCHT**, an R package that facilitates bootstrap and Monte Carlo hypothesis testing. In this article, I will elaborate on some important technical details about making `MCHTest`

objects, explaining in the process how closures and R environments work.

To recap, last week I made a basic `MCHTest`

-class object. These are S3-class objects; really they are just functions with a `class`

attribute. All the work is done in the initial function call creating the object. But there’s more to the story.

We want these objects to be self-contained. Specifically, we don’t want changes in the global namespace to change how a `MCHTest`

object behaves. By default, these objects are *not* self-contained and a programmer who isn’t careful can accidentally break these objects. Here I explain how to prefent this from happening.

I highly recommend those who want to learn more about closures and environments read [1], but I will briefly explain these critical concepts here.

A closure is a function created by another function. `MCHTest`

objects are closures, functions created by `MCHTest()`

(then given a `class`

attribute). An environment is an R object where other R objects are effectively defined. For example, there is the global environment where most R objects created by users live.

environment()

## <environment: R_GlobalEnv>

globalenv()

## <environment: R_GlobalEnv>

Ever wonder why a variable defined inside a function doesn’t affect anything outside of that function and why it simply disappears? It’s because when a function is called, a new environment is created, and all assignments within the function are done within that new environment. We can see this occuring with some clever use of `print()`

.

x <- 2 u <- function() { x } u()

## [1] 2

f <- function() { x <- 1 function() { x } } g <- f() g()

## [1] 1

environment(g)

## <environment: 0x9c45d78>

environment(u)

## <environment: R_GlobalEnv>

parent.env(environment(g))

## <environment: R_GlobalEnv>

`u()`

is a function and lives in the global environment so it looks for variables in the global environment. `g()`

, however, lives in an environment created by `f()`

. Normally, when a function creates an environment, it disappears the moment the function finishes execution. Closures, however, still use that environment created by the function, so the environment doesn’t disappear when the function finishes execution.

When a function looks for an object, it first looks for that object in its environment. If it doesn’t find the object there, it looks for the object in the parent environment of its environment. It will continue this process until it either finds the object or discovers that none of its environment’s ancestors has the object (prompting an error).

This means that the function is sensitive to changes in its environment or its environment’s ancestors, as we see here:

x <- 3 h <- function() { function() { x } } u()

## [1] 3

j <- h() environment(j)

## <environment: 0xa6e7cb4>

parent.env(environment(j))

## <environment: R_GlobalEnv>

j()

## [1] 3

One of R’s attractive features is that it promotes a style of programming that discourages side effects, where changes to one object doesn’t change the behavior of another. But the examples above show how closures can suffer side effects when objects in the global namespace are changed. The closures created above depend on the global environment in surprising ways for those not familiar with how environments in R work.

By default, `MCHTest`

objects can suffer from these side effects, and they can creep in if the functions passed to the parameters of `MCHTest()`

are carelessly defined, as we see below. (The tests being defined are effectively Monte Carlo -tests; learn about the -test here.)

library(MCHT)

## .------..------..------..------. ## |M.--. ||C.--. ||H.--. ||T.--. | ## | (\/) || :/\: || :/\: || :/\: | ## | :\/: || :\/: || (__) || (__) | ## | '--'M|| '--'C|| '--'H|| '--'T| ## `------'`------'`------'`------' v. 0.1.0 ## Type citation("MCHT") for citing this R package in publications

library(doParallel) registerDoParallel(detectCores()) ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma # z-test for mean = 0 } sg <- function(x, sigma = 1) { x <- sigma * x ts(x, sigma = sigma) # unsafe } unsafe.test.1 <- MCHTest(ts, sg, rnorm, seed = 100, N = 100, fixed_params = "sigma") unsafe.test.1(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 1.1972, sigma = 1, p-value = 0.15

ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) # again, unsafe } unsafe.test.2 <- MCHTest(ts, sg, rnorm, seed = 100, N = 100) unsafe.test.2(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.22926, p-value = 0.46

# ERROR unsafe.test.1(rnorm(10))

## Error in {: task 1 failed - "unused argument (sigma = sigma)"

What happened? Let’s pick it apart by looking at the `stat_gen`

parameter of `unsafe.test.1()`

.

get_MCHTest_settings(unsafe.test.1)$stat_gen

## function(x, sigma = 1) { ## x <- sigma * x ## ts(x, sigma = sigma) # unsafe ## }

This function depends on an object called `ts()`

. When the function looks for `ts()`

, it looks *in the global namespace!* This means that changes to `ts()`

in that namespace will change the behavior of the function. The most recent version of `ts()`

does not have a parameter called `sigma`

, prompting an error. *The object is not self-contained!*

How can we prevent side effects like this? One answer is to define the functions passed to `MCHTest()`

in a way that doesn’t depend on objects defined in the global namespace. For example, we would not call `ts()`

in `sg()`

above but instead rewrite the test statistic as we defined it in `ts()`

. (Using functions and objects defined in packages is okay, though, since these generally don’t change in an R session.)

However, this is not always practical. The test statistic written in `ts()`

could be complicated, and writing that same statistic again would not only be a lot of work but be tempting bugs to invade. Fortunately, `MCHTest()`

supports methods for making `MCHTest`

objects self-contained.

The first step is to set the `localize_functions`

parameter to `TRUE`

. This changes the environment of the `test_stat`

`stat_gen`

, `rand_gen`

, and `pval_func`

functions so that they belong to the environment the `MCHTest`

object lives in. Not only does this help make the function self-contained we may even be able to write our inputs in a more idiomatic way, like so:

ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma } sg <- function(x, sigma = 1) { x <- sigma * x test_stat(x, sigma = 1) # Would not be able to do this if localize_functions # were FALSE } safe.test.1 <- MCHTest(ts, sg, function(n) {rnorm(n)}, seed = 100, N = 100, fixed_params = "sigma", localize_functions = TRUE) safe.test.1(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 2.0277, sigma = 1, p-value = 0.02

ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) } safe.test.1(rnorm(10)) # Still works

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 1.0038, sigma = 1, p-value = 0.21

(Notice how `rand_gen`

was handled; it was wrapped in a function rather than passed directly. In short, this is to prevent the function `rnorm`

from being stripped of its namespace, since it needs functions from that namespace.)

This is the first step to removing side effects. (In fact it makes our functions better written since we can anticipate the existence of `test_stat`

as a function). However, we could still have variables or functions defined outside of our input functions. We can expose these functions to our localized input functions via the `imported_objects`

parameter, a list (the doppleganger of R’s environments) containing these objects.

ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma } sg <- function(x, sigma = 1) { x <- sigma * x ts(x) # We're going to do this safely now } safe.test.2 <- MCHTest(ts, sg, function(n) {rnorm(n)}, seed = 100, N = 100, fixed_params = "sigma", localize_functions = TRUE, imported_objects = list("ts" = ts)) safe.test.2(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.57274, sigma = 1, p-value = 0.39

ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) } safe.test.2(rnorm(10))

## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.24935, sigma = 1, p-value = 0.45

Both `safe.test.1()`

and `safe.test.2()`

are now immune to changes in the global namespace. They are self-contained and thus safe to use.

By default, `localize_functions`

is `FALSE`

. I thought of making it `TRUE`

by default but I feared that those not familiar with the concept of environments would be bewildered by all the errors that would be thrown whenever they tried to use a function they defined. Setting the parameter to `TRUE`

makes using `MCHTest()`

more difficult.

That said, I highly recommend using the parameter in a longer script. It makes the function safer (errors are good when they’re enforcing safety), so become acquainted with it.

(Next post: maximized Monte Carlo hypothesis testing)

- H. Wickham,
*Advanced R*(2015), CRC Press, Boca Raton

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>

This will be the first of a series of blog posts introducing the package. Most of the examples in the blog posts are already present in the manual, but I plan to go into more depth here, including some background and more detailed explanations.

**MCHT** is a package implementing an interface for creating and using Monte Carlo tests. The primary function of the package is `MCHTest()`

, which creates functions with S3 class `MCHTest`

that perform a Monte Carlo test.

**MCHT** is not presently available on CRAN. You can download and install **MCHT** from GitHub using **devtools** via the R command `devtools::install_github("ntguardian/MCHT")`

.

Monte Carlo testing is a form of hypothesis testing where the -values are computed using the empirical distribution of the test statistic computed from data simulated under the null hypothesis. These tests are used when the distribution of the test statistic under the null hypothesis is intractable or difficult to compute, or as an exact test (that is, a test where the distribution used to compute $p$-values is appropriate for any sample size, not just large sample sizes).

Suppose that is the observed value of the test statistic and large values of are evidence against the null hypothesis; normally, -values would be computed as , where is the cumulative distribution functions and is the random variable version of . We cannot use for some reason; it’s intractable, or the provided is only appropriate for large sample sizes.

Instead of using we will use , which is the empirical CDF of the same test statistic computed from simulated data following the distribution prescribed by the null hypothesis of the test. For the sake of simplicity in this presentation, assume that is a continuous random variable. Now our -value is , where where is the indicator function and is an independent random copy of computed from simulated data with a sample size of .

The power of these tests increase with (see [1]) but modern computers are able to simulate large quickly, so this is rarely an issue. The procedure above also assumes that there are no nuisance parameters and the distribution of can effectively be known precisely when the null hypothesis is true (and all other conditions of the test are met, such as distributional assumptions). A different procedure needs to be applied when nuisance parameters are not explicitly stated under the null hypothesis. [2] suggests a procedure using optimization techniques (recommending simulated annealing specifically) to adversarially select values for nuisance parameters valid under the null hypothesis that maximize the -value computed from the simulated data. This procedure is often called *maximized Monte Carlo* (MMC) testing. That is the procedure employed here. (In fact, the tests created by `MCHTest()`

are the tests described in [2].) Unfortunately, MMC, while conservative and exact, has much less power than if the unknown parameters were known, perhaps due to the behavior of samples under distributions with parameter values distant from the true parameter values (see [3]).

Bootstrap statistical testing is very similar to Monte Carlo testing; the key difference is that bootstrap testing uses information from the sample. For example a parametric bootstrap test would estimate the parameters of the distribution the data is assumed to follow and generate datasets from that distribution using those estimates as the actual parameter values. A permutation test (like Fisher’s permutation test; see [4]) would use the original dataset values but randomly shuffle the labeles (stating which sample an observation belongs to) to generate new data sets and thus new simulated test statistics. -values are essentially computed the same way.

Unlike Monte Carlo tests and MMC, these tests are not exact tests. That said, they often have good finite sample properties. (See [3].) See the documentation mentioned above for more details and references.

Why write a package for these types of tests? This is not the only package that facilitates bootstrapping or Monte Carlo testing. The website RDocumentation includes documentation for the package **MChtest**, by Michael Fay which exists for Monte Carlo testing, too. The package **MaxMC** by Julien Neves is devoted to MMC specifically, as described by [2]. Then there’s the package **boot**, which is intended to facilitate bootstrapping. (If I’m missing anything, please let me know in the comments.)

**MChtest** is no longer on CRAN and implements a particular form of Monte Carlo testing and thus does not work for MMC. **MaxMC** appears to be in a very raw state. **boot** seems general enough that it could be used for bootstrap testing but still seems more geared towards constructing bootstrap confidence intervals and standard errors rather than hypothesis testing. All of these have a very different architecture from **MCHT**, which is primarily for creating a function like `t.test()`

that performs a hypothesis test that was described when the function was created.

Additionally, this was good practice in practicing package development and more advanced R programming. This is the first time I made serious use of closures, S3 classes and R’s flavor of object-oriented programming, and environments. So far the result seems to be an flexible and robust tool for performing tests based on randomization.

Let’s start with a “Hello, world!”-esque example for a Monte Carlo test: a Monte Carlo version of the -test.

The one-sample -test, one of the oldest statistical tests used today, is used to test for the location of the population mean . It decides between the set of hypotheses:

(The alternative could also be one-sided, perhaps instead stating .) The -test is an exact, most-powerful test for any sample size if the data generating process (DGP) that was used to produce the sample is a Gaussian distribution. If we believe this assumption then the Monte Carlo version of the test is a contrived example as we could not do better than to use `t.test()`

, but the moment we drop this assumption there is an opening for Monte Carlo testing to be useful.

Let’s load up the package.

library(MCHT)

## .------..------..------..------. ## |M.--. ||C.--. ||H.--. ||T.--. | ## | (\/) || :/\: || :/\: || :/\: | ## | :\/: || :\/: || (__) || (__) | ## | '--'M|| '--'C|| '--'H|| '--'T| ## `------'`------'`------'`------' v. 0.1.0 ## Type citation("MCHT") for citing this R package in publications

(Yes, I’ve got a cute little `.onAttach()`

package start-up message. I first saw a message like this implemented by **mclust** and of course Stata’s start-up message and thought they’re so adorable that I will likely add such messages to all my packages. You can use `suppressPackageStartupMessages()`

to make this quiet if you want. Thanks to the Python package **art** for the cool ASCII art.)

The star function of the pacakge is the `MCHTest()`

function.

args(MCHTest)

## function (test_stat, stat_gen, rand_gen = function(n) { ## stats::runif(n) ## }, N = 10000, seed = NULL, memoise_sample = TRUE, pval_func = MCHT::pval, ## method = "Monte Carlo Test", test_params = NULL, fixed_params = NULL, ## nuisance_params = NULL, optim_control = NULL, tiebreaking = FALSE, ## lock_alternative = TRUE, threshold_pval = 1, suppress_threshold_warning = FALSE, ## localize_functions = FALSE, imported_objects = NULL) ## NULL

The documentation for this function is the majority of the manual and I’ve written multiple examples demonstrating its use. In short, a single call to `MCHTest()`

will create an `MCHTest`

-S3-class object (which is just a function) that can be use for hypothesis testing. Three arguments (all of which are functions) passed to the call will characterize the resulting test:

`test_stat`

: A function with an argument`x`

that computes the test statistic, with`x`

being the argument that accepts the dataset from which to compute the test statistic.`rand_gen`

: A function generating random datasets, and must have either an argument`x`

that would accept the original dataset or an argument`n`

that represents the size of the dataset.`stat_gen`

: A function with an argument`x`

that will take the random numbers generated by`rand_gen`

and turn them into a simulated test statistic. Sometimes`stat_gen`

is the same as`test_stat`

, but it is better to write separate functions, as will be seen later.

The functions passed to these arguments can accept other parameters, particularly parameters describing test parameters (that is, the parameter values we are testing, such as the population mean ), fixed parameters (parameter values the test assumes, like , the population standard deviation, whose value is assumed by the -test often taught in introductory statistics courses; see this link), and nuisance parameters (parameter values we don’t know, are not directly investigating, and may be needed to know the distribution of the test statistic). For the cases mentioned above, there are `MCHTest()`

parameters that can be used for recognizing them: `test_params`

, `fixed_params`

, and `nuisance_params`

, respectively. While one could in principle ignore these parameters and pass functions to `test_stat`

, `stat_gen`

, and `rand_gen`

that use them anyway, I would recommend not doing so. First, there’s no guarantee that `MCHTest`

-class objects would handle the extra parameters correctly. Second, when `MCHTest()`

is made aware of these special cases, it can check that the functions passed to `test_stat`

, `stat_gen`

, and `rand_gen`

handle these types of parameters correctly and will throw an error when it appears they do not. This safety measure helps you use **MCHT** correctly.

Carrying on, let’s create our first `MCHTest`

object for a -test.

ts <- function(x) { sqrt(length(x)) * mean(x)/sd(x) } mc.t.test.1 <- MCHTest(ts, ts, rnorm, N = 10000, seed = 123)

Above, both `test_stat`

and `stat_gen`

are `ts()`

(they're the first and second arguments, respectively) and the random number generator `rand_gen`

is `rnorm()`

. Two other parameters are:

`N`

: the number of simulated test statistics to generate.`seed`

: the seed of the random number generator, which makes test results consistent and reproducible.

`MCHTest`

-class objects have a `print()`

method that summarize how the object was defined. We see it in action here:

mc.t.test.1

## ## Details for Monte Carlo Test ## ## Seed: 123 ## Replications: 10000 ## ## Memoisation enabled ## Argument "alternative" is locked

This will tell us the seed being used and the number of replicates used for hypothesis testing, along with other messages. I want to draw attention to the message `Argument "alternative" is locked`

. This means that the test we just created will ignore anything passed to the parameter `alternative`

(similar to the parameter of the same name `t.test()`

has). We can enable that parameter by setting the `MCHTest()`

parameter `lock_alternative`

to `FALSE`

.

(mc.t.test.1 <- MCHTest(ts, ts, rnorm, N = 10000, seed = 123, lock_alternative = FALSE))

## ## Details for Monte Carlo Test ## ## Seed: 123 ## Replications: 10000 ## ## Memoisation enabled

Let's now try this function out on data.

dat <- c(0.27, 0.04, 1.37, 0.23, 0.34, 1.44, 0.34, 4.05, 1.59, 1.54) mc.t.test.1(dat)

## ## Monte Carlo Test ## ## data: dat ## S = 2.9445, p-value = 0.0072

If you run the above code you may see a complaint about `%dopar%`

being run sequentially. This complaint appears when we don't register CPU cores for parallelization. **MCHT** uses **foreach**, **doParallel**, and **doRNG** to parallelize simulations and thus hopefully speed them up. Simulations can take a long time and parallelization can help make the process faster. If we were to continue we would not see the complaint again; R accepts that there's only one core visible and thus doesn't parallelize. But we can register the other cores on our system with the following:

library(doParallel) registerDoParallel(detectCores())

Not only do we have parallelization enabled, `MCHTest()`

automatically enables memoization so that it doesn't redo simulations if the data (or at least the data's sample size) hasn't changed. (This can be turned off by setting the `MCHTest()`

parameter `memoise-sample`

to `FALSE`

.) Again, this is so that we save time and don't have to fear repeat usage of our `MCHTest`

-class function.

The above test effectively checked whether the population mean was zero against the alternative that the population mean is greater than zero (due to the default behaviour when `alternative`

is not specified). By changing the `alternative`

parameter we can test against other alternative hypotheses.

mc.t.test.1(dat, alternative = "less")

## ## Monte Carlo Test ## ## data: dat ## S = 2.9445, p-value = 0.9928 ## alternative hypothesis: less

mc.t.test.1(dat, alternative = "two.sided")

## ## Monte Carlo Test ## ## data: dat ## S = 2.9445, p-value = 0.0144 ## alternative hypothesis: two.sided

Compare this to `t.test()`

.

t.test(dat, alternative = "two.sided")

## ## One Sample t-test ## ## data: dat ## t = 2.9445, df = 9, p-value = 0.01637 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 0.2597649 1.9822351 ## sample estimates: ## mean of x ## 1.121

The two tests reach similar conclusions.

However, `t.test()`

is an exact and most-powerful test at any sample size under the assumptions we made. But all we need to do is not assume the data was drawn from a Gaussian distribution to throw the -test for a loop. -test will often do well even when the Gaussian assumption is violated but those statements hold for large sample sizes; at no sample size will the test be an exact test. Monte Carlo tests, though, can be exact tests for any sample size under different (often strong) distributional assumptions, without having to compute the distribution of the test statistic under the null hypothesis.

I know for a fact that `dat`

was generated using an exponential distribution, so let's write a new version of the -test that uses this information. While we're at it, let's add a parameter so that we know we're teseting for the mean of the data and that mean can be specified by the user.

ts <- function(x, mu = 1) { # Throw an error if mu is not positive; exponential random variables have only # positive mu if (mu <= 0) stop("mu must be positive") sqrt(length(x)) * (mean(x) - mu)/sd(x) } sg <- function(x, mu = 1) { x <- mu * x sqrt(length(x)) * (mean(x) - mu)/sd(x) } (mc.t.test.2 <- MCHTest(ts, sg, rexp, seed = 123, method = "One-Sample Monte Carlo Exponential t-Test", test_params = "mu", lock_alternative = FALSE))

## ## Details for One-Sample Monte Carlo Exponential t-Test ## ## Seed: 123 ## Replications: 10000 ## Tested Parameters: mu ## Default mu: 1 ## ## Memoisation enabled

Using this new function works the same, only now we can specify the we want to test.

mc.t.test.2(dat, mu = 2, alternative = "two.sided")

## ## One-Sample Monte Carlo Exponential t-Test ## ## data: dat ## S = -2.3088, p-value = 0.181 ## alternative hypothesis: true mu is not equal to 2

mc.t.test.2(dat, mu = 1, alternative = "two.sided")

## ## One-Sample Monte Carlo Exponential t-Test ## ## data: dat ## S = 0.31782, p-value = 0.6888 ## alternative hypothesis: true mu is not equal to 1

t.test(dat, mu = 1, alternative = "two.sided")

## ## One Sample t-test ## ## data: dat ## t = 0.31782, df = 9, p-value = 0.7579 ## alternative hypothesis: true mean is not equal to 1 ## 95 percent confidence interval: ## 0.2597649 1.9822351 ## sample estimates: ## mean of x ## 1.121

Now the -test and the Monte Carlo test produce -values that are not similar, and the Monte Carlo -test will in general be more accurate. (It appears that the regular -test is more conservative than the Monte Carlo test and thus is less powerful.)

I would consider the current release of **MCHT** to be early beta; it is usable but it's not yet able to be considered "stable". Keep that in mind if you plan to use it.

I'm very excited about this package and look forward to writing more about it. Stay tuned for future blog posts explaining its functionality. It's highly likely it has strange and mysterious behavior so I hope that if anyone encounters strange behavior, they report it and help push **MCHT** closer to a "stable" state.

I'm early in my academic career (in that I'm a Ph.D. student without any of my own publications yet), and I'm unsure if this package is worth a paper in, say, *J. Stat. Soft.* or the *R Journal* (heck, I'd even write a book about the package if it deserved it). I'd love to hear comments on any future publications that others would want to see.

Thanks for reading and stay tuned!

Next post: making `MCHTest`

objects self-contained.

- A. C. A. Hope,
*A simplified Monte Carlo test procedure*, JRSSB, vol. 30 (1968) pp. 582-598 - J-M Dufour,
*Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics*, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477 - J. G. MacKinnon,
*Bootstrap hypothesis testing*in*Handbook of computational econometrics*(2009) pp. 183-213 - R. A. Fisher,
*The design of experiments*(1935) - R. Davidson and J. G. MacKinnon,
*The size distortion of bootstrap test*, Econometric Theory, vol. 15 (1999) pp. 361-376

*Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

]]>

*Hands-On Data Analysis with NumPy and Pandas* is now available for purchase from Packt Publishing’s website and from Amazon. This book was created by a team at Packt Publishing who took my video course and turned it into book form. If you’re like me and love books that you can hold in your hand, touch, thumb through, etc., and you’re looking to learn about basic tools for data analysis with Python, give my book a look.

As with the video course, the book covers how to set up an environment for data analysis with Python and how to use two important tools: NumPy and pandas.

I discuss how to set up Anaconda, a popular data analysis environment, along with how to use Jupyter Notebooks. I show how to connect Python with a MySQL database, along with how to set up such a database.

Then I show how to use NumPy. This includes creating NumPy arrays, indexing arrays, using arrays in arithmetic, NumPy linear algebra, and vectorization. These are essential skills anyone using Python for data analysis should know.

Finally I show how to use pandas. This includes creating a pandas `DataFrame`

, subsetting the data frame, indexing, plotting, and even how to handle missing data. `DataFrame`

s are a great way to manage data and I highly recommend their use.

The book consists of numerous tutorials demonstrating these concepts. I think this book would be great for an introductory course on data science for programming novices who just learned Python basics (perhaps from the book I learned from, Allen Downey’s *Think Python*) and are starting to learn the basics of data analysis. The basics of using NumPy arrays and pandas `DataFrame`

s is challenging for beginners, and my book helps get them going.

I list the book’s chapters below:

- Setting Up a Python Data Analysis Environment
- Diving Into NumPy
- Operations on NumPy Arrays
- pandas are Fun! What is pandas?
- Arithmetic, Function Application, and Mapping with pandas
- Managing, Indexing and Plotting

I would like to thank the staff at Packt Publishing for their work on this book, particularly Tushar Gupta and Nikita Shetty. I was so pleased when I received my copies in the mail and I thank them for their hard work to make this possible.

The MSRP for the books is $23.99, but is currently on sale for $10 as part of Packt’s AI Now campaign, so pick it up while it’s cheap! If you’re not interested in buying this particular book, perhaps consider getting a Mapt subscription. You’ll have access to thousands of books and video courses (including all of my content), and can even get one book to keep for free (without DRM) every month! Perhaps that book will be mine! It’s a great deal you should consider.

]]>