Winning the Battle for Riddler Nation; An Agent-Based Modelling Approach to the Solution

Introduction

Oliver Roeder runs a column on FiveThirtyEight called “The Riddler,” where he proposes logical and mathematical puzzles for readers to solve. On February 3rd of this year, he posted in Riddler Classic the problem, “Can You Rule Riddler Nation?” Here is the description:

In a distant, war-torn land, there are 10 castles. There are two warlords: you and your archenemy. Each castle has its own strategic value for a would-be conqueror. Specifically, the castles are worth 1, 2, 3, …, 9, and 10 victory points. You and your enemy each have 100 soldiers to distribute, any way you like, to fight at any of the 10 castles. Whoever sends more soldiers to a given castle conquers that castle and wins its victory points. If you each send the same number of troops, you split the points. You don’t know what distribution of forces your enemy has chosen until the battles begin. Whoever wins the most points wins the war.

Submit a plan distributing your 100 soldiers among the 10 castles. Once I receive all your battle plans, I’ll adjudicate all the possible one-on-one matchups. Whoever wins the most wars wins the battle royale and is crowned king or queen of Riddler Nation!

My solution was (1, 1, 1, 1, 1, 1, 23, 23, 24, 24), with this rationale:

Their are 55 possible points so one must win 28 to win. I will send 1 to each in case one person does not send any to a base. Then I will divide the remaining 90 among 7, 8, 9, 10 which is all that is nescecary to reach 28. 90/4 is 22 with 2 remainder. This will be put on the 9 and 10 to go after the most important targets.

The winning solution, by Cyrus Hettle, was (3, 5, 8, 10, 13, 1, 26, 30, 2, 2), and he won 84% of his battles, an excellent showing. In general, the winning strategies focused more on capturing lots of low-value castles than a few high-value ones, as my strategy did.

My strategy lost to the winning strategy, as the code below shows:

import pandas as pd
from pandas import Series, DataFrame
import numpy as np
def gen_strat(obj):
    """Generating a Series consisting of castle troop assignments

    args:
        obj: An iterable (NumPy array, Series, list, tuple, etc.) containing troop assignment

    return:
        Series; The generated strategy
    """

    srs = Series(obj).apply(int)
    if (len(srs) != 10) or (srs.sum() != 100) or ((srs < 0).values.any()):
        raise ValueError("A valid strategy consists of ten integers between 0 and 100 that sum to 100.")
    srs.index = pd.Index(np.arange(10) + 1)

    return srs
def check_strat(strat):
    """Checks that strat is a valid strategy, like that created by gen_strat

    args:
        strat: Series; An object to check

    return:
        bool; True if valid
    """

    if str(type(strat)) == "<class 'pandas.core.series.Series'>":
        # All the other conditions that must hold
        if len(strat) == 10 and strat.sum() == 100 and (strat >= 0).values.all() and \
           (strat == strat.apply(int)).values.all() and (strat.index.values == (np.arange(10) + 1)).all():
                return True

    return False
def battle_strat(strat1, strat2):
    """Determine which strategy between two wins

    args:
        strat1: Series; generated by gen_strat (or obeying its rules)
        strat2: Series; generated by gen_strat (or obeying its rules)

    return:
        dict; with the following fields:
            * "winner": int; 1 for strat1, -1 for strat2, 0 for tie
            * "pts": list; Item 0 is strat1's points, Item1 strat2's points
    """
    if not check_strat(strat1) or not check_strat(strat2):
        raise ValueError("Both strat1 and strat2 must be valid strategies")

    castle_margins = strat1 - strat2
    win1 = (castle_margins > 0).apply(int)
    win2 = (castle_margins < 0).apply(int)
    tie = (castle_margins == 0).apply(int)
    castle_points = Series(strat1.index, index=strat1.index)
    tie_pts = (tie * castle_points).sum() / 2
    pts1 = (win1 * castle_points).sum() + tie_pts
    pts2 = (win2 * castle_points).sum() + tie_pts
    pts_list = [pts1, pts2]

    if pts1 > pts2:
        return {"winner": 1, "pts": pts_list}
    elif pts1 < pts2:
        return {"winner": -1, "pts": pts_list}
    else:
        return {"winner": 0, "pts": pts_list}
myStrat  = gen_strat((1, 1, 1,  1,  1, 1, 23, 23, 24, 24))
winStrat = gen_strat((3, 5, 8, 10, 13, 1, 26, 30,  2,  2))
battle_strat(myStrat, winStrat)
{'pts': [22.0, 33.0], 'winner': -1}

You can see that my strategy went top-heavy, while the strategies that did best were bottom-heavy; they went for castles worth little and taking just enough point-heavy castles to win the tournament.

Troops sent to castles, with winning distribution shown; credit to Oliver Roeder at FiveThirtyEight

(Troops sent to castles, with winning distribution shown; credit to Oliver Roeder at FiveThirtyEight.)

This was my favorite of the Riddler challenges (I rarely submit to them but sometimes try to think of a solution), and I’m thrilled to see Mr. Roeder offer another one, but this time participants have the data for how others played, and Mr. Roeder encourages the players to use this data.

Challenge accepted, good sir.

Colonel Blotto Game

I read the first instantiation of the game while I was riding the train home from the U. of U., and I realized that, from a game-theoretic perspective, the game must have an optimal strategy (most likely a mixed strategy, where players randomly pick their moves; for example, the solution to rock-paper-scissors is a mixed strategy where a player randomly picks rock, paper, or scissors one third of the time for each), and I started thinking about how to find it, only to realize that the space of possible strategies is huge; in fact, there are 42,634,215,112,710 possible troop allocations. No way a computer is going to solve that. I tried to think of approximation methods that could possibly solve this problem, but my research turned up nothing. So I went off intuition (which didn’t work out well).

Sure enough, these problems are far from new, as Mr. Roeder discussed in the follow-up. They’re known as Colonel Blotto games.

It’s easy to show (TM) that these games have a Nash equilibrium and thus a solution, an optimal strategy. In fact, this pops straight out from Nash’s theorem; the only conditions required to apply his theorem are that there be two players (and there are), and that there be a finite number of potential strategies (which there is, although that finite number is in the trillions).

The hard part is finding the optimal strategy. In fact, that part is really hard. You can imagine why.

In some versions of the game, solutions can be found with analytic techniques. This was done for the version of the game proposed in a Rand Corporation paper in 1950 that is responsible for the name “Colonel Blotto”. The solution was found in 2006 by Brian Roberson, after 56 years of being unsolved. I doubt that the Battle for Riddler Nation is a game with a clean solution. It will likely require applying computational techniques that find approximate solutions rather than truly optimal ones.

Michael Wittman, a PhD candidate at MIT, wrote a paper applying agent-based models to the Colonel Blotto game, even providing some of the Python code he used. It’s an interesting read and I like the idea. His paper focuses on emulating round-robin tournaments like the Battle for Riddler Nation, where people create strategies, the researcher pits them against one-another, and sees which comes out on top. He also wanted to explore the characteristics of winning strategies.

I think that the agent-based approach could be used for finding quasi-optimal strategies, and I would like to try it here.

Agent-Based Modelling

Agent-based modelling is a computational approach for exploring complex systems by creating the agents that make up the system, putting them together, simulating their individual behaviors, allowing them to interact with one-another, and seeing what characteristics emerge in the larger system. For example, people apply agent-based models to traffic by creating a road network, creating simulated drivers, give them rules of behavior and desired destinations, and set them loose. It’s used in economics, sociology, biology, and other fields.

Agent-based modelling appeals for a number of reasons. First, it feels realistic to study a forest by simulating the trees and seeing what emerges. Second, the assumptions applied to agents seem more verifiable (and thus realistic) than making assumptions for the whole system. Third, agent-based models create a history that allows not only the end result but the dynamics leading up to that result to be studied. Fourth, they provide hope for solving problems for which an analytic solution is simply not possible, or confirming an analytic solution’s sufficiency. Fifth, they allow for investigating how sensitive a result is to initial conditions or assumptions. In a sense, they can help make sense of chaotic or fluid systems.

I’ve been interested in agent-based modelling for years. I would have loved to have done a thesis that applied agent-based models (ABMs) in an economic context. The appeal might stem from my life long love of video games and simulated systems, or just how cool I think it is to be creating petri-dish economies and societies on my computer. And I do believe that as computing power increases ABMs will be applied more for making forecasts and predictions, and that they are more realistic and general approaches to problems. Their results may be more trustworthy than those of analytic approaches, in my mind.

I’ve been planning to creating and applying ABMs for a while, and writing articles about them prior to the Battle for Riddler Nation. I want to use them to explore self-segregation of communities (see the seminal Schelling segregation model, which you can try without a computer), the emergence of two-party systems in winner-take-all elections (you will never convince me voting for Jill Stein or any Green Party nominee instead of the Democrat is a good idea), and maybe the development of technical analysis in financial markets. (So far I’ve found one book that might serve as a good starting point for studying ABMs, Agent-Based Models of the Economy; From Theories to Applications, by Boero et. al., which uses Python.)

Here, the idea is to create agents that always use some strategy for playing the Battle for Riddler Nation. Agents that lose their battles are dropped, while those that win spread. This approach emulates natural selection in nature; the “fit” reproduce, while the “unfit” do not, directing the species to the optimal design for their environment. Mr. Wittman applies such a technique in his paper, though I will make my own variations.

Unfortunately, the technique I just described does not find mixed strategies (that is, a strategy that randomly chooses a fixed strategy to apply). This is particularly bad in this context because one of the few things that is known with absolute certainty about Colonel Blotto games is that they have a mixed strategy, not a pure one. The hope is that whatever “stable” dynamics emerge will constitute a mixed strategy.

Here is an example. Suppose that we want to use agent-based modelling to solve rock-paper-scissors. We know that the optimal solution to rock-paper-scissors is to play rock, paper, and scissors randomly, choosing each one-third of the time in the long run. But the agents don’t play this way; each agent will either play either rock, paper, or scissors every game. So no one will be playing optimally, but hopefully one-third of all agents play pure-rock, one-third play pure-scissors, and one-third play pure-paper. Then the system, as a whole, represents an optimal mixed strategy.

I have no proof this is the case. This is just a hunch or hope. (Guess I have another project; see if this rock-paper-scissors dynamic as I described plays out).

Because the space of possible strategies in the Colonel Blotto-type games is so large, I would like the agents to explore the space. My idea is to allow mutations in the agents that get to reproduce. When an agent reproduces, the offspring has a chance to mutate, where the strategy they use is changed slightly. But as an agent lives longeer, the chance of producing a mutant offspring decreases, and those offspring that do mutate don’t mutate as much when they’re born from older agents. The rationale is that we should explore the sample space but we should bias it towards successful agents.

Again, I don’t know if this will work, but it’s worth a shot.

Mesa

While Wittman’s paper and Boero et. al‘s use Python and come with code examples, I want to use the Python library Mesa for creating an ABM. Mesa is a framework for creating and running ABMs and getting analytics from them, by David Masad and Jacqueline Kazil. It provides objects that can create agents, models, and visualizations. You can read Mr. Masad’s and Ms. Kazil’s paper, and watch their video presentations.

Mesa is still in development and there are models that it currently does not support, but the model that I will be using isn’t too fancy. It also is intended to integrate well with the rest of the scientific Python universe; pandas DataFrames and NumPy are supposedly supported by the package.

What Strategies to Use?

Mr. Wittman was interested in how to generate strategies for a round-robin tournament. Here, I’ve already been given a number of strategies, those used in the last Battle for Riddler Nation. My plan is to populate my world with agents that use the strategies players have already submitted.

Now, Mr. Roeder challenged players in the second round to use the results of the first round. I’m curious how people will process this. On the one hand, people may lean heavily towards emulating the top-performing strategies, in hopes ofad obtaining similar results. But then someone will come along who does just slightly better than the top-performers, or perhaps finds a flaw in that strategy and develops one that beats the best strategy and those similar to it. But then someone could realize this potentiality and develop a strategy that beats the strategy that beats what used to be the best strategy. And on and on and on…

Even with that in mind, I would rather just start with what I know, then hope that the mutations will account for these dynamics.

Here I load in the data file with the strategies submitted in the last Battle for Riddler Nation.

riddler_strats = pd.read_csv("castle-solutions.csv", encoding='latin-1').iloc[:, 0:10]    # Get only castle data
riddler_strats.head()
Castle 1 Castle 2 Castle 3 Castle 4 Castle 5 Castle 6 Castle 7 Castle 8 Castle 9 Castle 10
0 100 0 0 0 0 0 0 0 0 0
1 52 2 2 2 2 2 2 12 12 12
2 26 26 26 16 1 1 1 1 1 1
3 26 5 5 5 6 7 26 0 0 0
4 25 0 0 0 0 0 0 25 25 25
riddler_strats.columns = pd.Index(np.arange(10) + 1)
riddler_strats.head()
1 2 3 4 5 6 7 8 9 10
0 100 0 0 0 0 0 0 0 0 0
1 52 2 2 2 2 2 2 12 12 12
2 26 26 26 16 1 1 1 1 1 1
3 26 5 5 5 6 7 26 0 0 0
4 25 0 0 0 0 0 0 25 25 25
riddler_strats.shape
(1387, 10)

Some of the strategies people submitted were not valid; you can see that is in fact the case for the strategy with index 3 (it does not add up to 100). Here is a count of how many strategies are valid, followed by filtering the DataFrame for valid strategies.

riddler_strats.apply(lambda x: not check_strat(x), axis=1).sum()
38
riddler_strats = riddler_strats.loc[riddler_strats.apply(check_strat, axis=1), :]
riddler_strats.head()
1 2 3 4 5 6 7 8 9 10
0 100 0 0 0 0 0 0 0 0 0
1 52 2 2 2 2 2 2 12 12 12
2 26 26 26 16 1 1 1 1 1 1
4 25 0 0 0 0 0 0 25 25 25
5 25 0 0 0 0 0 0 25 25 25

And just for kicks, let’s see how my strategy performed against these.

myStrat_res = riddler_strats.apply(lambda x: battle_strat(myStrat, x)["winner"], axis=1)
myStrat_res.head()
0    1
1    1
2    1
4   -1
5   -1
dtype: int64
# Number of wins
(myStrat_res == 1).sum()
647
# Number of losses
(myStrat_res == -1).sum()
692
# Number of ties
(myStrat_res == 0).sum() - 1    # There is one auto-tie, against myself
9
647 / (692 + 647 + 9)
0.47997032640949555

With a win rate of 48%, my strategy didn’t do particularly well. But whatever. Water under the bridge now.

Anyway, just for the fun of it, I’m going to describe the ABM in a narrative.

The Great Wars for Riddler Nation

Riddler Nation has been split between 1,349 warlords. There are 4,047 provinces, with each province having ten castles, with values ranging from one to ten. Each warlord starts with three provinces. Wars take place and winners are determined as before.

Every year, every warlord wars with another warlord, chosen at random (Riddler Nation has teleportation magic, so every province is adjacent to every other province). Each warlord has his or her own war doctrine, and the warlord clings to it until death. When a warlord loses all the provinces under his control, the warlord dies, a pauper. But warlords only know war; they cannot manage their lands very well. The moment a warlord finds himself in charge of six provinces, he splits his domain, giving three provinces to his eldest son. The eldest son may have his own war doctrine, or may share his father’s, but the sons of the most successful warlords tend to stick with their fathers’ ways.

Let’s see how the lands of Riddler Nation evolve.

import mesa, mesa.time, mesa.datacollection
import random
import numpy as np
from numpy.random import poisson, uniform
class Warlord(mesa.Agent):
    """A warlord of Riddler Nation, who starts with a war doctrine, provinces, and fights other warlords"""
    def __init__(self, model, unique_id, doctrine, provinces, max_provinces, mutate_prob, mutate_amount):
        """Create a warlord

        args:
            model: mesa.Model; The model to which the agent belongs
            unique_id: int; Identifies the agent
            doctrine: Series; Describes the doctrine, and must be like one generated by gen_strat()
            provinces: int; The number of provinces the warlord starts with
            max_provinces: int; The maximum number of provinces a warlord can have before a split
            mutate_prob: float; A number between 0 and 1 that represents the probability of a mutation in offspring
            mutate_ammount: float; A number greater than 0 representing average number of mutations in doctrine
                                   of offspring

        return:
            Warlord
        """

        self.model = model

        if not check_strat(doctrine):
            raise ValueError("doctrine must be a valid strategy")

        self.doctrine = doctrine
        self.unique_id = int(unique_id)
        self.provinces = max(0, int(provinces))            # provinces at least 0
        self.max_provinces = max(2, int(max_provinces))    # max_provinces at least 2
        self.mutate_prob = max(min(mutate_prob, 1), 0)     # between 0 and 1
        self.mutate_amount = max(mutate_amount, 0)

        self.opponent = None    # In future, will be the id of this warlord's opponent

    def step(self):
        """In a step, find a new opponent to battle if not assigned"""

        if self.opponent is None:
            other = self.model.get_random_warlord()
            self.opponent = other
            other.opponent = self

    def advance(self):
        """Fight, and handle death or split of provinces"""

        if self.opponent is not None:
            # Resolve battle
            other = self.opponent
            res = battle_strat(self.doctrine, other.doctrine)["winner"]
            self.provinces += res
            other.provinces -= res
            # Clear opponents
            self.opponent = None
            other.opponent = None

        # Resolve possible death
        if self.provinces <= 0:
            self.model.schedule.remove(self)

        # If too many provinces, split, create new warlord
        if self.provinces >= self.max_provinces:
            son_doctrine = self.doctrine
            son_mutate_prob = self.mutate_prob
            son_mutate_amount = self.mutate_amount
            son_provinces = int(self.provinces / 2)

            # Is there a mutation?
            if uniform() < self.mutate_prob:
                # Yes! Apply the mutation
                son_mutate_prob = (self.mutate_prob + 1) / 2
                son_mutate_amount = self.mutate_amount * 2
                # Change doctrine
                mutations = min(poisson(self.mutate_amount), 100)
                for _ in range(mutations):
                    son_doctrine[random.choice(son_doctrine.index)] += 1                      # Add 1 to a random castle
                    son_doctrine[random.choice(son_doctrine[son_doctrine > 0].index)] -= 1    # Subtract 1 from a random
                                                                                              # castle that has at least
                                                                                              # one soldier
            # Create the son
            son = Warlord(model = self.model,
                          unique_id = self.model.next_id(),
                          doctrine = son_doctrine,
                          provinces = son_provinces,
                          max_provinces = self.max_provinces,
                          mutate_prob = son_mutate_prob,
                          mutate_amount = son_mutate_amount)
            self.model.schedule.add(son)

            # Changes to the warlord
            self.mutate_prob /= 2
            self.mutate_amount /= 2
            self.provinces = self.max_provinces - son_provinces
class RiddlerBattle(mesa.Model):
    """The Model that handles the simulation of the Battle for Riddler Nation"""
    def __init__(self, doctrines, N=None, max_provinces=6, mutate_prob=0.9, mutate_amount=10, debug=False):
        """Create an instance of the Battle for Riddler Nation

        args:
            doctrines: DataFrame; Contains all the doctrines for the warlords to create
            N: int; The number of warlords to create; if None, the number of rows of doctrines
            max_provinces: int; The maximum number of provinces each warlord can have
            mutate_prob: float; Each warlord's initial mutation probability
            mutate_amount: float; Each warlord's initial rate of number of mutations
            debug: bool; Debug mode

        return:
            RiddlerBattle
        """

        if N is None:
            N = doctrines.shape[0]

        if debug:
            strat = {"Doctrine": lambda w: w.doctrine.values, "Provinces": lambda w: w.provinces}
        else:
            strat = {"Doctrine": lambda w: w.doctrine.values}
        self.schedule = mesa.time.SimultaneousActivation(self)    # Battles are determined simultaneously
        self._max_id = 0    # The highest id currently used by the model
        self.dc = mesa.datacollection.DataCollector(agent_reporters=strat)    # Data collection

        self.create_warlords(doctrines, N, max_provinces, mutate_prob, mutate_amount)
        self._gen_warlord_list()

    def create_warlords(self, doctrines, N, max_provinces, mutate_prob, mutate_amount):
        """Populate the model with warlords

        args:
            doctrines: DataFrame; Contains all the doctrines for the warlords to create
            N: int; The number of warlords to create
            max_provinces: int; The maximum number of provinces each warlord can have
            mutate_prob: float; Each warlord's initial mutation probability
            mutate_amount: float; Each warlord's initial rate of number of mutations
        """

        i = 0
        init_provinces = int(max_provinces / 2)
        for _ in range(N):
            w = Warlord(model = self,
                        unique_id = self.next_id(),
                        doctrine = doctrines.iloc[i, :],
                        provinces = init_provinces,
                        max_provinces = max_provinces,
                        mutate_prob = mutate_prob,
                        mutate_amount = mutate_amount)
            self.schedule.add(w)

            i += 1
            if (i >= doctrines.shape[0]):
                i = 0    # We've run out of available doctrines, so start at the beginnning of the dataframe

    def _gen_warlord_list(self):
        """Generate a list of warlords used for assigning opponents"""
        self._warlord_list = [w for w in self.schedule.agents]

    def get_random_warlord(self):
        """Pull a random warlord without an opponent"""
        i = random.choice([i for i in range(len(self._warlord_list))])
        return self._warlord_list.pop(i)

    def next_id(self):
        """Get the next valid id for the model, and update the max id of the model

        return:
            int; Can be used as an id
        """

        self._max_id += 1
        return self._max_id

    def step(self):
        """Take a step"""
        self.dc.collect(self)
        self.schedule.step()
        self._gen_warlord_list()   # Reset the list of available warlords

    def run_model(self, steps):
        """Run the model

        args:
            steps: int; The number of steps to run the model
        """

        steps = int(steps)
        for _ in range(steps):
            self.step()

Having written the code defining the land of Riddler Nation, let’s run the model!

… first on a baby version, using only the first few strategies.

riddler_strats.head()    # The strategies being used
1 2 3 4 5 6 7 8 9 10
0 100 0 0 0 0 0 0 0 0 0
1 52 2 2 2 2 2 2 12 12 12
2 26 26 26 16 1 1 1 1 1 1
4 25 0 0 0 0 0 0 25 25 25
5 25 0 0 0 0 0 0 25 25 25
rb = RiddlerBattle(riddler_strats.head())
rb.run_model(100)    # 100 battles
strat_hist = rb.dc.get_agent_vars_dataframe()    # Get the doctrines used by agents
strat_hist.sort_index(inplace=True)
strat_hist.loc[(slice(0, 10)), :]    # See the first ten rounds
Doctrine
Step AgentID
0 1 [100, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
3 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
1 1 [100, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
3 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
2 1 [100, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
3 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
3 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
3 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
3 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
5 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
3 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
6 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
7 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
8 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
8 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
5 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
8 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
9 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
8 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
10 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
4 [25, 0, 0, 0, 0, 0, 0, 25, 25, 25]
6 [24, 25, 25, 17, 3, 1, 1, 0, 3, 1]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
8 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
strat_hist.tail()
Doctrine
Step AgentID
98 9 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
99 2 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
7 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
8 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]
9 [53, 3, 2, 3, 4, 1, 1, 11, 11, 11]

Looks like 100 steps was enough to reach stability. How does the new strategy perform?

strat_hist.head().applymap(lambda x: battle_strat(gen_strat(strat_hist.head().iloc[4, 0]), gen_strat(x))["winner"])
Doctrine
Step AgentID
0 1 1
2 -1
3 1
4 0
5 0

Not bad. We would expect an optimal strategy to tie or win at least half of the time. It will not win all the time, just like no one wins rock-paper-scissors all the time when both players play optimally.

Now let’s go to the full simulation.

%time rb_full = RiddlerBattle(riddler_strats)
CPU times: user 3.43 s, sys: 16 ms, total: 3.45 s
Wall time: 3.53 s
%time rb_full.run_model(100)    # A hundred years of warfare
CPU times: user 18min 29s, sys: 2.36 s, total: 18min 31s
Wall time: 18min 53s
warlords = rb_full.dc.get_agent_vars_dataframe()
warlords.loc[(99), :]
Doctrine
AgentID
1517 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
2399 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
2680 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
3727 [2, 11, 8, 1, 2, 22, 0, 13, 39, 2]
3765 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
4389 [2, 0, 2, 4, 3, 16, 27, 36, 4, 6]
4666 [8, 5, 12, 7, 4, 31, 3, 0, 26, 4]
5191 [3, 7, 6, 2, 24, 11, 9, 34, 2, 2]
5326 [0, 1, 0, 11, 0, 24, 2, 26, 3, 33]
5356 [9, 4, 10, 10, 14, 22, 15, 14, 1, 1]
5634 [9, 7, 0, 6, 17, 10, 1, 1, 39, 10]
5759 [2, 0, 7, 5, 3, 13, 18, 24, 16, 12]
5867 [1, 14, 9, 6, 15, 6, 10, 1, 6, 32]
5895 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]
5922 [5, 4, 4, 10, 6, 4, 13, 18, 35, 1]
6013 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
6091 [5, 14, 2, 22, 6, 1, 10, 24, 1, 15]
6113 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]
6133 [12, 2, 12, 6, 12, 7, 12, 10, 14, 13]
6183 [1, 14, 9, 6, 15, 6, 10, 1, 6, 32]
6190 [12, 2, 12, 6, 12, 7, 12, 10, 14, 13]
6248 [8, 0, 1, 8, 20, 14, 26, 11, 7, 5]
6295 [9, 7, 0, 6, 17, 10, 1, 1, 39, 10]
6347 [0, 1, 0, 11, 0, 24, 2, 26, 3, 33]
6362 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
6434 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
6460 [2, 2, 0, 3, 20, 13, 30, 22, 4, 4]
6564 [8, 5, 12, 7, 4, 31, 3, 0, 26, 4]
6606 [0, 1, 0, 11, 19, 23, 21, 19, 5, 1]
6620 [5, 6, 2, 13, 9, 17, 7, 16, 24, 1]
11170 [3, 3, 13, 2, 4, 1, 27, 30, 10, 7]
11171 [2, 0, 2, 4, 3, 16, 27, 36, 4, 6]
11172 [8, 5, 12, 7, 4, 31, 3, 0, 26, 4]
11173 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
11174 [8, 7, 4, 6, 11, 0, 0, 24, 38, 2]
11175 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]
11176 [8, 0, 1, 8, 20, 14, 26, 11, 7, 5]
11177 [2, 0, 7, 5, 3, 13, 18, 24, 16, 12]
11178 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
11179 [2, 0, 7, 5, 3, 13, 18, 24, 16, 12]
11180 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]
11181 [2, 0, 7, 5, 3, 13, 18, 24, 16, 12]
11182 [12, 2, 12, 6, 12, 7, 12, 10, 14, 13]
11183 [3, 3, 13, 2, 4, 1, 27, 30, 10, 7]
11184 [12, 2, 12, 6, 12, 7, 12, 10, 14, 13]
11185 [2, 0, 2, 4, 3, 16, 27, 36, 4, 6]
11186 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]
11187 [0, 1, 0, 1, 11, 15, 13, 20, 36, 3]
11188 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
11189 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]
11190 [3, 7, 6, 2, 24, 11, 9, 34, 2, 2]
11191 [3, 1, 0, 0, 2, 1, 1, 28, 29, 35]
11192 [3, 3, 13, 2, 4, 1, 27, 30, 10, 7]
11193 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
11194 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
11195 [3, 3, 13, 2, 4, 1, 27, 30, 10, 7]
11196 [12, 2, 12, 6, 12, 7, 12, 10, 14, 13]
11197 [4, 0, 2, 2, 20, 23, 19, 24, 0, 6]
11198 [0, 1, 0, 11, 0, 24, 2, 26, 3, 33]
11199 [0, 0, 3, 7, 11, 20, 24, 29, 2, 4]

1659 rows × 1 columns

warlords.loc[(99), :].shape
(1659, 1)
battle_winners = DataFrame([w[1]["Doctrine"] for w in warlords.loc[(99), :].iterrows()], columns=np.arange(10) + 1,
                           index=warlords.loc[(99), :].index)
battle_winners.head()
1 2 3 4 5 6 7 8 9 10
AgentID
1517 4 0 2 2 20 23 19 24 0 6
2399 4 0 2 2 20 23 19 24 0 6
2680 4 0 2 2 20 23 19 24 0 6
3727 2 11 8 1 2 22 0 13 39 2
3765 0 1 0 1 11 15 13 20 36 3
def battle_strat_df(df1, df2):
    """Adjudicate all battles to see who wins

    args:
        df1: DataFrame; A data frame of warlords' strategies
        df2: DataFrame; A data frame of warlords' strategies

    return:
        DataFrame; Contains columns for Win, Tie, and Lose, the performance of the warlords of df1 against those
                   of df2
    """
    df1_mat = df1.values
    df2_mat = df2.values
    score_mat = np.ones(df2_mat.shape, dtype=np.int8)

    res = list()
    for i in range(df1_mat.shape[0]):
        vec = df1_mat[i, np.newaxis, :]
        margin = vec - df2_mat
        df1_castles = (margin > 0) * (np.arange(10) + 1)
        df2_castles = (margin < 0) * (np.arange(10) + 1)
        tie_castles = (margin == 0) * (np.arange(10) + 1)

        tie_scores = tie_castles.sum(axis=1) / 2
        df1_scores = df1_castles.sum(axis=1) + tie_scores
        df2_scores = df2_castles.sum(axis=1) + tie_scores
        res.append({"Win": (df1_scores > df2_scores).sum(),
                    "Tie": (df1_scores == df2_scores).sum(),
                    "Losses": (df1_scores < df2_scores).sum()})

    return DataFrame(res, index=df1.index)
results = battle_strat_df(battle_winners, riddler_strats)
results.sort_values("Win", ascending=False).head().join(battle_winners)
Losses Tie Win 1 2 3 4 5 6 7 8 9 10
AgentID
10260 205 20 1124 0 0 3 7 11 20 24 29 2 4
10139 205 20 1124 0 0 3 7 11 20 24 29 2 4
10402 205 20 1124 0 0 3 7 11 20 24 29 2 4
10400 205 20 1124 0 0 3 7 11 20 24 29 2 4
10390 205 20 1124 0 0 3 7 11 20 24 29 2 4
results.describe()
Losses Tie Win
count 1659.000000 1659.000000 1659.000000
mean 507.974684 23.877034 817.148282
std 179.140316 14.029499 182.099181
min 205.000000 5.000000 433.000000
25% 396.000000 16.000000 702.000000
50% 498.000000 20.000000 813.000000
75% 621.000000 28.000000 911.000000
max 840.000000 95.000000 1124.000000

As a frame of reference, let’s see what the results for the original game were.

riddler_strat_results = battle_strat_df(riddler_strats, riddler_strats)
riddler_strat_results.sort_values("Win", ascending=False).head()
Losses Tie Win
1109 208 10 1131
615 205 20 1124
1313 216 18 1115
1046 234 10 1105
630 225 22 1102
riddler_strat_results.describe()
Losses Tie Win
count 1349.000000 1349.000000 1349.000000
mean 660.661231 27.677539 660.661231
std 196.169993 20.333728 196.490469
min 205.000000 1.000000 0.000000
25% 513.000000 16.000000 508.000000
50% 651.000000 22.000000 665.000000
75% 814.000000 33.000000 808.000000
max 1348.000000 195.000000 1131.000000
riddler_strats.loc[1109, :]    # This is not the same as the winner according to official results
1      0
2      3
3      9
4      9
5      2
6     13
7     29
8     28
9      5
10     2
Name: 1109, dtype: int64

How do the warlords do against the winners of the simulated tournament?

warlord_results = battle_strat_df(battle_winners, battle_winners)
warlord_results.sort_values("Win", ascending=False).head().join(battle_winners)
Losses Tie Win 1 2 3 4 5 6 7 8 9 10
AgentID
10756 283 67 1309 3 3 13 2 4 1 27 30 10 7
10623 283 67 1309 3 3 13 2 4 1 27 30 10 7
10952 283 67 1309 3 3 13 2 4 1 27 30 10 7
10951 283 67 1309 3 3 13 2 4 1 27 30 10 7
10192 283 67 1309 3 3 13 2 4 1 27 30 10 7
warlord_results.describe()
Losses Tie Win
count 1659.000000 1659.000000 1659.000000
mean 770.500301 117.999397 770.500301
std 267.187321 76.097015 244.110661
min 283.000000 1.000000 339.000000
25% 516.000000 54.000000 653.000000
50% 698.000000 118.000000 742.000000
75% 956.000000 224.000000 911.000000
max 1273.000000 232.000000 1309.000000

Strangely, I got different results than Oliver Roeder, and I found a different winner, from row 1110 (or 1109 if you count from zero). I’m curious why that is.

That said, the best performer from the simulated warlords did not do as well as a human player, and uses the strategy (0, 0, 3, 7, 11, 20, 24, 29, 2, 4).

That said, among “optimally” playing warlords, the best had 1,309 victories, and used the strategy (3, 3, 13, 2, 4, 1, 27, 30, 10, 7). Now, if I were actually playing optimally, I would be playing a random strategy, so I would randomly pick one of the remaining warlords and use his strategy. But… I don’t want to do that. It’s a one-off game, so I’ll play the strategy that did best.

Conclusion

Do I expect to win the Battle for Riddler Nation? Am I using the best technique possible? Am I playing “optimally?”

By now, some readers may be screaming at me for begging the question: “What does it mean to play optimally?” Throughout this post, I’ve been referring to the game-theoretic notion of optimality. Here, this game is zero-sum, and it is fair; both players have access to the same set of strategies and determine their moves simultaneously, so neither player has an advantage. In this situation, optimal play means playing a strategy that guarantees a player, in the long run, will win at least as many games as her opponent no matter what strategy her opponent uses. In fact, this strategy does its job so well, she could announce her strategy to her opponent, and this guarantee will not be broken; she will still win at least as many games as she loses in the long run. (Not counting draws.) No strategy exists to break this guarantee. So game-theoretic optimal play is what’s known as minimax: it minimizes the maximum loss a player can suffer (in this case, all the way to zero).

How can this be the case? Often such strategies involve a player randomly picking her move (a mixed strategy), as opposed to always playing a certain move. Rock-paper-scissors is a perfect toy example. A pure strategy in this game would be “always play rock”, whereas the optimal mixed strategy is “play rock, paper, or scissors, picking one randomly with equal chance.” You can easily imagine that the optimal strategy can guarantee the player will win at least as many games as she loses, in this case. Who wins ultimately amounts to a coin flip.

But there’s a catch. True, rock-paper-scissors has an optimal strategy and if everyone played optimally then there would be no need for organizations such as the World RPS Society. Yet this society exists, and that’s because people rarely play optimally.

The problem is randomization. Even if people could find the optimal solution (and in the General Blotto game that’s quite unlikely), they can’t randomize well. There’s patterns they develop, or tips to their moves, or biases. In rock-paper-scissors, people rarely play the same strategy twice, let alone three times, and forget about four times (random strategies produce more long strings of moves than humans). They’re also prone to play whatever would have won the last round. The World RPS Society has a list of tricks to gain an edge on a human player.

People may also not fully understand what the optimal solution is. The classic example is a guessing game, where people pick numbers between 1 and 100, and the person who picks the answer that’s two-thirds of the average answer wins the game. According to game theory, the optimal answer is 0, found by multiplying two-thirds an infinite number of times. But in practice the winning number is never zero, nor even one. It’s much larger, since while most people appreciate they should not choose 66, nor choose 44, they rarely go beyond a few steps of the iteration, and may end up picking 30 on average, so the winner is the one who picked 20. (Mr. Roeder, in the same Riddler post as the one of the first Battle for Riddler Nation, actually proposed this game, but with the starting number being 1,000,000,000; the winning number, by Jeffery, was 196,352,731.)

Someone could win the Battle of Riddler Nation by better guessing what other players will do. Maybe 20% of players will do nothing different, 20% will play what won the last iteration, 20% will play what just barely beats the last iteration, 20% will play what beats the strategy that beats the strategy that barely won, and 20% will try to play in a style that tries to be game-theoretically optimal, and someone is able to guess this distribution, calculates what he should do, plays a strategy that’s good against humans, and beats 80% of participants.

Even if the odds of my winning the game are low, this was a fun project. Winning the game would be icing on the cake.

Advertisements

3 thoughts on “Winning the Battle for Riddler Nation; An Agent-Based Modelling Approach to the Solution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s