Walk-Forward Analysis Demonstration with backtrader

DISCLAIMER: Any losses incurred based on the content of this post are the responsibility of the trader, not me. I, the author, neither take responsibility for the conduct of others nor offer any guarantees. None of this should be considered as financial advice; the content of this article is only for educational/entertainment purposes.

Finally I can apply a walk-forward analysis!

My initial reason for abandoning quantstrat was frustration with the function walk.forward() for performing a walk-forward analysis that eventually led to me concluding that walk-forward analysis was difficult–if not impossible–to perform with quantstrat. (Since then I’ve tried developing a “simple” example, but have not been able to do even that, even when using a for loop to try to manually do a walk-forward analysis. No one has offered to help on the mailing list.) Part of the move to backtrader was influenced by the possibility to easily do walk-forward analysis with it.

After months, I’ve finally been able to do it. I now show how. But first, some background. (The next section was originally published in this post.)


Data scientists want to fit training models to data that will do a good job of predicting future, out-of-sample data points. This is not done by finding the model that performs the best on the training data. This is called overfitting; a model can appear to do well on training data but will not generalize to out-of-sample data. Techniques need to be applied to prevent against it.

A data scientist may first split her data set used for developing a predictive algorithm into a training set and a test set. The data scientist locks away the test set in a separate folder on the hard drive, never looking at it until she’s satisfied with the model she’s developed using the training data. The test set will be used once a final model has been found to determine the final model’s expected performance.

She would like to be able to simulate out-of-sample performance with the training set, though. The usual approach to do this is split the training set into, say, 10 different folds, or smaller data sets, so she can apply cross-validation. With cross-validation, she will choose one of the folds to be held out, and she fits a predictive model on the remaining nine folds. After fitting the model, she sees how the fitted model performs on the held-out fold, which the fitted model has not seen. She then repeats this model for the nine other folds, to get a sense of the distribution of the performance of the model, or average performance, on out-of-sample data. If she needs to specify hyperparameters (which are parameters that describe some higher-order aspect of a model and not learned from the data in the usual way other parameters are; they’re difficult to define rigorously), she may try different combinations of them and see which combination lead to the model with the best predictive ability in cross-validation. After determining which predictive model generally leads to the best results and which hyperparameters lead to optimal results, she trains a model on the training set with those hyperparameters, evaluates its performance on the test set, and reports the results, perhaps deploying the model.

For those developing trading algorithms, the goals are similar, but some key differences exist:

  1. We evaluate a trading method not by predictive accuracy but by some other measure of performance, such as profitability or profitability relative to risk. (Maybe predictive accuracy is profitable, maybe not; if the most profitable trading system always underestimates its profitability, we’re fine with it.)
  2. We are using data where time and the order in which data comes in is thought to matter. We cannot just reshuffle data; important features would be lost. We cannot divide up the data set randomly into different folds; order must be preserved.

I’ll talk more about how we could evaluate a trading system later; for now, let’s focus on how we can apply the idea of cross-validation analysis.

(Illustration by Max Kuhn in his book, The caret Package.)

For time-dependent data, we can employ walk-forward analysis or rolling forecast origin techniques. This comes in various flavors (see the above illustration), but I will focus on one flavor. We first divide up the data set into, say, ten periods. We first fit a trading algorithm on the first period in the data set, then see how it performs on the “out-of-sample” second period. Then we repeat for the second and third periods, third and fourth periods, and so on until we’ve run to the end of the data set. The out-of-sample data sets are then used for evaluating the potential performance of the trading system in question. If we like, we may be keeping a final data set, perhaps the most recent data set, for final evaluation of whatever trading system passes this initial test.

Other variants include overlapping training/testing periods (as described, my walk-forward analysis approach does not overlap) or ones where the initial window grows from beginning to end. Of these, I initially think either the approach I’ve described or the approach with overlapping training/testing periods makes most sense, and fortunately these can be applied using backtrader.

The first step in a walk-forward analysis is splitting the data. I initially hoped that the TimeSeriesSplit class provided by Scikit Learn would do this, but this class provides only one data-splitting scheme (corresponding to the lower-left corner of the above illustration by Max Kuhn), and it was not the scheme I wanted. In response, I rewrote the splitting function in the class to be more flexible, calling the new class TimeSeriesSplitImproved (subclassed from TimeSeriesSplit).

The code for my class is below. I may issue a pull request to Scikit Learn’s Github repository if I ever get the time/courage/know-how.

from sklearn.model_selection import TimeSeriesSplit
from sklearn.utils import indexable
from sklearn.utils.validation import _num_samples
import numpy as np

class TimeSeriesSplitImproved(TimeSeriesSplit):
    """Time Series cross-validator
    Provides train/test indices to split time series data samples
    that are observed at fixed time intervals, in train/test sets.
    In each split, test indices must be higher than before, and thus shuffling
    in cross validator is inappropriate.
    This cross-validation object is a variation of :class:`KFold`.
    In the kth split, it returns first k folds as train set and the
    (k+1)th fold as test set.
    Note that unlike standard cross-validation methods, successive
    training sets are supersets of those that come before them.
    Read more in the :ref:`User Guide `.
    n_splits : int, default=3
        Number of splits. Must be at least 1.
    >>> from sklearn.model_selection import TimeSeriesSplit
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> tscv = TimeSeriesSplit(n_splits=3)
    >>> print(tscv)  # doctest: +NORMALIZE_WHITESPACE
    >>> for train_index, test_index in tscv.split(X):
    ...    print("TRAIN:", train_index, "TEST:", test_index)
    ...    X_train, X_test = X[train_index], X[test_index]
    ...    y_train, y_test = y[train_index], y[test_index]
    TRAIN: [0] TEST: [1]
    TRAIN: [0 1] TEST: [2]
    TRAIN: [0 1 2] TEST: [3]
    >>> for train_index, test_index in tscv.split(X, fixed_length=True):
    ...     print("TRAIN:", train_index, "TEST:", test_index)
    ...     X_train, X_test = X[train_index], X[test_index]
    ...     y_train, y_test = y[train_index], y[test_index]
    TRAIN: [0] TEST: [1]
    TRAIN: [1] TEST: [2]
    TRAIN: [2] TEST: [3]
    >>> for train_index, test_index in tscv.split(X, fixed_length=True,
    ...     train_splits=2):
    ...     print("TRAIN:", train_index, "TEST:", test_index)
    ...     X_train, X_test = X[train_index], X[test_index]
    ...     y_train, y_test = y[train_index], y[test_index]
    TRAIN: [0 1] TEST: [2]
    TRAIN: [1 2] TEST: [3]

    When ``fixed_length`` is ``False``, the training set has size
    ``i * train_splits * n_samples // (n_splits + 1) + n_samples %
    (n_splits + 1)`` in the ``i``th split, with a test set of size
    ``n_samples//(n_splits + 1) * test_splits``, where ``n_samples``
    is the number of samples. If fixed_length is True, replace ``i``
    in the above formulation with 1, and ignore ``n_samples %
    (n_splits + 1)`` except for the first training set. The number
    of test sets is ``n_splits + 2 - train_splits - test_splits``.

    def split(self, X, y=None, groups=None, fixed_length=False,
              train_splits=1, test_splits=1):
        """Generate indices to split data into training and test set.
        X : array-like, shape (n_samples, n_features)
            Training data, where n_samples is the number of samples
            and n_features is the number of features.
        y : array-like, shape (n_samples,)
            Always ignored, exists for compatibility.
        groups : array-like, with shape (n_samples,), optional
            Always ignored, exists for compatibility.
        fixed_length : bool, hether training sets should always have
            common length
        train_splits : positive int, for the minimum number of
            splits to include in training sets
        test_splits : positive int, for the number of splits to
            include in the test set
        train : ndarray
            The training set indices for that split.
        test : ndarray
            The testing set indices for that split.
        X, y, groups = indexable(X, y, groups)
        n_samples = _num_samples(X)
        n_splits = self.n_splits
        n_folds = n_splits + 1
        train_splits, test_splits = int(train_splits), int(test_splits)
        if n_folds > n_samples:
            raise ValueError(
                ("Cannot have number of folds ={0} greater"
                 " than the number of samples: {1}.").format(n_folds,
        if (n_folds - train_splits - test_splits)  0 and test_splits > 0):
            raise ValueError(
                ("Both train_splits and test_splits must be positive"
                 " integers."))
        indices = np.arange(n_samples)
        split_size = (n_samples // n_folds)
        test_size = split_size * test_splits
        train_size = split_size * train_splits
        test_starts = range(train_size + n_samples % n_folds,
                            n_samples - (test_size - split_size),
        if fixed_length:
            for i, test_start in zip(range(len(test_starts)),
                rem = 0
                if i == 0:
                    rem = n_samples % n_folds
                yield (indices[(test_start - train_size - rem):test_start],
                       indices[test_start:test_start + test_size])
            for test_start in test_starts:
                yield (indices[:test_start],
                    indices[test_start:test_start + test_size])

We would use the indices provided by the generator created by the split() method to subset pandas DataFrames that contain stock data and serve as data feeds to a backtrader Cerebro object.

Walking Forward

The first thing I will do is pick up where I left off in my introduction to backtrader. I’m implementing a simple moving average crossover (SMAC) strategy, and I judge a strategy’s performance by how much the final account has grown. The following code is slightly modified from my earlier post, both to reflect growing knowledge on my part and to prepare for what’s ahead.

import backtrader as bt
import backtrader.indicators as btind
import datetime as dt
import pandas as pd
import pandas_datareader as web
from pandas import Series, DataFrame
import random
from copy import deepcopy
class SMAC(bt.Strategy):
    """A simple moving average crossover strategy; crossing of a fast and slow moving average generates buy/sell
    params = {"fast": 20, "slow": 50,                  # The windows for both fast and slow moving averages
              "optim": False, "optim_fs": (20, 50)}    # Used for optimization; equivalent of fast and slow, but a tuple
                                                       # The first number in the tuple is the fast MA's window, the
                                                       # second the slow MA's window

    def __init__(self):
        """Initialize the strategy"""

        self.fastma = dict()
        self.slowma = dict()
        self.regime = dict()

        if self.params.optim:    # Use a tuple during optimization
            self.params.fast, self.params.slow = self.params.optim_fs    # fast and slow replaced by tuple's contents

        if self.params.fast > self.params.slow:
            raise ValueError(
                "A SMAC strategy cannot have the fast moving average's window be " + \
                 "greater than the slow moving average window.")

        for d in self.getdatanames():

            # The moving averages
            self.fastma[d] = btind.SimpleMovingAverage(self.getdatabyname(d),      # The symbol for the moving average
                                                       period=self.params.fast,    # Fast moving average
                                                       plotname="FastMA: " + d)
            self.slowma[d] = btind.SimpleMovingAverage(self.getdatabyname(d),      # The symbol for the moving average
                                                       period=self.params.slow,    # Slow moving average
                                                       plotname="SlowMA: " + d)

            # Get the regime
            self.regime[d] = self.fastma[d] - self.slowma[d]    # Positive when bullish

    def next(self):
        """Define what will be done in a single step, including creating and closing trades"""
        for d in self.getdatanames():    # Looping through all symbols
            pos = self.getpositionbyname(d).size or 0
            if pos == 0:    # Are we out of the market?
                # Consider the possibility of entrance
                # Notice the indexing; [0] always mens the present bar, and [-1] the bar immediately preceding
                # Thus, the condition below translates to: "If today the regime is bullish (greater than
                # 0) and yesterday the regime was not bullish"
                if self.regime[d][0] > 0 and self.regime[d][-1] <= 0:    # A buy signal

            else:    # We have an open position
                if self.regime[d][0] <= 0 and self.regime[d][-1] > 0:    # A sell signal

class PropSizer(bt.Sizer):
    """A position sizer that will buy as many stocks as necessary for a certain proportion of the portfolio
       to be committed to the position, while allowing stocks to be bought in batches (say, 100)"""
    params = {"prop": 0.1, "batch": 100}

    def _getsizing(self, comminfo, cash, data, isbuy):
        """Returns the proper sizing"""

        if isbuy:    # Buying
            target = self.broker.getvalue() * self.params.prop    # Ideal total value of the position
            price = data.close[0]
            shares_ideal = target / price    # How many shares are needed to get target
            batches = int(shares_ideal / self.params.batch)    # How many batches is this trade?
            shares = batches * self.params.batch    # The actual number of shares bought

            if shares * price > cash:
                return 0    # Not enough money for this trade
                return shares

        else:    # Selling
            return self.broker.getposition(data).size    # Clear the position

class AcctValue(bt.Observer):
    alias = ('Value',)
    lines = ('value',)

    plotinfo = {"plot": True, "subplot": True}

    def next(self):
        self.lines.value[0] = self._owner.broker.getvalue()    # Get today's account value (cash + stocks)

class AcctStats(bt.Analyzer):
    """A simple analyzer that gets the gain in the value of the account; should be self-explanatory"""

    def __init__(self):
        self.start_val = self.strategy.broker.get_value()
        self.end_val = None

    def stop(self):
        self.end_val = self.strategy.broker.get_value()

    def get_analysis(self):
        return {"start": self.start_val, "end": self.end_val,
                "growth": self.end_val - self.start_val, "return": self.end_val / self.start_val}

I now re-run the strategy, this time using data from pandas DataFrames instead of managing downloading Yahoo! Finance data with backtrader. Unfortunately, we can no longer download Yahoo! Finance data using the DataReader from pandas-datareader. I will use Google data instead. Google adjusts prices for splits, but not dividends; we will need to make due for now.

start = dt.datetime(2010, 1, 1)
end = dt.datetime(2016, 10, 31)
# Different stocks from past posts because of different data source (no plot for NTDOY)
symbols = ["AAPL", "GOOG", "MSFT", "AMZN", "YHOO", "SNY", "VZ", "IBM", "HPQ", "QCOM", "NVDA"]
datafeeds = {s: web.DataReader(s, "google", start, end) for s in symbols}
for df in datafeeds.values():
    df["OpenInterest"] = 0    # PandasData reader expects an OpenInterest column;
                              # not provided by Google and we don't use it so set to 0

cerebro = bt.Cerebro(stdstats=False)

plot_symbols = ["AAPL", "GOOG", "NVDA"]
is_first = True
#plot_symbols = []
for s, df in datafeeds.items():
    data = bt.feeds.PandasData(dataname=df, name=s)
    if s in plot_symbols:
        if is_first:
            data_main_plot = data
            is_first = False
            data.plotinfo.plotmaster = data_main_plot
        data.plotinfo.plot = False
    cerebro.adddata(data)    # Give the data to cerebro

cerebro.addobservermulti(bt.observers.BuySell)    # Plots up/down arrows

cerebro.plot(iplot=True, volume=False)

Plot 1


Now let’s look at the walk-forward analysis. I first use TimeSeriesSplitImproved to get the splits. These will serve as the indices for training and testing periods. I’m requiring that there be ten splits, and training data will be twice as long as testing data (training data gets two splits, testing data gets one).

tscv = TimeSeriesSplitImproved(10)
split = tscv.split(datafeeds["AAPL"], fixed_length=True, train_splits=2)

These splits are then used as the indices in the walk-forward analysis. I loop through all training/testing combinations, optimizing on the training set then applying the strategy with the best-performing parameters to the test set. The training and testing sets are obtained simply by subsetting the pandas DataFrames with the appropriate indices.

I organize the results of the optimization in a DataFrame for presentation and analysis. This is all done below. (On my computer this loop took hours to complete.)

walk_forward_results = list()
# Be prepared: this will take a while
for train, test in split:

    # Generate random combinations of fast and slow window lengths to test
    windowset = set()    # Use a set to avoid duplicates
    while len(windowset) < 40:
        f = random.randint(1, 10) * 5
        s = random.randint(1, 10) * 10
        if f > s:    # Cannot have the fast moving average have a longer window than the slow, so swap
            f, s = s, f
        elif f == s:    # Cannot be equal, so do nothing, discarding results
        windowset.add((f, s))

    windows = list(windowset)

    trainer = bt.Cerebro(stdstats=False, maxcpus=1)
    tester = deepcopy(trainer)

    trainer.optstrategy(SMAC, optim=True,    # Optimize the strategy (use optim variant of SMAC)...
                          optim_fs=windows)    # ... over all possible combinations of windows
    for s, df in datafeeds.items():
        data = bt.feeds.PandasData(dataname=df.iloc[train], name=s)    # Add a subset of data
                                                                       # to the object that
                                                                       # corresponds to training
    res = trainer.run()
    # Get optimal combination
    opt_res = DataFrame({r[0].params.optim_fs: r[0].analyzers.acctstats.get_analysis() for r in res}
                       ).T.loc[:, "return"].sort_values(ascending=False).index[0]

    tester.addstrategy(SMAC, optim=True, optim_fs=opt_res)    # Test with optimal combination
    for s, df in datafeeds.items():
        data = bt.feeds.PandasData(dataname=df.iloc[test], name=s)    # Add a subset of data
                                                                       # to the object that
                                                                       # corresponds to testing

    res = tester.run()
    res_dict = res[0].analyzers.acctstats.get_analysis()
    res_dict["fast"], res_dict["slow"] = opt_res
    res_dict["start_date"] = datafeeds["AAPL"].iloc[test[0]].name
    res_dict["end_date"] = datafeeds["AAPL"].iloc[test[-1]].name

Notice the results:

wfdf = DataFrame(walk_forward_results)
end end_date fast growth return slow start start_date
0 699858.20 2011-11-14 5 -300141.80 0.699858 10 1000000 2011-04-05
1 1007201.00 2012-06-28 50 7201.00 1.007201 100 1000000 2011-11-15
2 1001420.12 2013-02-13 30 1420.12 1.001420 100 1000000 2012-06-29
3 1008344.50 2013-09-26 15 8344.50 1.008345 70 1000000 2013-02-14
4 990771.78 2014-05-12 50 -9228.22 0.990772 80 1000000 2013-09-27
5 946295.64 2014-12-22 25 -53704.36 0.946296 60 1000000 2014-05-13
6 939632.78 2015-08-06 35 -60367.22 0.939633 60 1000000 2014-12-23
7 1000000.00 2016-03-21 50 0.00 1.000000 100 1000000 2015-08-07
8 1000000.00 2016-10-31 25 0.00 1.000000 100 1000000 2016-03-22

This analysis says that when taken out-of-sample, optimized strategies perform somewhere between barely any gain and catastrophic loss (with the two strategies at the end apparently not producing any trades in the out-of-sample periods). Clearly optimization leads only to overfitting that barely produces a profit at best, and lights a bonfire with your cash at worst.

Consequences of Overfitting

What I believe the walk.forward() function in quantstrat does is simulate a strategy that periodically re-optimizes itself. Having obtained the “optimal” parameters over certain periods above, we can simulate such a strategy as quantstrat does. This is done in the SMACWalkForward strategy below.

The strategy takes in more inputs, including the starting/ending dates for every period, and all the fast/slow parameters for the moving averages, each for its respective period. The strategy will apply different fast/slow moving average windows over the different periods. These parameters are not optional; there is no default here. What we will see in the end is how a strategy that periodically re-optimizes itself performs.

Interestingly, trades happen after August 2015 when the earlier walk-forward analysis indicates no trades during this time, but notice that the trades occur during what would be the 100-day moving average’s “warm-up” period. Since all the moving-average indicators are declared at the initialization of the strategy, each one “warms-up” at the same time. This was not the case before; lots of data was lost to the “warm-up” period for every training/testing set. This highlights a flaw in my earlier analysis; I considered moving average combinations that were large relative to the periods being considered, giving them few opportunities to launch trades. Realistically, since data for these companies extends back for years, we would not have to worry about warm-up periods since we can get enough data to start the moving averages whenever we want. This means that the strategy below is actually more realistic than the walk-forward analysis done above. Correcting this is possible (allow larger slices or extend the test sets so all indicators are warmed up on day one) but I will not concern myself with that issue now. The fact I was able to get this far feels miraculous enough.

Notice that allowing for multiple periods in the strategy is handled similarly to how we handled multiple symbols; we index each combination of strategy and period with dictionaries. Then, when we are performing the backtest, we determine which period we currently are in, and depending on the result we may use different indicators.

class SMACWalkForward(bt.Strategy):
    """The SMAC strategy but in a walk-forward analysis context"""
    params = {"start_dates": None,    # Starting days for trading periods (a list)
              "end_dates": None,      # Ending day for trading periods (a list)
              "fast": None,           # List of fast moving average windows, corresponding to start dates (a list)
              "slow": None}           # Like fast, but for slow moving average window (a list)
    # All the above lists must be of the same length, and they all line up

    def __init__(self):
        """Initialize the strategy"""

        self.fastma = dict()
        self.slowma = dict()
        self.regime = dict()

        self.date_combos = [c for c in zip(self.p.start_dates, self.p.end_dates)]

        # Error checking
        if type(self.p.start_dates) is not list or type(self.p.end_dates) is not list or \
           type(self.p.fast) is not list or type(self.p.slow) is not list:
            raise ValueError("Must past lists filled with numbers to params start_dates, end_dates, fast, slow.")
        elif len(self.p.start_dates) != len(self.p.end_dates) or \
            len(self.p.fast) != len(self.p.start_dates) or len(self.p.slow) != len(self.p.start_dates):
            raise ValueError("All lists passed to params must have same length.")

        for d in self.getdatanames():
            self.fastma[d] = dict()
            self.slowma[d] = dict()
            self.regime[d] = dict()

            # Additional indexing, allowing for differing start/end dates
            for sd, ed, f, s in zip(self.p.start_dates, self.p.end_dates, self.p.fast, self.p.slow):
                # More error checking
                if type(f) is not int or type(s) is not int:
                    raise ValueError("Must include only integers in fast, slow.")
                elif f > s:
                    raise ValueError("Elements in fast cannot exceed elements in slow.")
                elif f <= 0 or s <= 0:
                    raise ValueError("Moving average windows must be positive.")

                if type(sd) is not dt.date or type(ed) is not dt.date:
                    raise ValueError("Only datetime dates allowed in start_dates, end_dates.")
                elif ed - sd < dt.timedelta(0):
                    raise ValueError("Start dates must always be before end dates.")

                # The moving averages
                # Notice that different moving averages are obtained for different combinations of
                # start/end dates
                self.fastma[d][(sd, ed)] = btind.SimpleMovingAverage(self.getdatabyname(d),
                self.slowma[d][(sd, ed)] = btind.SimpleMovingAverage(self.getdatabyname(d),

                # Get the regime
                self.regime[d][(sd, ed)] = self.fastma[d][(sd, ed)] - self.slowma[d][(sd, ed)]
                # In the future, use the backtrader indicator btind.CrossOver()

    def next(self):
        """Define what will be done in a single step, including creating and closing trades"""

        # Determine which set of moving averages to use
        curdate = self.datetime.date(0)
        dtidx = None    # Will be index
        # Determine which period (if any) we are in
        for sd, ed in self.date_combos:
            # Debug output
            #print('{}: {} < {}: {}, {} < {}: {}'.format(
            #    len(self), sd, curdate, (sd <= curdate), curdate, ed, (curdate <= ed)))
            if sd <= curdate and curdate <= ed:
                dtidx = (sd, ed)
        # Debug output
        #print('{}: the dtixdx is {}, and curdate is {};'.format(len(self), dtidx, curdate))
        for d in self.getdatanames():    # Looping through all symbols
            pos = self.getpositionbyname(d).size or 0
            if dtidx is None:    # Not in any window
                break            # Don't engage in trades
            if pos == 0:    # Are we out of the market?
                # Consider the possibility of entrance
                # Notice the indexing; [0] always mens the present bar, and [-1] the bar immediately preceding
                # Thus, the condition below translates to: "If today the regime is bullish (greater than
                # 0) and yesterday the regime was not bullish"
                if self.regime[d][dtidx][0] > 0 and self.regime[d][dtidx][-1] <= 0:    # A buy signal

            else:    # We have an open position
                if self.regime[d][dtidx][0] <= 0 and self.regime[d][dtidx][-1] > 0:    # A sell signal

Now let’s run this strategy and see the plot.

cerebro_wf = bt.Cerebro(stdstats=False)

plot_symbols = ["AAPL", "GOOG", "NVDA"]
is_first = True
#plot_symbols = []
for s, df in datafeeds.items():
    data = bt.feeds.PandasData(dataname=df, name=s)
    if s in plot_symbols:
        if is_first:
            data_main_plot = data
            is_first = False
            data.plotinfo.plotmaster = data_main_plot
        data.plotinfo.plot = False
    cerebro_wf.adddata(data)    # Give the data to cerebro

                       # Give the results of the above optimization to SMACWalkForward (NOT OPTIONAL)
                       fast=[int(f) for f in wfdf.fast],
                       slow=[int(s) for s in wfdf.slow],
                       start_dates=[sd.date() for sd in wfdf.start_date],
                       end_dates=[ed.date() for ed in wfdf.end_date])
cerebro_wf.addobservermulti(bt.observers.BuySell)    # Plots up/down arrows

cerebro_wf.plot(iplot=True, volume=False)

Plot 2


As the earlier walk-forward analysis suggested, our optimization procedure is a great way to lose money fast. Its propensity to overfit doesn’t lead to profits; it leads to losses. Our account is about 68% of its original value. We’re better off abandoning this strategy and looking for something else.


I sometimes listen to Chat with Traders, a trading podcast where the host interviews veteran traders. Some of the traders he invites on his show discuss optimization with an audible smirk, telling a similar story of an unsuspecting novice setting up a SMAC strategy, optimizing the fast and slow moving average windows, see good results in a backtest, apply the optimized strategy to future data, and fail to replicate their earlier stellar results. We have seen that phenomenon here, and I have demonstrated one way to guard against this type of overfitting.

Walk-forward analysis can be used to get a sense for what out-of-sample performance may look like while you are developing a strategy, but alone it is not enough to guard against overfitting. Here are some other things you may do:

  • Separate out a reasonably large sample of data (obviously more recent than that used in strategy development) to be truly out-of-sample. You can perform walk-forward analysis as much as you like when trying to find a profitable strategy and testing out ideas in your training set, but this final test set is seen once, and only once. This test set serves as a final line of defense against overfitting and one final estimate for the profitability of your strategy. If your strategy does not do well on this test set, you cannot go back and fix it, then see how it performs on the same data set. You have to start over, redesign from the ground-up, and wait until you have enough data for a new test set. You cannot test on the test set twice; once you do, the test set is no longer out-of-sample, but in-sample. (If this intimidates you, perhaps create two test sets, one that you are allowed to look at multiple times after finding decent strategies performing well in cross-validation, and one that you can look at only once.

  • Require that strategies in both training and testing perform some minimum number of trades. This is so you can have a large sample size to evaluate results. You may require, for example, that 100 trades take place during each cross-validation, with at least 20 such sections. This would mean that you would need at least 2,000 trades for a strategy to be considered. The probability theorem known as the Law of Large Numbers inspires this approach1: according to the theorem, as sample sizes increase, the mean value of a random variable converges almost surely to the true population mean. Here, this suggests that lots of trades gives you a better idea of a strategy’s expected profitability. Additionally, overfitting a few data points is easy, but overfitting many data points is hard.

  • The above two points together imply a third point: get lots of data. You need to have enough so that there’s plenty of data points in the folds during cross-validation and plenty of data points in your test sets, plus enough data points to have lots of trades. Again, larger data sets are harder to overfit.

  • Try tweaking some parameters your strategy uses. For example, if you fit a strategy for Coca Cola stock (KO), maybe try that same strategy on a similar stock, like Pepsi (PEP). Maybe change a 30-day moving average to a 32-day moving average. If you get radically different results, you may have overfit.

  • Does the strategy make sense? Strange and complicated strategies are more likely to be produced when overfitting. A good guard against overfitting is common sense. This also suggests you should prefer simple strategies and simple numbers to complex or exotic ones.

  • Even after all this, paper-trade a strategy first before committing money to it, thus getting more of a glimpse of out-of-sample results that could signal your strategy’s quality.

Notice that the above recommendations seem to call for discipline on the practitioner’s part. Overfitting is tempting. The Siren’s song of overfit profits may lead you to be dead on the rocks. You should know when you are overfitting, and fight the biases that lead to it.

I have created a video course that Packt Publishing will be publishing later this month, entitled Unpacking NumPy and Pandas, the first volume in a four-volume set of video courses entitled, Taming Data with Python; Excelling as a Data Analyst. This course covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my course or at least spreading the word about it. You can buy the course directly or purchase a subscription to Mapt and watch it there (when it becomes available).

If you like my blog and would like to support it, spread the word (if not get a copy yourself)! Also, stay tuned for future courses I publish with Packt at the Video Courses section of my site.

  1. That said, I doubt the theorem applies directly; I don’t think trades are a stationary process. 

7 thoughts on “Walk-Forward Analysis Demonstration with backtrader

  1. excellent writeup. Never thought of using optstrategy in walk forward analysis. Very creative.

    PS: i wouldn’t worry about not having enough capital.. if ur strategy can outperform any index consistently (i.e. high sharpe), there’s plenty of people willing to fork over the dough no matter where u are on earth

    Liked by 2 people

  2. I’m glad you liked it. Stats-wise I was disappointed that this post did not get a lot of attention/views, even though I think this approach may be essential to combat overfitting.

    Starting out I don’t want to use other people’s money. But I’ll keep that in mind for the future. 🙂

    Liked by 1 person

  3. Using the stock TimeSeriesSplit(), if you use the max_train_size parameter, it will allow you to get splits like the top row of the images. The only thing is there isn’t a min size, so if your max_train_size is greater than 1, you will have to skip the first few ones. For example:

    from sklearn.model_selection import TimeSeriesSplit
    X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    y = np.array([1, 2, 3, 4])
    tscv = TimeSeriesSplit(n_splits=3, max_train_size=1)

    for train_index, test_index in tscv.split(X):
    print(“TRAIN:”, train_index, “TEST:”, test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]


  4. Thank you very much for your article, it was really helpful to setup my WF on backtrader!
    I decided to use a simpler approach, instead of splitting the time series I just run cerebro on the whole time series forcing him to stay flat in USD outside the training window.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s