# An Introduction to Stock Market Data Analysis with R (Part 2)

Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evaluating trading strategies (see part 1 and part 2). These have been my most popular posts, up until I published my article on learning programming languages (featuring my dad’s story as a programmer), and has been translated into both Russian (which used to be on backtest.ru at a link that now appears to no longer work) and Chinese (here and here). R has excellent packages for analyzing stock data, so I feel there should be a “translation” of the post for using R for stock data analysis.

This post is the second in a two-part series on stock data analysis using R, based on a lecture I gave on the subject for MATH 3900 (Data Science) at the University of Utah. (You can read the first post here.) In these posts, I discuss basics such as obtaining the data from Yahoo! Finance using pandas, visualizing stock data, moving averages, developing a moving-average crossover strategy, backtesting, and benchmarking. The final post will include practice problems. This post discusses moving average crossover strategies,backtesting, and benchmarking.

NOTE: The information in this post is of a general nature containing information and opinions from the author’s perspective. None of the content of this post should be considered financial advice. Furthermore, any code written here is provided without any form of guarantee. Individuals who choose to use it do so at their own risk.

Call an open position a trade that will be terminated in the future when a condition is met. A long position is one in which a profit is made if the financial instrument traded increases in value, and a short position is on in which a profit is made if the financial asset being traded decreases in value. When trading stocks directly, all long positions are bullish and all short position are bearish. That said, a bullish attitude need not be accompanied by a long position, and a bearish attitude need not be accompanied by a short position (this is particularly true when trading stock options).

Here is an example. Let’s say you buy a stock with the expectation that the stock will increase in value, with a plan to sell the stock at a higher price. This is a long position: you are holding a financial asset for which you will profit if the asset increases in value. Your potential profit is unlimited, and your potential losses are limited by the price of the stock since stock prices never go below zero. On the other hand, if you expect a stock to decrease in value, you may borrow the stock from a brokerage firm and sell it, with the expectation of buying the stock back later at a lower price, thus earning you a profit. This is called shorting a stock, and is a short position, since you will earn a profit if the stock drops in value. The potential profit from shorting a stock is limited by the price of the stock (the best you can do is have the stock become worth nothing; you buy it back for free), while the losses are unlimited, since you could potentially spend an arbitrarily large amount of money to buy the stock back. Thus, a broker will expect an investor to be in a very good financial position before allowing the investor to short a stock.

We will call a plan that includes trading signals for prompting trades, a rule for deciding how much of the portfolio to risk on any particular strategy, and a complete exit strategy for any trade an overall trading strategy. Our concern now is to design and evaluate trading strategies.

We will suppose that the amount of money in the portfolio involved in any particular trade is a fixed proportion; 10% seems like a good number. We will also say that for any trade, if losses exceed 20% of the value of the trade, we will exit the position. Now we need a means for deciding when to enter position and when to exit for a profit.

Here, I will be demonstrating a moving average crossover strategy. We will use two moving averages, one we consider “fast”, and the other “slow”. The strategy is:

• Trade the asset when the fast moving average crosses over the slow moving average.
• Exit the trade when the fast moving average crosses over the slow moving average again.

A long trade will be prompted when the fast moving average crosses from below to above the slow moving average, and the trade will be exited when the fast moving average crosses below the slow moving average later. A short trade will be prompted when the fast moving average crosses below the slow moving average, and the trade will be exited when the fast moving average later crosses above the slow moving average.

We now have a complete strategy. But before we decide we want to use it, we should try to evaluate the quality of the strategy first. The usual means for doing so is backtesting, which is looking at how profitable the strategy is on historical data. For example, looking at the above chart’s performance on Apple stock, if the 20-day moving average is the fast moving average and the 50-day moving average the slow, this strategy does not appear to be very profitable, at least not if you are always taking long positions.

(You may not see all of these packages being loaded in the session; they may get loaded when other packages are loaded.)

First, let’s recreate some of our earlier charts. (I’m going to use different methods for recreating these charts, both for diversity and because some functions we’ll be using later will be introduced this way. The method used is actually more general than what was used in part 1.)

if (!require("TTR")) {
install.packages("TTR")
library(TTR)
}
if (!require("quantstrat")) {
install.packages("quantstrat", repos="http://R-Forge.R-project.org")
library(quantstrat)
}
if (!require("devtools")) {
install.packages("devtools")
}
library(devtools)
}
library(quantmod)

start <- as.Date("2010-01-01")
end <- as.Date("2016-10-01")

# Let's get Apple stock data; Apple's ticker symbol is AAPL. We use the quantmod function getSymbols, and pass a string as a first argument to identify the desired ticker symbol, pass "yahoo" to src for Yahoo! Finance, and from and to specify date ranges

# The default behavior for getSymbols is to load data directly into the global environment, with the object being named after the loaded ticker symbol. This feature may become deprecated in the future, but we exploit it now.

getSymbols("AAPL", src="yahoo", from = start, to = end)

## [1] "AAPL"

# What is AAPL?
class(AAPL)

## [1] "xts" "zoo"

# Let's see the first few rows

##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume
## 2010-01-04    213.43    214.50   212.38     214.01   123432400
## 2010-01-05    214.60    215.59   213.25     214.38   150476200
## 2010-01-06    214.38    215.23   210.75     210.97   138040000
## 2010-01-07    211.75    212.00   209.05     210.58   119282800
## 2010-01-08    210.30    212.00   209.06     211.98   111902700
## 2010-01-11    212.80    213.00   208.45     210.11   115557400
## 2010-01-04      27.72704
## 2010-01-05      27.77498
## 2010-01-06      27.33318
## 2010-01-07      27.28265
## 2010-01-08      27.46403
## 2010-01-11      27.22176

candleChart(AAPL, up.col = "black", dn.col = "red", theme = "white", subset = "2016-01-04/")

AAPL_sma_20 <- SMA(
Cl(AAPL),  # The closing price of AAPL, obtained by quantmod's Cl() function
n = 20     # The number of days in the moving average window
)

AAPL_sma_50 <- SMA(
Cl(AAPL),
n = 50
)

AAPL_sma_200 <- SMA(
Cl(AAPL),
n = 200
)

zoomChart("2016")  # Zoom into the year 2016 in the chart
addTA(AAPL_sma_20, on = 1, col = "red")  # on = 1 plots the SMA with price
addTA(AAPL_sma_50, on = 1, col = "blue")
addTA(AAPL_sma_200, on = 1, col = "green")


We will refer to the sign of this difference as the regime; that is, if the fast moving average is above the slow moving average, this is a bullish regime (the bulls rule), and a bearish regime (the bears rule) holds when the fast moving average is below the slow moving average. I identify regimes with the following code.

AAPL_trade <- AAPL
AAPL_trade$20d <- AAPL_sma_20 AAPL_trade$50d <- AAPL_sma_50

regime_val <- sigComparison("", data = AAPL_trade,
columns = c("20d", "50d"), relationship = "gt") -
sigComparison("", data = AAPL_trade,
columns = c("20d", "50d"), relationship = "lt")

plot(regime_val["2016"], main = "Regime", ylim = c(-2, 2))


plot(regime_val, main = "Regime", ylim = c(-2, 2))


I could visualize the regime along with the main series with the following code:

candleChart(AAPL, up.col = "black", dn.col = "red", theme = "white", subset = "2016-01-04/")
addTA(regime_val, col = "blue", yrange = c(-2, 2))
addLines(h = 0, col = "black", on = 3)
addSMA(n = c(20, 50), on = 1, col = c("red", "blue"))
zoomChart("2016")


candleChart(AAPL, up.col = "black", dn.col = "red", theme = "white", subset = "2016-01-04/")
addTA(regime_val, col = "blue", yrange = c(-2, 2))
addLines(h = 0, col = "black", on = 3)
addSMA(n = c(20, 50), on = 1, col = c("red", "blue"))


table(as.vector(regime_val))

##
##  -1   1
## 663 987


The call above indicates that the market was bullish on Apple stock for 987 days, and bearish for 663 days. (If you were following along on the Python version, you may notice there is no 0. This may be due to different behavior by sigComparison().)

Trading signals appear at regime changes. When a bullish regime begins, a buy signal is triggered, and when it ends, a sell signal is triggered. Likewise, when a bearish regime begins, a sell signal is triggered, and when the regime ends, a buy signal is triggered (this is of interest only if you ever will short the stock, or use some derivative like a stock option to bet against the market).

It’s simple to obtain signals. Let $r_t$ indicate the regime at time $t$, and $s_t$ the signal at time $t$. Then:

$s_t = \text{sign}(r_t - r_{t - 1})$

$s_t \in \{-1, 0, 1\}$, with $-1$ indicating “sell”, $1$ indicating “buy”, and $0$ no action. We can obtain signals like so:

sig <- diff(regime_val) / 2
plot(sig, main = "Signal", ylim = c(-2, 2))


table(sig)

## sig
##   -1    0    1
##   19 1611   19


We would buy Apple stock 19 times and sell Apple stock 19 times. If we only go long on Apple stock, only 19 trades will be engaged in over the 6-year period, while if we pivot from a long to a short position every time a long position is terminated, we would engage in 19 trades total. (Bear in mind that trading more frequently isn’t necessarily good; trades are never free.)

You may notice that the system as it currently stands isn’t very robust, since even a fleeting moment when the fast moving average is above the slow moving average triggers a trade, resulting in trades that end immediately (which is bad if not simply because realistically every trade is accompanied by a fee that can quickly erode earnings). Additionally, every bullish regime immediately transitions into a bearish regime, and if you were constructing trading systems that allow both bullish and bearish bets, this would lead to the end of one trade immediately triggering a new trade that bets on the market in the opposite direction, which again seems finnicky. A better system would require more evidence that the market is moving in some particular direction. But we will not concern ourselves with these details for now.

Let’s now try to identify what the prices of the stock is at every buy and every sell.

# The Cl function from quantmod pulls the closing price from the object
# holding a stock's data
Cl(AAPL)[which(sig == 1)]

##            AAPL.Close
## 2010-06-18     274.07
## 2010-09-20     283.23
## 2011-05-12     346.57
## 2011-07-14     357.77
## 2011-12-28     402.64
## 2012-06-25     570.77
## 2013-05-17     433.26
## 2013-07-31     452.53
## 2013-10-16     501.11
## 2014-03-26     539.78
## 2014-04-25     571.94
## 2014-08-18      99.16
## 2014-10-28     106.74
## 2015-02-05     119.94
## 2015-04-28     130.56
## 2015-10-27     114.55
## 2016-03-11     102.26
## 2016-07-01      95.89
## 2016-07-25      97.34

# Sell prices
Cl(AAPL)[sig == -1]

##            AAPL.Close
## 2010-06-11     253.51
## 2010-07-22     259.02
## 2011-03-31     348.51
## 2011-05-27     337.41
## 2011-11-17     377.41
## 2012-05-09     569.18
## 2012-10-17     644.61
## 2013-06-26     398.07
## 2013-10-03     483.41
## 2014-01-28     506.50
## 2014-04-22     531.70
## 2014-06-11      93.86
## 2014-10-17      97.67
## 2015-01-05     106.25
## 2015-04-16     126.17
## 2015-06-25     127.50
## 2015-12-18     106.03
## 2016-05-05      93.24
## 2016-07-08      96.68

# Since these are of the same dimension, computing profit is easy
as.vector(Cl(AAPL)[sig == 1])[-1] - Cl(AAPL)[sig == -1][-table(sig)[["1"]]]

##             AAPL.Close
## 2010-06-11   29.720012
## 2010-07-22   87.549988
## 2011-03-31    9.259998
## 2011-05-27   65.230011
## 2011-11-17  193.360020
## 2012-05-09 -135.920013
## 2012-10-17 -192.080017
## 2013-06-26  103.040009
## 2013-10-03   56.369995
## 2014-01-28   65.440003
## 2014-04-22 -432.540016
## 2014-06-11   12.879997
## 2014-10-17   22.270004
## 2015-01-05   24.309998
## 2015-04-16  -11.619995
## 2015-06-25  -25.239998
## 2015-12-18  -10.140000
## 2016-05-05    4.099998


Above, we can see that on May 22nd, 2013, there was a massive drop in the price of Apple stock, and it looks like our trading system would do badly. But this price drop is not because of a massive shock to Apple, but simply due to a stock split. And while dividend payments are not as obvious as a stock split, they may be affecting the performance of our system.

candleChart(AAPL, up.col = "black", dn.col = "red", theme = "white")
addTA(regime_val, col = "blue", yrange = c(-2, 2))
addLines(h = 0, col = "black", on = 3)
addSMA(n = c(20, 50), on = 1, col = c("red", "blue"))
zoomChart("2014-05/2014-07")


We don’t want our trading system to be behaving poorly because of stock splits and dividend payments. How should we handle this? One approach would be to obtain historical stock split and dividend payment data and design a trading system for handling these. This would most realistically represent the behavior of the stock and could be considered the best solution, but it is more complicated. Another solution would be to adjust the prices to account for stock splits and dividend payments.

Yahoo! Finance only provides the adjusted closing price of a stock, but this is all we need to get adjusted opening, high, and low prices. The adjusted close is computed like so:

$\text{price}^{\text{adj}}_t = m_t \times \text{price}_t$

where $m_t$ is the multiplier used for the adjustment. Solving for $m_t$ requires only division and thus we can use the closing price and the adjusted closing price to adjust all prices in the series.

Let’s go back, adjust the apple data, and reevaluate our trading system using the adjusted data.

candleChart(adjustOHLC(AAPL), up.col = "black", dn.col = "red", theme = "white")
addLines(h = 0, col = "black", on = 3)
addSMA(n = c(20, 50), on = 1, col = c("red", "blue"))


As you can see, adjusting for dividends and stock splits makes a big difference. We will use this data from now on.

Let’s now create a simulated portfolio of $1,000,000, and see how it would behave, according to the rules we have established. This includes: • Investing only 10% of the portfolio in any trade • Exiting the position if losses exceed 20% of the value of the trade. When simulating, bear in mind that: • Trades are done in batches of 100 stocks (unfortunately, osMaxDollar() will not round to batches of 100, so for the sake of simplicity, this will be ignored). • Our stop-loss rule involves placing an order to sell the stock the moment the price drops below the specified level. Thus we need to check whether the lows during this period ever go low enough to trigger the stop-loss. Realistically, unless we buy a put option, we cannot guarantee that we will sell the stock at the price we set at the stop-loss, but we will use this as the selling price anyway for the sake of simplicity. • Every trade is accompanied by a commission to the broker, which should be accounted for. I do not do so here. quantmod is good for visualizing stock data, but if we want to start developing and testing strategies, we will need to rely more on other packages: • TTR contains functions for computing technical indicators (simple moving averages, or SMAs, are included). • blotter is intended to manage portfolios and positions created while developing trading strategies. It provides environments intended to help simplify portfolio tracking tasks. • FinancialInstruments is a companion package to blotter and stores meta-data about different financial instruments (stocks, options contracts, etc.). • quantstrat may be the work horse package here; it is used for defining and testing trading strategies. • IKTrading contains some additional useful functions for trading (in particular, osMaxDollar()). quantstrat and its companion packages provides a flexible framework for backtesting that, in principle, should allow for any trading strategy to be backtested. The following code must be executed for every testing session. For the sake of simplicity, I’m going to not force orders to be in batches of 100 shares, nor will I enforce the stop-loss rule mentioned above. It is possible to implement these rules, but doing so is not simple. Perhaps in a later article I will revisit these topics. Additionally, I am allowing the strategy to further enter a position in “bullish” markets. I would not recommend this in the real world, since this will drive up transaction costs for little profit. (Rule of thumb; the fewer transactions, the better.) rm(list = ls(.blotter), envir = .blotter) # Clear blotter environment currency("USD") # Currency being used  ## [1] "USD"  Sys.setenv(TZ = "MDT") # Allows quantstrat to use timestamps  Next, we declare the stocks we are going to be working with. AAPL_adj <- adjustOHLC(AAPL) stock("AAPL_adj", currency = "USD", multiplier = 1)  ## [1] "AAPL_adj"  initDate <- "1990-01-01" # A date prior to first close price; needed (why?)  Now we create a portfolio and account object that will be used in the simulation. strategy_st <- portfolio_st <- account_st <- "SMAC-20-50" # Names of objects rm.strat(portfolio_st) # Need to remove portfolio from blotter env rm.strat(strategy_st) # Ensure no strategy by this name exists either initPortf(portfolio_st, symbols = "AAPL_adj", # This is a simple portfolio # trading AAPL only initDate = initDate, currency = "USD")  ## [1] "SMAC-20-50"  initAcct(account_st, portfolios = portfolio_st, # Uses only one portfolio initDate = initDate, currency = "USD", initEq = 1000000) # Start with a million dollars  ## [1] "SMAC-20-50"  initOrders(portfolio_st, store = TRUE) # Initialize the order container; will # contain all orders made by strategy  Now we define the strategy. strategy(strategy_st, store = TRUE) # store = TRUE tells function to store in # the .strategy environment # Now define trading rules # Indicators are used to construct signals add.indicator(strategy = strategy_st, name = "SMA", # SMA is a function arguments = list(x = quote(Cl(mktdata)), # args of SMA n = 20), label = "fastMA")  ## [1] "SMAC-20-50"  add.indicator(strategy = strategy_st, name = "SMA", arguments = list(x = quote(Cl(mktdata)), n = 50), label = "slowMA")  ## [1] "SMAC-20-50"  # Next comes trading signals add.signal(strategy = strategy_st, name = "sigComparison", # Remember me? arguments = list(columns = c("fastMA", "slowMA"), relationship = "gt"), label = "bull")  ## [1] "SMAC-20-50"  add.signal(strategy = strategy_st, name = "sigComparison", arguments = list(columns = c("fastMA", "slowMA"), relationship = "lt"), label = "bear")  ## [1] "SMAC-20-50"  # Finally, rules that generate trades add.rule(strategy = strategy_st, name = "ruleSignal", # Almost always this one arguments = list(sigcol = "bull", # Signal (see above) that triggers sigval = TRUE, ordertype = "market", orderside = "long", replace = FALSE, prefer = "Open", osFUN = osMaxDollar, # The next parameter, which is a parameter passed to # osMaxDollar, will ensure that trades are about 10% # of portfolio equity maxSize = quote(floor(getEndEq(account_st, Date = timestamp) * .1)), tradeSize = quote(floor(getEndEq(account_st, Date = timestamp) * .1))), type = "enter", path.dep = TRUE, label = "buy")  ## [1] "SMAC-20-50"  add.rule(strategy = strategy_st, name = "ruleSignal", arguments = list(sigcol = "bear", sigval = TRUE, orderqty = "all", ordertype = "market", orderside = "long", replace = FALSE, prefer = "Open"), type = "exit", path.dep = TRUE, label = "sell")  ## [1] "SMAC-20-50"  With the strategy set up, we now execute it. (I’m allowing the function’s intermediate output to be included; I believe, in this case, it’s nice to see what is going on.) applyStrategy(strategy_st, portfolios = portfolio_st)  ## [1] "2010-03-17 00:00:00 AAPL_adj 3410 @ 29.4145214710173" ## [1] "2010-03-19 00:00:00 AAPL_adj 1 @ 29.4001360824518" ## [1] "2010-03-23 00:00:00 AAPL_adj 56 @ 29.5113078050153" ## [1] "2010-06-14 00:00:00 AAPL_adj -3467 @ 33.4768405519892" ## [1] "2010-06-21 00:00:00 AAPL_adj 2808 @ 36.3188897412688" ## [1] "2010-06-23 00:00:00 AAPL_adj 1 @ 35.9121377755245" ## [1] "2010-06-25 00:00:00 AAPL_adj 12 @ 35.3209704881171" ## [1] "2010-06-28 00:00:00 AAPL_adj 10 @ 34.9115992042882" ## [1] "2010-06-29 00:00:00 AAPL_adj 33 @ 34.5440814243432" ## [1] "2010-06-30 00:00:00 AAPL_adj 30 @ 33.5749295477998" ## [1] "2010-07-01 00:00:00 AAPL_adj 84 @ 33.2597286804071" ## [1] "2010-07-02 00:00:00 AAPL_adj 28 @ 32.761422084999" ## [1] "2010-07-06 00:00:00 AAPL_adj 46 @ 32.8281245169061" ## [1] "2010-07-15 00:00:00 AAPL_adj 13 @ 32.4658384412953" ## [1] "2010-07-16 00:00:00 AAPL_adj 15 @ 33.1132447519415" ## [1] "2010-07-21 00:00:00 AAPL_adj 67 @ 34.6709448593864" ## [1] "2010-07-23 00:00:00 AAPL_adj -3147 @ 33.6246305427903" ## [1] "2010-09-21 00:00:00 AAPL_adj 2769 @ 37.1258627071335" ## [1] "2011-04-01 00:00:00 AAPL_adj -2769 @ 45.9214462525203" ## [1] "2011-05-13 00:00:00 AAPL_adj 2209 @ 45.2086437030919" ## [1] "2011-05-16 00:00:00 AAPL_adj 2 @ 44.3637427445526" ## [1] "2011-05-17 00:00:00 AAPL_adj 43 @ 43.4220589833276" ## [1] "2011-05-18 00:00:00 AAPL_adj 48 @ 44.006688373276" ## [1] "2011-05-24 00:00:00 AAPL_adj 15 @ 43.8798216684994" ## [1] "2011-05-31 00:00:00 AAPL_adj -2317 @ 44.6122447113503" ## [1] "2011-07-15 00:00:00 AAPL_adj 2117 @ 47.2371856911493" ## [1] "2011-08-24 00:00:00 AAPL_adj 5 @ 48.8458929867096" ## [1] "2011-11-18 00:00:00 AAPL_adj -2122 @ 49.5586954053487" ## [1] "2011-12-29 00:00:00 AAPL_adj 1879 @ 52.7604184147789" ## [1] "2011-12-30 00:00:00 AAPL_adj 16 @ 52.7748073346566" ## [1] "2012-05-10 00:00:00 AAPL_adj -1895 @ 75.1489364843129" ## [1] "2012-06-26 00:00:00 AAPL_adj 1324 @ 74.7238700874815" ## [1] "2012-06-27 00:00:00 AAPL_adj 14 @ 75.2038727149497" ## [1] "2012-10-18 00:00:00 AAPL_adj -1338 @ 84.0107133628424" ## [1] "2013-05-20 00:00:00 AAPL_adj 1704 @ 57.702437434572" ## [1] "2013-05-21 00:00:00 AAPL_adj 29 @ 58.5360921156059" ## [1] "2013-06-18 00:00:00 AAPL_adj 1 @ 57.6556788337951" ## [1] "2013-06-20 00:00:00 AAPL_adj 1 @ 56.0177655048158" ## [1] "2013-06-21 00:00:00 AAPL_adj 50 @ 55.9095500863203" ## [1] "2013-06-24 00:00:00 AAPL_adj 3 @ 54.4279450227602" ## [1] "2013-06-25 00:00:00 AAPL_adj 49 @ 54.2008263223713" ## [1] "2013-06-26 00:00:00 AAPL_adj 7 @ 53.9603509990938" ## [1] "2013-06-27 00:00:00 AAPL_adj -1844 @ 53.3391172023021" ## [1] "2013-08-01 00:00:00 AAPL_adj 1645 @ 60.8874187232282" ## [1] "2013-09-18 00:00:00 AAPL_adj 14 @ 62.288630508577" ## [1] "2013-10-07 00:00:00 AAPL_adj -1659 @ 65.4327788398408" ## [1] "2013-10-17 00:00:00 AAPL_adj 1484 @ 67.2375075223774" ## [1] "2013-10-18 00:00:00 AAPL_adj 3 @ 68.0457369974481" ## [1] "2014-01-29 00:00:00 AAPL_adj -1487 @ 68.1670731190817" ## [1] "2014-03-12 00:00:00 AAPL_adj 1372 @ 72.733561996219" ## [1] "2014-03-13 00:00:00 AAPL_adj 2 @ 73.1322624931313" ## [1] "2014-03-17 00:00:00 AAPL_adj 15 @ 71.8068850638596" ## [1] "2014-03-18 00:00:00 AAPL_adj -1389 @ 71.5619513215039" ## [1] "2014-03-25 00:00:00 AAPL_adj 1364 @ 73.6847222604636" ## [1] "2014-03-31 00:00:00 AAPL_adj 1 @ 73.37583725238" ## [1] "2014-04-02 00:00:00 AAPL_adj 1 @ 73.8044711654273" ## [1] "2014-04-08 00:00:00 AAPL_adj 25 @ 71.4653411892903" ## [1] "2014-04-09 00:00:00 AAPL_adj 8 @ 71.1183468222455" ## [1] "2014-04-10 00:00:00 AAPL_adj 7 @ 72.2123947642041" ## [1] "2014-04-14 00:00:00 AAPL_adj 9 @ 71.0176525287248" ## [1] "2014-04-17 00:00:00 AAPL_adj 3 @ 70.7591073193403" ## [1] "2014-04-23 00:00:00 AAPL_adj -1418 @ 71.9919515657196" ## [1] "2014-04-28 00:00:00 AAPL_adj 1301 @ 77.943882953473" ## [1] "2014-10-20 00:00:00 AAPL_adj -1301 @ 94.6439193845976" ## [1] "2014-10-29 00:00:00 AAPL_adj 985 @ 102.662471436688" ## [1] "2015-01-06 00:00:00 AAPL_adj -985 @ 103.001288430376" ## [1] "2015-02-06 00:00:00 AAPL_adj 858 @ 116.491485509158" ## [1] "2015-02-10 00:00:00 AAPL_adj 11 @ 116.637076575269" ## [1] "2015-04-17 00:00:00 AAPL_adj -869 @ 121.858912853907" ## [1] "2015-04-29 00:00:00 AAPL_adj 766 @ 126.333382759857" ## [1] "2015-04-30 00:00:00 AAPL_adj 25 @ 124.858064939017" ## [1] "2015-05-01 00:00:00 AAPL_adj 9 @ 122.392738351109" ## [1] "2015-05-04 00:00:00 AAPL_adj 17 @ 125.69278245721" ## [1] "2015-05-08 00:00:00 AAPL_adj 5 @ 123.469279769953" ## [1] "2015-06-29 00:00:00 AAPL_adj -822 @ 122.280199845824" ## [1] "2015-10-28 00:00:00 AAPL_adj 885 @ 114.482259314017" ## [1] "2015-11-17 00:00:00 AAPL_adj 28 @ 112.995955565041" ## [1] "2015-12-17 00:00:00 AAPL_adj 2 @ 110.144507689672" ## [1] "2015-12-21 00:00:00 AAPL_adj -915 @ 105.483868873907" ## [1] "2016-03-11 00:00:00 AAPL_adj 997 @ 101.073743853996" ## [1] "2016-04-28 00:00:00 AAPL_adj 56 @ 96.496561342483" ## [1] "2016-05-02 00:00:00 AAPL_adj 23 @ 92.8980829110911" ## [1] "2016-05-06 00:00:00 AAPL_adj -1076 @ 92.8669223571517" ## [1] "2016-06-24 00:00:00 AAPL_adj 1047 @ 92.4094018468721" ## [1] "2016-06-27 00:00:00 AAPL_adj 35 @ 92.4989129454683" ## [1] "2016-06-28 00:00:00 AAPL_adj -1082 @ 92.3994537379766" ## [1] "2016-07-01 00:00:00 AAPL_adj 1064 @ 94.9754947544617" ## [1] "2016-07-12 00:00:00 AAPL_adj -1064 @ 96.6464428592831" ## [1] "2016-07-26 00:00:00 AAPL_adj 1023 @ 96.2983306600025" ## [1] "2016-07-27 00:00:00 AAPL_adj 15 @ 103.708186831476"  Here we effectvely conclude the study so we can perform some analytics. updatePortf(portfolio_st)  ## [1] "SMAC-20-50"  dateRange <- time(getPortfolio(portfolio_st)$summary)[-1]
updateAcct(portfolio_st, dateRange)

## [1] "SMAC-20-50"

updateEndEq(account_st)

## [1] "SMAC-20-50"


Here’s our first set of useful summary statistics.

tStats <- tradeStats(Portfolios = portfolio_st, use="trades",
inclZeroDays = FALSE)
tStats[, 4:ncol(tStats)] <- round(tStats[, 4:ncol(tStats)], 2)
print(data.frame(t(tStats[, -c(1,2)])))

##                     AAPL_adj
## Num.Txns               90.00
## Largest.Winner      42426.01
## Largest.Loser       -8386.18
## Gross.Profits      152876.04
## Gross.Losses       -40295.42
## Percent.Positive       57.14
## Percent.Negative       42.86
## Profit.Factor           3.79
## Avg.Daily.PL         4765.18
## Med.Daily.PL          856.79
## Std.Dev.Daily.PL    12868.07
## Ann.Sharpe              5.88
## Max.Drawdown       -27945.73
## Profit.To.Max.Draw      4.03
## Avg.WinLoss.Ratio       2.85
## Med.WinLoss.Ratio       3.08
## Max.Equity         120545.86
## Min.Equity          -1182.21
## End.Equity         112580.62

final_acct <- getAccount(account_st)
plot(final_acct$summary$End.Eq["2010/2016"], main = "Portfolio Equity")


Our portfolio’s value grew by 10% in about six years. Considering that only 10% of the portfolio was ever involved in any single trade, this is not bad performance.

A more realistic portfolio would not be betting 10% of its value on only one stock. A more realistic one would consider investing in multiple stocks. Multiple trades may be ongoing at any given time involving multiple companies, and most of the portfolio will be in stocks, not cash.

Fortunately, quantmod allows us to do this.

# Get new symbols
symbols = c("AAPL", "MSFT", "GOOG", "FB", "TWTR", "NFLX", "AMZN", "YHOO",
"SNY", "NTDOY", "IBM", "HPQ")
getSymbols(Symbols = symbols, from = start, to = end)
# Quickly define adjusted versions of each of these
%s% <- function(x, y) {paste(x, y)}
%s0% <- function(x, y) {paste0(x, y)}
for (s in symbols) {
eval(parse(text = s %s0% "_adj <- adjustOHLC(" %s0% s %s0% ")"))
}
symbols_adj <- paste(symbols, "adj", sep = "_")

stock(symbols_adj, currency = "USD", multiplier = 1)

strategy_st_2 <- portfolio_st_2 <- account_st_2 <- "SMAC-20-50_v2"
rm.strat(portfolio_st_2)
rm.strat(strategy_st_2)
initPortf(portfolio_st_2, symbols = symbols_adj,
initDate = initDate, currency = "USD")
initAcct(account_st_2, portfolios = portfolio_st_2,
initDate = initDate, currency = "USD",
initEq = 1000000)
initOrders(portfolio_st_2, store = TRUE)

strategy(strategy_st_2, store = TRUE)

add.indicator(strategy = strategy_st_2, name = "SMA",
arguments = list(x = quote(Cl(mktdata)),
n = 20),
label = "fastMA")
add.indicator(strategy = strategy_st_2, name = "SMA",
arguments = list(x = quote(Cl(mktdata)),
n = 50),
label = "slowMA")

# Next comes trading signals
add.signal(strategy = strategy_st_2, name = "sigComparison",  # Remember me?
arguments = list(columns = c("fastMA", "slowMA"),
relationship = "gt"),
label = "bull")
add.signal(strategy = strategy_st_2, name = "sigComparison",
arguments = list(columns = c("fastMA", "slowMA"),
relationship = "lt"),
label = "bear")

# Finally, rules that generate trades
add.rule(strategy = strategy_st_2, name = "ruleSignal",
arguments = list(sigcol = "bull",
sigval = TRUE,
ordertype = "market",
orderside = "long",
replace = FALSE,
prefer = "Open",
osFUN = osMaxDollar,
maxSize = quote(floor(getEndEq(account_st_2,
Date = timestamp) * .1)),
Date = timestamp) * .1))),
type = "enter", path.dep = TRUE, label = "buy")
add.rule(strategy = strategy_st_2, name = "ruleSignal",
arguments = list(sigcol = "bear",
sigval = TRUE,
orderqty = "all",
ordertype = "market",
orderside = "long",
replace = FALSE,
prefer = "Open"),
type = "exit", path.dep = TRUE, label = "sell")

applyStrategy(strategy_st_2, portfolios = portfolio_st_2)

# Now for analytics
updatePortf(portfolio_st_2)

## [1] "SMAC-20-50_v2"

dateRange <- time(getPortfolio(portfolio_st_2)$summary)[-1] updateAcct(account_st_2, dateRange)  ## [1] "SMAC-20-50_v2"  updateEndEq(account_st_2)  ## [1] "SMAC-20-50_v2"  tStats2 <- tradeStats(Portfolios = portfolio_st_2, use="trades", inclZeroDays = FALSE) tStats2[, 4:ncol(tStats2)] <- round(tStats2[, 4:ncol(tStats2)], 2) print(data.frame(t(tStats2[, -c(1,2)])))  ## AAPL_adj AMZN_adj FB_adj GOOG_adj HPQ_adj ## Num.Txns 90.00 100.00 52.00 82.00 101.00 ## Num.Trades 21.00 18.00 12.00 18.00 16.00 ## Net.Trading.PL 112580.62 131136.44 123027.65 34936.13 47817.45 ## Avg.Trade.PL 5360.98 7285.36 10252.30 1940.90 2988.59 ## Med.Trade.PL 1379.84 2734.07 4946.80 -1110.80 -2937.26 ## Largest.Winner 42426.01 39293.00 78430.09 31629.27 55409.83 ## Largest.Loser -8386.18 -17567.20 -18374.18 -10477.41 -16000.11 ## Gross.Profits 152876.04 190370.65 148742.50 78661.57 129111.69 ## Gross.Losses -40295.42 -59234.21 -25714.85 -43725.44 -81294.24 ## Std.Dev.Trade.PL 12835.99 18910.41 23637.58 9807.28 20596.60 ## Percent.Positive 57.14 61.11 75.00 38.89 37.50 ## Percent.Negative 42.86 38.89 25.00 61.11 62.50 ## Profit.Factor 3.79 3.21 5.78 1.80 1.59 ## Avg.Win.Trade 12739.67 17306.42 16526.94 11237.37 21518.62 ## Med.Win.Trade 9970.11 10904.54 9600.62 10363.86 12287.22 ## Avg.Losing.Trade -4477.27 -8462.03 -8571.62 -3975.04 -8129.42 ## Med.Losing.Trade -3234.16 -7586.30 -3885.64 -4277.02 -8975.12 ## Avg.Daily.PL 4765.18 4733.02 10715.22 1745.28 1887.01 ## Med.Daily.PL 856.79 2358.72 4733.39 -1206.41 -4857.62 ## Std.Dev.Daily.PL 12868.07 15980.17 24734.19 10072.85 20825.92 ## Ann.Sharpe 5.88 4.70 6.88 2.75 1.44 ## Max.Drawdown -27945.73 -50062.96 -44728.34 -31255.17 -67890.79 ## Profit.To.Max.Draw 4.03 2.62 2.75 1.12 0.70 ## Avg.WinLoss.Ratio 2.85 2.05 1.93 2.83 2.65 ## Med.WinLoss.Ratio 3.08 1.44 2.47 2.42 1.37 ## Max.Equity 120545.86 131136.44 127540.04 48651.04 47817.45 ## Min.Equity -1182.21 -11314.07 -15750.32 -23388.32 -63845.70 ## End.Equity 112580.62 131136.44 123027.65 34936.13 47817.45 ## IBM_adj MSFT_adj NFLX_adj NTDOY_adj SNY_adj ## Num.Txns 115.00 112.00 96.00 109.00 121.00 ## Num.Trades 21.00 20.00 18.00 20.00 22.00 ## Net.Trading.PL 9126.76 29513.06 306152.14 17217.18 -26278.53 ## Avg.Trade.PL 434.61 1475.65 17008.45 860.86 -1194.48 ## Med.Trade.PL -722.94 -2408.39 3954.98 -1467.95 474.75 ## Largest.Winner 22193.38 18573.05 177643.74 45165.04 15195.77 ## Largest.Loser -8188.45 -11537.97 -30183.02 -9052.12 -17715.75 ## Gross.Profits 50691.47 91471.86 372517.68 85502.47 64843.56 ## Gross.Losses -41564.71 -61958.80 -66365.54 -68285.29 -91122.09 ## Std.Dev.Trade.PL 6571.91 8946.29 44740.88 12590.03 8855.56 ## Percent.Positive 38.10 45.00 66.67 40.00 54.55 ## Percent.Negative 61.90 55.00 33.33 60.00 45.45 ## Profit.Factor 1.22 1.48 5.61 1.25 0.71 ## Avg.Win.Trade 6336.43 10163.54 31043.14 10687.81 5403.63 ## Med.Win.Trade 3593.06 10044.61 14941.58 2718.22 3385.61 ## Avg.Losing.Trade -3197.29 -5632.62 -11060.92 -5690.44 -9112.21 ## Med.Losing.Trade -2368.90 -5111.92 -7694.86 -6391.76 -8601.77 ## Avg.Daily.PL 434.61 1160.50 17854.27 -77.22 -1194.48 ## Med.Daily.PL -722.94 -2870.77 5280.37 -1587.30 474.75 ## Std.Dev.Daily.PL 6571.91 9076.67 45969.27 12195.79 8855.56 ## Ann.Sharpe 1.05 2.03 6.17 -0.10 -2.14 ## Max.Drawdown -42354.92 -32484.71 -57707.47 -55006.73 -38592.68 ## Profit.To.Max.Draw 0.22 0.91 5.31 0.31 -0.68 ## Avg.WinLoss.Ratio 1.98 1.80 2.81 1.88 0.59 ## Med.WinLoss.Ratio 1.52 1.96 1.94 0.43 0.39 ## Max.Equity 31766.86 44009.02 340579.29 35295.68 12237.77 ## Min.Equity -10588.07 -18649.24 -646.98 -39765.29 -34367.59 ## End.Equity 9126.76 29513.06 306152.14 17217.18 -26278.53 ## TWTR_adj YHOO_adj ## Num.Txns 31.00 108.00 ## Num.Trades 6.00 22.00 ## Net.Trading.PL 17321.55 110711.07 ## Avg.Trade.PL 2886.93 5032.32 ## Med.Trade.PL -3051.02 -1332.22 ## Largest.Winner 3058.35 55850.27 ## Largest.Loser -11305.68 -8359.86 ## Gross.Profits 45892.12 161301.36 ## Gross.Losses -28570.57 -50590.28 ## Std.Dev.Trade.PL 20412.79 15701.97 ## Percent.Positive 33.33 31.82 ## Percent.Negative 66.67 68.18 ## Profit.Factor 1.61 3.19 ## Avg.Win.Trade 22946.06 23043.05 ## Med.Win.Trade 22946.06 14477.80 ## Avg.Losing.Trade -7142.64 -3372.69 ## Med.Losing.Trade -8618.81 -3245.27 ## Avg.Daily.PL -5102.44 4582.54 ## Med.Daily.PL -6074.75 -1445.65 ## Std.Dev.Daily.PL 6490.57 15943.85 ## Ann.Sharpe -12.48 4.56 ## Max.Drawdown -61686.41 -34526.52 ## Profit.To.Max.Draw 0.28 3.21 ## Avg.WinLoss.Ratio 3.21 6.83 ## Med.WinLoss.Ratio 2.66 4.46 ## Max.Equity 33279.24 115011.39 ## Min.Equity -28407.16 -11616.95 ## End.Equity 17321.55 110711.07  final_acct2 <- getAccount(account_st_2) plot(final_acct2$summary$End.Eq["2010/2016"], main = "Portfolio Equity")  A more realistic portfolio that can invest in any in a list of twelve (tech) stocks has a final growth of about 100%. How good is this? While on the surface not bad, we will see we could have done better. ## Benchmarking Backtesting is only part of evaluating the efficacy of a trading strategy. We would like to benchmark the strategy, or compare it to other available (usually well-known) strategies in order to determine how well we have done. Whenever you evaluate a trading system, there is one strategy that you should always check, one that beats all but a handful of managed mutual funds and investment managers: buy and hold SPY. The efficient market hypothesis claims that it is all but impossible for anyone to beat the market. Thus, one should always buy an index fund that merely reflects the composition of the market. SPY is an exchange-traded fund (a mutual fund that is traded on the market like a stock) whose value effectively represents the value of the stocks in the S&P 500 stock index. By buying and holding SPY, we are effectively trying to match our returns with the market rather than beat it. I obtain data on SPY below, and look at the profits for simply buying and holding SPY. getSymbols("SPY", from = start, to = end)  ## [1] "SPY"  # A crude estimate of end portfolio value from buying and holding SPY 1000000 * (SPY$SPY.Adjusted["2016-09-30"][[1]] / SPY$SPY.Adjusted[[1]])  ## [1] 2189421  plot(final_acct2$summary$End.Eq["2010/2016"] / 1000000, main = "Portfolio Equity", ylim = c(0.8, 2.5)) lines(SPY$SPY.Adjusted / SPY\$SPY.Adjusted[[1]], col = "blue")


Buying and holding SPY beats our trading system, at least how we currently set it up, and we haven’t even accounted for how expensive our more complex strategy is in terms of fees. Given both the opportunity cost and the expense associated with the active strategy, we should not use it.

What could we do to improve the performance of our system? For starters, we could try diversifying. All the stocks we considered were tech companies, which means that if the tech industry is doing poorly, our portfolio will reflect that. We could try developing a system that can also short stocks or bet bearishly, so we can take advantage of movement in any direction. We could seek means for forecasting how high we expect a stock to move. Whatever we do, though, must beat this benchmark; otherwise there is an opportunity cost associated with our trading system.

Other benchmark strategies exist, and if our trading system beat the “buy and hold SPY” strategy, we may check against them. Some such strategies include:

• Buy SPY when its closing monthly price is aboves its ten-month moving average.
• Buy SPY when its ten-month momentum is positive. (Momentum is the first difference of a moving average process, or $MO^q_t = MA^q_t - MA^q_{t - 1}$.)

(I first read of these strategies here.) The general lesson still holds: don’t use a complex trading system with lots of active trading when a simple strategy involving an index fund without frequent trading beats it. This is actually a very difficult requirement to meet.

As a final note, suppose that your trading system did manage to beat any baseline strategy thrown at it in backtesting. Does backtesting predict future performance? Not at all. Backtesting has a propensity for overfitting, so just because backtesting predicts high growth doesn’t mean that growth will hold in the future.

## Conclusion

While this lecture ends on a depressing note, keep in mind that the efficient market hypothesis has many critics. My own opinion is that as trading becomes more algorithmic, beating the market will become more difficult. That said, it may be possible to beat the market, even though mutual funds seem incapable of doing so (bear in mind, though, that part of the reason mutual funds perform so poorly is because of fees, which is not a concern for index funds).

This lecture is very brief, covering only one type of strategy: strategies based on moving averages. Many other trading signals exist and employed. Additionally, we never discussed in depth shorting stocks, currency trading, or stock options. Stock options, in particular, are a rich subject that offer many different ways to bet on the direction of a stock.

Some resources I referred to when researching this subject include:

A resource I have not thoroughly checked out but looks promising is the ebook Backtesting Strategies with R, by Tim Trice, also discussing quantstrat.

Remember that it is possible (if not common) to lose money in the stock market. It’s also true, though, that it’s difficult to find returns like those found in stocks, and any investment strategy should take investing in it seriously. This lecture is intended to provide a starting point for evaluating stock trading and investments, and, more generally, analyzing temporal data, and I hope you continue to explore these ideas.

## 24 thoughts on “An Introduction to Stock Market Data Analysis with R (Part 2)”

1. Super. I really liked what you shared. I am new to the field but your contribution was really really exciting. Only one question: the entire shift from SMAC-20-50 to its version v2 got lost. It might be intuitive to set other variables and redo the sequence but I am missing the multiple call for different stock in a portfolio…. 😉
Thx !

Like

2. Hi, I am the follwing err when runing that part of the code:

library(devtools)

ERROR: dependency ‘quantstrat’ is not available for package ‘IKTrading’

Do you what is going on please? Rversion: 3.3.3 on Windows 10

Like

3. Great article to learn how to apply the analytics in the stock market. Just one question – once we identify the buy/sell call, how to set the order to the stock exchange or to the broker? How can this integration happen between the R code with the exchange?

Like

• I have no idea. Furthermore, I’m not sure you would want to, either. Others who use R for finance see it as a research tool rather than as a tool for actually doing the trading; that job may be left to other software packages, perhaps C++ (which is much faster).

Like

4. could you please illistrate how you got this, adjustOHLC, in R code? I kind of get lost when this object is created since you didnt list the code for it.

Like

5. Really great tutorial! Thanks for sharing. I started adding more symbols, and expanding the date range, but I’m getting the following error when running the applyStrategy() function now:

Error in addOrder(portfolio = portfolio, symbol = symbol, timestamp = timestamp, :
order at timestamp 2017-07-31 must not have price of NA

Do you have any workarounds for this?

Here’s my symbols:
symbols = c(“AAPL”, “MSFT”, “GOOG”, “FB”, “TWTR”, “AMZN”, “YHOO”,
“SNY”, “NTDOY”, “IBM”, “NVDA”, “AMD”, “INTC”)

And here’s my start and end dates:
start <- as.Date("2016-01-01")
end <- as.Date("2017-10-01")

Thanks again for the awesome post!

Like

• No none come to mind. Something you might want to watch for is where the data is coming from. I think those functions have Yahoo! Finance as the default data source. Yahoo! Finance isn’t what it used to be, so you should look at getting your data from somewhere else, such as Quandl.

Like

6. Great guide! Really well explained. However, I do not quite understand what your profit function is doing ?

as.vector(Cl(AAPL)[sig == 1])[-1] – Cl(AAPL)[sig == -1][-table(sig)[[“1”]]]

For example, for sig +1, in 2010-09-20 you go long at 283.23. Then, sell for 253.51. So, surely your profit is -29.72 and not +29.72? Or perhaps I am misinterpreting what exactly you are doing, but I would really appreciate some clarification on this point, thanks!

Like

7. Hello,
Thank you for these fantastic postings. I received the error below while trying to install the Quantstart package – and similarly with its dependent IKTrading package. Thanks

Warning in install.packages :
package ‘quantstrat’ is not available (for R version 3.3.3)
Error in library(quantstrat) : there is no package called ‘quantstrat’
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘quantstrat’

Like

• …never mind. I was able to install it using the following instructions that I found on the StackOverflow site. Thanks

install.packages(“devtools”)
require(devtools)
install_github(“braverock/blotter”) # dependency
install_github(“braverock/quantstrat”)

Like

8. Great guide, i have successfully ran your codes with AAPL and I’m very thankful for that. Since I’m a noob, I have no clue how to apply na.omit() on the getSymbols when I encountered some series containing NAs. The error:

> applyStrategy(strategy_st_2, portfolios = portfolio_st_2)
Error in runSum(x, n) : Series contains non-leading NAs

Any help is appreciated. Thanks.

Like