Introduction to Quantitative Equity Strategies¶

MGMT 767: Data-Driven Investments Lab¶

Kerry Back and Kevin Crotty, Rice University¶

Typess of characteristics¶

Company financials
- Financial ratios
- Growth rates
Past returns
- Momentum and reversal
- Moving averages
- Pairs trading

Trade data
- Corporate insiders
- Short sellers
- Retail orders
Corporate events
- Dividends, earnings, other
Nontraditional data
- Social media
- NLP of corporate announcements
- Image data
- Cellphone location data

Basic strategy¶

Combine characteristics to form a return predictor
Buy good stocks and maybe short sell bad stocks

Plan for the course¶

Backtest strategies and choose one
Use daily updated database to update return predictions at least weekly
Implement paper trades at Alpaca brokerage
Reassess and revise strategy regularly

Example for today¶

Momentum and value characteristics of small-cap stocks

Database¶

SQL database on Rice server. Must be on campus or on Rice VPN.
Data is downloaded from Nasdaq Data Link and updated on the server daily.
Access the database
- In python using pyodbc.
  - On a Mac, you will need to install Microsoft's ODBC Server
  - Everyone will need to pip install pyodbc
- Or with Azure Data Studio

In [2]:

from sqlalchemy import create_engine

server = 'fs.rice.edu'
database = 'stocks'
username = 'stocks'
password = '6LAZH1'
driver1 = 'SQL+Server'
driver2 = 'ODBC+Driver+17+for+SQL+Server'
string1 = f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver1}" 
string2 = f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver2}" 
try: 
    conn = create_engine(string1).connect()
except:
    conn = create_engine(string2).connect() 

Database tables¶

tickers has one row for each ticker, with general company information
indicators has one row for each variable in the other tables with definitions
sf1 has annual and quarterly reports for all NYSE/Nasdaq stocks back to 2000
sep has daily open, high, low, close and adjusted close for same stocks
daily has marketcap, pb, pe, ps, ev, evebit, evebitda for same stocks
sep_weekly is a weekly version of sep
weekly is a weekly version of daily

Basic SQL¶

select [] from [] join [] on [] where [] order by []
select * means select all columns
select top 3 * means select all columns for top 3 rows
join [] on [] where [] order by [] are all optional
information_schema.tables lists the other tables.

In [3]:

import pandas as pd 
pd.read_sql("select * from information_schema.tables", conn)

Out[3]:

	TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	TABLE_TYPE
0	stocks	dbo	today	BASE TABLE
1	stocks	dbo	ghz	BASE TABLE
2	stocks	dbo	indicators	BASE TABLE
3	stocks	dbo	tickers	BASE TABLE
4	stocks	dbo	prices_weekly	BASE TABLE
5	stocks	dbo	sep2	BASE TABLE
6	stocks	dbo	weekly	BASE TABLE
7	stocks	dbo	sep_weekly	BASE TABLE
8	stocks	dbo	sf1	BASE TABLE
9	stocks	dbo	daily	BASE TABLE
10	stocks	dbo	sep	BASE TABLE

In [4]:

pd.read_sql("select top 3 * from tickers", conn)

Out[4]:

	permaticker	siccode	lastupdated	firstadded	firstpricedate	lastpricedate	firstquarter	lastquarter	isdelisted	ticker	...	famaindustry	sector	industry	scalemarketcap	scalerevenue	relatedtickers	currency	location	secfilings	companysite
0	196290	3826	2023-12-20	2014-09-26	1999-11-18	2024-01-05	1997-06-30	2023-09-30	N	A	...	Measuring and Control Equipment	Healthcare	Diagnostics & Research	5 - Large	5 - Large	None	USD	California; U.S.A	https://www.sec.gov/cgi-bin/browse-edgar?actio...	https://www.agilent.com
1	124392	3334	2023-10-26	2016-11-01	2016-11-01	2024-01-05	2014-12-31	2023-09-30	N	AA	...	Steel Works Etc	Basic Materials	Aluminum	5 - Large	5 - Large	None	USD	Pennsylvania; U.S.A	https://www.sec.gov/cgi-bin/browse-edgar?actio...	http://www.alcoa.com
2	122827	6022	2019-07-29	2017-09-09	1998-09-25	2003-01-28	1997-09-30	2002-09-30	Y	AAAB	...	Banking	Financial Services	Banks - Regional	2 - Micro	1 - Nano	None	USD	Florida; U.S.A	https://www.sec.gov/cgi-bin/browse-edgar?actio...	None

3 rows × 26 columns

In [5]:

pd.read_sql("select top 3 * from indicators", conn)

Out[5]:

	tbl	indicator	isfilter	isprimarykey	title	description	unittype
0	SF1	revenue	N	N	Revenues	[Income Statement] The amount of Revenue recog...	currency
1	SF1	cor	N	N	Cost of Revenue	[Income Statement] The aggregate cost of goods...	currency
2	SF1	sgna	N	N	Selling General and Administrative Expense	[Income Statement] A component of [OpEx] repre...	currency

In [6]:

pd.read_sql("select top 3 * from sep", conn)

Out[6]:

	ticker	date	lastupdated	opn	high	low	cls	volume	closeadj	closeunadj
0	PTMN	2018-10-17	2023-11-17	31.900	32.200	31.800	31.900	4428.200	13.933	3.190
1	RELI	2019-06-14	2023-02-23	192.855	192.855	177.105	177.105	91.267	177.105	0.139
2	REX	2019-06-14	2022-08-08	23.607	23.667	22.630	22.663	142662.000	22.663	67.990

In [7]:

pd.read_sql("select top 3 * from daily", conn)

Out[7]:

	ticker	date	lastupdated	ev	evebit	evebitda	marketcap	pb	pe	ps
0	APCC	2000-05-19	2019-03-28	6145.0	21.0	19.0	6601.2	7.3	32.0	4.9
1	MYGN	2006-12-29	2018-10-18	1158.0	-28.0	-33.7	1246.6	5.2	-30.1	10.2
2	GNI	2006-07-21	2018-10-18	177.3	11.9	11.7	178.1	12.1	11.9	10.1

In [8]:

pd.read_sql("select top 3 * from sf1", conn)

Out[8]:

	ticker	dimension	calendardate	datekey	reportperiod	lastupdated	accoci	assets	assetsavg	assetsc	...	sharesbas	shareswa	shareswadil	sps	tangibles	taxassets	taxexp	taxliabilities	tbvps	workingcapital
0	DWAC	ARQ	2021-06-30	2021-09-01	2021-06-30	2023-11-13	0.0	1.262350e+05	None	None	...	38824109.0	7500000.0	7500000.0	0.000	1.262350e+05	0.0	0.0	0.0	0.017	None
1	WEX	ARQ	2009-12-31	2010-02-26	2009-12-31	2023-10-28	-287000.0	1.499662e+09	None	None	...	38213657.0	38240000.0	39379000.0	2.191	1.149620e+09	183602000.0	8379000.0	109511000.0	30.063	None
2	WEX	ARQ	2010-03-31	2010-04-30	2010-03-31	2023-10-28	-568000.0	1.564785e+09	None	None	...	38790106.0	38334000.0	39122000.0	2.187	1.216255e+09	176579000.0	11133000.0	109084000.0	31.728	None

3 rows × 111 columns

Weekly tables¶

Weekly versions of sep and daily (sep_weekly $\sim$ sep, weekly $\sim$ daily)
Convenient for looking at strategies that trade weekly
Calculate weekly returns from weekly adjusted closing prices
- Prices are dividend and split adjusted
- So % changes are total returns including dividends
Can use end-of-prior-week pb (price-to-book) from weekly table to pick stocks (for example).

Momentum¶

What people have found in equities and other markets (see "Value and Momentum Everywhere" by Asness and other AQR people) is
- long-term reversals (5 year returns reverse somewhat)
- medium-term momentum (1 year or 6 month returns continue)
- short-term reversals (1 month or 1 week returns reverse)
The conventional definition of momentum in academic work (including the Asness paper) is last year's return excluding the most recent month
- In other words, the return over the first 11 of the previous 12 months.

Calculating momentum¶

Each week, we want to look back one year and compound the returns, excluding the most recent month.
Count the weeks in the prior year as 1, 2, ..., 52.
We want to calculate $(1+r_1)\cdots (1+r_{48})$.
We can do this as

$$\frac{(1+r_1)\cdots (1+r_{52})}{(1+r_{49})\cdots (1+r_{52})}$$

In other words,

$$\frac{1 + \text{last year's return}}{1+\text{last month's return}}$$

Get data¶

Start in 2010 just to make the example quicker to run (data starts in 2000).
Get closeadj to compute returns.
Get closeunadj to filter out penny stocks (will impose price $\ge$ 5).
In case there are two rows for the same stock/date, keep the one with the latest "lastupdated."

In [9]:

prices = pd.read_sql(
    """ 
    select date, ticker, closeadj, closeunadj, lastupdated from sep_weekly 
    where date>='2010-01-01'
    order by ticker, date, lastupdated    
    """,
    conn,
)
prices = prices.groupby(["ticker", "date", "lastupdated"]).last()
prices = prices.droplevel("lastupdated")

In [10]:

rets = prices.groupby(
    "ticker", 
    group_keys=False
).closeadj.pct_change()

rets_annual = prices.groupby(
    "ticker", 
    group_keys=False
).closeadj.pct_change(52)

rets_monthly = prices.groupby(
    "ticker", 
    group_keys=False
).closeadj.pct_change(4)

mom = (1 + rets_annual) / (1 + rets_monthly) - 1

Value investing¶

Value means cheap relative to quality. Value investing has a very long tradition.
Conventional measures are price-to-earnings (PE) and price-to-book (PB).
Low PE or low PB stocks are value stocks. High PE or PB stocks are "growth stocks" or "glamour stocks."
We'll use PB in this example, but PE is also worth exploring (also price-to-sales, price-to-clicks, ...)

Get data¶

Follow same recipe as when getting prices but use the weekly table.
Get pb and marketcap (so we can filter to small caps)

In [11]:

df = pd.read_sql(
    """ 
    select date, ticker, pb, marketcap, lastupdated from weekly 
    where date>='2010-01-01'
    order by ticker, date, lastupdated    
    """,
    conn,
)
df = df.groupby(["ticker", "date", "lastupdated"]).last()
df = df.droplevel("lastupdated")

Merge and lag¶

The return shown at a given date is the return over the week ending on that date.
To pick stocks, we need to use characteristics known at the beginning of the week.
- This is the same as the end of the prior week.
We will line up marketcap, closeunadj, mom, and pb from the prior week with the return of the current week.

In [12]:

df["close"] = prices.closeunadj
df["ret"] = rets
df["mom"] = mom
for col in ["marketcap", "close", "mom", "pb"]:
    df[col] = df.groupby("ticker", group_keys=False)[col].shift()
df = df.dropna()

Filter to small caps and exclude penny stocks¶

Rank on marketcap each week, with 1=largest, etc. Drop largest 1,000.
Drop all stocks with price < 5.

In [13]:

size_rank = df.groupby(
    "date", 
    group_keys=False
).marketcap.rank(ascending=False)

df = df[size_rank>1000]
df = df[df.close > 5]

Number of stocks by week¶

In [14]:

num_stocks = df.groupby("date", group_keys=True).ret.count()
num_stocks.index = pd.to_datetime(num_stocks.index)
num_stocks.plot()

Out[14]:

<AxesSubplot: xlabel='date'>

Preliminary analysis¶

To understand how returns depend on momentum and value, we start by sorting into quintiles each week on each characteristic.
We intersect the two sorts, forming 25 groups each week.
We calculate average returns within each group. These are equally weighted portfolio returns.

In [15]:

df["pb_quintile"] = df.groupby("date", group_keys=False).pb.apply(
    lambda x: pd.qcut(x, 5, labels=range(1, 6))
)
df["mom_quintile"] = df.groupby("date", group_keys=False).mom.apply(
    lambda x: pd.qcut(x, 5, labels=range(1, 6))
)
sorted_rets = df.groupby(
    ["date", "pb_quintile", "mom_quintile"], 
    observed=True,
    group_keys=True
).ret.mean()
sorted_rets = sorted_rets.unstack(["pb_quintile", "mom_quintile"])

In [16]:

sorted_rets.head()

Out[16]:

pb_quintile	1					2					...	4					5
mom_quintile	1	2	3	4	5	1	2	3	4	5	...	1	2	3	4	5	1	2	3	4	5
date
2011-01-14	0.003848	-0.002287	0.006847	-0.000550	0.000852	-0.014294	-0.007331	-0.017853	-0.005665	-0.003503	...	-0.002017	-0.008608	-0.011751	-0.000972	0.004042	-0.010017	-0.016354	-0.011555	0.000095	0.011148
2011-01-21	0.012342	0.008409	0.009834	0.015451	0.002926	0.016729	0.017019	0.005577	0.011915	0.008766	...	0.019471	0.011235	0.016113	0.012968	0.028114	0.003879	0.009990	-0.000224	0.019024	0.008260
2011-01-28	-0.019970	-0.003331	-0.005422	-0.011979	-0.021571	-0.018761	-0.013485	-0.011322	-0.022904	-0.024932	...	-0.028604	-0.025485	-0.029492	-0.015414	-0.036851	-0.029170	-0.041022	-0.028589	-0.024798	-0.041725
2011-02-04	-0.004283	0.006279	-0.005805	0.010473	0.014758	-0.005800	0.004836	-0.001052	0.006198	0.019444	...	-0.011771	-0.000415	0.003025	0.005043	0.010816	-0.001913	-0.014447	-0.007645	0.004717	0.006695
2011-02-11	0.015693	0.022936	0.020645	0.020812	0.025529	0.016449	0.015239	0.019683	0.026489	0.030514	...	0.017176	0.031344	0.018916	0.036835	0.031097	0.025140	0.017540	0.026404	0.023187	0.041926

5 rows × 25 columns

In [17]:

mean_sorted_rets = sorted_rets.mean()
mean_sorted_rets = mean_sorted_rets.unstack()
(52*mean_sorted_rets).round(3)

Out[17]:

mom_quintile	1	2	3	4	5
pb_quintile
1	0.038	0.134	0.159	0.163	0.140
2	0.057	0.105	0.116	0.110	0.128
3	0.068	0.095	0.106	0.103	0.132
4	0.084	0.091	0.109	0.127	0.128
5	0.001	0.077	0.110	0.094	0.152

Example strategy¶

Compute momentum rank each week (1 = highest = best)
Compute value rank each week (1 = lowest = best)
Average ranks
Hold 50 stocks with best (1 = best) ranks each week, equally weighted
Compare to 50 stocks with worst ranks each week, equally weighted and compare to all stocks

Best stocks¶

In [18]:

mom_rank = df.groupby("date", group_keys=False).mom.rank(ascending=False)
pb_rank = df.groupby("date", group_keys=False).pb.rank()
avg_rank = (mom_rank + pb_rank) / 2

starting_from_best = avg_rank.groupby("date", group_keys=False).rank()
best = df[starting_from_best <= 50]
best_rets = best.groupby("date", group_keys=True).ret.mean()
best_rets.index = pd.to_datetime(best_rets.index)

Worst and all stocks¶

In [19]:

starting_from_worst = avg_rank.groupby(
    "date", 
    group_keys=False
).rank(ascending=False)
worst = df[starting_from_worst <= 50]
worst_rets = worst.groupby("date", group_keys=True).ret.mean()
worst_rets.index = pd.to_datetime(worst_rets.index)

all_rets = df.groupby("date", group_keys=True).ret.mean()
all_rets.index = pd.to_datetime(all_rets.index)

Cumulative returns¶

In [20]:

import matplotlib.pyplot as plt
(1+best_rets).cumprod().plot(label="best")
(1+worst_rets).cumprod().plot(label="worst")
(1+all_rets).cumprod().plot(label="all")
plt.legend()
plt.show()