Habit, Long-Run Risks, Prospect? A Statistical Inquiry

Aldrich, Eric M.; Gallant, A. Ronald

doi:10.1093/jjfinec/nbq034

Abstract

We use recently proposed Bayesian statistical methods to compare the habit persistence asset pricing model of Campbell and Cochrane, the long-run risks model of Bansal and Yaron, and the prospect theory model of Barberis, Huang, and Santos. We improve these Bayesian methods so that they can accommodate highly nonlinear models such as the three aforementioned. Our substantive results can be stated succinctly: If one believes that the extreme consumption fluctuations of 1930–1949 can recur, although they have not in the last sixty years even counting the current recession, then the long-run risks model is preferred. Otherwise, the habit model is preferred.

The goal of this article is to fill a void in the literature. There are, to our knowledge, no head-to-head, statistical (i.e., likelihood based or asymptotically equivalent) comparisons of asset pricing models from macro/finance. This paper fills the void. The asset pricing models considered are the habit persistence model of Campbell and Cochrane (1999), CC hereafter, the long-run risks model of Bansal and Yaron (2004), BY hereafter, and the prospect theory model of Barberis, Huang, and Santos (2001), BHS hereafter. There are two reasons for this choice: These three models are arguably the leading contenders and the authors describe their computational methods precisely enough to permit replication of their results.

The need for a statistical comparison of asset pricing models is underscored by the ongoing debate between advocates of the long-run risks and habit models. Beeler and Campbell (2008) claim that the long-run risks model is rejected by historical data on the basis of the predictability of excess returns, consumption growth, dividend growth, and their respective volatilities by the price to dividend ratio. Bansal, Kiku, and Yaron (2009) argue that the long-run risks model provides adequate predictability results when using a vector autoregression (VAR) based on consumption growth, price to dividend ratio, and the real risk-free rate. Bansal, Kiku and Yaron also argue that the habit model provides counterfactual predictability results for the price to dividend ratio when using lagged consumption growth as a regressor. Because the argument is based on the selective use of statistics by the advocates, it can be continued indefinitely without resolution. In contrast, the likelihood of model output contains all the information in these models. Therefore, likelihood-based inference, in principle, resolves the debate definitively.

We know of only one other study that attempts a head-to-head statistical comparison of asset pricing models: Bansal, Gallant, and Tauchen (2007), BGT hereafter. BGT compared the habit model to the long-run risks model using frequentist methods. Their methods could not distinguish between the two models because frequentist nonnested model comparison methods require abundant data. Abundant data are not available in macro/finance. The typical sampling frequency used to calibrate and assess macro/finance models is annual and there are only about 80 annual observations available on the U.S. economy. The papers cited above use annual data. BHS insist that annual is the only frequency that is appropriate to their model. Using more abundant higher frequency data to compare models is not an option. They were not designed to explain high-frequency data.

Failing to achieve a definitive statistical result, BGT proceeded to compare the models using the more traditional methods of macro/finance, which consist of enumerating some moments, evaluating them both from the data and from a model simulation, and comparing, often without taking sampling variability into account. On the basis of such comparisons, BGT conclude that the long-run risks model is preferred. Of these comparisons, they relied mostly on the fact that the habit model provides counterfactual predictability results for the price to dividend ratio when using lagged consumption growth as a regressor.

In addition to the fact that the BGT comparison, in the end, was not statistical, there are other concerns. BGT did not actually compare the models proposed by CC and BY. They modified them to impose cointegration on macrovariables that ought not diverge. They also used a general purpose method to solve them; specifically, a Bubnov–Galerkin method (Miranda and Fackler, 2002, 152–3). Our view is that fairness dictates that one should use the model that was actually proposed by the originator in comparisons, not a modified model, and that one should use the same solution method that was proposed. To state our view succinctly, the model is the simulation algorithm proposed by the originator; it is not the mathematical equations that suggested the algorithm.

The data we use are annual, per capita, real, U.S. stock returns, consumption growth, and the price to dividend ratio from 1925–2008. The comparisons are for the periods 1930–2008 and 1950–2008. The data from 1925–1929 are only used to prime recursions because they are of lower quality than the data from 1930 onward. The data are plotted in Figure 1. Note in the figure that consumption growth is far more volatile in the 1930–1949 period than in the period 1950–2008. It turns out that the difference in consumption growth volatility dramatically influences results.

Figure 1

Consumption growth, stock returns, and price to dividend ratio, 1925–2008. The left vertical line is at 1930 and the right at 1950. The data collection protocol is as described in Bansal, Gallant, and Tauchen (2007) for the period 1930–2008. The data from 1925–1929, which are only used to prime recursions, are the inflation adjusted Dow–Jones industrial average, a real U.S. consumption growth series kindly supplied by Robert Barro, and from backcasted real price and dividend levels.

Open in new tab Download slide

Gallant and McCulloch (2009), GM hereafter, introduced a Bayesian method for fitting a model from a scientific discipline (scientific model for short) for which a likelihood is not readily available to sparse data. They synthesize a likelihood by means of an “auxiliary model” and simulation from the “scientific model.” GM used the term “statistical model” for auxiliary model. We do not because the consonance between scientific model and statistical model causes confusion and also because we call into question the premise on which GM chose the term statistical model.

In the GM framework, the auxiliary model must nest the scientific model for the methodology to be logically correct. GM argue that it is better to use an auxiliary model that represents the data well, even if it does not nest the scientific model. We conduct a sensitivity analysis employing six auxiliary models of differing complexities to determine if the choice of auxiliary model makes a difference to our results. The six models are shown in Table 1. Model f₁ is closest to that used by GM; f₅ is the nesting model for univariate (stock returns alone) and bivariate (consumption growth and stock returns) data.

Table 1

Open in new tab

Auxiliary models

	f₀	f₁	f₂	f₃	f₄	f₅
Mean	1 lag	1 lag	1 lag	1 lag	1 lag	2 lags
Variance	Constant	GARCH	GARCH	GARCH	GARCH	GARCH
			leverage	leverage	leverage	leverage
Errors	Normal	Normal	Normal	fFlexible	Flexible	Flexible
					nonlinear	nonlinear
Parms univar	3	5	6	10	11	12
Parms bivar	9	12	14	22	24	28
Parms trivar	18	22	25	37

	f₀	f₁	f₂	f₃	f₄	f₅
Mean	1 lag	1 lag	1 lag	1 lag	1 lag	2 lags
Variance	Constant	GARCH	GARCH	GARCH	GARCH	GARCH
			leverage	leverage	leverage	leverage
Errors	Normal	Normal	Normal	fFlexible	Flexible	Flexible
					nonlinear	nonlinear
Parms univar	3	5	6	10	11	12
Parms bivar	9	12	14	22	24	28
Parms trivar	18	22	25	37

Multivariate GARCH variance matrices are of the BEKK form (Engle and Kroner 1995) with one lag throughout. A nonlinear error density adds nonlinear terms that depend on one lag to the conditional mean and variance. When evaluated, data are centered and scaled and lags are attenuated by a spline transform. See Gallant and Tauchen (2009) for details. The functional form is displayed in Subsection 2.4. Parms is the number of parameters, which depends on the dimension of the data: univariate, bivariate, or trivariate. The habit persistence model has seven parameters, the long-run risks model has thirteen, and the prospect theory model has eleven.

Table 1

Open in new tab

Auxiliary models

	f₀	f₁	f₂	f₃	f₄	f₅
Mean	1 lag	1 lag	1 lag	1 lag	1 lag	2 lags
Variance	Constant	GARCH	GARCH	GARCH	GARCH	GARCH
			leverage	leverage	leverage	leverage
Errors	Normal	Normal	Normal	fFlexible	Flexible	Flexible
					nonlinear	nonlinear
Parms univar	3	5	6	10	11	12
Parms bivar	9	12	14	22	24	28
Parms trivar	18	22	25	37

	f₀	f₁	f₂	f₃	f₄	f₅
Mean	1 lag	1 lag	1 lag	1 lag	1 lag	2 lags
Variance	Constant	GARCH	GARCH	GARCH	GARCH	GARCH
			leverage	leverage	leverage	leverage
Errors	Normal	Normal	Normal	fFlexible	Flexible	Flexible
					nonlinear	nonlinear
Parms univar	3	5	6	10	11	12
Parms bivar	9	12	14	22	24	28
Parms trivar	18	22	25	37

Multivariate GARCH variance matrices are of the BEKK form (Engle and Kroner 1995) with one lag throughout. A nonlinear error density adds nonlinear terms that depend on one lag to the conditional mean and variance. When evaluated, data are centered and scaled and lags are attenuated by a spline transform. See Gallant and Tauchen (2009) for details. The functional form is displayed in Subsection 2.4. Parms is the number of parameters, which depends on the dimension of the data: univariate, bivariate, or trivariate. The habit persistence model has seven parameters, the long-run risks model has thirteen, and the prospect theory model has eleven.

We use the protocol set forth in GM to establish nesting, which, briefly, is as follows. The models shown in Table 1 are the beginning of a sequence (whose progression is described in Subsection 2.4) that, if continued, is dense for the space in which the scientific model must lie. Therefore, what needs to be done to find a nesting model is to select the correct truncation point. Here we use the Schwarz (1978) criterion to select it followed by some diagnostic checks that GM discuss.

Increasing the dimension of a multivariate time series often simplifies the conditioning structure. That happens here: f₀ is the nesting model for trivariate (stock returns, consumption growth, and price to dividend ratio) data. We check our f₀ results using models f₁ through f₃. Computations to fit a nonlinear model to data that does not identify it well are often unstable, as they are for models f₄ and f₅ when fitted to trivariate simulations. Therefore, we do not consider f₄ and f₅ for trivariate data. The prospect theory model puts its mass on a two-dimensional subspace. This violates the GM regularity conditions. Therefore, we do not consider the prospect theory model for trivariate data.

We find that the computational methods that GM proposed are not sufficiently accurate to compare the habit, long-run risks, and prospect theory models. This is due to the fact that the auxiliary model f₅ that nests these three is more complex than the auxiliary models that GM considered. A contribution of this paper is a refinement of GM's methods that increases accuracy to the point that auxiliary models as complex as f₅ can be used in applications.

1 MODELS CONSIDERED

The intuitive notions behind any consumption-based asset pricing model are that agents receive wage, interest, and dividend income from which they purchase consumption. Agents seek to reallocate their consumption over time by trading shares of stock that pay a random dividend and bonds that pay interest with certainty. This reallocation is done for the purpose of insuring against spells of unemployment, providing for retirement, and so on. Trading activity enters the model via the constraint that an agent's purchase of consumption, bonds, and stock cannot exceed wage, interest, and dividend income in any period. When applying the model to a national economy, consumption and dividends can be used as the driving processes instead of wages and dividends because wages can be recovered by subtracting dividends from consumption. (Someone must own the stock so the dividends must be received, while for bonds someone pays interest and another receives so there are no net interest receipts.) Agents are endowed with a utility function that depends on the entire consumption process. The first-order conditions of their utility maximization problem determine a map from present and past values of the driving processes to the present price of a stock and a bond. These models are simulated by first simulating the driving processes and then evaluating the map that determines stock and bond prices. For each model, we shall describe the driving processes and the utility function, leaving a description of the algorithm for computing the map to the cited literature.

Our prior, which is the same for all models, is

(1)

where r_f^a = lim_n→∞(1/n)∑_{t = 1}ⁿr_ft^a is the ergodic mean of the risk-free rate r_ft^a and the θ_i^* are the parameter values published by the proposer. Campbell (2003) notes that any reasonable asset pricing model must incorporate the indirect evidence that the risk-free rate is very low with low volatility. Campbell's evidence suggests that the mean risk-free rate for the United States is 0.896 percent per annum. BGT argue that imposing the risk-free rate a priori will produce better estimates than using an ex ante risk-free rate series because any empirical ex ante risk-free rate series is mostly noise due to the difficulty of determining ex ante inflation (Mishkin 1981). The proposers used their judgment to determine the θ_i^*. They took data, some of which differed from ours, partially into account in forming their judgment but none could have used data more recent than 2003. Thus our prior is partially, if not completely, independent of our data. This prior appears to strike the right balance. It is tight enough to achieve Markov Chain Monte Carlo (MCMC) chains that mix well despite the use of sparse data. It is loose enough to allow the data to be influential. Making it looser still causes numerical problems. For the purposes of this paper, it is pointless to consider a tighter prior.

Throughout this section, lower case denotes the logarithm of an upper case quantity, for example, c_t = log(C_t), where C_t is consumption during time period t, and d_t = log(D_t), where D_t is dividends paid during period t. The exceptions are the geometric risk-free interest rate r_ft = − logP_{f,t − 1} and the geometric stock return inclusive of dividends r_dt = log(P_dt + D_t) − logP_{d,t − 1}, where P_{f,t − 1} is the price of a discount bond at the beginning of time period t that pays $1 with certainty at the end of period t,P_{d,t − 1} is the price of a stock at the beginning of time period t, and P_dt its price at the end.

1.1 The Habit Persistence Model

The driving processes for the habit persistence model are

(2)

The utility function is

(3)

where habit persistence is implemented by two equations:

(4)

(5)

γ is a measure of curvature, which scales attitudes toward risk, and δ is the agent's discount factor. ℰ_t is conditional expectation with respect to S_t, which is the state variable; s_t = log(S_t). The quantities

and s_max can be computed from model parameters as

and

. The variable X_t = C_t(1 − S_t) is called external habit. By substituting S_tC_t = C_t − X_t in Equation (3), one can see that utility is extremely low when consumption is close to X_t for γ > 1. Habit X_t is determined by past consumption as is seen by noting that v_{t − 1} = log(C_{t − 1}/C_{t − 2}) − g in Equation (4). Given the habit model's parameters

(6)

{C_t,r_dt,r_ft}_{t = 1}^12N are simulated at the monthly frequency and aggregated to the annual frequency

(7)

(8)

(9)

(10)

where N is the annual simulation size.

The prior is Equation (1). For the habit model, the scale factor used for φ and δ is 0.001 rather than the 0.1 shown in Equation (1) to overcome an identification problem. The MCMC chain will not mix when the scale factor for φ and δ in Equation (1) is 0.1 because a move in φ can be nearly exactly offset by a move in δ. The value 0.001 is the largest value for which MCMC chains will mix. Because of the first term of the right-hand side of Equation (1), (1) is not an independence prior; in simulations from the prior no correlation between parameters is zero.

Measures of location and scale for the prior and posterior distributions of habit model parameters are shown in the top panel of Table 2. The prior and posterior densities of the risk-free rate, equity premium, equity returns, and the standard deviation of equity returns are plotted in the left column of Figure 2. Overall, Table 2 and Figure 2 suggest that the prior is sufficiently informative to fill in where data are sparse but it allows the data to move the posterior where data are informative.

Table 2

Open in new tab

Prior and posterior model parameters

	Prior			Posterior
Parameter	Mode	Mean	SD	Mode	Mean	SD
Habit model
g	0.00157547	0.00156519	0.00008128	0.00166893	0.00159147	0.00007473
σ	0.00440979	0.00431169	0.00022113	0.00502777	0.00501054	0.00018533
ρ	0.20068359	0.20053348	0.01072491	0.19445801	0.19892873	0.00931413
σ_w	0.03228760	0.03247938	0.00169052	0.03193665	0.03175960	0.00138630
φ	0.98826599	0.98830499	0.00042475	0.98769760	0.98773761	0.00033629
δ	0.99046326	0.99041700	0.00043605	0.99033737	0.99033565	0.00044495
γ	2.04296875	2.04076156	0.08924751	1.97558594	1.96336336	0.07720679
r_f	0.97796400	1.07587200	0.13273052	1.02530400	0.96219600	0.12647089
r_d − r_f	6.04969200	5.98359600	0.07700698	6.26854800	6.23908800	0.07426341
σ_{r_d}	19.67246807	19.69228275	0.14078849	20.17062220	20.14121148	0.14442220
Long-run risks model
δ	0.99961090	0.99934096	0.00031172	0.99964905	0.99943058	0.00029362
γ	9.89062500	10.07348625	0.48583545	9.92187500	10.00010750	0.50121255
ψ	1.49609375	1.49614344	0.07859747	1.53906250	1.50321312	0.07244585
μ_c	0.00148392	0.00148142	0.00007031	0.00151825	0.00149122	0.00007685
ρ	0.98413086	0.98408021	0.00468241	0.98284912	0.98435210	0.00320064
φ_e	0.03204346	0.03202031	0.00160150	0.03204346	0.03202844	0.00162241
${\bar{σ}}^{2}$	0.00004041	0.00004124	0.00000196	0.00004160	0.00004061	0.00000196
ν	0.98730469	0.98738766	0.00441105	0.98199463	0.98223563	0.00299350
σ_w	0.00000168	0.00000170	0.00000009	0.00000169	0.00000170	0.00000008
μ_d	0.00120926	0.00119140	0.00006114	0.00121307	0.00120186	0.00006030
φ_d	2.78906250	2.80749125	0.14620180	2.88281250	2.82820500	0.15095447
π_d	4.07031250	4.11655125	0.20586470	4.17187500	4.15665625	0.19923412
φ_u	6.14062500	6.27596375	0.31996896	6.45312500	6.19978500	0.30424633
r_f	0.94398000	1.16133600	0.12177703	0.90874800	1.11896400	0.11709356
r_d − r_f	4.30737600	4.98738000	0.48844526	4.11223200	4.59213600	0.28433000
σ_{r_d}	18.28002188	18.85677597	0.17586080	19.07839616	18.58935179	0.13239826
Prospect theory model
g_C	0.01828003	0.01792775	0.00093413	0.01846313	0.01795106	0.00095215
g_D	0.01870728	0.01833821	0.00095276	0.01849365	0.01845027	0.00097794
σ_C	0.03918457	0.03764040	0.00200690	0.03295898	0.03356905	0.00201110
σ_D	0.12231445	0.12023010	0.00611083	0.11962891	0.11738381	0.00597238
ω	0.14794922	0.15018164	0.00694094	0.14892578	0.15015283	0.00801015
γ	0.98632812	0.98511422	0.05145608	0.96484375	0.97603082	0.04958596
ρ	0.99972534	0.99783899	0.00163604	0.99969482	0.99783430	0.00202090
λ	2.17968750	2.24709750	0.11486810	2.23437500	2.18521953	0.11761822
k	9.82812500	9.86375625	0.53189914	9.90625000	9.84252984	0.53634137
b₀	2.00195312	2.00328703	0.10967111	1.89355469	1.93699477	0.12735310
η	0.91601562	0.89845969	0.04412695	0.85375977	0.85965642	0.02405305
r_f	1.75579200	1.91283600	0.05667617	1.76136000	1.91498400	0.06495191
r_d − r_f	5.92353600	5.49249600	0.19235810	4.88326800	4.78360800	0.12334973
σ_{r_d}	27.97748380	26.75881163	0.92424294	22.90177286	22.79236714	0.29273615

	Prior			Posterior
Parameter	Mode	Mean	SD	Mode	Mean	SD
Habit model
g	0.00157547	0.00156519	0.00008128	0.00166893	0.00159147	0.00007473
σ	0.00440979	0.00431169	0.00022113	0.00502777	0.00501054	0.00018533
ρ	0.20068359	0.20053348	0.01072491	0.19445801	0.19892873	0.00931413
σ_w	0.03228760	0.03247938	0.00169052	0.03193665	0.03175960	0.00138630
φ	0.98826599	0.98830499	0.00042475	0.98769760	0.98773761	0.00033629
δ	0.99046326	0.99041700	0.00043605	0.99033737	0.99033565	0.00044495
γ	2.04296875	2.04076156	0.08924751	1.97558594	1.96336336	0.07720679
r_f	0.97796400	1.07587200	0.13273052	1.02530400	0.96219600	0.12647089
r_d − r_f	6.04969200	5.98359600	0.07700698	6.26854800	6.23908800	0.07426341
σ_{r_d}	19.67246807	19.69228275	0.14078849	20.17062220	20.14121148	0.14442220
Long-run risks model
δ	0.99961090	0.99934096	0.00031172	0.99964905	0.99943058	0.00029362
γ	9.89062500	10.07348625	0.48583545	9.92187500	10.00010750	0.50121255
ψ	1.49609375	1.49614344	0.07859747	1.53906250	1.50321312	0.07244585
μ_c	0.00148392	0.00148142	0.00007031	0.00151825	0.00149122	0.00007685
ρ	0.98413086	0.98408021	0.00468241	0.98284912	0.98435210	0.00320064
φ_e	0.03204346	0.03202031	0.00160150	0.03204346	0.03202844	0.00162241
${\bar{σ}}^{2}$	0.00004041	0.00004124	0.00000196	0.00004160	0.00004061	0.00000196
ν	0.98730469	0.98738766	0.00441105	0.98199463	0.98223563	0.00299350
σ_w	0.00000168	0.00000170	0.00000009	0.00000169	0.00000170	0.00000008
μ_d	0.00120926	0.00119140	0.00006114	0.00121307	0.00120186	0.00006030
φ_d	2.78906250	2.80749125	0.14620180	2.88281250	2.82820500	0.15095447
π_d	4.07031250	4.11655125	0.20586470	4.17187500	4.15665625	0.19923412
φ_u	6.14062500	6.27596375	0.31996896	6.45312500	6.19978500	0.30424633
r_f	0.94398000	1.16133600	0.12177703	0.90874800	1.11896400	0.11709356
r_d − r_f	4.30737600	4.98738000	0.48844526	4.11223200	4.59213600	0.28433000
σ_{r_d}	18.28002188	18.85677597	0.17586080	19.07839616	18.58935179	0.13239826
Prospect theory model
g_C	0.01828003	0.01792775	0.00093413	0.01846313	0.01795106	0.00095215
g_D	0.01870728	0.01833821	0.00095276	0.01849365	0.01845027	0.00097794
σ_C	0.03918457	0.03764040	0.00200690	0.03295898	0.03356905	0.00201110
σ_D	0.12231445	0.12023010	0.00611083	0.11962891	0.11738381	0.00597238
ω	0.14794922	0.15018164	0.00694094	0.14892578	0.15015283	0.00801015
γ	0.98632812	0.98511422	0.05145608	0.96484375	0.97603082	0.04958596
ρ	0.99972534	0.99783899	0.00163604	0.99969482	0.99783430	0.00202090
λ	2.17968750	2.24709750	0.11486810	2.23437500	2.18521953	0.11761822
k	9.82812500	9.86375625	0.53189914	9.90625000	9.84252984	0.53634137
b₀	2.00195312	2.00328703	0.10967111	1.89355469	1.93699477	0.12735310
η	0.91601562	0.89845969	0.04412695	0.85375977	0.85965642	0.02405305
r_f	1.75579200	1.91283600	0.05667617	1.76136000	1.91498400	0.06495191
r_d − r_f	5.92353600	5.49249600	0.19235810	4.88326800	4.78360800	0.12334973
σ_{r_d}	27.97748380	26.75881163	0.92424294	22.90177286	22.79236714	0.29273615

Parameter values are for the monthly frequency for the habit and long-run risks models and for the annual frequency for the prospect theory model. Mode is that of the multivariate density. Returns are geometric, annualized, and expressed as a percent for all models. In the data, returns are r_d − r_f = 5.59 − 0.89 = 4.7 and σ_{r_d} = 19.72. The auxiliary model is f₅ as described in Table 1. The data are annual consumption growth and stock returns for the years 1930–2008, see Figure 1.

Table 2

Open in new tab

Prior and posterior model parameters

	Prior			Posterior
Parameter	Mode	Mean	SD	Mode	Mean	SD
Habit model
g	0.00157547	0.00156519	0.00008128	0.00166893	0.00159147	0.00007473
σ	0.00440979	0.00431169	0.00022113	0.00502777	0.00501054	0.00018533
ρ	0.20068359	0.20053348	0.01072491	0.19445801	0.19892873	0.00931413
σ_w	0.03228760	0.03247938	0.00169052	0.03193665	0.03175960	0.00138630
φ	0.98826599	0.98830499	0.00042475	0.98769760	0.98773761	0.00033629
δ	0.99046326	0.99041700	0.00043605	0.99033737	0.99033565	0.00044495
γ	2.04296875	2.04076156	0.08924751	1.97558594	1.96336336	0.07720679
r_f	0.97796400	1.07587200	0.13273052	1.02530400	0.96219600	0.12647089
r_d − r_f	6.04969200	5.98359600	0.07700698	6.26854800	6.23908800	0.07426341
σ_{r_d}	19.67246807	19.69228275	0.14078849	20.17062220	20.14121148	0.14442220
Long-run risks model
δ	0.99961090	0.99934096	0.00031172	0.99964905	0.99943058	0.00029362
γ	9.89062500	10.07348625	0.48583545	9.92187500	10.00010750	0.50121255
ψ	1.49609375	1.49614344	0.07859747	1.53906250	1.50321312	0.07244585
μ_c	0.00148392	0.00148142	0.00007031	0.00151825	0.00149122	0.00007685
ρ	0.98413086	0.98408021	0.00468241	0.98284912	0.98435210	0.00320064
φ_e	0.03204346	0.03202031	0.00160150	0.03204346	0.03202844	0.00162241
${\bar{σ}}^{2}$	0.00004041	0.00004124	0.00000196	0.00004160	0.00004061	0.00000196
ν	0.98730469	0.98738766	0.00441105	0.98199463	0.98223563	0.00299350
σ_w	0.00000168	0.00000170	0.00000009	0.00000169	0.00000170	0.00000008
μ_d	0.00120926	0.00119140	0.00006114	0.00121307	0.00120186	0.00006030
φ_d	2.78906250	2.80749125	0.14620180	2.88281250	2.82820500	0.15095447
π_d	4.07031250	4.11655125	0.20586470	4.17187500	4.15665625	0.19923412
φ_u	6.14062500	6.27596375	0.31996896	6.45312500	6.19978500	0.30424633
r_f	0.94398000	1.16133600	0.12177703	0.90874800	1.11896400	0.11709356
r_d − r_f	4.30737600	4.98738000	0.48844526	4.11223200	4.59213600	0.28433000
σ_{r_d}	18.28002188	18.85677597	0.17586080	19.07839616	18.58935179	0.13239826
Prospect theory model
g_C	0.01828003	0.01792775	0.00093413	0.01846313	0.01795106	0.00095215
g_D	0.01870728	0.01833821	0.00095276	0.01849365	0.01845027	0.00097794
σ_C	0.03918457	0.03764040	0.00200690	0.03295898	0.03356905	0.00201110
σ_D	0.12231445	0.12023010	0.00611083	0.11962891	0.11738381	0.00597238
ω	0.14794922	0.15018164	0.00694094	0.14892578	0.15015283	0.00801015
γ	0.98632812	0.98511422	0.05145608	0.96484375	0.97603082	0.04958596
ρ	0.99972534	0.99783899	0.00163604	0.99969482	0.99783430	0.00202090
λ	2.17968750	2.24709750	0.11486810	2.23437500	2.18521953	0.11761822
k	9.82812500	9.86375625	0.53189914	9.90625000	9.84252984	0.53634137
b₀	2.00195312	2.00328703	0.10967111	1.89355469	1.93699477	0.12735310
η	0.91601562	0.89845969	0.04412695	0.85375977	0.85965642	0.02405305
r_f	1.75579200	1.91283600	0.05667617	1.76136000	1.91498400	0.06495191
r_d − r_f	5.92353600	5.49249600	0.19235810	4.88326800	4.78360800	0.12334973
σ_{r_d}	27.97748380	26.75881163	0.92424294	22.90177286	22.79236714	0.29273615

	Prior			Posterior
Parameter	Mode	Mean	SD	Mode	Mean	SD
Habit model
g	0.00157547	0.00156519	0.00008128	0.00166893	0.00159147	0.00007473
σ	0.00440979	0.00431169	0.00022113	0.00502777	0.00501054	0.00018533
ρ	0.20068359	0.20053348	0.01072491	0.19445801	0.19892873	0.00931413
σ_w	0.03228760	0.03247938	0.00169052	0.03193665	0.03175960	0.00138630
φ	0.98826599	0.98830499	0.00042475	0.98769760	0.98773761	0.00033629
δ	0.99046326	0.99041700	0.00043605	0.99033737	0.99033565	0.00044495
γ	2.04296875	2.04076156	0.08924751	1.97558594	1.96336336	0.07720679
r_f	0.97796400	1.07587200	0.13273052	1.02530400	0.96219600	0.12647089
r_d − r_f	6.04969200	5.98359600	0.07700698	6.26854800	6.23908800	0.07426341
σ_{r_d}	19.67246807	19.69228275	0.14078849	20.17062220	20.14121148	0.14442220
Long-run risks model
δ	0.99961090	0.99934096	0.00031172	0.99964905	0.99943058	0.00029362
γ	9.89062500	10.07348625	0.48583545	9.92187500	10.00010750	0.50121255
ψ	1.49609375	1.49614344	0.07859747	1.53906250	1.50321312	0.07244585
μ_c	0.00148392	0.00148142	0.00007031	0.00151825	0.00149122	0.00007685
ρ	0.98413086	0.98408021	0.00468241	0.98284912	0.98435210	0.00320064
φ_e	0.03204346	0.03202031	0.00160150	0.03204346	0.03202844	0.00162241
${\bar{σ}}^{2}$	0.00004041	0.00004124	0.00000196	0.00004160	0.00004061	0.00000196
ν	0.98730469	0.98738766	0.00441105	0.98199463	0.98223563	0.00299350
σ_w	0.00000168	0.00000170	0.00000009	0.00000169	0.00000170	0.00000008
μ_d	0.00120926	0.00119140	0.00006114	0.00121307	0.00120186	0.00006030
φ_d	2.78906250	2.80749125	0.14620180	2.88281250	2.82820500	0.15095447
π_d	4.07031250	4.11655125	0.20586470	4.17187500	4.15665625	0.19923412
φ_u	6.14062500	6.27596375	0.31996896	6.45312500	6.19978500	0.30424633
r_f	0.94398000	1.16133600	0.12177703	0.90874800	1.11896400	0.11709356
r_d − r_f	4.30737600	4.98738000	0.48844526	4.11223200	4.59213600	0.28433000
σ_{r_d}	18.28002188	18.85677597	0.17586080	19.07839616	18.58935179	0.13239826
Prospect theory model
g_C	0.01828003	0.01792775	0.00093413	0.01846313	0.01795106	0.00095215
g_D	0.01870728	0.01833821	0.00095276	0.01849365	0.01845027	0.00097794
σ_C	0.03918457	0.03764040	0.00200690	0.03295898	0.03356905	0.00201110
σ_D	0.12231445	0.12023010	0.00611083	0.11962891	0.11738381	0.00597238
ω	0.14794922	0.15018164	0.00694094	0.14892578	0.15015283	0.00801015
γ	0.98632812	0.98511422	0.05145608	0.96484375	0.97603082	0.04958596
ρ	0.99972534	0.99783899	0.00163604	0.99969482	0.99783430	0.00202090
λ	2.17968750	2.24709750	0.11486810	2.23437500	2.18521953	0.11761822
k	9.82812500	9.86375625	0.53189914	9.90625000	9.84252984	0.53634137
b₀	2.00195312	2.00328703	0.10967111	1.89355469	1.93699477	0.12735310
η	0.91601562	0.89845969	0.04412695	0.85375977	0.85965642	0.02405305
r_f	1.75579200	1.91283600	0.05667617	1.76136000	1.91498400	0.06495191
r_d − r_f	5.92353600	5.49249600	0.19235810	4.88326800	4.78360800	0.12334973
σ_{r_d}	27.97748380	26.75881163	0.92424294	22.90177286	22.79236714	0.29273615

Parameter values are for the monthly frequency for the habit and long-run risks models and for the annual frequency for the prospect theory model. Mode is that of the multivariate density. Returns are geometric, annualized, and expressed as a percent for all models. In the data, returns are r_d − r_f = 5.59 − 0.89 = 4.7 and σ_{r_d} = 19.72. The auxiliary model is f₅ as described in Table 1. The data are annual consumption growth and stock returns for the years 1930–2008, see Figure 1.

Figure 2

Prior and posterior density estimates. The dashed line is the prior. The solid line is the posterior. Left column is for the habit model, middle for the long-run risks model, right for the prospect theory model. Returns are geometric, annualized, and expressed as a percent. Other details as in Table 2. Bandwidths are small to reduce smudging of isolated peaked modes.

Open in new tab Download slide

Where differences in the three models are the most obvious visually is in their out-of-sample forecasts for the next five years. The mean posterior forecast for the habit model, computed as described in Subsection 2.6, is plotted in the left column of Figure 3. The habit model predicts an end to the current recession in 2009 and return to steady-state growth by 2010. This is dictated by Equations (2), (7), and (8), which imply that annual consumption growth for the habit model is a first-order autoregression with an autogression parameter of about 0.25 (Working 1960). Stock returns are predicted to be high in 2009 with a return to steady-state returns by 2013.

Figure 3

Prior and posterior forecasts. The solid line is the mean of the posterior annualized and expressed as a percent. The dashed lines are ±1.96 posterior standard deviations. The left column is for the habit model, middle for the long-run risks model, and right for the prospect theory model. Other details as in Table 2.

Open in new tab Download slide

1.2 The Long-run Risks Model

The driving processes for the long-run risks model are as follows:

The long-run risks model derives its name from the random shifts in the location of consumption and dividends due to x_t. Note that long-run risks x_t and stochastic volatility σ_t² evolve autonomously. The utility function is

(11)

γ and δ have the same interpretation as for the habit model; ψ, which summarizes preferences across time periods, is a separate parameter. Separation of attitudes toward risk and preferences across time is the main advantage of Equation (11). ℰ_t is the conditional expectation with respect to x_t and σ_t, which are the state variables.

The long-run risks model is richly parametrized as

(12)

It is so richly parametrized that identification would have to come from the prior even when data are abundant because half of the models in Table 1 have fewer parameters than θ. The time increment is one month. Aggregation of monthly {c_t,r_dt,r_ft}_{t = 1}^12N to the annual frequency {c_t^a,r_dt^a,r_ft^a}_{t = 1}^N is by means of Equations (7) through (10).

The prior is Equation (1). The autoregressive parameters ρ and η cause problems. The solution method proposed by BY degrades as ρ and η increase from their published values. Because the degradation is continuous in ρ and η, there is no logical threshold that one can impose on ρ and η to completely prevent degradation. Our solution to this problem is to set the scale factor for ρ and ν in Equation (1) to 0.01 rather than 0.1 and attenuate the tails for ρ and ν above 0.995.

Measures of location and scale for the prior and posterior distributions of the parameters of the long-run risks model are shown in the middle panel of Table 2. The prior and posterior densities of the risk-free rate, equity premium, equity returns, and the standard deviation of equity returns are plotted in the middle column of Figure 2. As for the habit model, Table 2 and Figure 2 suggest that the prior is sufficiently informative to fill in where data are sparse, but it allows the data to move to the posterior where data are informative.

The mean posterior forecast for the long-run risks model is plotted in the middle column of Figure 3. The long-run risks model predicts an end to the current recession in 2010 and a slow increase in the growth rate thereafter. Stock returns are predicted to be at their steady-state values over the entire forecast period. A flat response of asset returns to a consumption growth and asset return shock is a structural characteristic of the long-run risks model. It is due to the fact that stochastic volatility is autonomous and therefore not affected by consumption growth and asset return shocks. The stochastic volatility process is the factor that affects the risk premium in the long-run risks model.

1.3 The Prospect Theory Model

The driving processes for the prospect theory model are as follows:

The prospect theory model distinguishes between aggregate consumption

, which is not a choice variable, and the agent's consumption C_t, which is. In addition to these variables, let R_t denote the gross stock return; R_f, the gross risk-free rate; S_t, the share of wealth allocated to the risky asset;

(13)

the relative gain or loss on the risky asset; and

(14)

the benchmark level, where

is chosen to make median {z_t} = 1. The prospect theory utility function is

(15)

where

(16)

and

(17)

ℰ_t is the conditional expectation with respect to the benchmark level z_t, which is the state variable. In Equation (15), the agent's discount factor is ρ and risk aversion parameter is γ. The second term in Equation (15) is the utility from gains or losses, where b₀ is a scale factor. From a plot of

figure 1 of BHS, one can see that when there are no prior gains and losses (z = 1), agents dislike losses more than they appreciate gains. When there are prior losses (z > 1), the dislike intensifies. When there are prior gains (z < 1), an agent is “playing on the house's money” and pain is delayed until the “house's money has been lost.” The parameter η in Equation (14) controls sensitivity to past gains and losses. When η is zero, its lower bound, the benchmark does not depend at all on past gains and losses. The dependence increases as η approaches its upper bound of one. Agents always dislike losses more than they appreciate gains; η just determines the delay. See BHS for the relation of the prospect theory utility function to the psychology literature.

Like the long-run risks model, the prospect theory model is richly parameterized:

(18)

Identification requires the prior for most of the auxiliary models considered. The time increment is one year: Simulate directly and set r_dt^a = r_dt, and r_ft^a = r_ft.

Measures of location and scale for the prior and posterior distributions of the parameters of the prospect theory model are shown in the bottom panel of Table 2. The prior and posterior densities of the risk-free rate, equity premium, equity returns, and the standard deviation of equity returns are plotted in the right column of Figure 2. As previously, Table 2 and Figure 2 suggest that the prior is sufficiently informative to fill in where data are sparse, but it allows the data to move the posterior where data are informative.

The mean posterior forecast for the prospect theory model is plotted in the right column of Figure 3. The prospect theory model predicts steady-state growth throughout the forecast period. This is dictated by the fact that annual consumption growth for the prospect theory model is independent and identically distributed. Stock returns are predicted to be double their steady-state value in 2009, reach steady-state by 2011, and remain at steady-state thereafter.

2 INFERENCE FOR GENERAL SCIENTIFIC MODELS

We describe the Bayesian methods proposed by GM and the modifications that we found necessary. Public domain code implementing the method for the auxiliary models in Table 1 and a User's Guide are available at http://econ.duke.edu/ webfiles/arg/gsm.

2.1 Estimation of Scientific Model Parameters

Let the transition density of the scientific model be denoted by

(19)

where x_{t − 1} = (y_{t − 1},…,y_{t − L}) if Markovian and x_{t − 1} = (y_{t − 1},…,y₁) if not. We presume that there is no straightforward algorithm for computing the likelihood, but we can simulate data from p(·|·,θ) for given θ. We presume that simulations from the scientific model are ergodic. We assume that there is a transition density

(20)

and a map

(21)

such that

(22)

We assume that f(y|x,η) and its gradient (∂/∂η)f(y|x,η) are easy to evaluate. f is called the auxiliary model and g is called the implied map. When Equation (22) holds, f is said to nest p. Whenever we need the likelihood ∏_{t = 1}ⁿp(y_t|x_{t − 1},θ), we use

(23)

where {y_t,x_{t − 1}}_{t = 1}ⁿ are the data and n is the sample size. After substituting ℒ(θ) for ∏_{t = 1}ⁿp(y_t|x_{t − 1},θ), standard Bayesian MCMC methods become applicable. That is, we have a likelihood ℒ(θ) from Equation (23) and a prior π(θ) from Equation (1) and need nothing beyond that to implement Bayesian methods by means of MCMC. A good introduction to these methods is Gamerman and Lopes (2006).

The difficulty is computing the implied map g accurately enough that the accept/reject decision in an MCMC chain (step 5 in the algorithm below) is correct when f is a nonlinear model. We describe the algorithms that we have found effective next.

Given θ, η = g(θ) is computed by minimizing Kullback–Leibler divergence

with respect to η. The advantage of Kullback–Leibler divergence over other distance measures is that the part that depends on the unknown p(·|·,θ), ∫∫logp(y|x,θ)p(y|x,θ)dyp(x|θ)dx, does not have to be computed to solve the minimization problem. We approximate the integral that does have to be computed by

where

is a simulation of length N from p(·|·,θ). Upon dropping the division by N, the implied map is computed as

(24)

We use N = 5000, which requires 60,000 monthly simulations in the case of the habit and long-run risks models. Results (posterior mean, posterior standard deviation, etc.) are not sensitive to N; doubling N makes no difference other than doubling computational time. By accident we once set N = 60,000 in the prospect theory model; this also made no difference. It is essential that the same seed be used to start these simulations so that the same θ always produces the same simulation.

GM run a Markov chain {η_t}_{t = 1}^K of length K to compute that solves Equation (24). There are two other Markov chains discussed below so, to help distinguish among them, this chain is called the η-subchain. While the η-subchain must be run to provide the scaling for the model assessment method that GM propose, the that corresponds to the maximum of over the η-subchain is not a sufficiently accurate evaluation of g(θ) for our auxiliary models. This is mainly because our auxiliary models use a multivariate specification of generalized autoregressive conditional heteroskedasticity (GARCH) that Engle and Kroner (1995) call BEKK. Likelihoods incorporating BEKK are notoriously difficult to optimize. We use as a starting value and maximize Equation (24) using the BFGS algorithm (Fletcher 1987, 26–40). This also is not a sufficiently accurate evaluation of g(θ). A second refinement is necessary. The second refinement is embedded within the MCMC chain {θ_t}_{t − 1}^R of length R that is used to compute the posterior distribution of θ. It is called the θ-chain. (We use R = 25,000 past the point transients have dissipated.) Its computation proceeds as follows.

The θ-chain is generated using the Metropolis algorithm. The Metropolis algorithm is an iterative scheme that generates a Markov chain whose stationary distribution is the posterior of θ. To implement it, we require a likelihood, a prior, and transition density in θ called the proposal density. The likelihood is Equation (23).

The prior may require quantities computed from the simulation used to compute Equation (23). Our prior requires r_f^a. The sequence is available from the simulation. The risk-free rate for the prior is the average . (For the habit and prospect models, r_ft^a is constant over the simulation.) Quantities computed in this fashion can be interpreted as the evaluation of a functional of the scientific model of the form Ψ:p(·|·,θ)↦ψ. Thus, our prior is a function of the form π(θ,ψ). However, the functional ψ is a composite function, θ↦p(·|·,θ)↦ψ, so that π(θ,ψ) is ultimately a function of θ only. Therefore, we will only write π(θ,ψ) when it is necessary to call attention to the subsidiary computation p(·|·,θ)↦ψ.

Let q denote the proposal density. For a given θ, q(θ,θ^*) defines a distribution of potential new values θ^*. We use a move-one-at-a-time, random-walk proposal density that puts its mass on discrete, separated points, proportional to a normal. Two aspects of the proposal scheme are worth noting. The first is that the wider the separation between the points in the support of q the less accurately g(θ) needs to be computed for α at step 5 of the algorithm below to be correct. As an example, the long-run risks model is not sensitive to the risk aversion parameter γ so that values of γ could be separated as much as one-fourth without making any difference to the usefulness of the θ-chain with respect to inference regarding economics. A practical constraint is that the separation cannot be much more than a standard deviation of the proposal density or the chain will not move. Our separations are typically one-eighth of a standard deviation of the proposal density. In turn, the standard deviations of the proposal density are typically no more than the standard deviations in Table 2 and no less than one order of magnitude smaller. The second aspect worth noting is that the prior is putting mass on these discrete points in proportion to π(θ). Because we never need to normalize π(θ), this does not matter. Similarly for the joint distribution f(y|x,g(θ))π(θ) considered as a function of θ. However, f(y|x,η) must be properly normalized as a function of y, at least to the extent that Equation (24) is computed correctly.

The algorithm for the θ-chain is as follows. Given a current θ^o and the corresponding η^o = g(θ^o), we obtain the next pair (θ^′,η^′) as follows:

Draw θ^* according to q(θ^o,θ^*).
Draw according to p(y_t|x_{t − 1},θ^*).
Compute η^* = g(θ^*) and the functional ψ^* from the simulation .
Compute .
With probability α, set (θ^′,η^′) = (θ^*,η^*), otherwise set (θ^′,η^′) = (θ^o,η^o).

It is at step 3 that we make our second modification. At that point, we have putative pairs (θ^*,η^*) and (θ^o,η^o) and corresponding simulations

and

. We use η^* as a start and recompute η^o using the BFGS algorithm, obtaining

. If

then

replaces η^o. In the same fashion, η^* is recomputed using η^o as a start. Once computed, a (θ,η) pair is never discarded. Neither are the corresponding ℒ(θ) and π(θ,ψ). Because the support of the proposal density is discrete, points in the θ-chain will often recur, in which case g(θ), ℒ(θ), and π(θ,ψ) are retrieved from storage rather than computed afresh. If the modification just described results in an improved (θ^o,η^o), that pair and corresponding ℒ(θ^o) and π(θ^o,ψ^o) replace the values in storage; similarly for (θ^*,η^*). The upshot is that the values for g(θ) used at Step 4 will be optima computed from many different random starts after the chain has run awhile.

To provide the scaling for the prior used in absolute model assessment, there is a subsidiary computation that needs to be carried out at Step 3. It is as follows. Initialize S_η and L to zero. Each time the η-subchain {η_t}_{t = 1}^K is run, increment L, replace S_η by S_η + (η_K/2 − η_K)(η_K/2 − η_K)^′ and set

(25)

We use K = 200. All that is important is that transients have died out by the time the midpoint K/2 of the η-subchain has been reached and that η_K/2 and η_K are nearly uncorrelated.

We compute posterior probabilities using a method that requires one to save the values θ^′,ℒ(θ^′),π(θ^′,ψ^′) available at step 5. It also requires that these same values for a chain that draws from the prior for θ be saved. To draw from the prior, replace α at Step 4 by .

The algorithm for the η-subchain is as follows. We use a move-one-at-a-time, random-walk proposal density with continuous support. Given the current η^o, obtain the next value η^′ in the chain as follows:

Draw η^* according to q(η^o,η^*).
Compute .
With probability α, set η^′ = η^*, otherwise set η^′ = η^o.

In Subsection 2.3, we shall require another chain, called the η-chain, that is computed from the data and a prior π_κ. The algorithm for that chain replaces α with

Draws from the prior are also required. This is done by putting

2.2 Relative Model Comparison

Relative model comparison is standard Bayesian inference although there are a few details that need to be discussed to connect it to Subsection 2.1.

One computes the marginal density, ∫∏_{t = 1}ⁿf(y_t|x_{t − 1},g(θ))π(θ)dθ, for the three scientific models p₁(y|x,θ₁), p₂(y|x,θ₂), p₃(y|x,θ₃) with respective priors π₁(θ₁), π₂(θ₂), π₃(θ₃) using method f₅ of Gamerman and Lopes (2006, section 7.2.1). The advantage of that method is that knowledge of the normalizing constants of f(·|·,η) and π(θ) are not required, and it appears to be accurate in tests that we conducted. The computation is straightforward because the relevant information from the θ-chains for the prior and posterior are available after completion of the computations discussed in Subsection 2.1. It is important, however, that the auxiliary model be the same for all three models when the computations in Subsection 2.1 are carried out. Otherwise the normalizing constant of f would be required. One divides the marginal density for each model by the sum for the three models to get the probabilities for relative model assessment.

Note that what one is actually doing is comparing the three models f(y|x,g₁(θ₁)), f(y|x,g₂(θ₂)), f(y|x,g₃(θ₃)), with respective priors π₁(θ₁), π₂(θ₂), π₃(θ₃). This is an important observation. Inference is actually being conducted with likelihoods ∏_{t = 1}ⁿf(y_t|x_{t − 1},g₁(θ₁)), ∏_{t = 1}ⁿf(y_t|x_{t − 1},g₂(θ₂)), ∏_{t = 1}ⁿf(y_t|x_{t − 1},g₃(θ₃)), not ∏_{t = 1}ⁿp₁(y_t|x_{t − 1},θ₁), ∏_{t = 1}ⁿp₂(y_t|x_{t − 1},θ₂), ∏_{t = 1}ⁿp₃(y_t|x_{t − 1},θ₃). If f nests all p_i, that is, if Equation (22) holds, then the former and the latter are the same. If not, the matter needs consideration. In GM's application, they give two examples. In the first, the presence or absence of GARCH in the auxiliary model makes a dramatic difference to habit model parameter estimates. In the second, changing the thickness of the tails of the auxiliary model makes no difference. They argue on the basis of common sense and their examples that what is actually required is that the auxiliary model fit the observed data, not that it nests p. That is why they use the term statistical model for f. However, their argument is not a proof. We examine this issue in Section 4.

The realization that what one is actually doing is comparing the three models f(y|x,g₁(θ₁)), f(y|x,g₂(θ₂)), f(y|x,g₃(θ₃)), with respective priors π₁(θ₁), π₂(θ₂), π₃(θ₃), allows us to perform a change of measure to η-space and illustrate relative model comparison graphically. In η-space, the prior π_i for model i restricts η to a manifold ℳ_i = {η∈ℋ:η = g_i(θ_i),θ_i∈Θ_i} with each η = g_i(θ_i) in ℳ_i receiving prior weight π_i(η) = π_i(θ_i) (recall that Θ_i is discrete). Think of this manifold as a line in η-space. This is shown graphically in Figure 4 for two hypothetical models. Although conceptually a line, what is plotted in Figure 4 has an area with gray fill to represent the density of the prior along the line. A density-weighted integral along the line would have approximately the same value as an integral over the area shown. Also shown in Figure 4 are the likelihood contours of a hypothetical auxiliary model f(y|x,η). The marginal density for model i is ∫_{ℳ_i}∏_{t = 1}ⁿf(y_t|x_{t − 1},η)π_i(η)dη, which is approximately the integral over the gray areas for the hypothetical models in Figure 4. As shown in the figure, the likelihood is larger over ℳ₂ than over ℳ₁, which implies the same for the integrals over ℳ₂ and ℳ₁. Thus, the second model has the higher marginal likelihood and is therefore preferred.

Figure 4

Relative model comparison. Shown is relative model comparison under a change of variables of integration θ↦η. The contours show the likelihood of the auxiliary model f(·|η). The curved lines, labeled “Prior of Model i”, show the manifolds ℳ_i = {η∈H:η = g_i(θ_i),θ_i∈Θ_i} for Model 1, i = 1, and Model 2, i = 2. Thickness, represented by gray fill, is proportional to the priors π₁ and π₂. Posterior probabilities are proportional to the integral of the likelihood over the manifold weighted by the prior. Model 2 is preferred because the likelihood values on its manifold are larger than the likelihood values on the manifold of Model 1.

Open in new tab Download slide

2.3 Absolute Model Assessment

We now shift our focus. The model of interest is the auxiliary model f(·|·,η) and its parameter η. The role of the scientific model p(·|·,θ) is to define the implied map g(θ) and the manifold

(26)

The scientific model can be viewed as a sharp prior on f that restricts the posterior distribution of η to lie on the manifold ℳ. Think of it as a line in η-space. If this prior is relaxed, the line becomes a region with volume in η-space. Relaxation can be indexed by a scale parameter κ. As κ increases, the size of the region increases and posterior for η will move along a path toward the likelihood of the data under f. Figure 5 is an illustration. One can select waypoints κ_i along this path, view them as the discrete values of a parameter, assign them equal prior probability, and compute their posterior probability. If waypoints near ℳ receive high posterior probability, then the data support the scientific model. If waypoints far from ℳ receive high posterior probability, then the data do not support the scientific model. The idea is illustrated graphically in Figure 5 for the case when a model is rejected and in Figure 6 for the case when a model is accepted. The formal development proceeds as follows.

Figure 5

Absolute model assessment—reject. The contours show the likelihood of the auxiliary model f(·|η). The shaded areas show the prior (27) for scale parameters κ₁ < κ₂ < κ₃. The smallest region corresponds to κ₁ and the largest to κ₃. The crosses show the mode of the posterior under κ₁, κ₂, κ₃. The posterior probabilities for absolute model assessment are proportional to the integrals of the likelihood over the respective shaded areas. The model is rejected because the likelihood, hence the integral, is larger over the κ₃-prior than over the κ₁-prior.

Open in new tab Download slide

Figure 6

Absolute model assessment—accept. The contours show the likelihood of the auxiliary model f(·|η). The shaded areas show the prior (27) for scale parameters κ₁ < κ₂ < κ₃. The smallest region corresponds to κ₁ and the largest to κ₃. The crosses show the mode of the posterior under κ₁, κ₂, κ₃. The posterior probabilities for absolute model assessment are proportional to the integrals of the likelihood over the respective shaded areas. The model is accepted because the likelihood, hence the integral, is larger over the κ₁-prior than over the κ₃-prior.

Open in new tab Download slide

We add the additional assumption that the auxiliary model is identified and has more parameters than the scientific model. This assumption implies that g^{− 1}(η) exists on ℳ. If the scientific model is identified g^{− 1}(η) will map to a single point; if not, g^{− 1}(η) will be a set.

We impose closeness to ℳ by means of the prior

(27)

where

(28)

π(θ) is the prior (1) for the scientific model, and Σ_η is given by Equation (25). It is easy and cheap to evaluate Equation (28) once the computations described in Subsection 2.1 have been carried out because the implied map g is represented by pairs (θ,η) stored together with π(θ) at the conclusion of Subsection 2.1 computations. Store is traversed to find the pair (θ^o,η^o) such that η^o solves Equation (28). Then π(g^{− 1}(η^o)) = π(θ^o). (If g^{− 1}(η^o) maps to a set, π(g^{− 1}(η^o)) is the sum of π(θ) over that set. Recall that Θ is discrete and normalization is not required.) The pairs (θ,η) and scale Σ_η used to compute Equations (28) and (27) are those for the θ-chain that draws from the prior because they are not tainted by data.

Choose three (for specificity) values κ₁, κ₂, and κ₃, ordered from small to large. Consider f under priors π_κ₁, π_κ₂, and π_κ₃ to be three different models and compute the posterior probability for the three models with each having prior probability of 1/3. That is, the pair (f(·|·,η),π_κ(η)) is considered to be a model and the posterior probability of each κ choice is proportional to the marginal likelihood ∫∏_{t = 1}ⁿf(y_t|x_{t − 1},η)π_κ(η)dη.

If the posterior probability of model κ₁ is small, that is evidence against the scientific model. Conversely, if it is large, that is evidence in favor of the scientific model.

2.4 The Auxiliary Model

The observed data are y_t for t = 1,…,n. We first discuss the case where y_t is a multivariate. Lagged values of y_t are denoted as x_{t − 1}. For auxiliary models f₀ through f₄,x_{t − 1} = y_{t − 1}. For auxiliary model f₅,x_{t − 1} = (y_{t − 1},y_{t − 2}).

The data are modeled as

where

(29)

which is the location function of a VAR, and R_{x_{t − 1}} is the Cholesky factor of

(30)

(31)

(32)

(33)

In our specification, R₀ is an upper triangular matrix, P and V are diagonal matrices, and Q is scalar; max(0,x) is applied elementwise. This is the BEKK form of multivariate GARCH described in Engle and Kroner (1995) with an added leverage term (33). In computations, max(0,x) in Equation (33) is replaced by a twice differentiable cubic spline approximation that plots slightly above max(0,x) over (0,0.1) and coincides elsewhere. Auxiliary model f₀ has term (30) only, f₁ has terms (30), (31), and (32), and f₂ through f₅ have all four terms.

The density h(z) of the iid z_t is the square of a Hermite polynomial times a normal density, the idea being that the class of such h is dense in Hellenger norm and can therefore approximate a density to within arbitrary accuracy in Kullback–Leibler distance (Gallant and Nychka 1987). The density h(z) is the normal when the degree of the Hermite polynomial is zero, which is the case for auxiliary models f₀ through f₂. For model f₃, the degree is 4. For models f₄ and f₅, the degree is 4, but the constant term of the Hermite polynomial is a linear function of y_{t − 1}. This has the effect of adding a nonlinear term to the location function (29) and the variance function (30). It also causes the higher moments of h(z) to depend on y_{t − 1} as well.

The univariate auxiliary models are the same as the above but μ_{x_{t − 1}} in Equation (29) has dimension one and becomes the location function of a first order autoregression and Σ_{x_{t − 1}} in (30) has dimension one and becomes a GARCH(1,1) with a leverage term added.

2.5 Diagnostic Checks

The idea behind diagnostic checking is straightforward: If one has compared two scientific models (p₁,π₁) and (p₂,π₂) using the same auxiliary model f(·|η) and the fit of (p₂,π₂) is preferred, then one can examine the posterior means (or modes) and of f(·|η) corresponding to the two fits to see which elements changed. The same is true for absolute model assessment. If one fits (f,π_κ₁) and (f,π_κ₂) and concludes that (f,π_κ₁) fails to fit the data, then one can examine the changes in the elements of the posterior means (or modes) of f(·|η) corresponding to the two fits to see which elements changed.

The changes in the elements of

and

need to be normalized to facilitate meaningful comparison. Let

and

denote the respective ith elements of

and

. Let

denote the ith posterior standard deviation of the second fit, that is, the preferred fit. The normalization we suggest is

(34)

Table 4 is an example.

There is a caveat. The t_i are often informative but are subject to the same risk as the interpretation of t-statistics in a regression, namely, a failure to fit one characteristic of the data can show up not in the parameters that describe that characteristic but elsewhere due to correlation (colinearity). Nonetheless, despite this risk, inspection of the t_i is often the most informative diagnostic available.

The methods proposed here are likelihood methods which means that at the conclusion of an estimation exercise a transition density that represents the data under the fitted model is available. The most useful are f(y|x,g(θ)) with θ set to the posterior mode from a fit of (p,π) and f(y|x,η) with η set to the posterior mode from a fit of (f,π_κ). One can apply standard diagnostics to these transition densities such as comparative plots as in Figure 7.

Figure 7

Conditional volatility of the three models. The solid line is the conditional volatility of the data expressed as a percent for the long-run risks model with its parameters set to the posterior mode from fitting to the bivariate consumption growth and stock returns data over the period 1930–2008 using auxiliary model f₅. The dashed line is the same for the habit persistence model and the dot-dash line is the same for the prospect theory model. The shaded area shows ±1.96 posterior standard deviations about the conditional volatility of the long-run risks model.

Open in new tab Download slide

2.6 Forecasts

A forecast can be viewed as a functional Y:f(·|·,η)↦υ of the auxiliary model that can be computed from f(·|·,η) either analytically or by simulation. If f(·|·,η) nests the scientific model p(·|·,θ) then, due to the map η = g(θ), this forecast can also be viewed both as a forecast from the scientific model and as function of θ. As such, it can be computed at each draw in the θ-chain for the posterior and the posterior mean, mode, and standard deviation obtained. Similarly for draws from the prior. An example is Figure 3.

3 HABIT, LONG-RUN RISKS, PROSPECT?

Table 3 presents the posterior probabilities for a relative model comparison and an absolute model assessment of the three asset pricing models fitted to the three data series over the two sample periods.

Table 3

Open in new tab

Posterior probabilities

	Relative model comparison
	Sample period
	1930–2008			1950–2008
Series	hab	lrr	pro	hab	lrr	pro
Trivariate	0.00	1.00		1.00	0.00
Bivariate	0.00	1.00	0.00	1.00	0.00	0.00
Univariate	0.28	0.48	0.24	0.44	0.42	0.14

Absolute model assessment
		Sample period
		1930–2008			1950–2008
Series	Prior	hab	lrr	pro	hab	lrr	pro
Trivariate	κ = 0.1	0.00	0.00		0.00	0.00
	κ = 1	0.00	0.00		0.33	0.00
	κ = 10	1.00	1.00		0.67	1.00
Bivariate	κ = 0.1	0.00	0.41	0.28	0.31	0.16	0.08
	κ = 1	0.00	0.36	0.28	0.31	0.21	0.08
	κ = 10	1.00	0.23	0.44	0.38	0.64	0.84
Univariate	κ = 0.1	0.29	0.36	0.10	0.40	0.39	0.29
	κ = 1	0.30	0.26	0.30	0.38	0.35	0.34
	κ = 10	0.41	0.38	0.60	0.22	0.26	0.37

Absolute model assessment
		Sample period
		1930–2008			1950–2008
Series	Prior	hab	lrr	pro	hab	lrr	pro
Trivariate	κ = 0.1	0.00	0.00		0.00	0.00
	κ = 1	0.00	0.00		0.33	0.00
	κ = 10	1.00	1.00		0.67	1.00
Bivariate	κ = 0.1	0.00	0.41	0.28	0.31	0.16	0.08
	κ = 1	0.00	0.36	0.28	0.31	0.21	0.08
	κ = 10	1.00	0.23	0.44	0.38	0.64	0.84
Univariate	κ = 0.1	0.29	0.36	0.10	0.40	0.39	0.29
	κ = 1	0.30	0.26	0.30	0.38	0.35	0.34
	κ = 10	0.41	0.38	0.60	0.22	0.26	0.37

Shown are posterior probabilities for a relative comparison and an absolute assessment of three asset pricing models fitted to three data series over two sample periods. The trivariate series is annual consumption growth, stock returns, and the price dividend ratio over the years shown. The bivariate series is consumption growth and stock returns. The univariate series is stock returns alone. Variables not in a series are treated as latent in model fits. For the univariate and bivariate series. the auxiliary model is f₅, which is described in Table 1; for the trivariate series it is f₀. κ is the standard deviation of a prior that imposes the habit model (hab), the long-run risks model (lrr), and the prospect theory model (pro), respectively, on the auxiliary model. The prior weakens as κ increases.

Table 3

Open in new tab

Posterior probabilities

	Relative model comparison
	Sample period
	1930–2008			1950–2008
Series	hab	lrr	pro	hab	lrr	pro
Trivariate	0.00	1.00		1.00	0.00
Bivariate	0.00	1.00	0.00	1.00	0.00	0.00
Univariate	0.28	0.48	0.24	0.44	0.42	0.14

Absolute model assessment
		Sample period
		1930–2008			1950–2008
Series	Prior	hab	lrr	pro	hab	lrr	pro
Trivariate	κ = 0.1	0.00	0.00		0.00	0.00
	κ = 1	0.00	0.00		0.33	0.00
	κ = 10	1.00	1.00		0.67	1.00
Bivariate	κ = 0.1	0.00	0.41	0.28	0.31	0.16	0.08
	κ = 1	0.00	0.36	0.28	0.31	0.21	0.08
	κ = 10	1.00	0.23	0.44	0.38	0.64	0.84
Univariate	κ = 0.1	0.29	0.36	0.10	0.40	0.39	0.29
	κ = 1	0.30	0.26	0.30	0.38	0.35	0.34
	κ = 10	0.41	0.38	0.60	0.22	0.26	0.37

Absolute model assessment
		Sample period
		1930–2008			1950–2008
Series	Prior	hab	lrr	pro	hab	lrr	pro
Trivariate	κ = 0.1	0.00	0.00		0.00	0.00
	κ = 1	0.00	0.00		0.33	0.00
	κ = 10	1.00	1.00		0.67	1.00
Bivariate	κ = 0.1	0.00	0.41	0.28	0.31	0.16	0.08
	κ = 1	0.00	0.36	0.28	0.31	0.21	0.08
	κ = 10	1.00	0.23	0.44	0.38	0.64	0.84
Univariate	κ = 0.1	0.29	0.36	0.10	0.40	0.39	0.29
	κ = 1	0.30	0.26	0.30	0.38	0.35	0.34
	κ = 10	0.41	0.38	0.60	0.22	0.26	0.37

Shown are posterior probabilities for a relative comparison and an absolute assessment of three asset pricing models fitted to three data series over two sample periods. The trivariate series is annual consumption growth, stock returns, and the price dividend ratio over the years shown. The bivariate series is consumption growth and stock returns. The univariate series is stock returns alone. Variables not in a series are treated as latent in model fits. For the univariate and bivariate series. the auxiliary model is f₅, which is described in Table 1; for the trivariate series it is f₀. κ is the standard deviation of a prior that imposes the habit model (hab), the long-run risks model (lrr), and the prospect theory model (pro), respectively, on the auxiliary model. The prior weakens as κ increases.

For the univariate series, all three models fit reasonably well and none of them strongly dominates. Our interest in the univariate case is due to the fact that this conclusion depends on the choice of auxiliary model. We explore this issue in Section 4.

Our interest in the trivariate series stems from fact that BGT, Beeler and Campbell (2008), and Bansal, Kiku, and Yaron (2009) argue that the main difference between the habit model and the long-run risks model occurs with respect to predictability regressions that involve the price to dividend ratio series.

It is difficult to properly adjust dividend payouts for stock repurchases and other distortions caused by tax policy so as to make the measured data resemble the theoretical construct of asset pricing models. It is apparent from Figure 1 that the price to dividend series does not look like the realization of a stationary process. An informal check on the ability of a model to explain dividends is to simulate it for ten thousand years and see if the best match (with respect to a variance weighted mean squared error metric) of any contiguous segment of the simulation to the data looks like Figure 1. It does not. For all models and for both sample periods, the best match cannot produce the hump seen in the third panel of Figure 1.

As mentioned earlier, because the prospect theory model puts its (conditional) mass on a two-dimensional subspace, it cannot be fit to the trivariate series. However, the informal check just described is possible and the prospect theory model fails more definitively under it than either the habit or long-run risks model.

With respect to the remaining two models, in the relative model comparison presented in Table 3, the long-run risks model is dominant over the 1930–2008 period, whereas the habit model is dominant over the 1950–2008 period. Both models fail to fit the data in the absolute model assessment. Interestingly, the failure of the habit persistence model in the 1950–2008 period is not as stark as for the long-run risks model. These results are for the nesting model f₀.

The fundamental objective of asset pricing models is to explain the relationship between asset prices and consumption. From this perspective, dividends can be treated as an internal construct that need not have any corresponding observable counterpart. Therefore, the bivariate results presented in Table 3 are the most important substantively.

As seen in Table 3, in the relative model comparison, the long-run risks model is dominant over the 1930–2008 period, whereas the habit model is dominant over the 1950–2008 period. In the absolute model assessment, the habit model fails in the 1930–2008 period, and the prospect theory model fails in the 1950–2008. These results are for the nesting model f₅.

It is of interest to determine why the habit persistence model fails to fit the bivariate series using the diagnostic checks described in Subsection 2.5. For this purpose, it is more informative to use the simpler auxiliary model f₁ rather than the nesting model f₅. Because the habit model has seven parameters and f₁ has twelve, this is a legitimate choice for the habit model.

Table 4 presents the diagnostics for the habit model. In the table the fit of (f₁,π_κ) with κ = 0.1 is compared to the fit with κ = 10 for the bivariate series over the period 1930–2008 and over the period 1950–2008. It is clear from the table what the problem is. The largest diagnostic, t = − 4.98, indicates that P₁₁, which is the feedback of consumption growth into its own volatility, is too small in absolute value. The habit model fails to put enough conditional heteroskedasticity into the consumption growth process. The problem disappears in the 1950–2008 period because, as seen by comparing the entries for κ = 10 across the row for P₁₁, the conditional volatility in the data drops substantially.

Table 4

Open in new tab

Diagnostics for the habit persistence model

	1930–2008			1950–2008
	Mode	Mode	Diagnostic	Mode	Mode	Diagnostic
Parameter	κ = 0.1	κ = 10		κ = 0.1	κ = 10
b_0,1	− 0.08	− 0.05	− 1.30	− 0.06	− 0.05	− 0.21
b_0,2	0.07	0.04	0.53	0.06	0.04	0.34
B₁₁	0.08	0.16	− 1.62	0.09	0.15	− 1.21
B₂₁	− 0.16	− 0.09	− 0.94	− 0.15	− 0.22	0.64
B₁₂	0.29	0.32	− 0.80	0.29	0.23	1.58
B₂₂	0.02	0.02	− 0.10	0.02	0.00	0.35
R_0,11	− 0.03	− 0.01	− 0.23	− 0.03	− 0.06	0.41
R_0,12	0.23	0.27	− 0.85	0.23	0.22	0.29
R_0,22	0.21	0.21	− 0.07	0.20	0.26	− 0.74
P₁₁	− 0.06	0.17	− 4.98	− 0.05	− 0.02	− 0.55
P₂₂	− 0.21	− 0.22	0.16	− 0.21	− 0.24	0.93
Q₁₁	0.91	0.91	− 0.04	0.91	0.91	0.13

	1930–2008			1950–2008
	Mode	Mode	Diagnostic	Mode	Mode	Diagnostic
Parameter	κ = 0.1	κ = 10		κ = 0.1	κ = 10
b_0,1	− 0.08	− 0.05	− 1.30	− 0.06	− 0.05	− 0.21
b_0,2	0.07	0.04	0.53	0.06	0.04	0.34
B₁₁	0.08	0.16	− 1.62	0.09	0.15	− 1.21
B₂₁	− 0.16	− 0.09	− 0.94	− 0.15	− 0.22	0.64
B₁₂	0.29	0.32	− 0.80	0.29	0.23	1.58
B₂₂	0.02	0.02	− 0.10	0.02	0.00	0.35
R_0,11	− 0.03	− 0.01	− 0.23	− 0.03	− 0.06	0.41
R_0,12	0.23	0.27	− 0.85	0.23	0.22	0.29
R_0,22	0.21	0.21	− 0.07	0.20	0.26	− 0.74
P₁₁	− 0.06	0.17	− 4.98	− 0.05	− 0.02	− 0.55
P₂₂	− 0.21	− 0.22	0.16	− 0.21	− 0.24	0.93
Q₁₁	0.91	0.91	− 0.04	0.91	0.91	0.13

Shown are the posterior modes from fitting (f₁,π_κ) to the bivariate consumption growth and stock returns data over the periods and κ values shown together with the diagnostic checks described in Subsection 2.5.

Table 4

Open in new tab

Diagnostics for the habit persistence model

	1930–2008			1950–2008
	Mode	Mode	Diagnostic	Mode	Mode	Diagnostic
Parameter	κ = 0.1	κ = 10		κ = 0.1	κ = 10
b_0,1	− 0.08	− 0.05	− 1.30	− 0.06	− 0.05	− 0.21
b_0,2	0.07	0.04	0.53	0.06	0.04	0.34
B₁₁	0.08	0.16	− 1.62	0.09	0.15	− 1.21
B₂₁	− 0.16	− 0.09	− 0.94	− 0.15	− 0.22	0.64
B₁₂	0.29	0.32	− 0.80	0.29	0.23	1.58
B₂₂	0.02	0.02	− 0.10	0.02	0.00	0.35
R_0,11	− 0.03	− 0.01	− 0.23	− 0.03	− 0.06	0.41
R_0,12	0.23	0.27	− 0.85	0.23	0.22	0.29
R_0,22	0.21	0.21	− 0.07	0.20	0.26	− 0.74
P₁₁	− 0.06	0.17	− 4.98	− 0.05	− 0.02	− 0.55
P₂₂	− 0.21	− 0.22	0.16	− 0.21	− 0.24	0.93
Q₁₁	0.91	0.91	− 0.04	0.91	0.91	0.13

	1930–2008			1950–2008
	Mode	Mode	Diagnostic	Mode	Mode	Diagnostic
Parameter	κ = 0.1	κ = 10		κ = 0.1	κ = 10
b_0,1	− 0.08	− 0.05	− 1.30	− 0.06	− 0.05	− 0.21
b_0,2	0.07	0.04	0.53	0.06	0.04	0.34
B₁₁	0.08	0.16	− 1.62	0.09	0.15	− 1.21
B₂₁	− 0.16	− 0.09	− 0.94	− 0.15	− 0.22	0.64
B₁₂	0.29	0.32	− 0.80	0.29	0.23	1.58
B₂₂	0.02	0.02	− 0.10	0.02	0.00	0.35
R_0,11	− 0.03	− 0.01	− 0.23	− 0.03	− 0.06	0.41
R_0,12	0.23	0.27	− 0.85	0.23	0.22	0.29
R_0,22	0.21	0.21	− 0.07	0.20	0.26	− 0.74
P₁₁	− 0.06	0.17	− 4.98	− 0.05	− 0.02	− 0.55
P₂₂	− 0.21	− 0.22	0.16	− 0.21	− 0.24	0.93
Q₁₁	0.91	0.91	− 0.04	0.91	0.91	0.13

Shown are the posterior modes from fitting (f₁,π_κ) to the bivariate consumption growth and stock returns data over the periods and κ values shown together with the diagnostic checks described in Subsection 2.5.

Figure 7 plots the conditional volatility of the three models over the 1930–2008 period. The solid line in Figure 7 is the long-run risks model, which is the most correct of the three according to the relative model comparison in Table 3. The shaded region shows ±1.96 posterior standard deviations about this line. All three models track the conditional volatility of stock returns about the same after taking the standard deviations into account. Where they differ is in how they track the conditional volatility of consumption growth and the conditional correlation between consumption growth and stock returns. The same plot for the 1950–2008 period (not shown) looks qualitatively similar: agreement in the middle panel and disagreement in the other two.

Plots of the conditional mean of consumption growth (not shown) indicate that the habit model and long-run risks model agree over the 1930–2008 period except over the years 1930–1940. They both track the data moderately well. Because, as was seen in Figure 3, the conditional mean of consumption growth for the prospect theory model must plot as a straight line, it does not track well. All models disagree in plots (not shown) of the conditional mean of stock returns, with long-run risks plotting as a straight line; all track the data poorly.

4 SENSITIVITY ANALYSIS

There is much experience with the data shown in Figure 1. That experience suggests that about the richest model one would be willing to fit to these data is a model with one-lag VAR location, GARCH scale, and normal innovations. The exact specification one gets using standard model selection procedures, such as upward F-testing, is sensitive to the sample period used. One can get slightly richer or coarser specifications. It is fair to say that the consensus view is that a one-lag VAR location, GARCH scale, and normal innovations is the richest model one ought to entertain, which is model f₁ of Table 1.

We do not discuss the trivariate series in this section other than to remark that for it, f₀ is the nesting model, results for f₀ are reported in Table 3, conclusions do not change under auxiliary models f₁ through f₃, and computations for models f₄ through f₅ cannot be undertaken because simulations do not identify them (BFGS becomes unstable). Similarly, we do not discuss results for the bivariate consumption growth and stock returns data because the results shown in Table 3 are the same for all models in Table 1. Finally, we do not discuss absolute model assessment because there is no logical requirement that the auxiliary model nest the scientific model for the purpose of absolute model assessment. All that is logically required is that the auxiliary model have more parameters than the scientific model. Other than that, one is free to choose an auxiliary model as judgment suggests.

For the univariate (and bivariate) series, a model that will nest the three scientific models that we consider has the following characteristics: a two-lag linear conditional mean function with a one-lag nonlinear conditional mean term added to it, a one-lag GARCH conditional variance function with a one-lag leverage term and a one-lag nonlinear conditional variance term added, and a flexible innovation distribution that permits fat tails and bumps. We denote this model by f₅. It is the last of the six in Table 1. GM found the same to be true for the habit model, except that they used data from 1933–2001 with the years 1930–1932 used to prime recursions. They dismissed f₅ out of hand as absurd and worked with f₀ and f₁. They did verify that a fat-tailed innovation distribution did not change results. Using model f₀ most closely corresponds to calibration as customarily implemented in the macro/finance literature. The sufficient statistics for f₀ are the mean and variance of y_t and the first-order autocorrelations. One is, effectively, finding parameter values that best match three moments for univariate data, nine for bivariate, and eighteen for trivariate.

As discussed in Subsection 2.1 and in GM, the logically correct view toward using f₁, which fits the data, instead of f₅, which nests the scientific model, is that it is not the likelihood of the scientific model that is being used. It is some other likelihood. Therefore, it is not the scientific models that are actually being estimated and compared. Another point of view is the argument advanced by GM that using a sensible auxiliary model is akin to method of moments estimation. One only asks that the scientific models match certain features of the data and allows them to ignore others. What to do? About all one can do is try a battery of auxiliary model specifications and see what happens.

Table 5 displays the results for the relative comparisons for the univariate stock returns data over the periods 1930–2008 and 1950–2008. There is considerable sensitivity to specification of the auxiliary model over the 1930–2008 period. Conclusions are affected by the choice of auxiliary model. Our view is that, because there can be sensitivity to auxiliary model choice, and because one is not actually comparing scientific models if the auxiliary model is not nesting, it is best to use the nesting auxiliary model in general, which is f₅ in this instance.

Table 5

Open in new tab

Posterior probability, relative comparison, stock returns

Model	f₀	f₁	f₂	f₃	f₄	f₅
1930–2008
Habit	0.47	0.71	0.28	0.36	0.28	0.28
LR risks	0.49	0.25	0.57	0.34	0.45	0.48
Prospect	0.04	0.04	0.15	0.30	0.27	0.24
1950–2008
Habit	0.51	0.49	0.44	0.42	0.46	0.44
LR risks	0.47	0.42	0.51	0.49	0.45	0.42
Prospect	0.02	0.10	0.05	0.09	0.09	0.14

Model	f₀	f₁	f₂	f₃	f₄	f₅
1930–2008
Habit	0.47	0.71	0.28	0.36	0.28	0.28
LR risks	0.49	0.25	0.57	0.34	0.45	0.48
Prospect	0.04	0.04	0.15	0.30	0.27	0.24
1950–2008
Habit	0.51	0.49	0.44	0.42	0.46	0.44
LR risks	0.47	0.42	0.51	0.49	0.45	0.42
Prospect	0.02	0.10	0.05	0.09	0.09	0.14

The data are annual stock returns over years shown. Auxiliary models f₀ through f₅ are described in Table 1.

Table 5

Open in new tab

Posterior probability, relative comparison, stock returns

Model	f₀	f₁	f₂	f₃	f₄	f₅
1930–2008
Habit	0.47	0.71	0.28	0.36	0.28	0.28
LR risks	0.49	0.25	0.57	0.34	0.45	0.48
Prospect	0.04	0.04	0.15	0.30	0.27	0.24
1950–2008
Habit	0.51	0.49	0.44	0.42	0.46	0.44
LR risks	0.47	0.42	0.51	0.49	0.45	0.42
Prospect	0.02	0.10	0.05	0.09	0.09	0.14

Model	f₀	f₁	f₂	f₃	f₄	f₅
1930–2008
Habit	0.47	0.71	0.28	0.36	0.28	0.28
LR risks	0.49	0.25	0.57	0.34	0.45	0.48
Prospect	0.04	0.04	0.15	0.30	0.27	0.24
1950–2008
Habit	0.51	0.49	0.44	0.42	0.46	0.44
LR risks	0.47	0.42	0.51	0.49	0.45	0.42
Prospect	0.02	0.10	0.05	0.09	0.09	0.14

The data are annual stock returns over years shown. Auxiliary models f₀ through f₅ are described in Table 1.

5 CONCLUSION

We used Bayesian statistical methods proposed by GM to compare the habit persistence asset pricing model of CC, the long-run risks model of BY, and the prospect theory model of BHS. This comparison fills a void in the literature.

We undertook two types of comparisons, relative and absolute, over two sample periods, 1930–2008 and 1950–2008, using three series, trivariate (consumption growth, stock returns, and the price to dividend ratio), bivariate (consumption growth and stock returns), and univariate (stock returns). The prior for each model is that the ergodic mean of the real interest rate be 0.896 within ±1 with probability 0.95 together with a preference for model parameters that are near their published values.

For the univariate series and for both sample periods, the models perform about the same in the relative comparison and fit the data reasonably well in the absolute assessment.

For the bivariate series, in the relative comparison, the long-run risks model dominates over the 1930–2008 period, while the habit persistence model dominates over the 1950–2008 period; in the absolute assessment, the habit model fails in the 1930–2008 period, and the prospect theory model fails in the 1950–2008 period.

For the trivariate series, in the relative comparison the long-run risks model dominates over the 1930–2008 period, while the habit persistence model dominates over the 1950–2008 period; in the absolute assessment, both the habit model and the long-run risks model fail in both periods. The prospect theory model cannot be fitted to a trivariate series because it puts its (conditional) mass on a two-dimensional subspace.

The estimator proposed by GM is a simulation-based estimator. Simulations from a scientific model, which here is either the habit model, the long-run risks model, or the prospect theory model, are used to determine a map η = g(θ) from the parameters θ of the scientific model to the parameters η of an auxiliary model f(y_t|x_{t − 1},η), where y_t is the observed data and x_{t − 1} are lags. Thereafter, ℒ(θ) = ∏_{t = 1}ⁿf(y_t|x_{t − 1},g(θ)) is used whenever a likelihood is required. Theory requires that the auxiliary model nest the scientific model. GM argue that one is better served by an auxiliary model that represents the data well. We undertook a sensitivity analysis and recomputed our results for six auxiliary models ordered by complexity. The first produces estimates that mimic values obtained by methods customarily employed in macro/finance. The second represents the data. The sixth nests the three scientific models considered. We find that results can be sensitive to the choice of auxiliary models. Most importantly, results can differ between the model that represents the data well and the model that nests the scientific model. In view of this difference and the fact that theory supports the latter, our view is that the nesting auxiliary model ought to be used. Our substantive conclusions are based on the nesting model.

We found that the computational methods that GM proposed are not sufficiently accurate to compare the habit, long-run risks, and prospect theory models. A contribution of this paper is a refinement of GM's methods that allows comparision of these three models.

References

Bansal

R

Yaron

A

,

Risks For the Long Run: A Potential Resolution of Asset Pricing Puzzles

,

Journal of Finance

,

2004

, vol.

59

(pg.

1481

-

1509

)

Google Scholar

Crossref

WorldCat

Bansal

R

Gallant

AR

Tauchen

G

,

Rational Pessimism, Rational Exuberance, and Asset Pricing Models

,

Review of Economic Studies

,

2007

, vol.

74

(pg.

1005

-

1033

)

Google Scholar

Crossref

WorldCat

Bansal

R

Kiku

D

Yaron

A

,

An Empirical Evaluation of the Long-Run Risks Model for Asset Prices

,

Working Paper 15504

,

2009

National Bureau of Economic Research

http://www.nber.org

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Barberis

N

Huang

M

Santos

T

,

Prospect Theory and Asset Prices

,

Quarterly Journal of Economics

,

2001

, vol.

116

(pg.

1

-

54

)

Google Scholar

Crossref

WorldCat

Beeler

J

Campbell

JY

,

The Long-Run Risks Model and Aggregate Asset Prices: An Empirical Assessment

,

Working Paper 14788

,

2008

National Bureau of Economic Research

http://www.nber.org

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Campbell

JY

Constantinides

GM

Harris

M

Stulz

RM

,

Consumption-based Asset Pricing

,

Handbook of the Economics of Finance

,

2003

, vol.

Vol. 1

Amsterdam

Elsevier

(pg.

803

-

887

)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Campbell

JY

Cochrane

J

,

By Force of Habit: A Consumption-based Explanation of Aggregate Stock Market Behavior

,

Journal of Political Economy

,

1999

, vol.

107

(pg.

205

-

251

)

Google Scholar

Crossref

WorldCat

Engle

RF

Kroner

KF

,

Multivariate Simultaneous Generalized ARCH

,

Econometric Theory

,

1995

, vol.

11

(pg.

122

-

150

)

Google Scholar

Crossref

WorldCat

Fletcher

R

,

Practical Methods of Optimization

,

1987

2nd ed.

New York, NY

Wiley

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Gallant

AR

McCulloch

RE

,

On the Determination of General Statistical Models with Application to Asset Pricing

,

Journal of the American Statistical Association

,

2009

, vol.

104

(pg.

117

-

131

)

Google Scholar

Crossref

WorldCat

Gallant

AR

Nychka

DW

,

Semi-Nonparametric Maximum Likelihood Estimation

,

Econometrica

,

1987

, vol.

55

(pg.

363

-

390

)

Google Scholar

Crossref

WorldCat

Gamerman

D

Lopes

HF

,

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

,

2006

2nd ed.

Boca Raton, FL

Chapman and Hall

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Miranda

Maria J

Fackler

Paul L

,

Applied Computational Economics and Finance. MA

,

2002

Cambridge

MIT Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Mishkin

Frederick S

,

The Real Rate of Interest: An Empirical Investigation

,

Carnegie-Rochester Conference Series on Public Policy, The Cost and Consequences of Inflation

,

1981

, vol.

Vol. 15

Amsterdam

Elsevier

(pg.

151

-

200

)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Schwarz

G

,

Estimating the Dimension of a Model

,

Annals of Statistics

,

1978

, vol.

6

(pg.

461

-

464

)

Google Scholar

Crossref

WorldCat

Working

Holbrook

,

Note on the Correlation of First Differences of Averages in a Random Chain

,

Econometrica

,

1960

, vol.

28

(pg.

916

-

918

)

Google Scholar

Crossref

WorldCat

Author notes

This work was supported by National Science Foundation Grant Number SES 0438174.

Download all slides

Month:	Total Views:
January 2017	2
February 2017	5
March 2017	2
April 2017	2
August 2017	3
September 2017	1
October 2017	4
November 2017	4
December 2017	4
February 2018	2
March 2018	2
April 2018	2
June 2018	3
July 2018	2
August 2018	4
October 2018	3
November 2018	2
December 2018	3
March 2019	2
May 2019	2
July 2019	1
August 2019	3
September 2019	5
October 2019	11
December 2019	1
March 2020	1
May 2020	2
August 2020	3
September 2020	7
December 2020	4
March 2021	4
November 2021	2
January 2022	2
March 2022	3
April 2022	2
May 2022	2
December 2022	3
February 2023	1
March 2023	1
September 2023	1
December 2023	3
January 2024	2
February 2024	3
April 2024	1
May 2024	2
June 2024	1
July 2024	2
September 2024	2
November 2024	2
December 2024	3
February 2025	4
March 2025	4

Article Contents

Habit, Long-Run Risks, Prospect? A Statistical Inquiry

Abstract

1 MODELS CONSIDERED

1.1 The Habit Persistence Model

1.2 The Long-run Risks Model

1.3 The Prospect Theory Model

2 INFERENCE FOR GENERAL SCIENTIFIC MODELS

2.1 Estimation of Scientific Model Parameters

2.2 Relative Model Comparison

2.3 Absolute Model Assessment

2.4 The Auxiliary Model

2.5 Diagnostic Checks

2.6 Forecasts

3 HABIT, LONG-RUN RISKS, PROSPECT?

4 SENSITIVITY ANALYSIS

5 CONCLUSION

References

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Habit, Long-Run Risks, Prospect? A Statistical Inquiry

Abstract

1 MODELS CONSIDERED

1.1 The Habit Persistence Model

1.2 The Long-run Risks Model

1.3 The Prospect Theory Model

2 INFERENCE FOR GENERAL SCIENTIFIC MODELS

2.1 Estimation of Scientific Model Parameters

2.2 Relative Model Comparison

2.3 Absolute Model Assessment

2.4 The Auxiliary Model

2.5 Diagnostic Checks

2.6 Forecasts

3 HABIT, LONG-RUN RISKS, PROSPECT?

4 SENSITIVITY ANALYSIS

5 CONCLUSION

References

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only