Abstract

We use recently proposed Bayesian statistical methods to compare the habit persistence asset pricing model of Campbell and Cochrane, the long-run risks model of Bansal and Yaron, and the prospect theory model of Barberis, Huang, and Santos. We improve these Bayesian methods so that they can accommodate highly nonlinear models such as the three aforementioned. Our substantive results can be stated succinctly: If one believes that the extreme consumption fluctuations of 1930–1949 can recur, although they have not in the last sixty years even counting the current recession, then the long-run risks model is preferred. Otherwise, the habit model is preferred.

The goal of this article is to fill a void in the literature. There are, to our knowledge, no head-to-head, statistical (i.e., likelihood based or asymptotically equivalent) comparisons of asset pricing models from macro/finance. This paper fills the void. The asset pricing models considered are the habit persistence model of Campbell and Cochrane (1999), CC hereafter, the long-run risks model of Bansal and Yaron (2004), BY hereafter, and the prospect theory model of Barberis, Huang, and Santos (2001), BHS hereafter. There are two reasons for this choice: These three models are arguably the leading contenders and the authors describe their computational methods precisely enough to permit replication of their results.

The need for a statistical comparison of asset pricing models is underscored by the ongoing debate between advocates of the long-run risks and habit models. Beeler and Campbell (2008) claim that the long-run risks model is rejected by historical data on the basis of the predictability of excess returns, consumption growth, dividend growth, and their respective volatilities by the price to dividend ratio. Bansal, Kiku, and Yaron (2009) argue that the long-run risks model provides adequate predictability results when using a vector autoregression (VAR) based on consumption growth, price to dividend ratio, and the real risk-free rate. Bansal, Kiku and Yaron also argue that the habit model provides counterfactual predictability results for the price to dividend ratio when using lagged consumption growth as a regressor. Because the argument is based on the selective use of statistics by the advocates, it can be continued indefinitely without resolution. In contrast, the likelihood of model output contains all the information in these models. Therefore, likelihood-based inference, in principle, resolves the debate definitively.

We know of only one other study that attempts a head-to-head statistical comparison of asset pricing models: Bansal, Gallant, and Tauchen (2007), BGT hereafter. BGT compared the habit model to the long-run risks model using frequentist methods. Their methods could not distinguish between the two models because frequentist nonnested model comparison methods require abundant data. Abundant data are not available in macro/finance. The typical sampling frequency used to calibrate and assess macro/finance models is annual and there are only about 80 annual observations available on the U.S. economy. The papers cited above use annual data. BHS insist that annual is the only frequency that is appropriate to their model. Using more abundant higher frequency data to compare models is not an option. They were not designed to explain high-frequency data.

Failing to achieve a definitive statistical result, BGT proceeded to compare the models using the more traditional methods of macro/finance, which consist of enumerating some moments, evaluating them both from the data and from a model simulation, and comparing, often without taking sampling variability into account. On the basis of such comparisons, BGT conclude that the long-run risks model is preferred. Of these comparisons, they relied mostly on the fact that the habit model provides counterfactual predictability results for the price to dividend ratio when using lagged consumption growth as a regressor.

In addition to the fact that the BGT comparison, in the end, was not statistical, there are other concerns. BGT did not actually compare the models proposed by CC and BY. They modified them to impose cointegration on macrovariables that ought not diverge. They also used a general purpose method to solve them; specifically, a Bubnov–Galerkin method (Miranda and Fackler, 2002, 152–3). Our view is that fairness dictates that one should use the model that was actually proposed by the originator in comparisons, not a modified model, and that one should use the same solution method that was proposed. To state our view succinctly, the model is the simulation algorithm proposed by the originator; it is not the mathematical equations that suggested the algorithm.

The data we use are annual, per capita, real, U.S. stock returns, consumption growth, and the price to dividend ratio from 1925–2008. The comparisons are for the periods 1930–2008 and 1950–2008. The data from 1925–1929 are only used to prime recursions because they are of lower quality than the data from 1930 onward. The data are plotted in Figure 1. Note in the figure that consumption growth is far more volatile in the 1930–1949 period than in the period 1950–2008. It turns out that the difference in consumption growth volatility dramatically influences results.

Consumption growth, stock returns, and price to dividend ratio, 1925–2008. The left vertical line is at 1930 and the right at 1950. The data collection protocol is as described in Bansal, Gallant, and Tauchen (2007) for the period 1930–2008. The data from 1925–1929, which are only used to prime recursions, are the inflation adjusted Dow–Jones industrial average, a real U.S. consumption growth series kindly supplied by Robert Barro, and from backcasted real price and dividend levels.
Figure 1

Consumption growth, stock returns, and price to dividend ratio, 1925–2008. The left vertical line is at 1930 and the right at 1950. The data collection protocol is as described in Bansal, Gallant, and Tauchen (2007) for the period 1930–2008. The data from 1925–1929, which are only used to prime recursions, are the inflation adjusted Dow–Jones industrial average, a real U.S. consumption growth series kindly supplied by Robert Barro, and from backcasted real price and dividend levels.

Gallant and McCulloch (2009), GM hereafter, introduced a Bayesian method for fitting a model from a scientific discipline (scientific model for short) for which a likelihood is not readily available to sparse data. They synthesize a likelihood by means of an “auxiliary model” and simulation from the “scientific model.” GM used the term “statistical model” for auxiliary model. We do not because the consonance between scientific model and statistical model causes confusion and also because we call into question the premise on which GM chose the term statistical model.

In the GM framework, the auxiliary model must nest the scientific model for the methodology to be logically correct. GM argue that it is better to use an auxiliary model that represents the data well, even if it does not nest the scientific model. We conduct a sensitivity analysis employing six auxiliary models of differing complexities to determine if the choice of auxiliary model makes a difference to our results. The six models are shown in Table 1. Model f1 is closest to that used by GM; f5 is the nesting model for univariate (stock returns alone) and bivariate (consumption growth and stock returns) data.

Table 1

Auxiliary models

f0f1f2f3f4f5
Mean1 lag1 lag1 lag1 lag1 lag2 lags
VarianceConstantGARCHGARCHGARCHGARCHGARCH
leverageleverageleverageleverage
ErrorsNormalNormalNormalfFlexibleFlexibleFlexible
nonlinearnonlinear
Parms univar356101112
Parms bivar91214222428
Parms trivar18222537
f0f1f2f3f4f5
Mean1 lag1 lag1 lag1 lag1 lag2 lags
VarianceConstantGARCHGARCHGARCHGARCHGARCH
leverageleverageleverageleverage
ErrorsNormalNormalNormalfFlexibleFlexibleFlexible
nonlinearnonlinear
Parms univar356101112
Parms bivar91214222428
Parms trivar18222537

Multivariate GARCH variance matrices are of the BEKK form (Engle and Kroner 1995) with one lag throughout. A nonlinear error density adds nonlinear terms that depend on one lag to the conditional mean and variance. When evaluated, data are centered and scaled and lags are attenuated by a spline transform. See Gallant and Tauchen (2009) for details. The functional form is displayed in Subsection 2.4. Parms is the number of parameters, which depends on the dimension of the data: univariate, bivariate, or trivariate. The habit persistence model has seven parameters, the long-run risks model has thirteen, and the prospect theory model has eleven.

Table 1

Auxiliary models

f0f1f2f3f4f5
Mean1 lag1 lag1 lag1 lag1 lag2 lags
VarianceConstantGARCHGARCHGARCHGARCHGARCH
leverageleverageleverageleverage
ErrorsNormalNormalNormalfFlexibleFlexibleFlexible
nonlinearnonlinear
Parms univar356101112
Parms bivar91214222428
Parms trivar18222537
f0f1f2f3f4f5
Mean1 lag1 lag1 lag1 lag1 lag2 lags
VarianceConstantGARCHGARCHGARCHGARCHGARCH
leverageleverageleverageleverage
ErrorsNormalNormalNormalfFlexibleFlexibleFlexible
nonlinearnonlinear
Parms univar356101112
Parms bivar91214222428
Parms trivar18222537

Multivariate GARCH variance matrices are of the BEKK form (Engle and Kroner 1995) with one lag throughout. A nonlinear error density adds nonlinear terms that depend on one lag to the conditional mean and variance. When evaluated, data are centered and scaled and lags are attenuated by a spline transform. See Gallant and Tauchen (2009) for details. The functional form is displayed in Subsection 2.4. Parms is the number of parameters, which depends on the dimension of the data: univariate, bivariate, or trivariate. The habit persistence model has seven parameters, the long-run risks model has thirteen, and the prospect theory model has eleven.

We use the protocol set forth in GM to establish nesting, which, briefly, is as follows. The models shown in Table 1 are the beginning of a sequence (whose progression is described in Subsection 2.4) that, if continued, is dense for the space in which the scientific model must lie. Therefore, what needs to be done to find a nesting model is to select the correct truncation point. Here we use the Schwarz (1978) criterion to select it followed by some diagnostic checks that GM discuss.

Increasing the dimension of a multivariate time series often simplifies the conditioning structure. That happens here: f0 is the nesting model for trivariate (stock returns, consumption growth, and price to dividend ratio) data. We check our f0 results using models f1 through f3. Computations to fit a nonlinear model to data that does not identify it well are often unstable, as they are for models f4 and f5 when fitted to trivariate simulations. Therefore, we do not consider f4 and f5 for trivariate data. The prospect theory model puts its mass on a two-dimensional subspace. This violates the GM regularity conditions. Therefore, we do not consider the prospect theory model for trivariate data.

We find that the computational methods that GM proposed are not sufficiently accurate to compare the habit, long-run risks, and prospect theory models. This is due to the fact that the auxiliary model f5 that nests these three is more complex than the auxiliary models that GM considered. A contribution of this paper is a refinement of GM's methods that increases accuracy to the point that auxiliary models as complex as f5 can be used in applications.

1 MODELS CONSIDERED

The intuitive notions behind any consumption-based asset pricing model are that agents receive wage, interest, and dividend income from which they purchase consumption. Agents seek to reallocate their consumption over time by trading shares of stock that pay a random dividend and bonds that pay interest with certainty. This reallocation is done for the purpose of insuring against spells of unemployment, providing for retirement, and so on. Trading activity enters the model via the constraint that an agent's purchase of consumption, bonds, and stock cannot exceed wage, interest, and dividend income in any period. When applying the model to a national economy, consumption and dividends can be used as the driving processes instead of wages and dividends because wages can be recovered by subtracting dividends from consumption. (Someone must own the stock so the dividends must be received, while for bonds someone pays interest and another receives so there are no net interest receipts.) Agents are endowed with a utility function that depends on the entire consumption process. The first-order conditions of their utility maximization problem determine a map from present and past values of the driving processes to the present price of a stock and a bond. These models are simulated by first simulating the driving processes and then evaluating the map that determines stock and bond prices. For each model, we shall describe the driving processes and the utility function, leaving a description of the algorithm for computing the map to the cited literature.

Our prior, which is the same for all models, is
(1)
where rfa = limn(1/n)∑t = 1nrfta is the ergodic mean of the risk-free rate rfta and the θi* are the parameter values published by the proposer. Campbell (2003) notes that any reasonable asset pricing model must incorporate the indirect evidence that the risk-free rate is very low with low volatility. Campbell's evidence suggests that the mean risk-free rate for the United States is 0.896 percent per annum. BGT argue that imposing the risk-free rate a priori will produce better estimates than using an ex ante risk-free rate series because any empirical ex ante risk-free rate series is mostly noise due to the difficulty of determining ex ante inflation (Mishkin 1981). The proposers used their judgment to determine the θi*. They took data, some of which differed from ours, partially into account in forming their judgment but none could have used data more recent than 2003. Thus our prior is partially, if not completely, independent of our data. This prior appears to strike the right balance. It is tight enough to achieve Markov Chain Monte Carlo (MCMC) chains that mix well despite the use of sparse data. It is loose enough to allow the data to be influential. Making it looser still causes numerical problems. For the purposes of this paper, it is pointless to consider a tighter prior.

Throughout this section, lower case denotes the logarithm of an upper case quantity, for example, ct = log(Ct), where Ct is consumption during time period t, and dt = log(Dt), where Dt is dividends paid during period t. The exceptions are the geometric risk-free interest rate rft = − logPf,t − 1 and the geometric stock return inclusive of dividends rdt = log(Pdt + Dt) − logPd,t − 1, where Pf,t − 1 is the price of a discount bond at the beginning of time period t that pays $1 with certainty at the end of period t,Pd,t − 1 is the price of a stock at the beginning of time period t, and Pdt its price at the end.

1.1 The Habit Persistence Model

The driving processes for the habit persistence model are
(2)
The utility function is
(3)
where habit persistence is implemented by two equations:
(4)
(5)
γ is a measure of curvature, which scales attitudes toward risk, and δ is the agent's discount factor. t is conditional expectation with respect to St, which is the state variable; st = log(St). The quantities graphic and smax can be computed from model parameters as graphic and graphic. The variable Xt = Ct(1 − St) is called external habit. By substituting StCt = CtXt in Equation (3), one can see that utility is extremely low when consumption is close to Xt for γ > 1. Habit Xt is determined by past consumption as is seen by noting that vt − 1 = log(Ct − 1/Ct − 2) − g in Equation (4). Given the habit model's parameters
(6)
{Ct,rdt,rft}t = 112N are simulated at the monthly frequency and aggregated to the annual frequency
(7)
(8)
(9)
(10)
where N is the annual simulation size.

The prior is Equation (1). For the habit model, the scale factor used for φ and δ is 0.001 rather than the 0.1 shown in Equation (1) to overcome an identification problem. The MCMC chain will not mix when the scale factor for φ and δ in Equation (1) is 0.1 because a move in φ can be nearly exactly offset by a move in δ. The value 0.001 is the largest value for which MCMC chains will mix. Because of the first term of the right-hand side of Equation (1), (1) is not an independence prior; in simulations from the prior no correlation between parameters is zero.

Measures of location and scale for the prior and posterior distributions of habit model parameters are shown in the top panel of Table 2. The prior and posterior densities of the risk-free rate, equity premium, equity returns, and the standard deviation of equity returns are plotted in the left column of Figure 2. Overall, Table 2 and Figure 2 suggest that the prior is sufficiently informative to fill in where data are sparse but it allows the data to move the posterior where data are informative.

Table 2

Prior and posterior model parameters

Prior
Posterior
ParameterModeMeanSDModeMeanSD
Habit model
g0.001575470.001565190.000081280.001668930.001591470.00007473
σ0.004409790.004311690.000221130.005027770.005010540.00018533
ρ0.200683590.200533480.010724910.194458010.198928730.00931413
σw0.032287600.032479380.001690520.031936650.031759600.00138630
φ0.988265990.988304990.000424750.987697600.987737610.00033629
δ0.990463260.990417000.000436050.990337370.990335650.00044495
γ2.042968752.040761560.089247511.975585941.963363360.07720679
rf0.977964001.075872000.132730521.025304000.962196000.12647089
rdrf6.049692005.983596000.077006986.268548006.239088000.07426341
σrd19.6724680719.692282750.1407884920.1706222020.141211480.14442220
Long-run risks model
δ0.999610900.999340960.000311720.999649050.999430580.00029362
γ9.8906250010.073486250.485835459.9218750010.000107500.50121255
ψ1.496093751.496143440.078597471.539062501.503213120.07244585
μc0.001483920.001481420.000070310.001518250.001491220.00007685
ρ0.984130860.984080210.004682410.982849120.984352100.00320064
φe0.032043460.032020310.001601500.032043460.032028440.00162241
σ20.000040410.000041240.000001960.000041600.000040610.00000196
ν0.987304690.987387660.004411050.981994630.982235630.00299350
σw0.000001680.000001700.000000090.000001690.000001700.00000008
μd0.001209260.001191400.000061140.001213070.001201860.00006030
φd2.789062502.807491250.146201802.882812502.828205000.15095447
πd4.070312504.116551250.205864704.171875004.156656250.19923412
φu6.140625006.275963750.319968966.453125006.199785000.30424633
rf0.943980001.161336000.121777030.908748001.118964000.11709356
rdrf4.307376004.987380000.488445264.112232004.592136000.28433000
σrd18.2800218818.856775970.1758608019.0783961618.589351790.13239826
Prospect theory model
gC0.018280030.017927750.000934130.018463130.017951060.00095215
gD0.018707280.018338210.000952760.018493650.018450270.00097794
σC0.039184570.037640400.002006900.032958980.033569050.00201110
σD0.122314450.120230100.006110830.119628910.117383810.00597238
ω0.147949220.150181640.006940940.148925780.150152830.00801015
γ0.986328120.985114220.051456080.964843750.976030820.04958596
ρ0.999725340.997838990.001636040.999694820.997834300.00202090
λ2.179687502.247097500.114868102.234375002.185219530.11761822
k9.828125009.863756250.531899149.906250009.842529840.53634137
b02.001953122.003287030.109671111.893554691.936994770.12735310
η0.916015620.898459690.044126950.853759770.859656420.02405305
rf1.755792001.912836000.056676171.761360001.914984000.06495191
rdrf5.923536005.492496000.192358104.883268004.783608000.12334973
σrd27.9774838026.758811630.9242429422.9017728622.792367140.29273615
Prior
Posterior
ParameterModeMeanSDModeMeanSD
Habit model
g0.001575470.001565190.000081280.001668930.001591470.00007473
σ0.004409790.004311690.000221130.005027770.005010540.00018533
ρ0.200683590.200533480.010724910.194458010.198928730.00931413
σw0.032287600.032479380.001690520.031936650.031759600.00138630
φ0.988265990.988304990.000424750.987697600.987737610.00033629
δ0.990463260.990417000.000436050.990337370.990335650.00044495
γ2.042968752.040761560.089247511.975585941.963363360.07720679
rf0.977964001.075872000.132730521.025304000.962196000.12647089
rdrf6.049692005.983596000.077006986.268548006.239088000.07426341
σrd19.6724680719.692282750.1407884920.1706222020.141211480.14442220
Long-run risks model
δ0.999610900.999340960.000311720.999649050.999430580.00029362
γ9.8906250010.073486250.485835459.9218750010.000107500.50121255
ψ1.496093751.496143440.078597471.539062501.503213120.07244585
μc0.001483920.001481420.000070310.001518250.001491220.00007685
ρ0.984130860.984080210.004682410.982849120.984352100.00320064
φe0.032043460.032020310.001601500.032043460.032028440.00162241
σ20.000040410.000041240.000001960.000041600.000040610.00000196
ν0.987304690.987387660.004411050.981994630.982235630.00299350
σw0.000001680.000001700.000000090.000001690.000001700.00000008
μd0.001209260.001191400.000061140.001213070.001201860.00006030
φd2.789062502.807491250.146201802.882812502.828205000.15095447
πd4.070312504.116551250.205864704.171875004.156656250.19923412
φu6.140625006.275963750.319968966.453125006.199785000.30424633
rf0.943980001.161336000.121777030.908748001.118964000.11709356
rdrf4.307376004.987380000.488445264.112232004.592136000.28433000
σrd18.2800218818.856775970.1758608019.0783961618.589351790.13239826
Prospect theory model
gC0.018280030.017927750.000934130.018463130.017951060.00095215
gD0.018707280.018338210.000952760.018493650.018450270.00097794
σC0.039184570.037640400.002006900.032958980.033569050.00201110
σD0.122314450.120230100.006110830.119628910.117383810.00597238
ω0.147949220.150181640.006940940.148925780.150152830.00801015
γ0.986328120.985114220.051456080.964843750.976030820.04958596
ρ0.999725340.997838990.001636040.999694820.997834300.00202090
λ2.179687502.247097500.114868102.234375002.185219530.11761822
k9.828125009.863756250.531899149.906250009.842529840.53634137
b02.001953122.003287030.109671111.893554691.936994770.12735310
η0.916015620.898459690.044126950.853759770.859656420.02405305
rf1.755792001.912836000.056676171.761360001.914984000.06495191
rdrf5.923536005.492496000.192358104.883268004.783608000.12334973
σrd27.9774838026.758811630.9242429422.9017728622.792367140.29273615

Parameter values are for the monthly frequency for the habit and long-run risks models and for the annual frequency for the prospect theory model. Mode is that of the multivariate density. Returns are geometric, annualized, and expressed as a percent for all models. In the data, returns are rdrf = 5.59 − 0.89 = 4.7 and σrd = 19.72. The auxiliary model is f5 as described in Table 1. The data are annual consumption growth and stock returns for the years 1930–2008, see Figure 1.

Table 2

Prior and posterior model parameters

Prior
Posterior
ParameterModeMeanSDModeMeanSD
Habit model
g0.001575470.001565190.000081280.001668930.001591470.00007473
σ0.004409790.004311690.000221130.005027770.005010540.00018533
ρ0.200683590.200533480.010724910.194458010.198928730.00931413
σw0.032287600.032479380.001690520.031936650.031759600.00138630
φ0.988265990.988304990.000424750.987697600.987737610.00033629
δ0.990463260.990417000.000436050.990337370.990335650.00044495
γ2.042968752.040761560.089247511.975585941.963363360.07720679
rf0.977964001.075872000.132730521.025304000.962196000.12647089
rdrf6.049692005.983596000.077006986.268548006.239088000.07426341
σrd19.6724680719.692282750.1407884920.1706222020.141211480.14442220
Long-run risks model
δ0.999610900.999340960.000311720.999649050.999430580.00029362
γ9.8906250010.073486250.485835459.9218750010.000107500.50121255
ψ1.496093751.496143440.078597471.539062501.503213120.07244585
μc0.001483920.001481420.000070310.001518250.001491220.00007685
ρ0.984130860.984080210.004682410.982849120.984352100.00320064
φe0.032043460.032020310.001601500.032043460.032028440.00162241
σ20.000040410.000041240.000001960.000041600.000040610.00000196
ν0.987304690.987387660.004411050.981994630.982235630.00299350
σw0.000001680.000001700.000000090.000001690.000001700.00000008
μd0.001209260.001191400.000061140.001213070.001201860.00006030
φd2.789062502.807491250.146201802.882812502.828205000.15095447
πd4.070312504.116551250.205864704.171875004.156656250.19923412
φu6.140625006.275963750.319968966.453125006.199785000.30424633
rf0.943980001.161336000.121777030.908748001.118964000.11709356
rdrf4.307376004.987380000.488445264.112232004.592136000.28433000
σrd18.2800218818.856775970.1758608019.0783961618.589351790.13239826
Prospect theory model
gC0.018280030.017927750.000934130.018463130.017951060.00095215
gD0.018707280.018338210.000952760.018493650.018450270.00097794
σC0.039184570.037640400.002006900.032958980.033569050.00201110
σD0.122314450.120230100.006110830.119628910.117383810.00597238
ω0.147949220.150181640.006940940.148925780.150152830.00801015
γ0.986328120.985114220.051456080.964843750.976030820.04958596
ρ0.999725340.997838990.001636040.999694820.997834300.00202090
λ2.179687502.247097500.114868102.234375002.185219530.11761822
k9.828125009.863756250.531899149.906250009.842529840.53634137
b02.001953122.003287030.109671111.893554691.936994770.12735310
η0.916015620.898459690.044126950.853759770.859656420.02405305
rf1.755792001.912836000.056676171.761360001.914984000.06495191
rdrf5.923536005.492496000.192358104.883268004.783608000.12334973
σrd27.9774838026.758811630.9242429422.9017728622.792367140.29273615
Prior
Posterior
ParameterModeMeanSDModeMeanSD
Habit model
g0.001575470.001565190.000081280.001668930.001591470.00007473
σ0.004409790.004311690.000221130.005027770.005010540.00018533
ρ0.200683590.200533480.010724910.194458010.198928730.00931413
σw0.032287600.032479380.001690520.031936650.031759600.00138630
φ0.988265990.988304990.000424750.987697600.987737610.00033629
δ0.990463260.990417000.000436050.990337370.990335650.00044495
γ2.042968752.040761560.089247511.975585941.963363360.07720679
rf0.977964001.075872000.132730521.025304000.962196000.12647089
rdrf6.049692005.983596000.077006986.268548006.239088000.07426341
σrd19.6724680719.692282750.1407884920.1706222020.141211480.14442220
Long-run risks model
δ0.999610900.999340960.000311720.999649050.999430580.00029362
γ9.8906250010.073486250.485835459.9218750010.000107500.50121255
ψ1.496093751.496143440.078597471.539062501.503213120.07244585
μc0.001483920.001481420.000070310.001518250.001491220.00007685
ρ0.984130860.984080210.004682410.982849120.984352100.00320064
φe0.032043460.032020310.001601500.032043460.032028440.00162241
σ20.000040410.000041240.000001960.000041600.000040610.00000196
ν0.987304690.987387660.004411050.981994630.982235630.00299350
σw0.000001680.000001700.000000090.000001690.000001700.00000008
μd0.001209260.001191400.000061140.001213070.001201860.00006030
φd2.789062502.807491250.146201802.882812502.828205000.15095447
πd4.070312504.116551250.205864704.171875004.156656250.19923412
φu6.140625006.275963750.319968966.453125006.199785000.30424633
rf0.943980001.161336000.121777030.908748001.118964000.11709356
rdrf4.307376004.987380000.488445264.112232004.592136000.28433000
σrd18.2800218818.856775970.1758608019.0783961618.589351790.13239826
Prospect theory model
gC0.018280030.017927750.000934130.018463130.017951060.00095215
gD0.018707280.018338210.000952760.018493650.018450270.00097794
σC0.039184570.037640400.002006900.032958980.033569050.00201110
σD0.122314450.120230100.006110830.119628910.117383810.00597238
ω0.147949220.150181640.006940940.148925780.150152830.00801015
γ0.986328120.985114220.051456080.964843750.976030820.04958596
ρ0.999725340.997838990.001636040.999694820.997834300.00202090
λ2.179687502.247097500.114868102.234375002.185219530.11761822
k9.828125009.863756250.531899149.906250009.842529840.53634137
b02.001953122.003287030.109671111.893554691.936994770.12735310
η0.916015620.898459690.044126950.853759770.859656420.02405305
rf1.755792001.912836000.056676171.761360001.914984000.06495191
rdrf5.923536005.492496000.192358104.883268004.783608000.12334973
σrd27.9774838026.758811630.9242429422.9017728622.792367140.29273615

Parameter values are for the monthly frequency for the habit and long-run risks models and for the annual frequency for the prospect theory model. Mode is that of the multivariate density. Returns are geometric, annualized, and expressed as a percent for all models. In the data, returns are rdrf = 5.59 − 0.89 = 4.7 and σrd = 19.72. The auxiliary model is f5 as described in Table 1. The data are annual consumption growth and stock returns for the years 1930–2008, see Figure 1.

Prior and posterior density estimates. The dashed line is the prior. The solid line is the posterior. Left column is for the habit model, middle for the long-run risks model, right for the prospect theory model. Returns are geometric, annualized, and expressed as a percent. Other details as in Table 2. Bandwidths are small to reduce smudging of isolated peaked modes.
Figure 2

Prior and posterior density estimates. The dashed line is the prior. The solid line is the posterior. Left column is for the habit model, middle for the long-run risks model, right for the prospect theory model. Returns are geometric, annualized, and expressed as a percent. Other details as in Table 2. Bandwidths are small to reduce smudging of isolated peaked modes.

Where differences in the three models are the most obvious visually is in their out-of-sample forecasts for the next five years. The mean posterior forecast for the habit model, computed as described in Subsection 2.6, is plotted in the left column of Figure 3. The habit model predicts an end to the current recession in 2009 and return to steady-state growth by 2010. This is dictated by Equations (2), (7), and (8), which imply that annual consumption growth for the habit model is a first-order autoregression with an autogression parameter of about 0.25 (Working 1960). Stock returns are predicted to be high in 2009 with a return to steady-state returns by 2013.

Prior and posterior forecasts. The solid line is the mean of the posterior annualized and expressed as a percent. The dashed lines are ±1.96 posterior standard deviations. The left column is for the habit model, middle for the long-run risks model, and right for the prospect theory model. Other details as in Table 2.
Figure 3

Prior and posterior forecasts. The solid line is the mean of the posterior annualized and expressed as a percent. The dashed lines are ±1.96 posterior standard deviations. The left column is for the habit model, middle for the long-run risks model, and right for the prospect theory model. Other details as in Table 2.

1.2 The Long-run Risks Model

The driving processes for the long-run risks model are as follows:
The long-run risks model derives its name from the random shifts in the location of consumption and dividends due to xt. Note that long-run risks xt and stochastic volatility σt2 evolve autonomously. The utility function is
(11)
γ and δ have the same interpretation as for the habit model; ψ, which summarizes preferences across time periods, is a separate parameter. Separation of attitudes toward risk and preferences across time is the main advantage of Equation (11). t is the conditional expectation with respect to xt and σt, which are the state variables.
The long-run risks model is richly parametrized as
(12)

It is so richly parametrized that identification would have to come from the prior even when data are abundant because half of the models in Table 1 have fewer parameters than θ. The time increment is one month. Aggregation of monthly {ct,rdt,rft}t = 112N to the annual frequency {cta,rdta,rfta}t = 1N is by means of Equations (7) through (10).

The prior is Equation (1). The autoregressive parameters ρ and η cause problems. The solution method proposed by BY degrades as ρ and η increase from their published values. Because the degradation is continuous in ρ and η, there is no logical threshold that one can impose on ρ and η to completely prevent degradation. Our solution to this problem is to set the scale factor for ρ and ν in Equation (1) to 0.01 rather than 0.1 and attenuate the tails for ρ and ν above 0.995.

Measures of location and scale for the prior and posterior distributions of the parameters of the long-run risks model are shown in the middle panel of Table 2. The prior and posterior densities of the risk-free rate, equity premium, equity returns, and the standard deviation of equity returns are plotted in the middle column of Figure 2. As for the habit model, Table 2 and Figure 2 suggest that the prior is sufficiently informative to fill in where data are sparse, but it allows the data to move to the posterior where data are informative.

The mean posterior forecast for the long-run risks model is plotted in the middle column of Figure 3. The long-run risks model predicts an end to the current recession in 2010 and a slow increase in the growth rate thereafter. Stock returns are predicted to be at their steady-state values over the entire forecast period. A flat response of asset returns to a consumption growth and asset return shock is a structural characteristic of the long-run risks model. It is due to the fact that stochastic volatility is autonomous and therefore not affected by consumption growth and asset return shocks. The stochastic volatility process is the factor that affects the risk premium in the long-run risks model.

1.3 The Prospect Theory Model

The driving processes for the prospect theory model are as follows:

The prospect theory model distinguishes between aggregate consumption graphic, which is not a choice variable, and the agent's consumption Ct, which is. In addition to these variables, let Rt denote the gross stock return; Rf, the gross risk-free rate; St, the share of wealth allocated to the risky asset;
(13)
the relative gain or loss on the risky asset; and
(14)
the benchmark level, where graphic is chosen to make median {zt} = 1. The prospect theory utility function is
(15)
where
(16)
and
(17)
t is the conditional expectation with respect to the benchmark level zt, which is the state variable. In Equation (15), the agent's discount factor is ρ and risk aversion parameter is γ. The second term in Equation (15) is the utility from gains or losses, where b0 is a scale factor. From a plot of graphic figure 1 of BHS, one can see that when there are no prior gains and losses (z = 1), agents dislike losses more than they appreciate gains. When there are prior losses (z > 1), the dislike intensifies. When there are prior gains (z < 1), an agent is “playing on the house's money” and pain is delayed until the “house's money has been lost.” The parameter η in Equation (14) controls sensitivity to past gains and losses. When η is zero, its lower bound, the benchmark does not depend at all on past gains and losses. The dependence increases as η approaches its upper bound of one. Agents always dislike losses more than they appreciate gains; η just determines the delay. See BHS for the relation of the prospect theory utility function to the psychology literature.
Like the long-run risks model, the prospect theory model is richly parameterized:
(18)

Identification requires the prior for most of the auxiliary models considered. The time increment is one year: Simulate directly and set graphicrdta = rdt, and rfta = rft.

Measures of location and scale for the prior and posterior distributions of the parameters of the prospect theory model are shown in the bottom panel of Table 2. The prior and posterior densities of the risk-free rate, equity premium, equity returns, and the standard deviation of equity returns are plotted in the right column of Figure 2. As previously, Table 2 and Figure 2 suggest that the prior is sufficiently informative to fill in where data are sparse, but it allows the data to move the posterior where data are informative.

The mean posterior forecast for the prospect theory model is plotted in the right column of Figure 3. The prospect theory model predicts steady-state growth throughout the forecast period. This is dictated by the fact that annual consumption growth for the prospect theory model is independent and identically distributed. Stock returns are predicted to be double their steady-state value in 2009, reach steady-state by 2011, and remain at steady-state thereafter.

2 INFERENCE FOR GENERAL SCIENTIFIC MODELS

We describe the Bayesian methods proposed by GM and the modifications that we found necessary. Public domain code implementing the method for the auxiliary models in Table 1 and a User's Guide are available at http://econ.duke.edu/ webfiles/arg/gsm.

2.1 Estimation of Scientific Model Parameters

Let the transition density of the scientific model be denoted by
(19)
where xt − 1 = (yt − 1,…,ytL) if Markovian and xt − 1 = (yt − 1,…,y1) if not. We presume that there is no straightforward algorithm for computing the likelihood, but we can simulate data from p(·|·,θ) for given θ. We presume that simulations from the scientific model are ergodic. We assume that there is a transition density
(20)
and a map
(21)
such that
(22)
We assume that f(y|x,η) and its gradient (/η)f(y|x,η) are easy to evaluate. f is called the auxiliary model and g is called the implied map. When Equation (22) holds, f is said to nest p. Whenever we need the likelihood ∏t = 1np(yt|xt − 1,θ), we use
(23)
where {yt,xt − 1}t = 1n are the data and n is the sample size. After substituting (θ) for ∏t = 1np(yt|xt − 1,θ), standard Bayesian MCMC methods become applicable. That is, we have a likelihood (θ) from Equation (23) and a prior π(θ) from Equation (1) and need nothing beyond that to implement Bayesian methods by means of MCMC. A good introduction to these methods is Gamerman and Lopes (2006).

The difficulty is computing the implied map g accurately enough that the accept/reject decision in an MCMC chain (step 5 in the algorithm below) is correct when f is a nonlinear model. We describe the algorithms that we have found effective next.

Given θ, η = g(θ) is computed by minimizing Kullback–Leibler divergence
with respect to η. The advantage of Kullback–Leibler divergence over other distance measures is that the part that depends on the unknown p(·|·,θ), ∫∫logp(y|x,θ)p(y|x,θ)dyp(x|θ)dx, does not have to be computed to solve the minimization problem. We approximate the integral that does have to be computed by
where graphic is a simulation of length N from p(·|·,θ). Upon dropping the division by N, the implied map is computed as
(24)

We use N = 5000, which requires 60,000 monthly simulations in the case of the habit and long-run risks models. Results (posterior mean, posterior standard deviation, etc.) are not sensitive to N; doubling N makes no difference other than doubling computational time. By accident we once set N = 60,000 in the prospect theory model; this also made no difference. It is essential that the same seed be used to start these simulations so that the same θ always produces the same simulation.

GM run a Markov chain {ηt}t = 1K of length K to compute graphic that solves Equation (24). There are two other Markov chains discussed below so, to help distinguish among them, this chain is called the η-subchain. While the η-subchain must be run to provide the scaling for the model assessment method that GM propose, the graphic that corresponds to the maximum of graphic over the η-subchain is not a sufficiently accurate evaluation of g(θ) for our auxiliary models. This is mainly because our auxiliary models use a multivariate specification of generalized autoregressive conditional heteroskedasticity (GARCH) that Engle and Kroner (1995) call BEKK. Likelihoods incorporating BEKK are notoriously difficult to optimize. We use graphic as a starting value and maximize Equation (24) using the BFGS algorithm (Fletcher 1987, 26–40). This also is not a sufficiently accurate evaluation of g(θ). A second refinement is necessary. The second refinement is embedded within the MCMC chain {θt}t − 1R of length R that is used to compute the posterior distribution of θ. It is called the θ-chain. (We use R = 25,000 past the point transients have dissipated.) Its computation proceeds as follows.

The θ-chain is generated using the Metropolis algorithm. The Metropolis algorithm is an iterative scheme that generates a Markov chain whose stationary distribution is the posterior of θ. To implement it, we require a likelihood, a prior, and transition density in θ called the proposal density. The likelihood is Equation (23).

The prior may require quantities computed from the simulation graphic used to compute Equation (23). Our prior requires rfa. The sequence graphic is available from the simulation. The risk-free rate for the prior is the average graphic. (For the habit and prospect models, rfta is constant over the simulation.) Quantities computed in this fashion can be interpreted as the evaluation of a functional of the scientific model of the form Ψ:p(·|·,θ)↦ψ. Thus, our prior is a function of the form π(θ,ψ). However, the functional ψ is a composite function, θ↦p(·|·,θ)↦ψ, so that π(θ,ψ) is ultimately a function of θ only. Therefore, we will only write π(θ,ψ) when it is necessary to call attention to the subsidiary computation p(·|·,θ)↦ψ.

Let q denote the proposal density. For a given θ, q(θ,θ*) defines a distribution of potential new values θ*. We use a move-one-at-a-time, random-walk proposal density that puts its mass on discrete, separated points, proportional to a normal. Two aspects of the proposal scheme are worth noting. The first is that the wider the separation between the points in the support of q the less accurately g(θ) needs to be computed for α at step 5 of the algorithm below to be correct. As an example, the long-run risks model is not sensitive to the risk aversion parameter γ so that values of γ could be separated as much as one-fourth without making any difference to the usefulness of the θ-chain with respect to inference regarding economics. A practical constraint is that the separation cannot be much more than a standard deviation of the proposal density or the chain will not move. Our separations are typically one-eighth of a standard deviation of the proposal density. In turn, the standard deviations of the proposal density are typically no more than the standard deviations in Table 2 and no less than one order of magnitude smaller. The second aspect worth noting is that the prior is putting mass on these discrete points in proportion to π(θ). Because we never need to normalize π(θ), this does not matter. Similarly for the joint distribution f(y|x,g(θ))π(θ) considered as a function of θ. However, f(y|x,η) must be properly normalized as a function of y, at least to the extent that Equation (24) is computed correctly.

The algorithm for the θ-chain is as follows. Given a current θo and the corresponding ηo = go), we obtain the next pair (θ) as follows:

  1. Draw θ* according to qo*).

  2. Draw graphic according to p(yt|xt − 1*).

  3. Compute η* = g*) and the functional ψ* from the simulation graphic.

  4. Compute graphic.

  5. With probability α, set (θ) = (θ**), otherwise set (θ) = (θoo).

It is at step 3 that we make our second modification. At that point, we have putative pairs (θ**) and (θoo) and corresponding simulations graphic and graphic. We use η* as a start and recompute ηo using the BFGS algorithm, obtaining graphic. If
then graphic replaces ηo. In the same fashion, η* is recomputed using ηo as a start. Once computed, a (θ,η) pair is never discarded. Neither are the corresponding (θ) and π(θ,ψ). Because the support of the proposal density is discrete, points in the θ-chain will often recur, in which case g(θ), (θ), and π(θ,ψ) are retrieved from storage rather than computed afresh. If the modification just described results in an improved (θoo), that pair and corresponding o) and π(θoo) replace the values in storage; similarly for (θ**). The upshot is that the values for g(θ) used at Step 4 will be optima computed from many different random starts after the chain has run awhile.
To provide the scaling for the prior used in absolute model assessment, there is a subsidiary computation that needs to be carried out at Step 3. It is as follows. Initialize Sη and L to zero. Each time the η-subchain {ηt}t = 1K is run, increment L, replace Sη by Sη + (ηK/2 − ηK)(ηK/2 − ηK) and set
(25)

We use K = 200. All that is important is that transients have died out by the time the midpoint K/2 of the η-subchain has been reached and that ηK/2 and ηK are nearly uncorrelated.

We compute posterior probabilities using a method that requires one to save the values θ,),π(θ) available at step 5. It also requires that these same values for a chain that draws from the prior for θ be saved. To draw from the prior, replace α at Step 4 by graphic.

The algorithm for the η-subchain is as follows. We use a move-one-at-a-time, random-walk proposal density with continuous support. Given the current ηo, obtain the next value η in the chain as follows:

  1. Draw η* according to qo*).

  2. Compute graphic.

  3. With probability α, set η = η*, otherwise set η = ηo.

In Subsection 2.3, we shall require another chain, called the η-chain, that is computed from the data and a prior πκ. The algorithm for that chain replaces α with

Draws from the prior are also required. This is done by putting graphic

2.2 Relative Model Comparison

Relative model comparison is standard Bayesian inference although there are a few details that need to be discussed to connect it to Subsection 2.1.

One computes the marginal density, ∫∏t = 1nf(yt|xt − 1,g(θ))π(θ)dθ, for the three scientific models p1(y|x1), p2(y|x2), p3(y|x3) with respective priors π11), π22), π33) using method f5 of Gamerman and Lopes (2006, section 7.2.1). The advantage of that method is that knowledge of the normalizing constants of f(·|·,η) and π(θ) are not required, and it appears to be accurate in tests that we conducted. The computation is straightforward because the relevant information from the θ-chains for the prior and posterior are available after completion of the computations discussed in Subsection 2.1. It is important, however, that the auxiliary model be the same for all three models when the computations in Subsection 2.1 are carried out. Otherwise the normalizing constant of f would be required. One divides the marginal density for each model by the sum for the three models to get the probabilities for relative model assessment.

Note that what one is actually doing is comparing the three models f(y|x,g11)), f(y|x,g22)), f(y|x,g33)), with respective priors π11), π22), π33). This is an important observation. Inference is actually being conducted with likelihoods ∏t = 1nf(yt|xt − 1,g11)), ∏t = 1nf(yt|xt − 1,g22)), ∏t = 1nf(yt|xt − 1,g33)), not ∏t = 1np1(yt|xt − 11), ∏t = 1np2(yt|xt − 12), ∏t = 1np3(yt|xt − 13). If f nests all pi, that is, if Equation (22) holds, then the former and the latter are the same. If not, the matter needs consideration. In GM's application, they give two examples. In the first, the presence or absence of GARCH in the auxiliary model makes a dramatic difference to habit model parameter estimates. In the second, changing the thickness of the tails of the auxiliary model makes no difference. They argue on the basis of common sense and their examples that what is actually required is that the auxiliary model fit the observed data, not that it nests p. That is why they use the term statistical model for f. However, their argument is not a proof. We examine this issue in Section 4.

The realization that what one is actually doing is comparing the three models f(y|x,g11)), f(y|x,g22)), f(y|x,g33)), with respective priors π11), π22), π33), allows us to perform a change of measure to η-space and illustrate relative model comparison graphically. In η-space, the prior πi for model i restricts η to a manifold i = {η∈:η = gii),θi∈Θi} with each η = gii) in i receiving prior weight πi(η) = πii) (recall that Θi is discrete). Think of this manifold as a line in η-space. This is shown graphically in Figure 4 for two hypothetical models. Although conceptually a line, what is plotted in Figure 4 has an area with gray fill to represent the density of the prior along the line. A density-weighted integral along the line would have approximately the same value as an integral over the area shown. Also shown in Figure 4 are the likelihood contours of a hypothetical auxiliary model f(y|x,η). The marginal density for model i is ∫it = 1nf(yt|xt − 1,η)πi(η)dη, which is approximately the integral over the gray areas for the hypothetical models in Figure 4. As shown in the figure, the likelihood is larger over 2 than over 1, which implies the same for the integrals over 2 and 1. Thus, the second model has the higher marginal likelihood and is therefore preferred.

Relative model comparison. Shown is relative model comparison under a change of variables of integration θ↦η. The contours show the likelihood of the auxiliary model f(·|η). The curved lines, labeled “Prior of Model i”, show the manifolds ℳi = {η∈H:η = gi(θi),θi∈Θi} for Model 1, i = 1, and Model 2, i = 2. Thickness, represented by gray fill, is proportional to the priors π1 and π2. Posterior probabilities are proportional to the integral of the likelihood over the manifold weighted by the prior. Model 2 is preferred because the likelihood values on its manifold are larger than the likelihood values on the manifold of Model 1.
Figure 4

Relative model comparison. Shown is relative model comparison under a change of variables of integration θ↦η. The contours show the likelihood of the auxiliary model f(·|η). The curved lines, labeled “Prior of Model i”, show the manifolds i = {η∈H:η = gii),θi∈Θi} for Model 1, i = 1, and Model 2, i = 2. Thickness, represented by gray fill, is proportional to the priors π1 and π2. Posterior probabilities are proportional to the integral of the likelihood over the manifold weighted by the prior. Model 2 is preferred because the likelihood values on its manifold are larger than the likelihood values on the manifold of Model 1.

2.3 Absolute Model Assessment

We now shift our focus. The model of interest is the auxiliary model f(·|·,η) and its parameter η. The role of the scientific model p(·|·,θ) is to define the implied map g(θ) and the manifold
(26)

The scientific model can be viewed as a sharp prior on f that restricts the posterior distribution of η to lie on the manifold . Think of it as a line in η-space. If this prior is relaxed, the line becomes a region with volume in η-space. Relaxation can be indexed by a scale parameter κ. As κ increases, the size of the region increases and posterior for η will move along a path toward the likelihood of the data under f. Figure 5 is an illustration. One can select waypoints κi along this path, view them as the discrete values of a parameter, assign them equal prior probability, and compute their posterior probability. If waypoints near receive high posterior probability, then the data support the scientific model. If waypoints far from receive high posterior probability, then the data do not support the scientific model. The idea is illustrated graphically in Figure 5 for the case when a model is rejected and in Figure 6 for the case when a model is accepted. The formal development proceeds as follows.

Absolute model assessment—reject. The contours show the likelihood of the auxiliary model f(·|η). The shaded areas show the prior (27) for scale parameters κ1 < κ2 < κ3. The smallest region corresponds to κ1 and the largest to κ3. The crosses show the mode of the posterior under κ1, κ2, κ3. The posterior probabilities for absolute model assessment are proportional to the integrals of the likelihood over the respective shaded areas. The model is rejected because the likelihood, hence the integral, is larger over the κ3-prior than over the κ1-prior.
Figure 5

Absolute model assessment—reject. The contours show the likelihood of the auxiliary model f(·|η). The shaded areas show the prior (27) for scale parameters κ1 < κ2 < κ3. The smallest region corresponds to κ1 and the largest to κ3. The crosses show the mode of the posterior under κ1, κ2, κ3. The posterior probabilities for absolute model assessment are proportional to the integrals of the likelihood over the respective shaded areas. The model is rejected because the likelihood, hence the integral, is larger over the κ3-prior than over the κ1-prior.

Absolute model assessment—accept. The contours show the likelihood of the auxiliary model f(·|η). The shaded areas show the prior (27) for scale parameters κ1 < κ2 < κ3. The smallest region corresponds to κ1 and the largest to κ3. The crosses show the mode of the posterior under κ1, κ2, κ3. The posterior probabilities for absolute model assessment are proportional to the integrals of the likelihood over the respective shaded areas. The model is accepted because the likelihood, hence the integral, is larger over the κ1-prior than over the κ3-prior.
Figure 6

Absolute model assessment—accept. The contours show the likelihood of the auxiliary model f(·|η). The shaded areas show the prior (27) for scale parameters κ1 < κ2 < κ3. The smallest region corresponds to κ1 and the largest to κ3. The crosses show the mode of the posterior under κ1, κ2, κ3. The posterior probabilities for absolute model assessment are proportional to the integrals of the likelihood over the respective shaded areas. The model is accepted because the likelihood, hence the integral, is larger over the κ1-prior than over the κ3-prior.

We add the additional assumption that the auxiliary model is identified and has more parameters than the scientific model. This assumption implies that g − 1(η) exists on . If the scientific model is identified g − 1(η) will map to a single point; if not, g − 1(η) will be a set.

We impose closeness to by means of the prior
(27)
where
(28)
π(θ) is the prior (1) for the scientific model, and Ση is given by Equation (25). It is easy and cheap to evaluate Equation (28) once the computations described in Subsection 2.1 have been carried out because the implied map g is represented by pairs (θ,η) stored together with π(θ) at the conclusion of Subsection 2.1 computations. Store is traversed to find the pair (θoo) such that ηo solves Equation (28). Then π(g − 1o)) = π(θo). (If g − 1o) maps to a set, π(g − 1o)) is the sum of π(θ) over that set. Recall that Θ is discrete and normalization is not required.) The pairs (θ,η) and scale Ση used to compute Equations (28) and (27) are those for the θ-chain that draws from the prior because they are not tainted by data.

Choose three (for specificity) values κ1, κ2, and κ3, ordered from small to large. Consider f under priors πκ1, πκ2, and πκ3 to be three different models and compute the posterior probability for the three models with each having prior probability of 1/3. That is, the pair (f(·|·,η),πκ(η)) is considered to be a model and the posterior probability of each κ choice is proportional to the marginal likelihood ∫∏t = 1nf(yt|xt − 1,η)πκ(η)dη.

If the posterior probability of model κ1 is small, that is evidence against the scientific model. Conversely, if it is large, that is evidence in favor of the scientific model.

2.4 The Auxiliary Model

The observed data are yt for t = 1,…,n. We first discuss the case where yt is a multivariate. Lagged values of yt are denoted as xt − 1. For auxiliary models f0 through f4,xt − 1 = yt − 1. For auxiliary model f5,xt − 1 = (yt − 1,yt − 2).

The data are modeled as
where
(29)
which is the location function of a VAR, and Rxt − 1 is the Cholesky factor of
(30)
(31)
(32)
(33)

In our specification, R0 is an upper triangular matrix, P and V are diagonal matrices, and Q is scalar; max(0,x) is applied elementwise. This is the BEKK form of multivariate GARCH described in Engle and Kroner (1995) with an added leverage term (33). In computations, max(0,x) in Equation (33) is replaced by a twice differentiable cubic spline approximation that plots slightly above max(0,x) over (0,0.1) and coincides elsewhere. Auxiliary model f0 has term (30) only, f1 has terms (30), (31), and (32), and f2 through f5 have all four terms.

The density h(z) of the iid zt is the square of a Hermite polynomial times a normal density, the idea being that the class of such h is dense in Hellenger norm and can therefore approximate a density to within arbitrary accuracy in Kullback–Leibler distance (Gallant and Nychka 1987). The density h(z) is the normal when the degree of the Hermite polynomial is zero, which is the case for auxiliary models f0 through f2. For model f3, the degree is 4. For models f4 and f5, the degree is 4, but the constant term of the Hermite polynomial is a linear function of yt − 1. This has the effect of adding a nonlinear term to the location function (29) and the variance function (30). It also causes the higher moments of h(z) to depend on yt − 1 as well.

The univariate auxiliary models are the same as the above but μxt − 1 in Equation (29) has dimension one and becomes the location function of a first order autoregression and Σxt − 1 in (30) has dimension one and becomes a GARCH(1,1) with a leverage term added.

2.5 Diagnostic Checks

The idea behind diagnostic checking is straightforward: If one has compared two scientific models (p11) and (p22) using the same auxiliary model f(·|η) and the fit of (p22) is preferred, then one can examine the posterior means (or modes) graphic and graphic of f(·|η) corresponding to the two fits to see which elements changed. The same is true for absolute model assessment. If one fits (fκ1) and (fκ2) and concludes that (fκ1) fails to fit the data, then one can examine the changes in the elements of the posterior means (or modes) of f(·|η) corresponding to the two fits to see which elements changed.

The changes in the elements of graphic and graphic need to be normalized to facilitate meaningful comparison. Let graphic and graphic denote the respective ith elements of graphic and graphic. Let graphic denote the ith posterior standard deviation of the second fit, that is, the preferred fit. The normalization we suggest is
(34)

Table 4 is an example.

There is a caveat. The ti are often informative but are subject to the same risk as the interpretation of t-statistics in a regression, namely, a failure to fit one characteristic of the data can show up not in the parameters that describe that characteristic but elsewhere due to correlation (colinearity). Nonetheless, despite this risk, inspection of the ti is often the most informative diagnostic available.

The methods proposed here are likelihood methods which means that at the conclusion of an estimation exercise a transition density that represents the data under the fitted model is available. The most useful are f(y|x,g(θ)) with θ set to the posterior mode from a fit of (p,π) and f(y|x,η) with η set to the posterior mode from a fit of (fκ). One can apply standard diagnostics to these transition densities such as comparative plots as in Figure 7.

Conditional volatility of the three models. The solid line is the conditional volatility of the data expressed as a percent for the long-run risks model with its parameters set to the posterior mode from fitting to the bivariate consumption growth and stock returns data over the period 1930–2008 using auxiliary model f5. The dashed line is the same for the habit persistence model and the dot-dash line is the same for the prospect theory model. The shaded area shows ±1.96 posterior standard deviations about the conditional volatility of the long-run risks model.
Figure 7

Conditional volatility of the three models. The solid line is the conditional volatility of the data expressed as a percent for the long-run risks model with its parameters set to the posterior mode from fitting to the bivariate consumption growth and stock returns data over the period 1930–2008 using auxiliary model f5. The dashed line is the same for the habit persistence model and the dot-dash line is the same for the prospect theory model. The shaded area shows ±1.96 posterior standard deviations about the conditional volatility of the long-run risks model.

2.6 Forecasts

A forecast can be viewed as a functional Y:f(·|·,η)↦υ of the auxiliary model that can be computed from f(·|·,η) either analytically or by simulation. If f(·|·,η) nests the scientific model p(·|·,θ) then, due to the map η = g(θ), this forecast can also be viewed both as a forecast from the scientific model and as function of θ. As such, it can be computed at each draw in the θ-chain for the posterior and the posterior mean, mode, and standard deviation obtained. Similarly for draws from the prior. An example is Figure 3.

3 HABIT, LONG-RUN RISKS, PROSPECT?

Table 3 presents the posterior probabilities for a relative model comparison and an absolute model assessment of the three asset pricing models fitted to the three data series over the two sample periods.

Table 3

Posterior probabilities

Relative model comparison
Sample period
1930–2008
1950–2008
Serieshablrrprohablrrpro
Trivariate0.001.001.000.00
Bivariate0.001.000.001.000.000.00
Univariate0.280.480.240.440.420.14
Relative model comparison
Sample period
1930–2008
1950–2008
Serieshablrrprohablrrpro
Trivariate0.001.001.000.00
Bivariate0.001.000.001.000.000.00
Univariate0.280.480.240.440.420.14
Absolute model assessment
Sample period
1930–20081950–2008
SeriesPriorhablrrprohablrrpro
Trivariateκ = 0.10.000.000.000.00
κ = 10.000.000.330.00
κ = 101.001.000.671.00
Bivariateκ = 0.10.000.410.280.310.160.08
κ = 10.000.360.280.310.210.08
κ = 101.000.230.440.380.640.84
Univariateκ = 0.10.290.360.100.400.390.29
κ = 10.300.260.300.380.350.34
κ = 100.410.380.600.220.260.37
Absolute model assessment
Sample period
1930–20081950–2008
SeriesPriorhablrrprohablrrpro
Trivariateκ = 0.10.000.000.000.00
κ = 10.000.000.330.00
κ = 101.001.000.671.00
Bivariateκ = 0.10.000.410.280.310.160.08
κ = 10.000.360.280.310.210.08
κ = 101.000.230.440.380.640.84
Univariateκ = 0.10.290.360.100.400.390.29
κ = 10.300.260.300.380.350.34
κ = 100.410.380.600.220.260.37

Shown are posterior probabilities for a relative comparison and an absolute assessment of three asset pricing models fitted to three data series over two sample periods. The trivariate series is annual consumption growth, stock returns, and the price dividend ratio over the years shown. The bivariate series is consumption growth and stock returns. The univariate series is stock returns alone. Variables not in a series are treated as latent in model fits. For the univariate and bivariate series. the auxiliary model is f5, which is described in Table 1; for the trivariate series it is f0. κ is the standard deviation of a prior that imposes the habit model (hab), the long-run risks model (lrr), and the prospect theory model (pro), respectively, on the auxiliary model. The prior weakens as κ increases.

Table 3

Posterior probabilities

Relative model comparison
Sample period
1930–2008
1950–2008
Serieshablrrprohablrrpro
Trivariate0.001.001.000.00
Bivariate0.001.000.001.000.000.00
Univariate0.280.480.240.440.420.14
Relative model comparison
Sample period
1930–2008
1950–2008
Serieshablrrprohablrrpro
Trivariate0.001.001.000.00
Bivariate0.001.000.001.000.000.00
Univariate0.280.480.240.440.420.14
Absolute model assessment
Sample period
1930–20081950–2008
SeriesPriorhablrrprohablrrpro
Trivariateκ = 0.10.000.000.000.00
κ = 10.000.000.330.00
κ = 101.001.000.671.00
Bivariateκ = 0.10.000.410.280.310.160.08
κ = 10.000.360.280.310.210.08
κ = 101.000.230.440.380.640.84
Univariateκ = 0.10.290.360.100.400.390.29
κ = 10.300.260.300.380.350.34
κ = 100.410.380.600.220.260.37
Absolute model assessment
Sample period
1930–20081950–2008
SeriesPriorhablrrprohablrrpro
Trivariateκ = 0.10.000.000.000.00
κ = 10.000.000.330.00
κ = 101.001.000.671.00
Bivariateκ = 0.10.000.410.280.310.160.08
κ = 10.000.360.280.310.210.08
κ = 101.000.230.440.380.640.84
Univariateκ = 0.10.290.360.100.400.390.29
κ = 10.300.260.300.380.350.34
κ = 100.410.380.600.220.260.37

Shown are posterior probabilities for a relative comparison and an absolute assessment of three asset pricing models fitted to three data series over two sample periods. The trivariate series is annual consumption growth, stock returns, and the price dividend ratio over the years shown. The bivariate series is consumption growth and stock returns. The univariate series is stock returns alone. Variables not in a series are treated as latent in model fits. For the univariate and bivariate series. the auxiliary model is f5, which is described in Table 1; for the trivariate series it is f0. κ is the standard deviation of a prior that imposes the habit model (hab), the long-run risks model (lrr), and the prospect theory model (pro), respectively, on the auxiliary model. The prior weakens as κ increases.

For the univariate series, all three models fit reasonably well and none of them strongly dominates. Our interest in the univariate case is due to the fact that this conclusion depends on the choice of auxiliary model. We explore this issue in Section 4.

Our interest in the trivariate series stems from fact that BGT, Beeler and Campbell (2008), and Bansal, Kiku, and Yaron (2009) argue that the main difference between the habit model and the long-run risks model occurs with respect to predictability regressions that involve the price to dividend ratio series.

It is difficult to properly adjust dividend payouts for stock repurchases and other distortions caused by tax policy so as to make the measured data resemble the theoretical construct of asset pricing models. It is apparent from Figure 1 that the price to dividend series does not look like the realization of a stationary process. An informal check on the ability of a model to explain dividends is to simulate it for ten thousand years and see if the best match (with respect to a variance weighted mean squared error metric) of any contiguous segment of the simulation to the data looks like Figure 1. It does not. For all models and for both sample periods, the best match cannot produce the hump seen in the third panel of Figure 1.

As mentioned earlier, because the prospect theory model puts its (conditional) mass on a two-dimensional subspace, it cannot be fit to the trivariate series. However, the informal check just described is possible and the prospect theory model fails more definitively under it than either the habit or long-run risks model.

With respect to the remaining two models, in the relative model comparison presented in Table 3, the long-run risks model is dominant over the 1930–2008 period, whereas the habit model is dominant over the 1950–2008 period. Both models fail to fit the data in the absolute model assessment. Interestingly, the failure of the habit persistence model in the 1950–2008 period is not as stark as for the long-run risks model. These results are for the nesting model f0.

The fundamental objective of asset pricing models is to explain the relationship between asset prices and consumption. From this perspective, dividends can be treated as an internal construct that need not have any corresponding observable counterpart. Therefore, the bivariate results presented in Table 3 are the most important substantively.

As seen in Table 3, in the relative model comparison, the long-run risks model is dominant over the 1930–2008 period, whereas the habit model is dominant over the 1950–2008 period. In the absolute model assessment, the habit model fails in the 1930–2008 period, and the prospect theory model fails in the 1950–2008. These results are for the nesting model f5.

It is of interest to determine why the habit persistence model fails to fit the bivariate series using the diagnostic checks described in Subsection 2.5. For this purpose, it is more informative to use the simpler auxiliary model f1 rather than the nesting model f5. Because the habit model has seven parameters and f1 has twelve, this is a legitimate choice for the habit model.

Table 4 presents the diagnostics for the habit model. In the table the fit of (f1κ) with κ = 0.1 is compared to the fit with κ = 10 for the bivariate series over the period 1930–2008 and over the period 1950–2008. It is clear from the table what the problem is. The largest diagnostic, t = − 4.98, indicates that P11, which is the feedback of consumption growth into its own volatility, is too small in absolute value. The habit model fails to put enough conditional heteroskedasticity into the consumption growth process. The problem disappears in the 1950–2008 period because, as seen by comparing the entries for κ = 10 across the row for P11, the conditional volatility in the data drops substantially.

Table 4

Diagnostics for the habit persistence model

1930–2008
1950–2008
ModeModeDiagnosticModeModeDiagnostic
Parameterκ = 0.1κ = 10κ = 0.1κ = 10
b0,1− 0.08− 0.05− 1.30− 0.06− 0.05− 0.21
b0,20.070.040.530.060.040.34
B110.080.16− 1.620.090.15− 1.21
B21− 0.16− 0.09− 0.94− 0.15− 0.220.64
B120.290.32− 0.800.290.231.58
B220.020.02− 0.100.020.000.35
R0,11− 0.03− 0.01− 0.23− 0.03− 0.060.41
R0,120.230.27− 0.850.230.220.29
R0,220.210.21− 0.070.200.26− 0.74
P11− 0.060.17− 4.98− 0.05− 0.02− 0.55
P22− 0.21− 0.220.16− 0.21− 0.240.93
Q110.910.91− 0.040.910.910.13
1930–2008
1950–2008
ModeModeDiagnosticModeModeDiagnostic
Parameterκ = 0.1κ = 10κ = 0.1κ = 10
b0,1− 0.08− 0.05− 1.30− 0.06− 0.05− 0.21
b0,20.070.040.530.060.040.34
B110.080.16− 1.620.090.15− 1.21
B21− 0.16− 0.09− 0.94− 0.15− 0.220.64
B120.290.32− 0.800.290.231.58
B220.020.02− 0.100.020.000.35
R0,11− 0.03− 0.01− 0.23− 0.03− 0.060.41
R0,120.230.27− 0.850.230.220.29
R0,220.210.21− 0.070.200.26− 0.74
P11− 0.060.17− 4.98− 0.05− 0.02− 0.55
P22− 0.21− 0.220.16− 0.21− 0.240.93
Q110.910.91− 0.040.910.910.13

Shown are the posterior modes from fitting (f1,πκ) to the bivariate consumption growth and stock returns data over the periods and κ values shown together with the diagnostic checks described in Subsection 2.5.

Table 4

Diagnostics for the habit persistence model

1930–2008
1950–2008
ModeModeDiagnosticModeModeDiagnostic
Parameterκ = 0.1κ = 10κ = 0.1κ = 10
b0,1− 0.08− 0.05− 1.30− 0.06− 0.05− 0.21
b0,20.070.040.530.060.040.34
B110.080.16− 1.620.090.15− 1.21
B21− 0.16− 0.09− 0.94− 0.15− 0.220.64
B120.290.32− 0.800.290.231.58
B220.020.02− 0.100.020.000.35
R0,11− 0.03− 0.01− 0.23− 0.03− 0.060.41
R0,120.230.27− 0.850.230.220.29
R0,220.210.21− 0.070.200.26− 0.74
P11− 0.060.17− 4.98− 0.05− 0.02− 0.55
P22− 0.21− 0.220.16− 0.21− 0.240.93
Q110.910.91− 0.040.910.910.13
1930–2008
1950–2008
ModeModeDiagnosticModeModeDiagnostic
Parameterκ = 0.1κ = 10κ = 0.1κ = 10
b0,1− 0.08− 0.05− 1.30− 0.06− 0.05− 0.21
b0,20.070.040.530.060.040.34
B110.080.16− 1.620.090.15− 1.21
B21− 0.16− 0.09− 0.94− 0.15− 0.220.64
B120.290.32− 0.800.290.231.58
B220.020.02− 0.100.020.000.35
R0,11− 0.03− 0.01− 0.23− 0.03− 0.060.41
R0,120.230.27− 0.850.230.220.29
R0,220.210.21− 0.070.200.26− 0.74
P11− 0.060.17− 4.98− 0.05− 0.02− 0.55
P22− 0.21− 0.220.16− 0.21− 0.240.93
Q110.910.91− 0.040.910.910.13

Shown are the posterior modes from fitting (f1,πκ) to the bivariate consumption growth and stock returns data over the periods and κ values shown together with the diagnostic checks described in Subsection 2.5.

Figure 7 plots the conditional volatility of the three models over the 1930–2008 period. The solid line in Figure 7 is the long-run risks model, which is the most correct of the three according to the relative model comparison in Table 3. The shaded region shows ±1.96 posterior standard deviations about this line. All three models track the conditional volatility of stock returns about the same after taking the standard deviations into account. Where they differ is in how they track the conditional volatility of consumption growth and the conditional correlation between consumption growth and stock returns. The same plot for the 1950–2008 period (not shown) looks qualitatively similar: agreement in the middle panel and disagreement in the other two.

Plots of the conditional mean of consumption growth (not shown) indicate that the habit model and long-run risks model agree over the 1930–2008 period except over the years 1930–1940. They both track the data moderately well. Because, as was seen in Figure 3, the conditional mean of consumption growth for the prospect theory model must plot as a straight line, it does not track well. All models disagree in plots (not shown) of the conditional mean of stock returns, with long-run risks plotting as a straight line; all track the data poorly.

4 SENSITIVITY ANALYSIS

There is much experience with the data shown in Figure 1. That experience suggests that about the richest model one would be willing to fit to these data is a model with one-lag VAR location, GARCH scale, and normal innovations. The exact specification one gets using standard model selection procedures, such as upward F-testing, is sensitive to the sample period used. One can get slightly richer or coarser specifications. It is fair to say that the consensus view is that a one-lag VAR location, GARCH scale, and normal innovations is the richest model one ought to entertain, which is model f1 of Table 1.

We do not discuss the trivariate series in this section other than to remark that for it, f0 is the nesting model, results for f0 are reported in Table 3, conclusions do not change under auxiliary models f1 through f3, and computations for models f4 through f5 cannot be undertaken because simulations do not identify them (BFGS becomes unstable). Similarly, we do not discuss results for the bivariate consumption growth and stock returns data because the results shown in Table 3 are the same for all models in Table 1. Finally, we do not discuss absolute model assessment because there is no logical requirement that the auxiliary model nest the scientific model for the purpose of absolute model assessment. All that is logically required is that the auxiliary model have more parameters than the scientific model. Other than that, one is free to choose an auxiliary model as judgment suggests.

For the univariate (and bivariate) series, a model that will nest the three scientific models that we consider has the following characteristics: a two-lag linear conditional mean function with a one-lag nonlinear conditional mean term added to it, a one-lag GARCH conditional variance function with a one-lag leverage term and a one-lag nonlinear conditional variance term added, and a flexible innovation distribution that permits fat tails and bumps. We denote this model by f5. It is the last of the six in Table 1. GM found the same to be true for the habit model, except that they used data from 1933–2001 with the years 1930–1932 used to prime recursions. They dismissed f5 out of hand as absurd and worked with f0 and f1. They did verify that a fat-tailed innovation distribution did not change results. Using model f0 most closely corresponds to calibration as customarily implemented in the macro/finance literature. The sufficient statistics for f0 are the mean and variance of yt and the first-order autocorrelations. One is, effectively, finding parameter values that best match three moments for univariate data, nine for bivariate, and eighteen for trivariate.

As discussed in Subsection 2.1 and in GM, the logically correct view toward using f1, which fits the data, instead of f5, which nests the scientific model, is that it is not the likelihood of the scientific model that is being used. It is some other likelihood. Therefore, it is not the scientific models that are actually being estimated and compared. Another point of view is the argument advanced by GM that using a sensible auxiliary model is akin to method of moments estimation. One only asks that the scientific models match certain features of the data and allows them to ignore others. What to do? About all one can do is try a battery of auxiliary model specifications and see what happens.

Table 5 displays the results for the relative comparisons for the univariate stock returns data over the periods 1930–2008 and 1950–2008. There is considerable sensitivity to specification of the auxiliary model over the 1930–2008 period. Conclusions are affected by the choice of auxiliary model. Our view is that, because there can be sensitivity to auxiliary model choice, and because one is not actually comparing scientific models if the auxiliary model is not nesting, it is best to use the nesting auxiliary model in general, which is f5 in this instance.

Table 5

Posterior probability, relative comparison, stock returns

Modelf0f1f2f3f4f5
1930–2008
Habit0.470.710.280.360.280.28
LR risks0.490.250.570.340.450.48
Prospect0.040.040.150.300.270.24
1950–2008
Habit0.510.490.440.420.460.44
LR risks0.470.420.510.490.450.42
Prospect0.020.100.050.090.090.14
Modelf0f1f2f3f4f5
1930–2008
Habit0.470.710.280.360.280.28
LR risks0.490.250.570.340.450.48
Prospect0.040.040.150.300.270.24
1950–2008
Habit0.510.490.440.420.460.44
LR risks0.470.420.510.490.450.42
Prospect0.020.100.050.090.090.14

The data are annual stock returns over years shown. Auxiliary models f0 through f5 are described in Table 1.

Table 5

Posterior probability, relative comparison, stock returns

Modelf0f1f2f3f4f5
1930–2008
Habit0.470.710.280.360.280.28
LR risks0.490.250.570.340.450.48
Prospect0.040.040.150.300.270.24
1950–2008
Habit0.510.490.440.420.460.44
LR risks0.470.420.510.490.450.42
Prospect0.020.100.050.090.090.14
Modelf0f1f2f3f4f5
1930–2008
Habit0.470.710.280.360.280.28
LR risks0.490.250.570.340.450.48
Prospect0.040.040.150.300.270.24
1950–2008
Habit0.510.490.440.420.460.44
LR risks0.470.420.510.490.450.42
Prospect0.020.100.050.090.090.14

The data are annual stock returns over years shown. Auxiliary models f0 through f5 are described in Table 1.

5 CONCLUSION

We used Bayesian statistical methods proposed by GM to compare the habit persistence asset pricing model of CC, the long-run risks model of BY, and the prospect theory model of BHS. This comparison fills a void in the literature.

We undertook two types of comparisons, relative and absolute, over two sample periods, 1930–2008 and 1950–2008, using three series, trivariate (consumption growth, stock returns, and the price to dividend ratio), bivariate (consumption growth and stock returns), and univariate (stock returns). The prior for each model is that the ergodic mean of the real interest rate be 0.896 within ±1 with probability 0.95 together with a preference for model parameters that are near their published values.

For the univariate series and for both sample periods, the models perform about the same in the relative comparison and fit the data reasonably well in the absolute assessment.

For the bivariate series, in the relative comparison, the long-run risks model dominates over the 1930–2008 period, while the habit persistence model dominates over the 1950–2008 period; in the absolute assessment, the habit model fails in the 1930–2008 period, and the prospect theory model fails in the 1950–2008 period.

For the trivariate series, in the relative comparison the long-run risks model dominates over the 1930–2008 period, while the habit persistence model dominates over the 1950–2008 period; in the absolute assessment, both the habit model and the long-run risks model fail in both periods. The prospect theory model cannot be fitted to a trivariate series because it puts its (conditional) mass on a two-dimensional subspace.

The estimator proposed by GM is a simulation-based estimator. Simulations from a scientific model, which here is either the habit model, the long-run risks model, or the prospect theory model, are used to determine a map η = g(θ) from the parameters θ of the scientific model to the parameters η of an auxiliary model f(yt|xt − 1,η), where yt is the observed data and xt − 1 are lags. Thereafter, (θ) = ∏t = 1nf(yt|xt − 1,g(θ)) is used whenever a likelihood is required. Theory requires that the auxiliary model nest the scientific model. GM argue that one is better served by an auxiliary model that represents the data well. We undertook a sensitivity analysis and recomputed our results for six auxiliary models ordered by complexity. The first produces estimates that mimic values obtained by methods customarily employed in macro/finance. The second represents the data. The sixth nests the three scientific models considered. We find that results can be sensitive to the choice of auxiliary models. Most importantly, results can differ between the model that represents the data well and the model that nests the scientific model. In view of this difference and the fact that theory supports the latter, our view is that the nesting auxiliary model ought to be used. Our substantive conclusions are based on the nesting model.

We found that the computational methods that GM proposed are not sufficiently accurate to compare the habit, long-run risks, and prospect theory models. A contribution of this paper is a refinement of GM's methods that allows comparision of these three models.

References

Bansal
R
Yaron
A
,
Risks For the Long Run: A Potential Resolution of Asset Pricing Puzzles
Journal of Finance
,
2004
, vol.
59
(pg.
1481
-
1509
)
Bansal
R
Gallant
AR
Tauchen
G
,
Rational Pessimism, Rational Exuberance, and Asset Pricing Models
Review of Economic Studies
,
2007
, vol.
74
(pg.
1005
-
1033
)
Bansal
R
Kiku
D
Yaron
A
,
An Empirical Evaluation of the Long-Run Risks Model for Asset Prices
Working Paper 15504
,
2009
National Bureau of Economic Research
 
Barberis
N
Huang
M
Santos
T
,
Prospect Theory and Asset Prices
Quarterly Journal of Economics
,
2001
, vol.
116
(pg.
1
-
54
)
Beeler
J
Campbell
JY
,
The Long-Run Risks Model and Aggregate Asset Prices: An Empirical Assessment
Working Paper 14788
,
2008
National Bureau of Economic Research
 
Campbell
JY
Constantinides
GM
Harris
M
Stulz
RM
,
Consumption-based Asset Pricing
Handbook of the Economics of Finance
,
2003
, vol.
Vol. 1
Amsterdam
Elsevier
(pg.
803
-
887
)
Campbell
JY
Cochrane
J
,
By Force of Habit: A Consumption-based Explanation of Aggregate Stock Market Behavior
Journal of Political Economy
,
1999
, vol.
107
(pg.
205
-
251
)
Engle
RF
Kroner
KF
,
Multivariate Simultaneous Generalized ARCH
Econometric Theory
,
1995
, vol.
11
(pg.
122
-
150
)
Fletcher
R
Practical Methods of Optimization
,
1987
2nd ed.
New York, NY
Wiley
Gallant
AR
McCulloch
RE
,
On the Determination of General Statistical Models with Application to Asset Pricing
Journal of the American Statistical Association
,
2009
, vol.
104
(pg.
117
-
131
)
Gallant
AR
Nychka
DW
,
Semi-Nonparametric Maximum Likelihood Estimation
Econometrica
,
1987
, vol.
55
(pg.
363
-
390
)
Gamerman
D
Lopes
HF
Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference
,
2006
2nd ed.
Boca Raton, FL
Chapman and Hall
Miranda
Maria J
Fackler
Paul L
Applied Computational Economics and Finance. MA
,
2002
Cambridge
MIT Press
Mishkin
Frederick S
,
The Real Rate of Interest: An Empirical Investigation
Carnegie-Rochester Conference Series on Public Policy, The Cost and Consequences of Inflation
,
1981
, vol.
Vol. 15
Amsterdam
Elsevier
(pg.
151
-
200
)
Schwarz
G
,
Estimating the Dimension of a Model
Annals of Statistics
,
1978
, vol.
6
(pg.
461
-
464
)
Working
Holbrook
,
Note on the Correlation of First Differences of Averages in a Random Chain
Econometrica
,
1960
, vol.
28
(pg.
916
-
918
)

Author notes

This work was supported by National Science Foundation Grant Number SES 0438174.