Skip to Main Content
Book cover for Oxford Textbook of Palliative Medicine (5 edn) Oxford Textbook of Palliative Medicine (5 edn)

Contents

Book cover for Oxford Textbook of Palliative Medicine (5 edn) Oxford Textbook of Palliative Medicine (5 edn)
Disclaimer
Oxford University Press makes no representation, express or implied, that the drug dosages in this book are correct. Readers must therefore always … More Oxford University Press makes no representation, express or implied, that the drug dosages in this book are correct. Readers must therefore always check the product information and clinical procedures with the most up to date published product information and data sheets provided by the manufacturers and the most recent codes of conduct and safety regulations. The authors and the publishers do not accept responsibility or legal liability for any errors in the text or for the misuse or misapplication of material in this work. Except where otherwise stated, drug dosages and recommendations are for the non-pregnant adult who is not breastfeeding.

Components of the randomized, controlled trial (RCT) have been part of clinical research for several centuries but the modern concept of the importance of the random selection of a control group can be traced to one of the first studies of streptomycin for pulmonary tuberculosis conducted by the British Medical Research Council and published in 1948 (Anonymous, 1948, 1998). Prior to that time, the evaluation of health care relied on what today would be considered anecdotal evidence. The RCT represents the gold standard for the evaluation of the efficacy of new medical therapies. Despite their status, several potential scientific and ethical difficulties continue to limit the use of RCTs in some clinical contexts such as palliative care, and hinder the generalizability of their results in others. Understanding these limitations and how to apply the results of such studies to clinical care is important.

By presenting the various strengths and limitations that are common to all RCTs and how they apply to trials of palliative care interventions, we will be considering how decisions made regarding trial design, conduct, and analysis can influence the trial’s results. Basic issues in the analysis of clinical trial data as they apply to the interpretation of RCT results will also be considered. This chapter will provide palliative care clinicians with a proper understanding of the structure and inherent problems with clinical trials. No research experiment can ever be perfect, but the information provided by RCTs is extremely useful as a basis for evidence-based approaches to clinical care. With this knowledge, the reader of science should be able to ascertain whether the results of published trials are (1) likely to be valid and (2) likely to apply to their patients.

To maximize the usefulness of clinical trials, study design issues must be considered from conceptualization, through implementation, and ending with the interpretation of results. Decisions made regarding every component of an RCT can dramatically influence the quality of the data and the outcome of a trial. Subtle flaws at any stage may lead to inappropriate conclusions.

Even when a clinical trial is perfectly designed, there is no guarantee of finding the right answer to a specific research question. Because of random variation from a sample of the whole population of interest, there is always a probability of reaching a false positive or a false negative conclusion simply by chance. Statistical analyses are necessary to define the extent of this probability.

A statistical analysis is conducted primarily to (1) summarize the data by estimating the size of the effect being observed that can be attributed to the treatment being tested and (2) to estimate the probability that the results obtained occurred simply by chance. The conventional selection of a p-value cut-off point of 0.05 means that we are willing to accept a 1/20 probability of getting a false positive answer by chance alone. The selection of the power for the study to detect a true change if one exists (often set at 90%) accepts a 10% chance of a false negative. As a result, replication of a trial’s results is always preferable before clinicians can confidently make decisions about patient care, since no single trial should ever be considered definitive proof of the presence or absence of efficacy.

The initial step in designing any trial or understanding the results is to define what research question is being asked. This can seem like a relatively simple process, but it is often a tremendous challenge to replicate a clinical reality in the research setting. To properly design a clinical trial requires reducing an important clinical question into a testable hypothesis. Many clinically relevant questions cannot be studied because the appropriate population is not available, the number required may be prohibitively large, or for ethical issues.

In such situations it is often necessary to modify the research question to one that is more readily answerable or to choose a different clinical study design, such as an observational cohort study. It is important to understand that getting an answer to a different question, may or may not allow its application to the original clinical scenario. Attempting to answer the right question may result in compromises in the study design, sacrificing either precision or protection from bias. In some cases, alteration of the design may reduce the value of the knowledge, and hence alter the risk–benefit calculation used to justify the research (Freedman, 1987; Emanuel et al., 2000). From the outset, an alternative design should be considered if the investigators are not clear about the clinical importance of answering the proposed question.

The choice of an appropriate outcome and choice of the measure to be used to collect the data are important steps. To ensure that the research question can be answered a clinically relevant primary outcome must be selected that can be appropriately measured, analysed, and the results interpreted. This must be done a priori, that is, before the data are collected or analysed. In order to maintain acceptable limits on the probability of arriving at an incorrect result by chance both the clinical importance of the effect of an intervention and the characteristics of the data to be collected must be defined. Although multiple secondary outcomes are often also tested within a single trial, each should be identified at the outset, and none should subsequently be designated as the primary outcome after the data have been collected and analysed (a posteriori).

Choosing outcomes is often dependent on the disease area to be studied. A recently published summary of outcomes created in conjunction with the Agency for Healthcare Research and Quality (AHRQ) have specified areas, including symptom management, quality of life (QOL), function and satisfaction, family burden, and quality-of-death measures. There are a number of evaluative tools that have been used over time or recommended by various groups, but they are beyond the scope of this chapter and well described elsewhere (Mularski et al., 2007). Almost all of the measures used for palliative care studies are patient-reported outcomes, which require a thoughtful approach to collection, analysis, and interpretation.

Part of the process of answering the appropriate clinical question is choosing the correct design of the trial. While there are many different variations on an RCT, the two basic formats are to design a trial to demonstrate an effect (either beneficial or harmful) of an exposure (or treatment) over no exposure; or to design a trial to demonstrate equivalence between two different treatments. In both cases, the goal is to demonstrate a difference in outcome between the treatment groups that usually has a less than 5% probability of occurring by chance (i.e. a p-value < 0.05) and with an appropriate probability of detecting a difference, if one truly exists (i.e. a power of > 80% or 90%). While the statistical issues need to be determined appropriately, the more difficult concerns usually involve other issues in clinical trial design. The majority of this chapter will focus on the other issues involved in the design and interpretation of clinical trials.

An efficacy trial is designed to demonstrate that the two groups are not the same (i.e. also described as rejecting the null hypothesis). This trial benefits from the fact that demonstrating two things are different is substantially easier than proving that two things are the same. The determination of an appropriate sample size is based on established probability calculations that take into consideration the underlying variability of the measurement and the size of the effect to be evaluated, in addition to the appropriate p-value and power.

When there are significant concerns about the difficulties in conducting a placebo-controlled trials, investigators may design a clinical trial to show that a new drug is ‘no worse than’ a treatment that is commonly accepted as effective (i.e. a non-inferiority or equivalency trial). In evaluating therapeutics for conditions in which the risks of placebo assignment are widely regarded as too great, such as thrombolytic agents for acute myocardial infarction or stroke, active-controlled, non-inferiority trials are the standard (Anonymous, 1997a, 1997b).

These trials are presently not considered standard for problems such as hypertension, hyperlipidaemia, and pain, because there are several potential problems with interpreting such studies (Jones et al., 1996; Temple, 1996, 1997; Van De Werf et al., 1999; Fleming, 2000). The first problem is that equivalence trials essentially aim to confirm the conventional null hypothesis of no treatment difference. Since a lack of difference in the outcomes detected between treatment groups can be the result of an inappropriate study design, problems with the study conduct, or other unexpected difficulties with the clinical trial, questions frequently remain about the validity of the results of such trials. This may create inappropriate incentives for conducting ‘sloppy’ research (Ellenberg and Temple, 2000). Secondly, such trials generally require larger numbers of participants, because equivalence or non-inferiority must be documented within relatively narrow margins to be clinically relevant. Thirdly, demonstrating that two treatments have the same outcome does not show that either of them work. This problem is related to ‘assay sensitivity’, because such trials require the assumption that the standard therapy would have proven superior to placebo had a placebo arm been included in the study design used (Temple, 1996). Because of these concerns, current regulatory guidelines still call for placebo-controlled trials to evaluate treatments for problems such as pain and other symptoms (Food and Drug Administration, 2001) where no established gold standard therapy exists.

The most important feature of an experimental clinical trial is the equivalence of the comparison groups at baseline such that any differences measured at the end can be attributed only to the differences between the treatments administered. Random selection of subjects from a sufficiently large single population equally distributes known and unknown factors that might otherwise influence the outcome, such as age, sex, race, and disease severity.

This is in contrast to observational studies (e.g. case–control, cohort, or cross-sectional studies) which depend on nature to set up the experiment. In these types of studies, there is a substantial possibility that known or unknown factors may create differences between the comparison groups at baseline, limiting our ability to attribute any subsequent changes to the treatment. By using statistical methods to adjust for potentially important confounders, some of these factors can be taken into consideration; however, unmeasured confounding or bias can potentially lead to the wrong conclusion from the study. In experimental studies, randomization is the primary mechanism used to create equivalence between the comparison groups. By minimizing the possibility of differences at baseline, an RCT enables investigators to more confidently attribute observed changes over time to the assigned treatments.

True randomization is accomplished by generating a set of random numbers, and distributing them via a mechanism that protects the integrity of the random assignment. A centrally managed randomization scheme may help to ensure consistent application of the procedure across sites and staff. Central control of the randomization will also prevent members of the study team, from knowingly or unknowingly influencing the assignment, especially if they are not be blinded to a patient’s treatment allocation.

Randomization works correctly only when sufficient numbers of patients are enrolled to ensure an equal distribution of all important factors. In smaller trials, or in large, multicentre trials with few participants from a given centre, chance alone may cause significant differences in the distributions of important demographic or disease-related characteristics between groups. In order to reduce the likelihood of such occurrences, investigators can use a block randomization scheme to ensure that selected participant characteristics will be equally distributed. For example, if investigators wished to guarantee an equal sex distribution among two treatment arms at multiple sites, they may randomize in blocks of six participants each, within each of which three participants would be male and three would be female. To assess the success of the randomization process, the analysis of all clinical trials should include a careful comparison of the baseline characteristics of the treatment groups to assure that they are approximately equivalent.

The purpose of the control group is to provide an appropriate comparison for the treatment group, in order to be able to attribute causality to the treatment if a difference is found. The treatment can be as specific as an individual medication such as an opioid (Abernethy et al., 2003), a physical treatment modality such as massage therapy (Kutner et al., 2008), or as complex as a whole-system approach such as inpatient hospice care (Gade et al., 2008). Assuming that the treatment groups are equivalent at baseline (through randomization), then the differences seen in outcome of the groups can be attributed to differences in the treatment. The degree to which blinding is applied to each group is important in defining what aspect of care is being studied and for the ability to interpret the resulting data (see following paragraphs).

There are three primary types of control groups that have a role in clinical trials: (1) a no-treatment control, (2) a placebo control, and (3) an active control. Each type provides different information in comparison to the active treatment group, and the usefulness will depend on the research question that is being explored by the study. The unblinded no-treatment control group can provide information about changes that happen based on the (1) natural history of the disease (i.e. normal variation in the status of any disease state) and (2) regression to the mean (i.e. patients with severe symptoms tend to get better), but does not control for the effect of patient’s knowing their treatment group (i.e. a mind–body interaction). The placebo treated control group, when properly blinded, also controls for the mind–body interaction that occurs either from participating in a clinical trial or because of the subject’s belief in the therapy. The mind–body response is an especially important part of many symptomatic therapy trials. An active control group is best thought of as a standard to test if the design and conduct of the clinical trial was to detect a difference if the study treatment has a real effect. By administering a drug known to be active in the disease being tested, a positive result provides evidence that the study has been properly designed and conducted. If the experimental agent then does not demonstrate an effect, the negative result is more convincing. Conversely, if the active agent does not produce an effect, the design and conduct of the trial are called into question. In this situation, a negative result with the experimental agent is as likely to be due to the problems in the design as a true lack of efficacy.

An unblinded no-treatment or standard of care control group, where participants receive no intervention or a delayed intervention, is applicable in two primary situations. The first situation arises when there are practical and/or ethical problems with using a placebo or sham control. For example, it is often difficult to construct an appropriate sham intervention for many trials of surgical interventions. Even if adequate shams could be constructed, some feel that assigning patients to receive an invasive, but non-active, intervention is unethical (Hrobjartsson and Gotzsche, 2001). A randomly assigned control group is still a major advantage over not having a control group, but the results must be viewed cautiously since there are many factors that can affect a subject’s response to a treatment, when both the subject and the investigator know the group assignment. As discussed in the following paragraphs, it is important that the study staff that collect and record the outcome measures are blinded to the subject’s group.

The second situation occurs when one of the goals of the trial is to determine the magnitude of the mind–body placebo effect. The placebo treated group will have a response that is a mix of the natural history of disease and regression to the mean, along with the mind–body placebo effect. By having a no-treatment-control group, the mind–body placebo effect can be estimated. In a meta-analysis of 114 trials employing both placebo- and no-treatment-controls, the placebo effect was less than might be expected, but in symptomatic relief the effect can be large (Hardy et al., 2012). In pain research, the placebo-control group patients typically have a more favourable outcome than those in no-treatment-control groups (Chaput de Saintonge and Herxheimer, 1994).

A blinded placebo control group is the best known and most widely used of the possible control groups. A placebo is defined as an inactive treatment designed to mimic as closely as possible, the characteristics of the active treatment. The purpose is to have the control group treated exactly the same as the treatment group except for the specific component being tested. The usefulness of a placebo assumes that at least the study subject and data collector will be blinded to the type of therapy being administered. Creating a placebo for a drug trial is relatively straightforward. An inactive substance is formulated to have similar appearance, route of administration, and, if appropriate, taste as the active treatment. Procedure-oriented therapies are much harder to mimic and therefore, it is significantly harder to obtain true blinding (see following paragraphs). In the absence of blinding, the placebo group is equivalent to the no-treatment group.

The primary benefit of using a blinded placebo control group, rather than a no-treatment control group, is that it enables the specific efficacy of the new intervention to be distinguished from the many non-specific effects that occur with most therapies, including the well-known mind–body interaction (also called the placebo effect) Freedman et al., (1996a, 1996b). The placebo control group is a group of patients who are treated with a placebo. The response measured in this group results from all three separate processes, namely (1) natural history of the disease, (2) regression to the mean, and (3) the mind–body interaction. The mind–body interaction is a change in brain function that, at least temporarily, leads to improvement in the bodily signs or symptoms. The mind–body interaction is also sometimes known as a non-specific action of treatment, while the direct effect of the therapy on the disease is known as a specific action of the therapy.

Assuming a simple additive model of treatment effects, the magnitude of the placebo effect in a given study can be estimated by the mean (or median) response in the placebo group and subtract the response in the no-treatment group. In addition, the placebo group response can be subtracted from the mean response in the active treatment group to estimate the specific efficacy of the new intervention. Though the existence of true placebo effects across a broad range of clinical interventions can vary depending on the disease being treated, the treatment modality, and the outcome expectation of the patients, the effect is generally larger in studies of the treatment of symptoms and in the management of pain (Hrobjartsson and Gotzsche, 2001). In such therapy trials, it is unclear if this additive model applies, complicating the assessment of the specific effect (Enck et al., 2011).

A placebo control group assumes the study will be conducted in a double-blind fashion. This helps to avoid the biases that may ensue if patients, investigators, or both, knew who would be receiving which treatment. But there are also costs to using a placebo control. The first, and most obvious, is that placebo-controlled trials require that some patients be given an inactive treatment. This is ethically questionable if there are effective interventions in existence, especially in palliative care trials, and remains a hotly debated topic (Rothman and Michels, 1994; Freedman et al., 1996a, 1996b; Temple and Ellenberg, 2000), and is considered further elsewhere in this book. The second cost to conducting placebo-controlled trials is that, while they remain the gold standard for documenting absolute efficacy, they do not always answer a clinically relevant question. For practising clinicians, who have several symptomatic therapies at their disposal, knowing whether another medication works better than nothing is not as important as knowing how the new therapy compares to the existing standard of care (Halpern and Karlawish, 2000).

Another critical decision for investigators designing trials, and for clinicians who use the data, regards the selection of study participants. There are two conflicting priorities: (1) ensuring similarities between participants in the experimental and control groups and (2) testing a new treatment in a sample of patients likely to reflect all those who could benefit from using the intervention. To meet the first goal, investigators would attempt to enrol patients who are relatively homogenous so there are fewer differences to equalize with randomization. Strict inclusion and exclusion criteria allow greater confidence that the observed differences in outcomes are attributable to the treatments being compared, rather than to undetected confounding variables related to the compositions of participants in each group.

By contrast, meeting the second goal requires enrolling participants from a more heterogeneous population. Because of the large interpersonal variability inherent in such a population, this approach can substantially increase the number of participants required to assure that the trial has adequate statistical power to document a treatment difference, if one exists. If a large enough sample size is available, a heterogeneous sample allows subgroup analyses to be conducted, and so potential variations in a treatment’s efficacy among higher- and lower-risk patients may be identified. As a result, early investigations of efficacy are commonly conducted using a select group of participants, whereas later, more definitive trials attempt to enrol more broadly representative patient samples often termed effectiveness studies. Physicians should, therefore, consider the composition of a given trial’s sample in order to determine the extent to which the results are applicable to their own patients.

In palliative care populations, there is the additional issue of the frailty of the population and the potential lack of stability in their disease state over time. Finding patients who will remain relatively stable for the duration of the trial can be a difficult challenge. In addition, vulnerable populations may make choices that are not always consistent with the goals of a trial, either to participate out of desperation or to not participate because they do not want to be part of an experiment. There is frequently the additional problem of the ability of some patients to understand enough about their disease to be able to give an informed consent. When patients are cognitively challenged in addition, the process of recruitment can become a seemingly overwhelming task. The ethical issues surrounding these problems are considered elsewhere in this book (White and Hardy, 2010).

Over the last century, a growing understanding of the ability of the mind to influence functions of the body, along with the desire to enhance the experimental rigor of clinical trials has increased appreciation of the need for blinding. Recall that the primary goal of a clinical trial is to ensure that any changes between groups seen at the end of the trial may be attributed to a specific treatment being studied. To accomplish this, not only must all comparison groups be similar at the start, but participants in all groups should feel that they have the same probability of getting the real treatment. Thus, the blinding of the study participants is of substantial importance, and investigators must design the study to prevent the participants from unblinding themselves. In particular, if a medication has a specific taste, common side effects, or other distinctive traits, it is important that the placebo treatment mimic these characteristics as closely as possible. Although the evaluation of more invasive interventions for some medical conditions have occasionally used sham procedures (Cobb et al., 1959; Macklin, 1999) this has no place in palliative care research.

In addition to creating a suitable placebo, investigators should plan to determine whether the blinding was maintained by asking participants what treatment they think they received and why. Such questions should be posed to participants occasionally during the trial, and at the trial’s completion (Morin et al., 1995). If the blinding is successful, the participants’ guesses should be no more accurate than chance (e.g. 50% in a typical two-arm trial). Blinding can be difficult to maintain (Karlowski et al., 1975; Brownell and Stunkard, 1982; Howard et al., 1982; Byington et al., 1985; Rabkin et al., 1986; Moscucci et al., 1987; Fisher and Greenberg, 1993; Basoglu et al., 1997). Study participants can often predict their receipt of placebo due to the absence of side effects, or their receipt of the real treatment by noting adverse effects of the intervention. However, common side effects often occur in the placebo group by chance helping to blind the patients (Sanderson et al., 2013).

When only participants are blinded to the treatment received, the study is labelled as a single-blind trial. The standard use of the term double-blind applies when both the participants and investigators are blinded, assuming that the investigator is collecting the outcome data. If the investigator is not collecting the data, it is critical to blind the person who is, so as to minimize the chance that evaluators will more favourably rate those receiving the innovative treatment, thereby biasing the trial toward finding a benefit of that treatment. Even in studies where the subject fills out their own forms, blinding of the investigator remains important to minimize the possibilities that they would impart different levels of enthusiasm, or prescribe different co-interventions, to patients in the different treatment groups.

The statistical power of an RCT to show a difference between treatments is determined by:

1.

The size of the effect—treatment effect (i.e. difference in the response among groups) that is deemed to be clinically important to be able to detect.

2.

The variance—variability of the outcomes in each treatment group.

3.

The p-value (α or type I error rate)—probability of finding a difference in the treatment effect of the size detected in the study or larger by chance alone when there is not a true effect. This is usually chosen to connote statistical significance and typically set at 0.05.

4.

The power (1 − β or 1 − type II error rate)—β is the probability of not finding a difference in the treatment effect of the size detected in the study or smaller by chance alone when there is a true effect. Power is 1 − β and adequate power is usually set to be 80–90% (i.e. β is set to 10–20%)

In general, the size of the sample to be tested (i.e. sample size) is the variable investigators most commonly adjust to obtain adequate power to detect a meaningful treatment difference when one truly exists and the p-value is outcome of the statistical test conducted upon completion of the clinical trial. It is a truism that with a sufficiently large sample size, any real difference between groups, no matter how small or clinically irrelevant, can be shown to be statistically significant. The converse is also true: a large, clinically important difference (CID) between treatments can fail to reach statistical significance when inadequate numbers of participants are enrolled.

The most common method of calculating the sample size required to achieve 80% power (or greater) is to first determine (1) the size of the effect that would be considered clinically important, (2) the anticipated response in the control group, and (3) the expected variability of the outcomes among the groups. This last determination may be particularly difficult to estimate, and should, when possible, be based on evidence from prior studies of similar diseases and/or treatments. An alternate method used when the sample size is fixed is to calculate the size of the effect that would need to be present to produce a statistically significant outcome. This approach is rarely preferable to setting the sample size to detect a specified difference but is commonly used when the available population is fixed.

Another critical decision to be made in planning an RCT involves how to measure the chosen outcome of interest. For example, if investigators are interested in studying the effects of a new antihypertensive agent on systolic and diastolic blood pressure, should they measure these values with a mercury sphygmomanometer or via an arterial line? In addition to how the outcome will be measured, investigators must further consider when and how often to measure the outcome. Are single readings once each week adequate, or should participants be equipped with ambulatory blood pressure monitors to obtain multiple readings throughout the day? Finally, investigators must consider how to account for other variables that could alter the measurement, such as body position, when the blood pressure is assessed. In palliative care, a frequent concern about the outcome is whether to measure a specific symptom (e.g. pain) or sign (e.g. physical function) or the more general outcome of quality of life. Similar to the hypertension model, the question of which measure to use and time period to consider is an important decision to consider in the design of a trial. Ultimately this decision is best made based on the clinical or scientific question that is primary reason for conducting the clinical trial.

Regardless of what measurement technique is chosen, it should be characterized by three features. Firstly, the measurement should be reliable—if the same measure is used repetitively in the same person under identical conditions without this person’s condition changing, then the measure should produce the same results each time. Secondly, the measure should be valid—it should measure exactly what it is intended to measure. Thirdly, the measure should be responsive—it should change over time if the condition being measured has truly changed. Though a full discussion of these concepts is beyond the scope of this chapter, the topics are well covered in many textbooks (Streiner and Norman, 2003). If the outcome measure has been not routinely used in other similar research, its reliability, validity, and responsiveness should be formally tested and documented in the intended population.

The criteria of reliability, validity, and responsiveness also depend on what form the outcome takes. For example, in pain management, the primary goal is to improve the patient’s subjective sense of comfort. For this purpose, investigators might ask a simple question, such as, ‘Do you feel better, yes or no?’ Because such a measure has only two possible responses, it may not provide an adequately responsive measure of pain relief.

To help differentiate the level of response, investigators might ask, ‘What percentage of pain relief do you get from the treatment?’ However, such questions require patients to remember their previous conditions in order to report the change over time. Alternatively, investigators could use a 0–10 numerical rating scale at both the beginning and end of the study to calculate the change in pain over time. Deciding which measurement is most appropriate for a given situation should be informed by considerations of how much change in the measure would be important to the patient, and the ability of the chosen scale to detect such a change.

Another measurement concern in palliative care trials relates to the fact that a change in pain or nausea may only provide one component of an overall change in quality of life. Thus, symptomatic reports may be incomplete surrogate markers for changes in the more complete outcome, the overall quality of life. The use of surrogate markers is a widespread practice in clinical trials. For example, investigators routinely monitor changes in serum cholesterol as a surrogate measure for one of the risk factors for myocardial infarction. However, using this surrogate measure requires making the assumption that a reduction in cholesterol will lead to a reduced risk of myocardial infarction. Similarly, if the use of an experimental analgesic agent relieves pain but produces substantial side effects, the patient may not consider its use as an improvement in quality of life. Therefore, if investigators wish to know an intervention’s effects on both the level of pain and the overall quality of life, then they must employ tools to measure both. Since there is no single measurement strategy that is universally applicable, it is important to carefully consider whether the measured outcome is appropriate to answer the research question being posed in the clinical trial. The systematic reporting of the benefits and harms of an intervention allows a net effect to be calculated.

Like other aspects of a clinical trial, the specific analytic strategy should be defined before commencing the study. Many different analytic approaches are possible and each will produce an answer to a slightly different research question. It is important that the chosen strategy be appropriate to evaluate the primary research question, and be compatible with the numerical distribution of the data collected. The primary role of the analysis is to summarize the data (size of the effect) and to provide an estimate of the likelihood that the result was obtained by chance alone (i.e. statistical significance).

The first, and most important, result of any analysis is a summary value of the size of the effect resulting from the experimental therapy. In RCTs, the size of the effect is estimated by determining a summary value for the primary outcome in each treatment groups, and then calculating the difference between these values to reveal the difference in the treatment effect. There are only two primary forms for the summary value for a set of trial data: (1) the central tendency (e.g. mean, median, or mode) of the response among participants, or (2) the proportion of participants who achieve a defined level of response.

For example, in a hypertension trial, investigators can report the mean change in diastolic blood pressure (central tendency), or the proportion of patients who achieve a diastolic blood pressures below 90 mmHg (proportion of responders). If one were interested in the effect of a hospice intervention on hospital length of stay, it might be acceptable to report either the median time spent in the hospital for each group (central tendency), or the proportion of patients in each group who die before discharge or some other predefined time period. Finally, in trials of pain management, in which the outcome of reported pain symptoms is provided on a numeric scale, investigators might either report the mean response in each group, or the percentage of patients in each group reporting pain reductions of 33% (or 50%) or greater in pain intensity. In each case, the units of these summary values should correspond to the units of the outcome measure.

Choices regarding how to best present the summary measures should reflect the type of information that is most relevant for the scientific or clinical question. In addition, ultimately results will need to have applicability to practising clinicians. For most health-care providers, the question of interest is whether a given treatment will work for a given patient rather than the average change that is likely in a population. The average change does not provide a unique answer to the question of the number of people who are likely to improve. For example, suppose investigators reported that the mean response in the active treatment group was an improvement of 10% on a standard pain scale. This same result could apply to data indicating that (1) every patient in the active treatment group improved by 10% (a unimodal distribution), (2) half of the patients in the treatment group improved by 20% and half had no improvement (a bimodal distribution), or (3) half of the patients in the active treatment group improved by 40%, and half deteriorated by 20% (a bimodal distribution in which some patients improve and other deteriorate). Because these three descriptions of the underlying data could yield strikingly different clinical decisions, it is important to present an analysis of the proportions of patients in each group who improve or deteriorate by a clinically important amount.

A common concern about presenting the proportion of ‘responders’ in each group is the need to define a level of response to be considered clinically important. Thus, the determination of a CID in a patient’s symptoms plays a key role in the interpretation of symptomatic studies. Two methods for determining the CID are ‘expert opinion’, and an assessment of how changes in symptom scales correspond to responses to global questions Jaeschke et al., (1989, 1991; Todd, 1996). Regardless of the method used, however, each requires that a somewhat arbitrary decision be made in defining the scale to be considered the standard.

Recent studies of pain have adopted an alternative method of displaying response data, but graphing the proportion of responders at each possible outcome level for all the groups in a clinical trial (Farrar et al., 2006). This display is a form of a cumulative distribution and allows the readers of the published report to select the level of improvement that they feel is clinically important and then determine the difference between the various groups at that level.

A p-value of 0.05 or higher is the most commonly accepted statistical test of the probability that a given result occurred by chance. However, this value is strictly arbitrary and is an indication that we are willing to accept a 1/20 chance of getting the wrong answer. This traditional method of hypothesis testing, in which p-values are reported to quantify the significance of a result, is gradually being replaced by methods used to gauge the range of plausible results that are compatible with the data. The most common method for presenting this range is to report a point estimate of the effect size, along with a 95% confidence interval around this estimate. A 95% confidence interval will include the true population value of the effect 19 times out of 20 (95%). Thus, it can help readers determine the uncertainty inherent in any result—the narrower the interval, the more precise the estimate of the true effect, and thus, the more confident readers can be that the reported result is ‘correct’.

It is also important to realize that when investigators choose a p-value of 0.05 as an acceptable type I (false positive) error rate, this value only applies to a single comparison between groups. In most clinical trials, however, performing multiple comparisons can be informative. However, the greater the number of comparisons, the more likely it is that at least one of them will be spuriously positive by chance alone. If an a priori decision is made to perform multiple comparisons, the p-value must be adjusted. Of the several available methods for adjusting this value, the simplest is to divide the p-value for one comparison by the number of comparisons to be performed, and to then use this new p-value as the cut-off for statistical significance across all analyses, called the Bonferroni adjustment (Hilsenbeck and Clark, 1996). While valid, this is a very conservative estimate, and alternative methods have been developed to deal with multiple comparisons (Liu et al., 1997).

A related issue regards the distinction between comparisons chosen a priori, and those which investigators choose to conduct post hoc, or after the data have been collected. There are times when post hoc comparisons can be informative, but the results of such analyses should never be considered conclusive because they were not explicitly planned at the outset. Rather, results of post hoc analyses may be considered exploratory, intended to guide future investigations by defining or refining new hypotheses. Authors can help highlight this distinction by reporting which comparisons were chosen a priori, and which were not.

In addition to the reported effect size and statistical significance of the primary outcome results, corroborative evidence from secondary outcome analyses should be used to support a study’s hypothesis. If multiple related measures are obtained, and the analyses of each show similar results, then it is less likely that any one of the positive results arose by chance. While there is no specific statistical test to document this phenomenon, showing that multiple related measures, all producing similar types of effects, lends support to the conclusions drawn from the primary outcome.

Evaluating side effects of interventions tested in clinical trials is subject to the same considerations as those used to evaluate measures of efficacy. It is important to compare the relative incidence in the active treatment and control groups. Differing rates of side effect must be evaluated with caution, since clinical trials are rarely powered to detect such differences, and the large number of different side effects that are possible make it likely that one or more of the differences observed, will be due to chance. Such differences should not be ignored; but observing similar results in multiple trials can increase one’s confidence that the findings may be specifically attributed to the treatment received.

Given that many components of a trial are central to interpreting the results, it is vital that trial reports be accurate, complete, and objective in their presentation of all important aspects of the trial. In particular, the a priori hypothesis should be clearly stated, and the discovery of other findings properly identified. All randomized participants must be accounted for in the publication, and an intention-to-treat analysis of all participants is typically appropriate. Subsequent subgroup analyses can focus on those who complete the trial, but these should not be considered as the primary result. A careful description of the randomization and blinding procedures is also important to assure readers that the trial was properly conducted. Finally, brief descriptions of the rationale behind the choice of measurement tools and analytic strategies can be helpful.

There is now good evidence for a publication bias against negative studies, since authors prefer to write up positive ones and editors prefer to publish them (Begg and Berlin, 1988; Reidenberg, 1998). This can lead to difficulties for clinicians who want a true picture of the nature of the evidence for a particular treatment.

There are several issues inherent in the design and conduct of RCTs that may threaten the internal validity of the results—that is, the likelihood that the treatment comparison is free from bias. Furthermore, even when the comparison is internally valid, the external validity, or generalizability of the results, can be limited. Finally, because the conditions in which trials are conducted only weakly approximate clinical reality, physicians must be cautious in using the results as the only guide in clinical decision-making. We will briefly discuss each of these potential problems in the following paragraphs. More detailed discussions of these issues are provided by Feinstein (1983) and by Kramer and Shapiro (1984).

Under-enrolment occurs when too few research participants are enrolled to provide adequate statistical power to answer the study’s primary research questions. The inability to recruit sufficient numbers of eligible patients is the most common cause of insufficient statistical power in RCTs (Freiman et al., 1978; Altman, 1980; Collins et al., 1980; Meinert, 1986; Hunninghake et al., 1987; Nathan, 1999). Such under-enrolment has been attributed to characteristics of (1) clinicians who refer their patients Taylor et al., (1984, 1994; Taylor, 1992), (2) patients who choose to be screened (Greenlick et al., 1979) or enrolled (Barofky and Sugarbaker, 1979), (3) investigators who design the trials (Collins et al., 1984), and (4) institutions at which the trials are conducted (Begg et al., 1982; Shea et al., 1992).

Among the challenges to adequate participant recruitment, potential participants’ reluctance to enrol in RCTs is likely to be the most formidable, especially in palliative care populations. Ways of addressing these issues specifically in palliative care studies have been carefully codified (LeBlanc et al., 2013). It has been observed that patients are generally less willing to participate in RCTs than in non-randomized, observational studies (Kramer and Shapiro, 1984). In addition to yielding unacceptably high probabilities for type II errors, the resulting under-enrolment substantially reduces the trial’s precision in quantifying the treatment effect.

Even when properly designed and carefully conducted, clinical trials can only provide information specific to the population from which the study participants were drawn. Selective enrolment occurs when particular subgroups within the target population enrol in proportions greater or lesser than their representation in that population (Mant, 1999; Halpern et al., 2001). If this population does not include, for example, elderly patients, or children, then applying the results to these clinical populations requires extrapolation. While extrapolating results may sometimes be reasonable, it must always be done cautiously because both the beneficial and adverse effects of an intervention can vary across populations.

The level of response detected by a single trial will depend on the patient population enrolled. For example, when first studying a novel treatment for a condition for which there is no adequately effective treatment, all patients with the condition are more likely to be willing to volunteer. By including people with relatively early or mild symptoms, the response rate may be higher than expected, although the response in the placebo group may also be larger. In contrast, when a treatment is tested in a population where an effective treatment already exists, only people who do not obtain a response to the available treatments are likely to enrol. This more recalcitrant group may have a lower response rate than expected in the total population, thereby underestimating the treatment’s potential usefulness.

In RCTs, participants may not adhere completely to their prescribed treatment regimens (Kramer and Shapiro, 1984). Especially if participants believe they are receiving a non-preferred treatment, their enthusiasm for the trial, and subsequent adherence to their assigned treatment, may wane. This is further complicated if the participants have access to and decide to take either the experimental therapy or a concomitant additional therapy outside of the trial. This occurs more frequently in trials where participants are able to overcome the blinding. There is accumulating evidence that a significant number of study participants make concerted efforts to unblind themselves, and that participants who become aware of their treatment assignment maybe more likely to drop out of the study. For example, many participants assigned to the placebo groups in both the initial phase II trial of AZT (azidothymidine/zidovudine) for patients with AIDS (Fischl et al., 1987), and in a randomized trial of vitamin E for patients with Alzheimer’s disease (Sano et al., 1997), appear to have become unblinded, and even to have obtained the active agents outside the trial (Kodish et al., 1990; Epstein, 1996; Karlawish and Whitehouse, 1998). Even more problematically, widespread unblinding in one AIDS Clinical Trial Group study (Volberding et al., 1990) not only allowed approximately 9% of those assigned to the placebo to receive AZT, but contributed to the drop-out rate in the placebo group being one-third higher than it was in the active treatment group (Merigan, 1990).

Participant non-adherence and drop-out can substantially bias the results of a trial (Peto et al., 1995). Though intention-to-treat analyses may mitigate this bias, if non-adherence or drop-out rates are higher in one group than in the other, such analyses may also prevent a true effect of treatment from being detected. Thus, investigators should make concerted efforts to monitor participant adherence and drop-outs. When such problems exist, the results of the trial must be interpreted cautiously.

As with all clinical research, palliative care studies require informed consent of the participants and, when cognitive impairment is an issue, from the appropriate family member or medical surrogate. Especially in situations where curative therapies are not likely to be effective, there are a number of important issues to consider and detailed discussions of this topic are covered in Section 5 of this book. The most important issue is to consider the balance between the ethical issues of right of the individual to receive compassionate care and the needs of the population for information on the efficacy and safety of specific therapies. When conducting clinical trials, the investigator must carefully protect the rights and well-being of the subjects in the study. One possible alternative is the use of innovative approaches for the conduct of clinical trials (Streiner, 2007). Although beyond the scope of this chapter, trials designs such as response-adaptive randomization procedures (e.g. ‘play-the-winner’ or ‘drop-the-loser’) may be more ethically appropriate for the testing of therapies in conditions that may have significant consequences on the quantity and/or quality of life that may result in the palliative care population. Such designs focus on minimizing the expected treatment failures while maintaining the power and randomization benefits (Rosenberger and Huc, 2004). ‘Add-on’ trials, where a new treatment is added to the current treatments the patient is receiving, may reduce the consequences to the individual patients from being randomized to a placebo treatment. Building in rescue strategies to the trial design can also reduce the potential risk to study participants (Boers, 2003). Crossover trial designs may also be useful in studies of diseases but are only applicable when symptoms are relatively stable. Using patients as their own control markedly increases the power of the study, but concerns about carryover effects between treatment periods are a serious risk to the validity of the study (Garcia et al., 2004; Simon and Chinchilli, 2007).

This chapter has outlined several fundamental considerations for investigators planning clinical trials, and for clinicians attempting to discern the applicability of such trials to their practices. There has been special consideration to the nuances of clinical trials for palliative care interventions. In summary, randomized, controlled trials remain the best available means of evaluating novel palliative care interventions, and for determining how these interventions may be optimally used. Despite the strengths of the design, readers of trial reports should be mindful of the many difficulties inherent in extrapolating from the results obtained in a trial setting to the use of these same interventions in clinical practice. Further advances in our understanding of how best to apply clinical trials in the evaluation of pain and palliative care will depend on improved understanding of the underlying pathophysiology and the anatomy of clinical trials (Farrar, 2010).

Anonymous (

1948
).
Streptomycin treatment of pulmonary tuberculosis.
 
British Medical Journal
, 2, 769–782.

Chaput De Saintonge, D. M. and Herxheimer, A. (

1994
).
Harnessing placebo effects in health care.
 
The Lancet
, 344, 995–998.

Farrar, J.T. (

2010
).
Advances in clinical research methodology for pain clinical trials.
 
Nature Medicine
, 16, 1284–1293.

Freiman, J.A., Chalmers, T.C., Smith, H., Jr., and Kuebler, R.R. (

1978
).
The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 ‘negative’ trials.
 
The New England Journal of Medicine
, 299, 690–694.

Hardy, J., Quinn, S., Fazekas, B., et al. (

2012
).
Randomized, double-blind, placebo-controlled study to assess the efficacy and toxicity of subcutaneous ketamine in the management of cancer pain.
 
Journal of Clinical Oncology
, 30, 3611–3617.

Jaeschke, R., Singer, J., and Guyatt, G. H. (

1989
).
Measurement of health status. Ascertaining the minimal clinically important difference.
 
Controlled Clinical Trials
, 10, 407–415.

Leblanc, T.W., Lodato, J.E., Currow, D.C., and Abernethy, A.P. (

2013
).
Overcoming recruitment challenges in palliative care clinical trials.
 
Journal of Oncology Practice
, 9, 277–282.

Mularski, R.A., Rosenfeld, K., Coons, S.J., et al. (

2007
).
Measuring outcomes in randomized prospective trials in palliative care.
 
Journal of Pain and Symptom Management
, 34, S7–S19.

Sanderson, C., Hardy, J., Spruyt, O., and Currow, D.C. (

2013
).
Placebo and nocebo effects in randomized controlled trials: the implications for research and practice.
 
Journal of Pain and Symptom Management
, 46, 722–730.

Streiner, D.L. (

2007
).
Alternatives to placebo-controlled trials.
 
Canadian Journal of Neurological Sciences
, 34(Suppl. 1), S37–41.

Taylor, K.M., Feldstein, M.L., Skeel, R.T., Pandya, K.J., Ng, P., and Carbone, P.P. (

1994
).
Fundamental dilemmas of the randomized clinical trial process: results of a survey of the 1,737 Eastern Cooperative Oncology Group investigators.
 
Journal of Clinical Oncology
, 12, 1796–1805.

Temple, R. and Ellenberg, S.S. (

2000
).
Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 1: ethical and scientific issues.
 
Annals of Internal Medicine
, 133, 455–463.

White, C. and Hardy, J. (

2010
).
What do palliative care patients and their relatives think about research in palliative care? A systematic review.
 
Supportive Care in Cancer
, 18, 905–911.

Abernethy, A.P., Currow, D.C., Frith, P., Fazekas, B.S., McHugh, A., and Bui, C. (

2003
).
Randomised, double blind, placebo controlled crossover trial of sustained release morphine for the management of refractory dyspnoea.
 
BMJ
, 327, 523–528.

Altman, D.G. (

1980
).
Statistics and ethics in medical research: III. How large a sample?
 
British Medical Journal
, 281, 1336–1338.

Anonymous (

1997
a).
A comparison of continuous infusion of alteplase with double-bolus administration for acute myocardial infarction. The Continuous Infusion versus Double-Bolus Administration of Alteplase (COBALT) Investigators.
 
The New England Journal of Medicine
, 337, 1124–1130.

Anonymous (

1997
b).
A comparison of reteplase with alteplase for acute myocardial infarction. The Global Use of Strategies to Open Occluded Coronary Arteries (GUSTO III) Investigators.
 
The New England Journal of Medicine
, 337, 1118–1123.

Anonymous (

1998
).
Fifty years of randomised controlled trials.
 
BMJ
, 317, 7167.

Barofky, I. and Sugarbaker, P.H. (

1979
).
Determinants of patient nonparticipation in randomized clinical trials for the treatment of sarcomas.
 
American Journal of Clinical Oncology
, 2, 237–246.

Basoglu, M., Marks, I., Livanou, M., and Swinson, R. (

1997
).
Double-blindness procedures, rater blindness, and ratings of outcome. Observations from a controlled trial.
 
Archives of General Psychiatry
, 54, 744–748.

Begg, C.B. and Berlin, J.A. (

1988
).
Publication bias: a problem in interpreting medical data.
 
Journal of the Royal Statistical Society. Series A (Statistics in Society)
, 151, 419–463.

Begg, C.B., Carbone, P.P., Elson, P.J., and Zelen, M. (

1982
).
Participation of community hospitals in clinical trials: analysis of five years of experience in the Eastern Cooperative Oncology Group.
 
The New England Journal of Medicine
, 306, 1076–1080.

Boers, M. (

2003
).
Add-on or step-up trials for new drug development in rheumatoid arthritis: a new standard?
 
Arthritis & Rheumatism
, 48, 1481–1483.

Brownell, K.D. and Stunkard, A.J. (

1982
).
The double-blind in danger: untoward consequences of informed consent.
 
American Journal of Psychiatry
, 139, 1487–1489.

Byington, R.P., Curb, J.D., and Mattson, M.E. (

1985
).
Assessment of double-blindness at the conclusion of the beta-Blocker Heart Attack Trial.
 
Journal of the American Medical Association
, 253, 1733–1736.

Cobb, L.A., Thomas, G.I., Dillard, D.H., Merendino, K.A., and Bruce, R.A. (

1959
).
An evaluation of internal-mammary-artery ligation by a double-blind technic.
 
The New England Journal of Medicine
, 260, 1115–1118.

Collins, J.F., Bingham, S.F., Weiss, D.G., Williford, W.O., and Kuhn, R.M. (

1980
).
Some adaptive strategies for inadequate sample acquisition in Veterans Administration cooperative clinical trials.
 
Controlled Clinical Trials
, 1, 227–248.

Collins, J.F., Williford, W.O., Weiss, D.G., Bingham, S.F., and Klett, C.J. (

1984
).
Planning patient recruitment: fantasy and reality.
 
Statistics in Medicine
, 3, 435–443.

Ellenberg, S.S. and Temple, R. (

2000
).
Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 2: practical issues and specific cases.
 
Annals of Internal Medicine
, 133, 464–470.

Emanuel, E.J., Wendler, D., and Grady, C. (

2000
).
What makes clinical research ethical?
 
Journal of the American Medical Association
, 283, 2701–2711.

Enck, P., Klosterhalfen, S., Weimer, K., Horing, B., and Zipfel, S. (

2011
).
The placebo response in clinical trials: more questions than answers.
 
Philosophical Transactions of the Royal Society of London—Series B: Biological Sciences
, 366, 1889–1895.

Epstein, S. (

1996
).
Impure Science: AIDS, Activism, and the Politics of Knowledge
, Berkeley, CA: Berkeley Press.

Farrar, J.T., Dworkin, R.H., and Max, M.B. (

2006
).
Use of the cumulative proportion of responders analysis graph to present pain data over a range of cut-off points: making clinical trial data more understandable.
 
Journal of Pain & Symptom Management
, 31, 369–377.

Feinstein, A.R. (

1983
).
An additional basic science for clinical medicine: II. The limitations of randomized trials.
 
Annals of Internal Medicine
, 99, 544–550.

Fischl, M.A., Richman, D.D., Grieco, M.H., et al. (

1987
).
The efficacy of azidothymidine (AZT) in the treatment of patients with AIDS and AIDS-related complex. A double-blind, placebo-controlled trial.
 
The New England Journal of Medicine
, 317, 185–191.

Fisher, S. and Greenberg, R.P. (

1993
).
How sound is the double-blind design for evaluating psychotropic drugs?
 
Journal of Nervous & Mental Disease
, 181, 345–350.

Fleming, T.R. (

2000
).
Design and interpretation of equivalence trials.
 
American Heart Journal
, 139, S171–176.

Food and Drug Administration (

2001
).
Guidance for Industry: E 10: Choice of Control Group and Related Issues in Clinical Trials
. Rockville, MD: Department of Health and Human Services.

Freedman, B. (

1987
).
Scientific value and validity as ethical requirements for research: a proposed explication.
 
IRB
, 9, 7–10.

Freedman, B., Glass, K.C., and Weijer, C. (

1996
a).
Placebo orthodoxy in clinical research. II: Ethical, legal, and regulatory myths.
 
Journal of Law, Medicine & Ethics
, 24, 252–259.

Freedman, B., Weijer, C., and Glass, K. C. (

1996
b).
Placebo orthodoxy in clinical research. I: Empirical and methodological myths.
 
Journal of Law, Medicine & Ethics
, 24, 243–251.

Gade, G., Venohr, I., Conner, D., et al. (

2008
).
Impact of an inpatient palliative care team: a randomized control trial.
 
Journal of Palliative Medicine
, 11, 180–190.

Garcia, R., Benet, M., Arnau, C., and Cobo, E. (

2004
).
Efficiency of the cross-over design: an empirical estimation.
 
Statistics in Medicine
, 23, 3773–3780.

Greenlick, M.R., Bailey, J.W., Wild, J., and Grover, J. (

1979
).
Characteristics of men most likely to respond to an invitation to be screened.
 
American Journal of Public Health
, 69, 1011–1015.

Halpern, S.D. and Karlawish, J.H. (

2000
).
Placebo-controlled trials are unethical in clinical hypertension research.
 
Archives of Internal Medicine
, 160, 3167–3169.

Halpern, S.D., Metzger, D.S., Berlin, J.A., and Ubel, P.A. (

2001
).
Who will enroll? Predicting participation in a phase II AIDS vaccine trial.
 
Journal of Acquired Immune Deficiency Syndromes: JAIDS
, 27, 281–288.

Hilsenbeck, S.G. and Clark, G.M. (

1996
).
Practical p-value adjustment for optimally selected cutpoints.
 
Statistics in Medicine
, 15, 103–112.

Howard, J., Whittemore, A.S., Hoover, J.J., and Panos, M. (

1982
).
How blind was the patient blind in AMIS?
 
Clinical Pharmacology & Therapeutics
, 32, 543–553.

Hrobjartsson, A. and Gotzsche, P.C. (

2001
).
Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment.
 
The New England Journal of Medicine
, 344, 1594–1602.

Hunninghake, D.B., Darby, C.A., and Probstfield, J.L. (

1987
).
Recruitment experience in clinical trials: literature summary and annotated bibliography.
 
Controlled Clinical Trials
, 8, 6S–30S.

Jaeschke, R., Guyatt, G.H., Keller, J., and Singer, J. (

1991
).
Interpreting changes in quality-of-life score in N of 1 randomized trials.
 
Controlled Clinical Trials
, 12, 226S–233S.

Jones, B., Jarvis, P., Lewis, J.A., and Ebbutt, A.F. (

1996
).
Trials to assess equivalence: the importance of rigorous methods.
[Erratum appears in
BMJ
1996, 313(7056), 550]. BMJ, 313, 36–39.

Karlawish, J.H. and Whitehouse, P.J. (

1998
).
Is the placebo control obsolete in a world after donepezil and vitamin E?
 
Archives of Neurology
, 55, 1420–1424.

Karlowski, T.R., Chalmers, T.C., Frenkel, L.D., Kapikian, A.Z., Lewis, T.L., and Lynch, J.M. (

1975
).
Ascorbic acid for the common cold. A prophylactic and therapeutic trial.
 
Journal of the American Medical Association
, 231, 1038–1042.

Kodish, E., Lantos, J.D., and Siegler, M. (

1990
).
Ethical considerations in randomized controlled clinical trials.
 
Cancer
, 65, 2400–2404.

Kramer, M.S., and Shapiro, S.H. (

1984
).
Scientific challenges in the application of randomized trials.
 
Journal of the American Medical Association
, 252, 2739–2745.

Kutner, J.S., Smith, M.C., Corbin, L., et al. (

2008
).
Massage therapy versus simple touch to improve pain and mood in patients with advanced cancer: a randomized trial.
 
Annals of Internal Medicine
, 149, 369–379.

Liu, Q., Li, Y., and Boyett, J.M. (

1997
).
Controlling false positive rates in prognostic factor analyses with small samples.
 
Statistics in Medicine
, 16, 2095–2101.

Macklin, R. (

1999
).
The ethical problems with sham surgery in clinical research.
 
The New England Journal of Medicine
, 341, 992–996.

Mant, D. (

1999
).
Can randomised trials inform clinical decisions about individual patients?
 
The Lancet
, 353, 743–746.

Meinert, C.L. (

1986
). Patient recruitment and enrollment. In
Clinical Trials: Design, Conduct, and Analysis
, pp. 149–158. New York: Oxford University Press.

Merigan, T.C. (

1990
).
You can teach an old dog new tricks. How AIDS trials are pioneering new strategies.
 
The New England Journal of Medicine
, 323, 1341–1343.

Morin, C.M., Colecchi, C., Brink, D., Astruc, M., Mercer, J., and Remsberg, S. (

1995
).
How “blind” are double-blind placebo-controlled trials of benzodiazepine hypnotics?
 
Sleep
, 18, 240–245.

Moscucci, M., Byrne, L., Weintraub, M., and Cox, C. (

1987
).
Blinding, unblinding, and the placebo effect: an analysis of patients’ guesses of treatment assignment in a double-blind clinical trial.
 
Clinical Pharmacology & Therapeutics
, 41, 259–265.

Nathan, R.A. (

1999
).
How important is patient recruitment in performing clinical trials?
 
Journal of Asthma
, 36, 213–216.

Peto, R., Collins, R., and Gray, R. (

1995
).
Large-scale randomized evidence: large, simple trials and overviews of trials.
 
Journal of Clinical Epidemiology
, 48, 23–40.

Rabkin, J.G., Markowitz, J.S., Stewart, J., et al. (

1986
).
How blind is blind? Assessment of patient and doctor medication guesses in a placebo-controlled trial of imipramine and phenelzine.
 
Psychiatry Research
, 19, 75–86.

Reidenberg, M.M. (

1998
).
Decreasing publication bias.
 
Clinical Pharmacology & Therapeutics
, 63, 1–3.

Rosenberger, W.F. and Huc, F. (

2004
).
Maximizing power and minimizing treatment failures in clinical trials.
 
Clinical Trials
, 1, 141–147.

Rothman, K.J. and Michels, K.B. (

1994
).
The continuing unethical use of placebo controls.
 
The New England Journal of Medicine
, 331, 394–398.

Sano, M., Ernesto, C., Thomas, R.G., et al. (

1997
).
A controlled trial of selegiline, alpha-tocopherol, or both as treatment for Alzheimer’s disease. The Alzheimer’s Disease Cooperative Study.
 
The New England Journal of Medicine
, 336, 1216–1222.

Shea, S., Bigger, J.T., Jr., Campion, J., et al. (

1992
).
Enrollment in clinical trials: institutional factors affecting enrollment in the cardiac arrhythmia suppression trial (CAST).
 
Controlled Clinical Trials
, 13, 466–486.

Simon, L.J. and Chinchilli, V.M. (

2007
).
A matched crossover design for clinical trials.
 
Contemporary Clinical Trials
, 28, 638–646.

Streiner, D.L. and Norman, G.R. (

2003
).
Health Measurement Scales: A Practical Guide to Their Development and Use
. New York: Oxford University Press.

Taylor, K.M. (

1992
).
Physician participation in a randomized clinical trial for ocular melanoma.
 
Annals of Ophthalmology
, 24, 337–344.

Taylor, K.M., Margolese, R.G., and Soskolne, C.L. (

1984
).
Physicians’ reasons for not entering eligible patients in a randomized clinical trial of surgery for breast cancer.
 
New England Journal of Medicine
, 310, 1363–1367.

Temple, R. (

1996
).
Problems in interpreting active control equivalence trials.
 
Accountability in Research
, 4, 267–275.

Temple, R.J. (

1997
).
When are clinical trials of a given agent vs. placebo no longer appropriate or feasible?
 
Controlled Clinical Trials
, 18, 613–620.

Todd, K.H. (

1996
).
Clinical versus statistical significance in the assessment of pain relief.
 
Annals of Emergency Medicine
, 27, 439–441.

Van De Werf, F., Adgey, J., Ardissino, D., et al. (

1999
).
Single-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infarction: the ASSENT-2 double-blind randomised trial.
 
The Lancet
, 354, 716–722.

Volberding, P.A., Lagakos, S.W., Koch, M.A., et al. (

1990
).
Zidovudine in asymptomatic human immunodeficiency virus infection. A controlled trial in persons with fewer than 500 CD4-positive cells per cubic millimeter. The AIDS Clinical Trials Group of the National Institute of Allergy and Infectious Diseases.
 
The New England Journal of Medicine
, 322, 941–949.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close