Reading Is Believing? A Guide to Clinical Trial Reports

Matt Sydes and Janet Darbyshire, MBChB

Anyone who reads medical journals or other health-related publications, such as BETA, will have come across reports of clinical trials, an essential tool in evidence-based medicine. Clinical trials are a crucial step in medical research and development (R&D); they are used to test new drugs or treatments for a disease or condition, novel assays (biological tests), and interventions such as vaccines to prevent disease. Before full approval can be granted for any new treatment, procedure, or vaccine, its potential risks and benefits must be carefully evaluated. Potential benefits may include curing the disease, prolonging life, preventing or treating complications of the disease, or improving quality of life. These must be weighed against potential risks such as side effects, both serious and minor, and their impact on quality of life.

The term clinical trial is used to cover a wide range of different types of study. This article aims to give a brief overview of the elements required for a well-designed clinical trial and criteria readers can use to assess the quality of clinical trial data they come across in the literature.

Clinical Trial Stages

Phase I trial: A new treatment or procedure is tested for the first time in people (after laboratory and animal testing) primarily to determine how safe it is. Researchers also study how best to administer the new treatment (e.g., by mouth or by injection), its side effects, and what dose should be given. Phase I trials generally involve a small number (10–50) of healthy volunteers, that is, people who do not have the disease for which the treatment is being tested. These trials also may be done—particularly with potentially toxic drugs, such as those used for cancer—in people with a disease, for example, those who are no longer responding to other treatment options.

Phase II trial: Next, the experimental treatment is tested on a larger group (40-100) of participants with the condition to assess its activity against some measure of a disease (e.g., CD4 cell count or viral load in HIV infection) and to collect more details on its safety and toxicity. Although these trials are often not randomized (treatment arms assigned by chance; see page XXX), more can be learned about the potential new treatment if they are.

Phase III trial: In the next step, the treatment is tested against current standard therapy in a large number (hundreds or thousands) of volunteers with the disease in a randomized controlled trial. In some trials, the experimental treatment is compared to a placebo (a ‘dummy’ treatment such as a sugar pill, see page XXX). Whether a participant receives the test (experimental) or standard (control) treatment or the placebo is decided at random, ensuring that similar groups of people will receive each treatment. These trials primarily compare how effective the treatments are, as well as comparing side effects, adherence, and effect on quality of life.

Phase IV: These trials are done once the treatment has been approved (in the U.S., by the Food and Drug Administration [FDA]) and marketed. They collect information on the use of the treatment in real-life situations in very large numbers of people, for example, long-term side effects or interactions with other treatments. In some cases uncommon side effects may not show up during a Phase III trial, and may only become apparent once a treatment is in widespread use.

How Are Treatments Tested?

Treatments can be tested in many ways, some of which are much more reliable than others. Those that are less reliable, and which may even give erroneous results, are often said to be subject to bias. Bias is any factor that favors some outcomes over others and that may produce misleading results. For example, if a physician selects the treatment for each of her patients, she may give a new treatment to those who are sicker and the standard treatment to those who are healthier. In this scenario, the results for the new treatment may not look as good as they might have if the treatment had been given to healthier subjects, because people who are less healthy tend to be less likely to do well on treatment. This type of subject selection is a form of bias. In a clinical trial, the experimental and control treatments should be randomized, that is, given to similar groups of people (see page XXX). Bias sometimes can arise unintentionally and is most likely to occur in nonrandomized trials. The more common types of nonrandomized study are described below.

Nonrandomized Studies

Cross-Sectional Analyses

A cross-sectional analysis is effectively a survey of current treatment with regard to efficacy and toxicity, and is based on data collected retrospectively (i.e., by looking at events that happened in the past). This backward-looking approach can result in a lack of consistency in the available data, leading to more missing data and increasing the potential for misleading results. Conversely, prospective trials detail in advance the type of information that will be collected and participants are followed forward in time, thus allowing systematic recording of the desired data.

Case Series

Case series involve the follow-up of a number of people, generally from a single health center, who have all been treated in the same way. Although this method may give some idea of the activity and safety of a particular treatment as administered in a particular place, it does not offer data on other treatments with which to compare these results, and therefore cannot help to clarify if the treatment in question is better or worse than any others.

Comparisons of case studies (between-group or between-center comparisons) typically involve comparing case series from two or more health-care providers or clinical centers that use different treatments (e.g., a comparison of people receiving Treatment A at Clinic 1 with those receiving Treatment B at Clinic 2), or comparing two groups of participants given two different treatments by one physician. Suppose the results for Treatment A appeared better than those for B. Does this mean that Treatment A is better than Treatment B? Not necessarily. This may not have been a fair comparison because biases may have crept into the design. For example, it might be that physicians at Clinic 2 were treating a greater proportion of people with more advanced disease or people who were older. Or Clinic 2 might be a specialist center where more difficult cases were likely to be referred. Any of these factors, and many more, could have biased the results of the comparison against Treatment B, making Treatment A look better.

Historical Controls

In a study using historical controls, a group of people receiving a certain treatment at one or more clinical centers is compared with an apparently similar group of people in the past who received other treatments in order to answer the question, ‘Is the treatment which is being used now better than the treatment that was used before?’ A limitation of this type of study is that any differences seen may actually be due to factors that were not controlled for, or taken into account, in the comparison.

For example, suppose that the development of AIDS were selected as the primary endpoint (a marker of disease progression) to assess a treatment for HIV infection. If the efficacy of a new anti-HIV drug were evaluated by comparing it with AZT (Retrovir) therapy in the early days of the epidemic, when most of the initial manifestations of AIDS were episodes of Pneumocystis carinii pneumonia (PCP) or Kaposi’s sarcoma (KS), the new treatment might look much better than AZT—not because it is better, but because many episodes of PCP are now prevented by effective primary prophylaxis (preventive therapy).

Case series, comparisons between groups or centers, and historical controls each clearly have limitations and may produce inaccurate data. While they might identify useful questions for future research, they do not provide reliable answers themselves.

Randomized Controlled Trials

Researchers need to know if a new treatment works, whether it is as good as or better than the current treatment, and how safe it is. Fortunately, there is a very good method for comparing treatments that adequately addresses the concerns raised above. This method, considered to be the gold standard of clinical research, is the randomized controlled trial (RCT). Randomized means that the decision about which treatment each trial volunteer receives is made not by the researcher or by the participant, but is made at random, like tossing a coin. Controlled means that one of the trial arms (study groups) is given the current standard (control) treatment to provide a comparable reference group. Thus a new, promising treatment typically will be compared with the standard treatment. New combinations or dosages of already approved drugs also may be compared with standard combinations or dosages; this is often the case in HIV-related trials. If there is no effective standard treatment, the control group may receive no treatment or a placebo (described on page XXX).

By removing the potentially confounding, or corrupting, influences that can be introduced through personal choice (e.g., physician bias), randomization minimizes the potential biases associated with the selection of participants for the different study arms. A number of randomization methods are available, some of which are better than others and some of which are more appropriate for different types of trials. The most secure, because it is difficult to tamper with, uses a computer-based system that automatically randomizes individuals based on telephone calls or faxes to a central office (details are faxed to a secure location without explicitly revealing an individual’s identity).

The possible differences between persons treated at Clinics 1 and 2 were a potential problem in the theoretical scenario discussed earlier. By using randomization, researchers can ensure that Clinics 1 and 2 each will give both Treatments A and B. Furthermore, the randomization can be stratified so that they give Treatments A and B to similar groups of subjects. As a result, any differences seen subsequently are more likely to be due to true differences between the treatments than to variability among the participants. Not only does randomization balance known factors that may affect trial outcomes, but it also balances unanticipated and therefore often unmeasured factors.

Blinding and Placebos

Trials are blinded, or masked, if the participants do not know which treatment they are receiving. A single-blind trial means that the volunteers do not know what trial treatment they were assigned to take, but their physicians (and/or those conducting the study) do know. In a double-blind trial, neither volunteers nor physicians nor those conducting the study know which treatment an individual is taking. When the trial reaches a predefined point, the code is broken and the trial is unblinded. Those analyzing the study data, of course, know which trial treatments are being used by whom. Double-blind clinical trials are considered to be the most free of bias.

Placebos, or dummy treatments, are often used to blind a trial. Sometimes the fact of receiving a treatment – even if it is only a sugar pill – seems to help some people; this is known as the placebo effect. Placebos ideally should be similar in every way – appearance, taste, smell, etc. – to the actual experimental treatment so that decisions made by trial participants and their physicians about the progress of their disease or apparent side effects are not biased by knowing what they are taking. For example, if a person in a trial who knows she is not taking the new drug wakes up in the morning with a headache, she will probably decide it is due to a reason not related to the trial – for instance, too much red wine the night before – and not report it to her physician. If the trial volunteer knows she is taking the experimental drug, however, she may decide that the headache was caused by the new drug, even if it indeed was due to the red wine. However, if she is in a blinded trial and does not know which treatment she is taking, she will be more likely to report the event as possibly due to the treatment, and an unbiased assessment can be made by her physician or those conducting the trial, who also may be blinded.

If there are no other available, approved treatments for the disease under study, the control group may receive a placebo. A study in which one group receives the investigational treatment and the other group, for comparison, receives a placebo is known as the classic placebo-controlled trial. Today it is considered unethical in most cases to give a trial volunteer a placebo if an effective standard treatment is available.

Even if there is a standard treatment, however, a trial may still be blinded, either by making the two treatments (standard and experimental) look the same or, more often, by making a placebo to match each treatment. In such studies, each group receives two trial treatments, one of which is an authentic drug and the other a placebo (i.e., participants take either Treatment A plus Placebo B, or Placebo A plus Treatment B) the so-called double dummy approach.

Sometimes it is impossible to blind a trial because the drug has a characteristic color or taste, or because the treatment is very complex, for example, involving surgery or radiotherapy (radiation therapy, usually with X-rays). These trials are called open-label, since no blinding is undertaken. Other reasons for choosing an open-label design include situations in which it would be considered unethical to use blinding, for example, a treatment that involves multiple injections. Another reason, which has emerged through HIV treatment advocacy, is when an urgently needed new therapy for a life-threatening disease with no known effective treatment is studied under less rigorous conditions; in such trials, both volunteers and their physicians know they are taking the experimental treatment.

Endpoints

Endpoints are the outcomes or events that are used to judge how effective or safe a treatment is. Ideally a trial would have an objective (or ‘hard’) endpoint such as survival or development of a major opportunistic illness (OI) for which there is unlikely to be any bias in reporting by the observers, that is, the event is equally likely to be reported in all groups if the experimental treatment were not effective. This is particularly important if the trial is not double-blind and trial participants and their physicians know which treatment group they are in.

However, in many cases objective endpoints are not feasible. For example, people with HIV infection may continue to live 20 years or longer and therefore are likely to receive a number of different treatments during their lifetime; it would be unethical to withhold a new treatment from an HIV positive participant if the trial treatment is not working. It is difficult to compare the effects on survival of Treatment A vs Treatment B when participants subsequently use Treatments C, D, or E. In this instance the trial would likely have a different type of endpoint, such as an AIDS-defining event. However, this also may be problematic.

First, a large number of different illnesses define AIDS, and they may vary in severity. (Nevertheless, delaying the onset of AIDS-related events clearly is of benefit.) Second, markers of HIV disease progression, such as viral load and CD4 cell count, are now monitored closely, and most people will be changed to a new treatment before an imminent diagnosis of AIDS. Laboratory markers (e.g., the time from randomization to a specified increase in viral load or fall in CD4 cell count) themselves are therefore widely used in trials as surrogate endpoints (surrogate markers). Although the measurements are objective, the definitions of these endpoints are open to debate. What markers should be used? At what time should levels be tested? And what laboratory values signal disease progression or treatment failure? Researchers also are uncertain as to how and if surrogate endpoints reflect underlying clinical endpoints.

Designing an RCT

Although there are several stages of clinical trials (see above), Phase III trials are pivotal in the drug development and approval process, and in answering questions about clinical management. They yield the most definitive and valuable data to the scientific community as well as to lay readers (i.e., nonscientists).

Large Phase III RCTs require much planning and coordination, and are expensive to run. They generally involve large numbers of participants at several study sites and may even be multinational. Trial participants usually undergo additional tests before and after randomization, and are usually followed more closely by physicians (who may or may not be their usual clinicians) than people who are not participating in a trial.

Phase III trials generally are designed by a team that includes experts in the disease in question and experts in designing and running clinical trials. Each clinical trial has a protocol, or document that describes the study in detail. The trial protocol must state:

  • approximately how many participants are needed to have reasonable statistical power to detect an important clinical difference between treatments or to be confident that there is no or little difference
  • how and when the experimental and control treatments should be given
  • what data must be collected
  • when participants should receive follow-up
  • what should be done in the event of problems such as adverse events, especially unforeseen significant side effects
  • how the trial should be monitored.

One way of assessing the quality of a trial is to look at the review processes it has been through. Academic, publicly funded (i.e., funded by a federal entity such as the U.S. National Institutes of Health [NIH]), and trials funded by nonprofit or charity groups (i.e., nongovernmental organizations, or NGOs, such as amfAR) undergo a full scientific peer review process in which the relevance, timeliness, and methodology of the study are assessed by independent experts before the trial begins.

All trials also should be assessed before subjects are enrolled by a designated independent ethics committee or institutional review board (IRB, usually a committee of physicians, medical experts, researchers, and community members), depending on the country. In the U.S., federally funded AIDS trial protocols also often are assessed (ideally before subjects are enrolled) by community advisory boards, or CABs, which frequently include people with HIV. The order in which different groups review trial protocols varies according to the organization(s) sponsoring the trial and the proposed trial’s funding source(s).

Although trial protocol review requires extra time and effort, it is vital for ensuring the quality of trials – in terms of safety, ethics, and data collection – offered to potential volunteers and reported in the scientific literature. Trials sponsored by the pharmaceutical industry may not go through the scientific peer review process, but are required by law to undergo an ethics review. (For more information about the ethics of conducting clinical trials, see ‘AIDS Vaccines: The Ethical and Social Issues’ – on pages 41-45 of the Summer/Autumn 2001 issue of BETA.)

Informed Consent

The ultimate decision about whether or not to join a trial is made by the participant, who may be a person with the disease under investigation or a healthy individual (e.g., for a Phase I or vaccine trial), who is offered the opportunity to join the trial by the clinician(s) enrolling participants. Potential study subjects must be given sufficient information at an appropriate level of technical detail (which may vary among people, e.g., according to education or reading level) with which to make their enrollment decision, and they must be given adequate time to ask questions and discuss the trial with health-care providers, family, and friends before making up their minds about enrolling. Potential participants should be informed of any costs (including any expenses that will be billed to their insurance) or remuneration, what types of tests and medical care they will receive during the trial, and whether they may be randomized to receive a standard treatment or placebo rather than the experimental treatment.

Individuals who decide to join a trial are asked to confirm that they are willing to take part by giving consent, usually in writing, which records that they have been fully informed about the trial and freely choose to participate. The concept and process of informed consent are fundamental to participation in RCTs and other clinical research projects.

Elements of a Clinical Trial Report

The main components of a full clinical trial report are the abstract, introduction, methods, results, and discussion or conclusion.

Abstract

The abstract is a summary, or abridged outline, of the trial. Ideally it begins with a short background section that establishes the context of the investigation and often includes a brief statement of purpose, which clarifies the principal question or questions the trial is designed to answer. (The statement of purpose, also known as the objective or hypothesis, may follow the background as a separate section.) Other elements of the abstract include highly condensed versions of the methods, results, and conclusion sections found in the main body of the trial report (described below).

Because they summarize the main details of a trial and its results, abstracts often stand alone as proxy trial reports, e.g., in medical conference reference materials and on online medical research sites.

Introduction

In the introduction, the context of the investigation is described in greater detail. The context may include a history of treatments for the particular illness(es) in question and, if appropriate, a review of outcomes of previous trials.

Methods

Important factors relating to the design and running of a trial are found in the methods (or methodology) section, which describes the means by which the study objective was investigated. This section usually is divided into subsections, which may include study design, study population, procedures (both clinical procedures such as treatments and trial practicalities such as the timing of follow-up visits), outcomes, and statistical analysis. While the various subsections are named and divided in different ways, depending on the report, all should include similar relevant information.

For example, the type of people who were eligible to participate in the trial should be described in detail in this section of the trial report. Potential participants are screened according to inclusion and exclusion criteria (i.e., personal factors that make people suitable for or disqualify them from the trial) determined in advance by the research team. For instance, were volunteers in an AIDS-related trial asymptomatic or had they already developed AIDS-defining illnesses? Had they received previous treatments? If so, which ones? Sometimes a schema, or diagram, is included in the report to show graphically how potential subjects were screened.

An important part of the study design subsection focuses on treatment; it should feature specific information on the treatment being tested including details on the drugs, dosages, timing, and administration route. This section should describe all trial arms, including whether a standard treatment or placebo was used as a control, as well as the reasons for the choice of the experimental and control treatments. There should also be details on whether the trial was blinded, and if so which type of blinding was implemented.

In the subsection on statistical analysis, the reader should find details about what statistical methods were used to interpret the trial data, including information about the trial’s endpoints.

Finally, the methods section of a trial report should mention whether there were plans for interim analyses (i.e., done at intervals before study completion) and whether the trial was reviewed by independent committees. Trials with long recruitment phases or long-term follow-up should be reviewed by an independent body to see if the preliminary (early) results are sufficiently convincing to warrant early closure of the trial or early reporting of data. This body is usually called a Data Monitoring Committee or a Data and Safety Monitoring Board (DSMB), which is made up of clinicians not involved in the trial, statisticians with expertise in trials, and, with increasing frequency, people affected by the disease under study.

Results

The results section should complement the methods section by indicating how the trial progressed from beginning to end according to the trial design. Like the methods section, the results section often is subdivided into different subsections such as enrollment, side effects, and treatment outcomes. The results section should include a description of the characteristics of the trial participants – e.g., age, sex, stage of disease. It also should indicate the number of subjects entering the study, the length of the trial, and specific information regarding subjects adherence to the trial medication. Did participants remain on treatment for as long as expected? If not, why did they stop and what, if anything, did they change to? What proportion of subjects switched over to the other treatment arm? Such deviations from the planned treatment regimen have crucial implications for the analysis and interpretation of the results.

The results section should complement the methods section by indicating how the trial progressed from beginning to end according to the trial design. Like the methods section, the results section often is subdivided into different subsections such as enrollment, side effects, and treatment outcomes. The results section should include a description of the characteristics of the trial participant – e.g., age, sex, stage of disease. It also should indicate the number of subjects entering the study, the length of the trial, and specific information regarding subjects – adherence to the trial medication. Did participants remain on treatment for as long as expected? If not, why did they stop and what, if anything, did they change to? What proportion of subjects switched over to the other treatment arm? Such deviations from the planned treatment regimen have crucial implications for the analysis and interpretation of the results.

The trial should report, and primarily refer to, the results of the primary analyses. Any analyses that were not specified before the trial should be considered exploratory and treated with caution. The primary analysis should be reported as an intent-to-treat or intention-to-treat (ITT) analysis, which is an assessment of the results according to the treatment the participants were originally assigned to receive (not the treatment they actually did receive). That is, all participants originally assigned to each study arm are analyzed, including those who dropped out or switched treatments due to nonresponse to the original treatment, side effects, or other reasons. An ITT analysis is the only bias-free way—and also the most pragmatic way—to report trial results, as it best reflects how people take medication in real life. However, as-treated or on-treatment analyses, in which the data are analyzed according to the treatment(s) the participants actually received, are important for assessing toxicity and even for cautiously estimating the treatment effects for those who were able to stay on treatment. But this may be a very atypical group of people; in particular, they are likely to have had better outcomes than those who stopped their initially assigned treatment.

The primary result of a trial is the difference in outcomes observed between the various treatment arms. Importantly, a difference in outcomes may not reflect a true underlying difference between the treatments; it merely represents the best estimate of a true difference given the available data. In clinical trials, and the field of statistics as a whole, uncertainties abound. Researchers deal with them as best they can with p-values and confidence intervals (see “Interpreting Trial Results,” below).

An important part of every trial is to assess the adverse events, or side effects, of a treatment as well as the benefits. The types of events that must be reported and the maximal delay in reporting them is determined in advance for every trial and should be included in the trial report. Adverse events are usually graded based on a classification system that may differ somewhat between trials, but is broadly similar to and follows systems developed by the World Health Organization (WHO) and other national and international agencies (see sidebar on page XX). The classification applies to both clinical and laboratory (i.e., detected by a laboratory test, usually on a blood specimen) events.

Classification of Adverse Events

  • Grade 1 reactions are described as mild and transient, do not limit usual activities, and require no treatment (e.g., red or itchy skin).
  • Grade 2 reactions are described as moderate, are somewhat disruptive of usual activity, and require minimal treatment (e.g., a rash that breaks the skin, with light peeling or scaling).
  • Grade 3 reactions are severe, interfere considerably with normal life, and require medical intervention (e.g., blistering skin, open ulcers, extensive peeling).
  • Grade 4 reactions are incapacitating or life-threatening and require significant medical intervention (e.g., severe rash or broken skin, Stevens-Johnson syndrome).

Serious adverse events may be reportable to regulatory authorities, depending on the drug’s stage of development and whether the reaction is expected or unexpected. Serious adverse events include severe, life-threatening, or fatal side effects (grades 3 or 4), those that require or prolong hospitalization, cancer (i.e., non-AIDS-defining cancers in people with HIV infection), those that result in permanent disability, those associated with congenital anomalies (birth defects), and those associated with other significant medical conditions.

Interpreting Trial Results

In a clinical trial, each comparison of the treatments will typically include a statement on the statistical significance of the results. Statistical significance—the probability that an observed outcome in a study is not due to chance alone—often is represented as a probability value, or p-value. The p-value is an attempt to compare a particular result against prior expectations. (In the process of designing an RCT, investigators generally have prior knowledge and data upon which to base their hypotheses; the specific hypothesis to be tested should have been stated clearly in the background section.)

For example, suppose the hypothesis is that there is no difference with respect to the primary endpoint between Treatments A and B, or A=B. The alternative hypothesis is that there is a difference between Treatment A and Treatment B, or Aˆ«B. The p-value is the probability of seeing, by chance, a difference between the treatments at least as large as that observed if there is really no difference between the treatments. If Treatment A has a treatment success rate 8% higher than Treatment B (e.g., 90% vs 82%) with a p-value of 0.05, this means that the probability of seeing a difference of at least 8% between the two arms in the trial if there is really no difference between them is 5%, or 1 in 20. In most research, a p-value of 0.05 (i.e., 5%) is considered the conventional threshold for statistical significance. That is, any result likely to have arisen by chance alone with a probability of more than 1 in 20 is not seen as sufficiently reliable evidence. However, a p-value of less than 0.05 (e.g., 0.01 or 0.001) means that the probability of the result occurring by chance is much less (1 in 100 or 1 in 1,000, respectively), that is, the observed difference is more likely to be due to a true difference between the treatments.

Confidence intervals (CIs) are a more informative way of expressing the uncertainty surrounding the results of a trial than p-values, although both are calculated using the same data. A CI is another statistical measure of the likelihood that an experimental outcome is real and not due to chance. Typically, the 95% CI is quoted. This gives the range of values within which the investigator is 95% certain (19 in 20) that the true value falls. Ideally, CIs should be reported in all trial reports.

Using the above example, say that the difference in outcomes between Treatment A and Treatment B is 8%, and the 95% CI around this estimate is 1% to 15%. This means that the best estimate of the difference between Treatment A and Treatment B (based on the trial results) is that Treatment A is 8% better, but investigators are 95% certain that the real difference falls between a 1% and a 15% advantage for Treatment A. So the investigators would be confident that Treatment A is better, but uncertain by exactly how much—it could be as little as 1% or as much as 15%. If the difference were 8% but there was greater uncertainty around this value, the 95% CI would be wider—perhaps –2% to 18%. (CIs may include values in the opposite direction to the best [or point] estimate.) The best guess still would be that Treatment A is 8% better than Treatment B, but one could not rule out that Treatment B actually may be better, by up to 2%. If a CI includes zero, the null hypothesis (generally, that there is no difference between the treatments) must be considered possible as well.

The more information that is included in the statistical analysis in terms of the number of outcome events or endpoints of interest, the narrower or tighter the confidence interval will be around the point estimate, and the greater confidence the reader can have in the reliability of the researchers’ estimates.

While trial reports usually refer to the significance of the results, there is an important distinction between statistical significance and clinical significance. Clinical significance refers to whether the results of a trial are clinically worthwhile in terms of the benefits that have been demonstrated. Given the observed differences between the treatment arms, does one of the arms offer an overall advantage that would make a worthwhile difference in clinical practice (e.g., lower reported adverse events)? Such an advantage needs to be balanced against the risks of the treatment to assess its overall benefit.

Reporting Trials

At various points during the course of a clinical trial, researchers may report results at scientific conferences, as either oral or poster presentations. But the “gold standard” for reporting trials is publication in a well-respected, peer-reviewed journal (e.g., The Lancet, the Journal of the American Medical Association [JAMA], the New England Journal of Medicine [NEJM]), which indicates that a report has been subjected to scrutiny by experts in the field who may have asked for further details or clarifications prior to publication. Reports published in the main medical journals (i.e., those with high science citation scores) will all have undergone peer review. However, not all available trial reports will have been subjected to such scrutiny. For example, many presentations (as well as most posters) at conferences, and some web-only publications, will not have been reviewed so stringently. These sources of information should be interpreted with greater care.

What can a reader tell about the quality of a trial by reading a report, whether in a newspaper or medical journal? In 1996, guidelines were published in several medical journals (e.g., JAMA) on how RCTs should be reported; the guidelines are known collectively as the CONSORT (Consolidated Standards Of Reporting Trials) statement. A revised and updated version of these guidelines was published (again, in a number of medical journals) in May 2001 and can be found at www.consort-statement.org. The aim of these guidelines is to ensure that all the information needed to assess the quality of an RCT is given in the trial report. The CONSORT statement covers most of the key areas, such as reporting whether a trial was run according to the principles of Good Clinical Practice (GCP); the most widely used guidelines are those based on the International Conference for Harmonisation of GCP (ICH GCP). The aims of GCP are primarily to protect trial volunteers and to ensure the quality of the trial (see sidebar on page XX).

Guidelines for Good Clinical Practice

The guidelines published by the International Conference for Harmonisation of Good Clinical Practice (available at

www.emea.eu.int/pdfs/human/ich/013595en.pdf) state the following:

Good Clinical Practice (GCP) is an international ethical and scientific quality standard for designing, conducting, recording, and reporting trials that involve the participation of human subjects. Compliance with this standard provides public assurance that the rights, safety, and well-being of trial subjects are protected, consistent with the principles that have their origin in the Declaration of Helsinki, and that the clinical trial data are credible….The guidance was developed with consideration of the current good clinical practices of the European Union, Japan, and the United States, as well as those of Australia, Canada, the Nordic countries, and the World Health Organization (WHO).

For more information on GCP and the Declaration of Helsinki, an important research ethics document first adopted in 1964, see “AIDS Vaccines: The Ethical and Social Issues” on pages 41–45 of the Summer/Autumn 2001 issue of BETA.

Academic institutions, such as universities and hospitals, usually will have no direct financial interest in establishing whether a treatment works. Conversely, the pharmaceutical industry needs to know about the efficacy and safety of its new products precisely because of economic concerns. Drug development is a costly process, and companies must recoup their expenditures to provide sufficient profits, some of which will be used to fund future research. The different reasons for running trials (profit-making vs non-profit) therefore can lead to a variety of angles from which the results may be reported, particularly in terms of a relative emphasis placed on positive or negative aspects. With these factors in mind, the published report should also indicate the funding body, the sponsor of the trial, the coordinating centers, the participating centers, and the affiliations of each of the authors.

Many of the major medical journals have agreed to the CONSORT statement, which includes guidelines about potential financial conflicts of interest. Some journals ask authors to record all monies, grants, and awards that they have received recently so that any potential conflict of interests can be identified. (For more on conflicts of interest in the publishing of clinical trial results, see “Science, Money, and Industry: Is Commerce Corrupting AIDS Research?” on pages XXX in this issue of BETA.)

The Perils of Abstracts

There is an increasingly large amount of available scientific material on HIV/AIDS, but only a limited time in which to read it. It would be very easy, for the sake of expediency, to rely purely on abstracts. But does the abstract always reflect the full published report of a trial? The answer is no, although this is generally not deliberate.

As previously discussed, new treatments are likely to have both benefits and risks. Because abstracts have limited amounts of space in which to summarize trial results, many important details—such as the choice of methods of analysis—may not be provided. Possibly as a consequence of space considerations, abstracts are much more likely to include or even overplay the benefits of an experimental treatment while downplaying or even failing to mention the risks. Inevitably some information about some of the participants will be omitted. Yet it is important for readers to know how all data were dealt with in the analyses. Consider as an example a hypothetical anti-HIV treatment trial in which the endpoint is undetectable viral load one year after randomization. Five things may have happened to participants: they may 1) be tested at one year and have an undetectable viral load; these people (group V) are the “successes”; 2) be tested at one year and have a high viral load; these people (group W) are the “nonresponders” or “treatment failures”; 3) develop progressive disease and require further treatment before one year (group X); 4) stop treatment early because of toxicity, and may or may not start new treatment (group Y); or 5) be lost to follow-up and not seen at one year (group Z).

Given 100 people in one treatment arm (called arm A), suppose that 40 responded (group V), 30 did not respond (group W), 15 developed progressive disease prior to one year and had to start new therapy early (group X), ten stopped treatment early because of toxicity (group Y), and five were lost to follow-up (group Z). So, what is the response rate at one year for this group of participants? A number of approaches could be employed. If all the participants are included, the response rate is 40% (40 of 100). Alternatively, only the 70 participants who were assessed at one year (groups V and W) may be considered. This would put the response rate at 57% (40 of 70). We may additionally carry forward the 15 early nonresponders (group X) as “treatment failures” at one year. In this case, the response rate would be lower at 47% (40 of 85). Or group Y, the people who stopped early because of toxicity, could be considered as “treatment failures” because they could not take the trial treatment for the full year. This would bring the response rate to 42% (40 of 95). It could be argued that, pragmatically, how group Y were actually doing at one year should be considered, as they may have responded to the alternative treatments they tried. If four responded and six did not, then the overall response rate would be 46% (44 of 95).

The results above then must be compared with those of participants in the same trial taking Treatment B, where again 40 people responded (group V), ten had treatment failure at one year (group W), 50 had early treatment failure (group X), and none stopped due to toxicity (group Y) or were lost to follow-up (group Z). Again, if all participants are included, the response rate is 40%. Looking at only those people with results at one year (groups V and W), the response rate of 80% (40 of 50) for Treatment B appears much better than the rate for Treatment A (57%). However, if participants who failed early are included, the response rate for Treatment B drops to 40% (40 of 100), compared with 47% (40 of 85) for Treatment A by the same criteria.

This is all rather confusing, but the bottom line is clear: results can look quite different depending on the method of analysis that was chosen. Therefore, the methodology should always be specified before the analysis. Another important point is that even if the abstract reports a good response rate, without knowing how it was calculated the reader cannot know whether the researchers perhaps were being overly optimistic. Furthermore, the reader cannot compare the results against a response rate quoted in the abstract of a different trial. A better approach would be to give basic data to allow the reader to look at the effect of different methods of analysis.

Abstracts tend to depict trials in black and white, while full reports add details that give a more accurate picture—a realistic shade of gray. Health-related articles in the popular media (e.g., newspapers) should be reviewed in much the same way as abstracts, since such articles are even less likely to give details on how endpoints and missing data were handled.

Conclusion

It is impossible in such a short article to cover everything that could help lay readers review clinical trial reports. Many books and a journal called Controlled Clinical Trials specifically focus on clinical trials. While they may appear complicated to lay readers who simply want to comprehend significant results, randomized controlled trials are rightly established as the “gold standard” in the evaluation of new interventions to prevent or treat disease.

Matt Sydes is a medical statistician at the Medical Research Council Clinical Trials Unit (MRC CTU) in London. Janet Darbyshire, MBChB, is the director of the MRC CTU and has been an advocate of RCTs for the past 25 years.

Additional Clinical Trial Resources

AIDS Clinical Trials Information Service (ACTIS):
http://www.actis.org/learnCT.html.

AIDS Community Research Initiative of America (ACRIA). Clinical Trials
Explained:
http://www.criany.org/clinical/clinical_res_explained.html
.

CONSORT statement:
http://www.consort-statement.org

Consumers for Ethics in Research (CERES):
http://www.ceres.org.uk

International Conference for Harmonisation:
http://www.ifpma.org

MRC Clinical Trials Unit:
http://www.ctu.mrc.ac.uk/ctu_patient.asp

National Institutes of Health (NIH). What is a Clinical Trial?:
http://www.thebody.com/niaid/trial.html

Additional reading materials related to clinical trials:

Introductory Reference Books

  • “Clinical trials: A practical approach” – S Pocock (Wiley, 1983)
  • “A Dictionary of epidemiology” – JM Last (editor) (OUP, 2001, 5th ed)
  • “Consider it pure joy – an introduction to clinical trials” – A
    Raven (Cambridge Healthcare Research Ltd, 1997, 3rd ed)
  • “Handbook of clinical trials” – M Flather, H Aston & R Stables
    (ReMedica, 2001)

Further Reference Books and Articles

  • “Practical statistics for medical research” DG Altman (Chapman &
    Hall, 2001, 2nd ed)
  • “Design and analysis of clinical experiments” – JL Fliess (Wiley, 1986)
  • “Fundamentals of clinial trials” – LM Friedman, CD Furberg & DL
    DeMets (Springer-Verlag, 1998, 3rd ed)
  • “Statistics in practice” – SM Gore, DG Altman (BMA, 1982)
  • “Bradford Hill’s principles of medical statistics” – AB Hill, ID
    Hill (Arnold, 1991)
  • “Randomised controlled trials: a user’s guide” – AJ Jadad (BMJ
    books, 1998) (Available online at www.bmjpg.com/rct/index.html)
  • “Introduction to randomised controlled trials” – JNS Mathews (Arnold, 2001)
  • “Clinical trials” – CL Meinert (OUP, 1986)
  • “Statistical aspects of the design and analysis of clinical trials”
    – A Pickles & B Everitt (Imp. College Press, 2000)
  • “The consumer’s guide to clinial trial results” – R Pringle Smith,
    et al. (AIDS/HIV Treatment Directory;v7;#3;Jun94)
  • “Clinical trials” – D Schwartz, R Flamont & J Lellouch (Academic Press, 1981)
  • “Statistical issues in drug development” – S Senn (Wiley, 1997)

Booklets

  • National AIDS Manual: “3.Clinical trials: information series for
    positive people” (Available online)
  • CancerBacup: “Understanding clinical trials” (ISBN 1-870403-78-9)
  • Royal Marsden Hospitals NHS Trust: “Patient information series #21:
    Clinical trials
    http://www.royalmarsden.org/patientinfo/booklets/index.asp

Web Sites

Edit
Published: May 31, 2002
Last edited: January 21, 2011