Analyzing Phase 3 COVID-19 Vaccine Trials

A scientist prepares a dose of a vaccine in an injection needle for a clinical trial.

On June 15, the University of Illinois at Chicago (UIC) announced it will partake in a phase 3 clinical study of a COVID-19 vaccine candidate developed by Moderna, a biotech company. UIC is expected to be the only site in Chicago selected to launch the trial, which is being administered by the National Institute of Allergy and Infectious Disease.

Chris Barker, PhD in Biostatistics '81 and adjunct professor of biostatistics at the UIC School of Public Health, offers this analysis of what is taking place during these clinical trials.  This analyses was originally posted at

Moderna announced a Phase III vaccine clinical trial

Chris Barker headshot.

Moderna (links below) has finalized plans (likely this means FDA agrees with the plan) for a 30,000 patient trial for a vaccine. How do they determine the need for exactly 30,000 patients? I give the background below. The clinical trial is double blind randomized placebo controlled. This is a “gold standard” clinical trial design. The Moderna press release had sections that clearly (to me) were written by a statistician. Those sections refer to the “event driven” analysis.

How and why are 30,000 people required? Is it exactly 30, 000? In statistical jargon, we call this “sample size” and this identically means “number of patients”.

Patients, volunteers or people?

In clinical trials we are always careful to consistently state whether the clinical trial has:

  • patients
  • healthy volunteers
  • people
  • other variations

Moderna’s press release refers to people. In the context of the vaccine clinical trial, “people” are infected but do not yet have symptoms of disease.

This is straight out of the statistics textbooks. For this clinical trial, the company must find 30,000 patients who are infected. They will likely scour every country in the world to find them. The patients will be infected and at the time of “informed consent” and “randomization” will not have symptoms.

Are exactly 30,000 required?

Until the clinical trial is listed in, it is not clear if Moderna requires exactly 30,000. The statistical calculations do not typically result in tidy “round “ numbers, like 30,000. Possibly the number was “rounded” from say (hypothetically) 29,351 patients. If the company rounded, the rounded “up” not down.

Are there 30,000 people who might be eligible?

As of writing this analysis (June 12), the CDC says about 10 million people in the US have been tested and about 1.4 million were positive.  The CDC’s dashboard offers updated totals.

Details on the clinical trials

Here’s a rundown from Fierce Biotech.

Moderna plans to randomize 30,000 people in the U.S. on a one-to-one basis to receive either 100 μg of mRNA-1273 or placebo.

The company press release is substantially more detailed, and clearly they had a statistician to help write some of the technical parts.

The Moderna vaccine is an “mRNA” vaccine which is translated to “messenger RNA” called mRNA-1273 . This vaccine was tested in both “phase I” and “phase II” and a “dose” for phase III determined in phase II.

The phase II trial was preceded by a “mouse study”.

In the excerpt from their press release below, “sequence selection” means Moderna “decoded” the virus RNA “gene sequence”.

About mRNA-1273

From the Moderna press release:

mRNA-1273 is an mRNA vaccine against SARS-CoV-2 encoding for a prefusion stabilized form of the Spike (S) protein, which was selected by Moderna in collaboration with investigators from Vaccine Research Center (VRC) at the National Institute of Allergy and Infectious Diseases (NIAID), a part of the NIH. The first clinical batch, which was funded by the Coalition for Epidemic Preparedness Innovations, was completed on February 7, 2020 and underwent analytical testing; it was shipped to NIH on February 24, 42 days from sequence selection.

The first participant in the NIAID-led Phase 1 study of mRNA-1273 was dosed on March 16, 63 days from sequence selection to Phase 1 study dosing.  On May 6, the U.S. Food and Drug Administration (FDA) completed its review of the Company’s Investigational New Drug (IND) application for mRNA-1273 allowing it to proceed to a Phase 2 study, which is expected to begin shortly. On May 12, the FDA granted mRNA-1273 Fast Track designation.

The biostatistical angle

From the release:

The trial’s primary endpoint will be the prevention of symptomatic COVID-19 disease; key secondary endpoints include prevention of severe COVID-19 disease (as defined by the need for hospitalization) and prevention of infection by SARS- CoV-2, the virus that causes COVID-19. The primary efficacy analysis will be an event-driven analysis based on the number of participants with symptomatic COVID-19 disease.

Event Driven

“Event driven” is statistics jargon for the number of people (a count) who develop COVID-19 disease. And in the study planning stages (98 percent by the statistician) one must have a study with enough patients to achieve the required “number of events”.

Translating, this study will have “one event per person” (either patient got symptoms or not). if the vaccine works then fewer patients on vaccine will have “events” than placebo. Over-simplified, If the calculation shows that “2 events” are needed then at least 2 patients are needed. Some patients will not have events, and that is factored into the technical calculations and the jargon term is “inflating” the sample size.

Math-y Stuff

The sample size calculation (sample size is the technical term for number of patients) produces the required number of events for some assumed “effect size”. This is a randomized study with placebo group and the “effect size” is how many fewer (count) patients have symptoms in the vaccine group than in the placebo. For example , if 10 total events are required then at least 10 patients are required. Because patients can be infected and never show symptoms complicates the calculations.

At-risk clinical trials

Moderna is basically testing something called an mRNA vaccine (that translates to messenger RNA).

In press releases from other pharmaceutical companies, pharma companies are planning these clinical trials and conducting them “at risk,” or at risk that the vaccine doesn’t work. But every clinical trial that iOS done for drug or vaccine approval is conducted at risk. Ordinarily in vaccine trials, a study would have more information about infection and immunity, infection dynamics, how many people were infected but asymptomatic, how much virus it takes to get infected, whether people develop immunity to the virus and for how long immunity lasts.

I’m reasonably sure that this clinical trial will find somebody who’s infected, vaccinate the subject and then wait 14 to perhaps 30 days to see if they develop symptoms or not. Can anyone participate in the trial? Probably yes – check the Moderna website and ask your doctor for details.

Right now, scientists don’t know how many people get infected and don’t show symptoms . That is factoring into the calculations of the 30,000 people.

Jargon-wise, this is a “fixed sample size” study. And before it starts, Moderna knows they are going to enroll 30,000 patients. In the world of clinical trial statistics, they could build in an option to increase it to more than 30,000 based on accumulating data in the trial.

They will likely have something called an interim analysis, examining the data after perhaps 15,000 have enrolled, or perhaps 10,000 . This analysis investigates whether  the vaccine has enough of an effect, at which point they can reach a decision and stop the clinical trial. That’s good for everyone. It is not something that would be known before the study started.

Statistician will have technical reasons for exactly how they picked that number (10,000 or 15, 000 or some other number) of patients for that interim look.

Calculus math-y stuff

The number of patients at interim is based on the second derivative of the likelihood function called a “fisher information matrix” and there’s a modification of that matrix that has an even more interesting name of “information time.”

However, the likelihood function is something I have to get a textbook or publication to write down because it’s pretty complicated with multiple integrals (e.g. a double or triple integral).

About the author

Chris Barker, PhD in Biostatistics ’81 and adjunct professor of biostatistics, has more than 20 years of experience in pharmaceutical drug development, with a unique background in the lifecycle of clinical trials for drug development and health economics.  He designed and analyzed the phase II and III trials which led to the  successful drug registration for the CELLCEPT Kidney Transplant program. In addition, he has prepared novel statistical analyses used in health economics value messages and models of cost-effectiveness used in reimbursement negotiations. He currently collaborate with drug discovery and development scientists and with medical affairs, commercial and health economics experts.

COVID-19 Hub Page