COVID-19 Forecast & Data Models

(Last Updated On: October 31, 2020)

The Past, Present & Future of Coronavirus…

COVID-19 forecasts try to predict the future. Accurate predictions would help hospitals and emergency rooms plan staffing and order supplies. Even better would be a model that could guide public policy, telling officials how proposed rules or closures would affect economic activity and infections. But despite many smart people involved in modeling, the COVID crystal ball is still cloudy.

Today’s blog explains the different ways that scientists are trying to make sense of coronavirus data, and peeks behind the curtain to suggest why such a simple-sounding task is so very difficult. This serves as an update to my earlier blog on Understanding Forecasts, which discussed only the earliest version of the University of Washington model. (For a more detailed dive into coronavirus data modeling see the IEEE Spectrum article by Matthew Hutson.)

COVID data presentations seem to fit into three categories:

  • Past: Data that describes hospitalizations and deaths after they occur.
  • Present: Models that try to predict how people and the virus react in response to healthcare advice and rules.
  • Future: Forecasts of what lies ahead according to various scenarios.

– What Is Life, Without a Touch of Art In It?

These topics reminded me of a famous painting created by Paul Gauguin in Tahiti in 1897. Its title is D’où Venons Nous / Que Sommes Nous / Où Allons Nous, generally translated as Where Do We Come From? What Are We? Where Are We Going? Gauguin stated that the painting should be read from right to left, communicating the artist’s messages about birth, adulthood and the approach of death.
We can do no better than borrow from Gauguin as we describe the types of COVID-19 forecast and data models that are dominating the attention of many smart people. And here are the sections of this blog:

COVID-19 Data Models – Where Do We Come From?

– Excess Deaths, from Centers for Disease Control

– State Cases Video, from Benjamin Renton / Flourish

COVID-19 Analysis Models – What Are We?

– Neural Network Models

– Agent Based Models

COVID-19 Forecast Models – Where Are We Going?

– COVID-19 Forecast of Daily Deaths, from IHME

– Ensemble COVID-19 Forecast at CDC


COVID-19 Data Models – Where Do We Come From?

Data collections that cover Where Do We Come From tell us what has already happened as coronavirus marched through our population.

– Excess Deaths, from Centers for Disease Control

In an earlier blog I described the Excess Deaths plots available from CDC, like the one here.

CDC Excess Deaths in 2020

By October 27, there had been 311,882 more US deaths in 2020 than expected from previous years. At least 231,952 of those deaths can be assigned with high confidence to COVID-19 coronavirus.

Another CDC chart tracks excess deaths by week. Peak deaths occurred in the week ending April 18, when there were 78,989 deaths, substantially more than the expected death count of 55,640. Recent weeks, for which records are still incomplete, show 3,000 to 5,000 excess deaths per week.

CDC shows this data, and more, not only for the US as a whole, but also state by state. Although deaths lag infections and hospitalizations by several weeks, deaths are the most reliable measure of whether we are making progress against the virus.

– State Cases Video, from Benjamin Renton / Flourish

The Flourish Design Studio displays a video by Benjamin Renton on their website. The video shows the cumulative COVID-19 cases per million population by state, from March 1 until now.  Furthermore, since the case count is scaled by population, smaller states are more prominent than in other displays. Here’s an iFrame showing the video on the Flourish website:

Above: Coronavirus Cases per Million by Benjamin Renton via Flourish

The states are coded red and blue, according to how they voted for President in 2016. The chart draws on data from Johns Hopkins University. Certainly, it is amazing to see how the dominance in cases shifted from blue to red states as the pandemic developed.

COVID-19 Analysis Models – What Are We?

COVID-19 models that address What Are We attempt to predict how we and society will behave, given various health recommendations and rules. Such models allow researchers to see how changes in health rules or public behavior might affect virus spread for good or ill. And they tend to require a lot of computer time.

Here are two examples of this type of model:

– Neural Network Models

Neural network models crunch a great deal of data and create rules that describe the apparent relationships between parameters. For example, the networks might construct relationships “between input data (such as mobility, testing, and social media) and pandemic outcomes (such as hospitalizations and deaths).” Prof. Prakash’s group at Georgia Tech terms their model “DeepCOVID.”

Neural network models allow tinkering with some factors to see how other factors change. However, the rules developed by the neural networks are very complex, which makes them hard to understand and non-intuitive. In addition, we can’t be sure of the range of parameters over which the trained network will give valid results.

– Agent Based Models

Although neural networks juggle data without considering individuals, agent based models do just the opposite. Amazingly, a group at the University of Sydney, Australia built a model that digitally represents 24 million people, Australia’s 2016 census count. In addition, these simulated people were assigned demographically accurate ages, family sizes and jobs, then allowed to mix in daytime and in nighttime venues.

The researchers adjusted the parameters to best match real life coronavirus data. They then varied factors like air travel, isolation of victims, home quarantine, social distancing and school closures to see how they affected the spread of infections. They found some non-obvious results, among them that school closures themselves were less important than the level of compliance with social distancing.

COVID-19 Forecast Models – Where Are We Going?

A COVID-19 forecast tries to predict how the virus will advance or retreat in the coming weeks or months. These models mainly deal with the current set of health rules and public behavior. And they don’t generally try to predict what will happen if we change one rule or another.

Here are two examples of such forecast models:

– COVID-19 Forecast of Daily Deaths, from IHME

IHME, the Institute for Health Metrics and Evaluation at the University of Washington, has been modeling and forecasting since early in 2020. My earlier blog described their original model, which assumed that hospital use would follow a rise-and-fall curve similar to that seen in China, Italy and Spain. Because the US and its people responded differently to the challenge of the virus, we did not follow the same curves and for that reason the IHME model initially under-forecast the impact of the disease.

Since then, additional data generated by the virus in the US has allowed IHME to evolve to a “compartmental” model. Such a model separately describes people who are Susceptible, Exposed, Infected, or Removed (via recovery or death), and how quickly people move from one category to another. IHME continually adjusts the model to match real-world data.

COVID-19 forecast

IHME Daily Deaths

As of October 22, IHME’s model anticipated a total of 385,611 COVID deaths in the US by February 1. Like all projections, the number is uncertain: IHME suggested it might be as low as 322,836 with universal masking, or as high as 485,607 with wide easing of mandates. The daily death rate by then might be anywhere from 1,299 to 5,562 persons per day.

A later article (October 23) extends the forecast through the end of February 2021. They find that an expected 511,000 US deaths could be reduced by 96,000 to 130,000 with 85% to 95% mask wearing.

– Ensemble COVID-19 Forecast at CDC

An early success in coronavirus modeling was the work by data scientist Youyang Gu. He built an artificial intelligence model driven by daily deaths plus parameters such as reproduction number, infection mortality rate and lockdown fatigue. And it was very successful in predicting future COVID deaths.

Now that there are dozens of pretty good models, Gu is retiring from updating his coronavirus model. In its place, he recommends a collection of models known as the Ensemble Forecast which CDC collects and presents.

COVID-19 forecast

CDC Combined Forecast, Weekly Deaths

The graph above shows a portion of CDC’s collected data, which limits its fortune-telling to four weeks in the future.


This discussion has just scratched the surface of the huge number of active modeling efforts going on. I count 44 different models on one of CDC’s summary pages!

How does a thinking person, a non-expert in epidemiology, deal with this glut of material? It depends on your personal goal:

  • If you want to know whether the US (or an individual state) is making overall progress, track its Excess Deaths count.
  • To see how red versus blue states have fared through the year, check the Flourish video.
  • If a long-term forecast interests you, the IHME model tries to scratch that itch.
  • To see a range of near-term forecasts from dozens of research groups, the CDC Ensemble is a good source.
  • Or, you may want to keep your eyes on the computation-intensive Analysis Models as their creators try to explore what, if anything, public health agencies can do to quench this persistent social emergency.

The scientific effort on COVID-19 forecasts and models rivals the huge efforts to develop vaccines, to find effective treatment protocols, and to give patient care during a state of continuous emergency. And all of these folks deserve our thanks and kudos for doing their part to return us to a better, more normal world.

Image Credits:
– Paul Gauguin’s Where do we come from? Who are we? Where are we going? is in the public domain under US law
– The other figures are referenced in the text near each image
– Extra thanks to Jim Zucchetto, who alerted Linos Jacovides who alerted me to the hypnotically fascinating Flourish “racing bar chart” video above


COVID-19 Forecast & Data Models — 10 Comments

  1. Thanks.

    For us senior citizens, I would suppose your recommendation is to: go back to March, put on mask, don’t touch your face until you’ve washed you hands or used hand sanitizer, wash your hands, use hand sanitizer, don’t shake hands and don’t eat out, don’t travel, don’t take chances, quarantine yourself.

    I might add this: get ready to lockdown again, toilet paper and Ramen Noodles.

    Miss you a ton, Art. Best to Nola.

    • Hi Steve, it’s good to hear from you! I take your comment as sarcasm, in which case I say LOL.

      In reality, as Nicholas Christof has said, protecting from infection is like a sandwich with a stack of Swiss cheese slices: each slice has a few random holes in it, but when you stack up several of them you have a solid barrier. Similarly, it’s not necessary to become a hermit to stay safe. All that’s needed is to be cautious when we can, while continuing to engage in life and community. From my point of view, a reasonable amount of masking and distance are 80% of the answer. The only important thing beyond that is to limit your time at crowded indoor venues that have stagnant air. After all, 80% of cases are traceable to super spreader events, and those venues are where they mostly occur. And, of course, get vaccinated when there’s a reasonably effective vaccine available. I’m not a fan of lockdowns personally. They are totally unnecessary if people would simply take a pause in big group events. – Art

      • We’re fun people, and maybe a little sarcastic. But I remember when you wrote about what to do, early on, I thought it was a little over the top. And now, it seems as this virus is running rampant, your chance of catching the virus today, is twice what it was from 2 months ago. And for those who might be health compromised (seniors, and especially you because you’ve had some bad luck lately). I was thinking about taking the Metro, Red Line Subway to downtown LA. Would you, Art Chester, do that? Is it OK for me? Mask and all? Here in CA? I guess there is a continuum at play here. 40 years, if you get it, it won’t be too bad, 50 years old, you’ll be slowed down a little bit, 60 years old, you might have to go to the hospital, 70 years old, you might not be coming back from the hospital.

        I got a haircut on Thursday. I wasn’t all that please by the precautions being taken by my barbershop. In fact, I’m going to text my barber and tell him that everybody needs to be wearing a mask, except for the person getting the haircut.

        Looking at what Charles South said. Two Tiers, one tiers, not so bad, tier two not so good. But take tier one person and put them in a in indoor bar scene, not good. Tier 2 person in indoor bar scene, really not good.

        • Steve, my advice has evolved as we have learned more about the virus. Back before we appreciated the importance of aerosols and super spreader events it seemed that cleaning surfaces might be important. Now, those have declined and it’s more critical to avoid recirculated air full of crowds. My early advice was obsessive and I felt people needed that to address their general fear factor. Nothing I said there would be considered wrong today, merely incomplete based on today’s understanding.

          But it’s still a crap shoot, rolling the dice. Age is not as serious a concern as you think if you look at, which analyzes excess deaths, which sweeps away any controversy about whether docs are gaming the cause of death. People over 25 are 30% more likely to die with the virus around, people over 65, 40%. Whether that’s significant depends on your point of view, but my point is that you’re not 10 times more likely to die because of the virus. So the precautions people need to consider are not strongly age dependent.

          You’re right to be concerned about the barbershop. If the state rules say for people to mask in public, why not go to a barber who tries to protect his customers? Your risk may not be high there since you’re in and out quickly and people are probably not talking loudly as in a bar, but the masks still reduce transmission. Here again, it depends on your perception of risk. Some would say, hey, my chance of catching it in the barbershop is only 1% or 0.1%, so I don’t care, emphasize the positive. Others would say, whatever the risk is, if I can reduce it by half without great personal sacrifice, why shouldn’t I do that?

          Concerning the Metro: CDC used to advise limiting close contact with other people to 15 minutes or less, because 15 minutes was long enough to ensure that if the other person was shedding viruses, you would pick up enough of them to be a concern. But recently they have said, well, your exposure to other people is cumulative, and if you’re exposed to several different people with each of them shedding virus, a total of 15 minutes for all of them is a risk. So it’s just like the 6 foot rule, the more we know, the less reliable the rule turns out to be.

          To me there are several things to think about. How much fresh air comes into the Metro car, is it pretty well changed at every stop when the doors open or does the same air stay there for many rides? How many people are inside it, how close together? What’s the percentage of mask wearing? How long are you in the car from your home stop to downtown? And what’s the chance that there will be one person shedding virus in that large a group of people? Of course, if the air doesn’t freshen very often, then the group of people you are exposed to is the total number who have ridden in that car since early morning, which really raises the stakes! (Overnight, any lingering virions have probably settled and died.)

          Personally, I would drive my own car, traffic and all, until some significant portion of the population around me have been vaccinated. Mass transport is really relaxing and comfortable, and certainly less risky than a crowded bar. However, it puts you in a confined space with people whom you don’t know, and if the virus is booming in your area, and you can afford to avoid public transportation, that might be best right now.

          The IHME data seems to show that California has pretty steady numbers. 70% mask wearing, a reasonable amount of social distancing and not a heavy load on hospitals ( So even though they project cases are likely to rise in the near future, perhaps it’s not too risky right now where you are. Certainly, safer than 3 months ago! But each person has to make their own calculation of risk versus reward…


  2. The “racing bar chart” from Benjamin Renton is truly fascinating. However, though I love looking at these models, and as you know have my own graphical method where I track California death rates, I have a concern about all such models. In what way will they be useful to non-scientists (the “average citizen)? I don’t just mean the non-scientist can understand what the model presents, I mean a person can get useful information from viewing it so they can make better decisions about their life and the lives of others they care about.

    I would argue that this nightmare won’t end until one of both of two things are true:

    1) We develop a treatment that will keep a sub-population of citizens from checking into a hospital if Covid symptoms show up.

    2) We develop a vaccine good enough to bring a specific sub-population to herd immunity status.

    My weasel-worded phrase “sub-population” should be explained. We have variations in vulnerability to Covid in so many dimensions — age, gender, genetic background, socio-economic status, and current health, among others — that it doesn’t do anyone any good to publish an overall statistic (like death rate for the USA). Anyone trying to make sense of or compare model results has to be able to identify their own specific vulnerability aspects for a model to be useful.

    I believe the general population has been deluged with fancy graphs since the pandemic began, but almost none of them help the average citizen to make smarter choices — they mostly sound the alarm that something serious is going on. I would like to see models which attempt to characterize the Type-1 and Type-2 issues above, in terms useful for the average citizen.

    The Type-1 model might attempt to show someone in a sub-population how likely it would be for them to need a hospital after testing positive for Covid. This in turn might affect their behavior in ways that might modify their risk. If you knew you had an 80% chance of going to a hospital if you developed Covid, you might be more inclined to wear a mask and social-distance, for example. But without a model like this, it’s too easy for a skeptical citizen to simply state the risk status for a favored sub-group like children or teens, and claim the risk is too low to worry about for anyone.

    The Type-2 model might tell you whether a specific vaccine (with its protection probability for your sub-population) combined with a compliance rate (for your geographic area) results in a high, medium, or low chance of encountering a Covid-positive person in your community. Knowing a vaccine is 75% effective might lead some people to conclude the virus concern is over, but if it’s only taken by 50% in that area, you might end up with only 37% who are protected from Covid, even before introducing sub-populations and their greater or lesser vulnerability if they become infected.

    • Charles, I love the fact that you can engage with the data and clearly express some goals for what we want it to tell us.

      Let’s consider what value the various kinds of models have.

      For the past “where do we come from” models, I see them as useful for policy makers and healthcare leaders. They show whether our current policies are winning or losing, especially when you compare trends in different states with different restriction policies. A policymaker might use this date to decide when to add a restriction on some class of activity, and to help sell the public on its reasonableness.

      For the current “what are we” models, some useful results are beginning to come out — I think of the Australian conclusion that closing schools is not a powerful tool to control the virus but that social distancing is much stronger. This can guide policy, but it could also help a parent feel more comfortable with moving toward in-person education for K-12. In school systems that offer parents a choice, this might tip the balance of the parent’s decision on what options to accept for their child.

      For the future “where are we going” models the personal value might come from looking at the projections in various states. If I were planning a vacation trip to a state whose forecast looks really scary, I might change my plans to a safer destination.

      But none of these models produce the kind of data you are looking for. Your “Type 1” solution might be addressed by Z-Pak or something similar. It’s only one of the many treatment options being tested on large numbers of people to see what works, for what kinds of people. I quoted two such studies in but there are many more. The progress of such research is captured by treatment guides such as the West Virginia protocol ( which is only one of many such guides used by hospitals. The researchers look for correlations within the data to help derive risk factors for subpopulations. However, telling a subpopulation their own risk may not be enough, because the bigger problem is the people whom they may infect. The college student group typifies this problem.

      The type 2 solution is the stronger, better one. However, here again, although a subpopulation may acquire herd immunity through infection, how do you prevent their infection from outside the subpopulation? I find it hard to imagine a subpopulation that can be adequately isolated, given the many ways that people mix and mingle in our society, and given the high percentage of infectious people who are asymptomatic. The subpopulation approach undergirds the controversial advice by some folks to open up society but selectively protect the most vulnerable people (e.g. nursing homes). But healthcare experts don’t like that idea, because those vulnerable people need care and assistance and services, and those will be provided by people who are part of the bigger society in which we are allowing the virus to spread.

      I guess I see the main hope to be a variation of Type 2: that is, a vaccine that is effective enough, and accepted by enough people, to slow down the reproduction rate of the virus. The eventual goal is for it to die out, but a more realistic near-term goal may be to reduce the likelihood that a random individual will catch it, while still improving the early treatment options (your Type 1) and reducing the mortality rate after getting to the hospital. This combination of things might allow us to live with the virus as we already live with measles and other diseases. However, we the US and we the world are not nearly there yet.

      • We are going to have to do what the Chinese did… And we’d better start lining up our testing kits, etc. (I wonder if Stormy Daniels is free, unattached, and available for future lockdowns, I certainly am.)

        • Unfortunately Steve, we are not the Chinese. Completely different group dynamics, habits, obedience to authority and experience with past deadly viruses. That’s why the IHME projections under-projected deaths earlier this year – they assumed that the US would react and recover the same way as the Asian countries. Attempting to copy China or any other country will not work for us. We are different, we are ourselves, we will have to find a way out of the pandemic that suits our people, our traditions and our strong independence.

Leave a Reply

Your email address will not be published. Required fields are marked *