What can we learn from the Delhi seroprevalence study and a single proportion that was reported?

11 min readJul 31, 2020

A list of contributors to this piece is provided at the end of this report. For detailed analytic formulation and code, please click here for the technical preprint [direct download link, please check downloads folder].

Almost one in four people in Delhi had been infected by the novel coronavirus by July 10 according to a serological survey conducted by the National Centre for Disease Control (NCDC) [The Hindu; Indian Express 1; Indian Express 2]. The survey found that 22.86% of the 21,387 serology samples collected across 11 districts of Delhi had detected IgG antibodies against SARS-CoV-2. This implies that Delhi, with approximately 20 million (2 crore) people, had a total number of cases standing somewhere around 4.6 Million (46 lakhs) by July 10. Delhi has reported only 133,310 cases and 3,907 through July 29 [covid19india.org]. On July 10, the number of cases reported for Delhi stood at 109,140 and number of deaths at 3,300. This gives you an underreporting factor for case counts at 42; it tells you that only 2.4% of cases are being detected and also implies that the infection fatality rate (IFR) for Delhi is of the order of 0.07% (3300/4.6M) or 717 per million if we believe the reported death data. The IFR seems low compared to estimates worldwide [median IFR 0.27% across 36 studies worldwide according to Ioannidis 2020] which is indeed a good news. However, it may be reasonable to argue that just like cases, deaths could also possibly be largely unreported in India [Financial Times; VICE; Times of India] and/or the cause of death could be misclassified.

There are several questions this seroprevalence study is raising: Should the alarmingly high number of asymptomatic cases worry us or make us happy that we are closer to herd immunity? Does being antibody positive mean you have immunity to the COVID-19 virus and are not contagious or at risk of re-infection? Should we trust the reported death counts and take comfort in the projected low IFR, and let life unfold and resume normalcy in Delhi? Should we be cavalier about non-pharmaceutical interventions such as: social distancing, mask wearing, hand washing, quarantine and self-isolation upon potential exposure, avoiding large gatherings and crowded indoor spaces? Should we now feel complacent as the chance of an infected person meeting a susceptible person has gone down and you are more likely to meet an antibody positive person? Let us take a deep dive through the lens of data and mathematical modeling into these numbers and see what we can say.

Which tests were used for COVID-19?

It is important to clarify that there are two tests that are being discussed in the COVID-19 literature: a diagnostic test and an antibody test. The diagnostic test, an RT-PCR test, is used to identify the presence of SARS-CoV-2, indicating an active infection. The antibody test, a serology test — like the ones performed in the NCDC study — looks for the presence of antibodies, the immune system’s response to fight off SARS-CoV-2, indicating a past infection. Figure 1 gives you a timeline in terms of when these tests are typically done during the time course of an infection. Due to a large number of asymptomatic cases and limited number of tests, many COVID infections do not get detected. A population-based seroprevalence survey gives us an idea about the “true number of infections” including both reported and unreported cases. The discrepancy between these two numbers gives you a sense of the “ascertainment rate”. For Delhi it appears only 1 out of 42 cases (109140/4.6M) are detected.

**Figure 1. Timeline of RT-PCR diagnostic test and Antibody test for SARS-CoV-2 infection.**

Are these tests perfect?

No — every test has false negatives and false positives. We worry more about the false negatives on the RT-PCR test (than false positives) because that is the situation when you are told you do not have the virus and you go about doing business as usual (possibly spreading the infection to others) where in fact you do have the virus. Similarly, we worry about the false positives of an antibody (Ab) test (more than false negatives) because you think you have already had a past infection and are not likely to get re-infected and take chances when you really do not have the antibody. Figure 2 gives you a schematic describing the error rates and properties of a test. The RT-PCR test is quoted to have a high FN (as high as 30%, with corresponding sensitivity as low as ~70%) and low FP (less than 5%, with corresponding specificity > 95%). The Ab test assays are more precise. The commercial assays have sensitivity around 97.6% and specificity of 99.3% at about 15 days after infection [Tran, Cohen, Waldman & May; US Food & Drug Administration (DiaSorin)]. The assay used by NCDC is a customized ELISA assay and we could not find the exact numbers and came across three different estimates. The bottom line is that we need to adjust for these testing error rates when we report case-counts (RT-PCR test) or seroprevalences (Ab Test). This has not been accounted for in most studies.

**Figure 2. Schematic explanation of properties of a diagnostic test. The same scheme applies to serological tests for presence of IgG antibodies.**

Can we correct for these testing errors?

We performed a series of analyses assuming different test sensitivities and specificities. Based on these calculations we believe that there are somewhere between 4.4–4.6 million individuals in Delhi who have had COVID-19 by July 10 and that the number of actual cases is somewhere between 30–42x the number of reported cases (Table 1). If we believe the observed death data, the infection fatality rate is around 0.07–0.08%. Just like cases, deaths could be underreported as well. Assuming one out of ten COVID-related deaths in Delhi are reported, we get an infection fatality rate of around 0.7–0.8%. These are NOT small numbers on the absolute scale given India’s population.

**Table 1. True cases, reported cases, underreporting factors, case-fatality rates and infection fatality rates across various misclassification error rates.**

At this observed IFR if 50% people in India (with a population of 1.38 billion) get infected and if we believe the reported death data, there will be 552,000 deaths. If we believe one out of 10 deaths are reported then there will be somewhere around 5.5 million deaths. Thus the country-wide unrestricted spread has to be arrested. Every death count is a life loved by others and a life lost. Let us do our best to persevere till scientists get to a vaccine and treatments for COVID with inspiring research that is moving at a war-time speed [COVID-19 vaccine tracker].

Is this seroprevalence in Delhi an outlier in the world?

It is definitely on the higher side, even within worldwide epicenters and hotspots. Many serology studies have been conducted across the world (see Table 2). While the 22.86% found in the NCDC survey is the highest, New York City, a densely populated area much like Delhi, found a similar seroprevalence — 22.7%. The NCDC seroprevalence estimate states that there have been 42 times the number of actual COVID-19 cases than have been reported. This is indicative of low testing and poor ascertainment rate. Of course, we would never expect to identify every single case — particularly when diagnostic testing is scarce and many individuals do not show symptoms. And while this number demonstrates that testing has been very low, it is reassuring to know that a large proportion of these individuals are presumably not showing many severe symptoms and went through it without seeking medical care. A similar seroprevalence study in Mumbai shows in the slum areas the prevalence is as high as 57% while in other areas it is about 16%.

**Table 2. A summary of seroprevalence estimates for SARS-CoV-2 infections worldwide.**

What does this mean about community transmission?

We did not really need the seroprevalence study to confirm that there has been community transmission in Delhi. Community transmission means that there are cases in the community where it is not known how or where they became infected [CDC; Indian Express]. Following this classic textbook definition there is undoubtedly community transmission in Delhi and possibly in all populous states in India. When community transmission happens, we cannot map the cases to a tagged source and contact tracing becomes less effective. With 562,605 active confirmed cases in India, and 10,743 in Delhi, we definitely are not tracking/mapping many of the cases to an identifiable source of infection [as of July 30; covid19india.org].

So now only herd immunity will save us?

Experts have said that the herd immunity threshold is from 40–70%. This is inferred from the value of the basic reproduction number R0, the range for which is quoted from 1.8 to 3.6 worldwide [Petersen et al. 2020], giving us a range for the herd immunity threshold, (1–1/R0) from 44% to 73%. Even in Delhi, it will take us some time to get to herd immunity. Rest of India (rural India included) is possibly still quite far from the herd immunity threshold. Herd immunity may be attained over time but should not be actively pursued or chased without a vaccine.

While almost one quarter of Delhi is likely have some protection for the virus, there is still a large proportion of uninfected individuals who are susceptible. We also know little about how long the protection lasts and the contagiousness of Ab positive people. In short, anything that can be done to slow the spread of the virus so that healthcare capacity is not overwhelmed should continue to be done. India has1.38 billion people and while the curve is tapering off in states like Delhi, Mumbai, Tamil Nadu, we have several other states on an upward rise(see the metrics tab at covind19.org). With increased mobility and relaxation of lockdown, remote villages are now at risk with almost no testing and fragile healthcare infrastructure. Let us buckle up and do our best to be vigilant, socially responsible and cautiously optimistic based on these numbers .

What else do we need to know to understand the full implications of this study?

There are some important pieces of information that were not publicly reported in the NCDC serology survey but will be extremely helpful: response and positivity rates stratified by age, sex, job type, district; sampling design; exact sensitivity or specificity of the customized assay. Just releasing one number 22.86% without a complete report is not the best practice for propelling science and policy.

Most importantly, we do not know if individuals with antibodies have immunity (are protected from re-infection) or how long this protection lasts [NPR; Ibarrondo et al.]. We only know that people with antibodies are possibly protected in some fashion for some amount of time [Deng et al.]. We need to know more about the longevity of the ephemeral antibody response and the levels needed to protect us from re-infections. We also do not know much about whether an antibody-positive person is contagious or the severity of the clinical symptoms that they themselves may show. When in doubt, protect yourself and others.

Can models predict asymptomatic cases when we do not have a serosurvey?

Yes. A recent paper concluded that 87% cases in the beginning phases in Wuhan were unascertained [Hao et al. 2020]. They used modification of an SEIR model to illustrate the high transmissibility and covertness of this virus. We tried to extend this model to incorporate the false negative rates of an RT-PCR test. Following is the schematic diagram of this model (Figure 3). We try to introduce compartments for reported/tested number of cases, recoveries and deaths as well as unreported number of cases, recoveries and deaths.

**Figure 3. SEIR model with false negative rate for the RT-PCR diagnostic test.**

What we find under this model is a striking agreement with the seroprevalence study. This model estimates 4.8 million true infections (compared to the 4.6 million implied by the Delhi Seroprevalence Study) and 33,165 deaths (reported +unreported) on July 10 if we assume the RT-PCR test has a sensitivity of 0.85. We present model predictions for RT-PCR test sensitivities of 1, 0.85, and 0.7 in Figure 4. The underreporting factor for infections is between 35–53x depending on the sensitivity of the test.

**Figure 4. Predicted total number of cases and deaths (reported+unreported) by SEIR model incorporating imperfect RT-PCR testing.**

This is encouraging as seroconversions happen and seroprevalence studies need to be done repeatedly which is expensive in terms of cost and time. In contrast, fitting a model, solving equations, learning from data are much more affordable. Usually we cannot validate the estimate of asymptomatic cases in SEIR models as we do not get to observe it in reality. In this particular case the parallel seroprevalence data gives us a validation for our model. This model also predicts 8–13 times more death than have been observed, for which we do not have any validation data.

Conclusion

The virus feels like a storm in which we are lost. This report tells us that 96–97% cases in Delhi remained undetected. Yet, there is reason to be hopeful. With extensive testing, contact tracing, collective vigilance, social distancing we can change the narrative of this virus. People and Government of Delhi have demonstrated that. The curve has turned the corner and the reproduction number R has stayed below 1.0 for almost a month now (since June 29). Figure 5 shows you the virus trajectory for Delhi on July 30.

**Figure 5. The reported case trajectory in Delhi and the effective reproduction number over time.**

Though there is more hope with what we know now, there is no room for complacency. Each state in India has to reach a state of containment (defined as a few cases per million) for us to safely release the brake. Till then we will need to live a life of confinement, restriction and collective sacrifice. The task list is now known to all of us:

Government:

Testing and contact tracing
Protective gears and equipments for frontline healthcare workers
Symptom tracking
Centralized isolation and quarantine
Work with health care and public health experts
Ramping up healthcare capacity
Affordable access to healthcare and tests for the most vulnerable
Replicating/borrowing models of success from some states/low-resource communities in states where the curve is rising
Modulated-punctuated lockdowns in a data-agile way
Continue well-designed large-scale seroprevalence surveys
Most importantly, integrative leadership and coordination

The leaders and policymakers need to bring together health care, public health, social support, financial sectors and scientific experts to rally around our common good. We need a long term strategy, not headline management through discrete tactics.

Public:

Community engagement and participation
Trust in Science and Public Health
Wear masks, wash hands, and socially distance
Strike a balance between alarmism and denial while minimizing personal risks, acceptance of the long term nature of the pandemic is needed
Make collective sacrifices in terms of our social life to avoid large indoor gatherings and non-essential travel
Prioritize on well-being and mental health
Protect the most vulnerable in your own social radius

We are not defenseless against this virus. Let us buy scientists some time to get to the vaccines and treatments where there has been remarkable accelerated progress. Sacrifice, social responsibility, vigilance, resilience, acceptance, solidarity, discovery: lofty words, but history says that monumental crises have always needed these attributes from the human race. We are in this together, we can do better. We cannot take a fatalistic approach and surrender to destiny.

The COV-IND-19 Study Group is comprised of an interdisciplinary group of scholars and data scientists. The following members contributed to this piece:

Rupam Bhattacharyya — Department of Biostatistics, University of Michigan
Ritwik Bhaduri — Indian Statistical Institute, Kolkata
Ritoban Kundu —Indian Statistical Institute, Kolkata
Maxwell Salvatore — Departments of Biostatistics and Epidemiology and Center for Precision Health Data Science, University of Michigan
Bhramar Mukherjee — Departments of Biostatistics and Epidemiology and Center for Precision Health Data Science, University of Michigan

Contact Bhramar Mukherjee (bhramar@umich.edu) with questions and inquiries.