Prediction of hospital-onset COVID-19 infections using dynamic networks of patient contact: an international retrospective cohort study
Introduction
- Harapan H
- Itoh N
- Yufika A
- et al.
Hospital-onset COVID-19 infections (HOCIs) have been reported to account for 12·0–15·0% of all COVID-19 cases in health-care settings and up to 16·2% at the peaks of the pandemic.
- Barranco R
- Vallega Bernucci Du Tremoul L
- Ventura F
Although their effect is yet to be fully quantified, HOCIs amplify the pandemic by seeding further outbreaks.
Although these approaches alone can perform reasonably well in identifying predictive risk factors of HCAIs, they overlook the fact that nosocomial spread of infection depends largely on the patient’s contacts,
which are heterogeneous
and vary over time.
Contact tracing of infected patients is effective at identifying disease super-spreaders,
- Lloyd-Smith JO
- Schreiber SJ
- Kopp PE
- Getz WM
who are strong HOCI drivers,
- Illingworth CJR
- Hamilton WL
- Warne B
- et al.
,
- Lumley SF
- Constantinides B
- Sanderson N
- et al.
and secondary cases and has played a pivotal role in national COVID-19 responses.
- Ge Y
- Martinez L
- Sun S
- et al.
,
- Ferretti L
- Wymant C
- Kendall M
- et al.
,
- Kendall M
- Milsom L
- Abeler-Dörner L
- et al.
However, exploiting the entire contact network, rather than direct contacts to individuals with known infection alone, provides greater information to characterise transmission.
Indeed, early in the COVID-19 pandemic, population mobility and interactions guided national policy to reduce transmission.
- Liu Y
- Wang Z
- Rader B
- et al.
In health-care settings, the overall number of direct contacts of a patient is predictive of HCAI.
- Rewley J
- Koehly L
- Marcum CS
- Reed-Tsochas F
,
- Hamel M
- Zoutman D
- O’Callaghan C
,
- Shaughnessy MK
- Micielli RL
- DePestel DD
- et al.
,
- Karan A
- Klompas M
- Tucker R
- et al.
Yet, these studies
- Rewley J
- Koehly L
- Marcum CS
- Reed-Tsochas F
,
- Hamel M
- Zoutman D
- O’Callaghan C
,
- Shaughnessy MK
- Micielli RL
- DePestel DD
- et al.
,
- Karan A
- Klompas M
- Tucker R
- et al.
fail to use the full dynamic information of contacts.
- Pastor-Satorras R
- Vespignani A
Evidence before this study
Throughout the COVID-19 pandemic, health-care facilities have had considerable numbers of hospital-onset COVID-19 infections (HOCIs). Despite substantially higher rates of COVID-19 morbidity and mortality among hospitalised patients, predictive models of HOCI are yet to be fully used in health-care settings. To address this gap, we have designed a machine-learning framework that integrates dynamic patient contact-networks with traditional patient clinical risk factors and contextual hospital variables. Patient contact networks are a natural approach to model the contact-mediated transmission of COVID-19 and other infectious diseases. Our study investigates the use of contact-network variables in predicting HOCIs at the patient level and their generalisability to various hospital settings. We performed two searches on PubMed (Sept 22, 2021) for English-language articles. Search one was on prediction of HOCIs, using the search terms “hospital-onset COVID-19 infections”, “nosocomial COVID-19”, “prediction”, and “forecasting”; search two was on the use of contact-networks for prediction of infections acquired in health-care settings, based on the search terms “healthcare-acquired infections”, “nosocomial infections”, “prediction”, “forecasting”, “contact networks”, and “dynamic contact networks”. Search one identified no studies performing a comprehensive investigation into risk factors of HOCI at the patient level. Although several works examined HOCI epidemiology, providing characterisation of contacts, these studies were performed at single hospital sites, with few patients, and did not include a risk-factor analysis. Other studies examined risk factors for predicting patient risk of COVID-19 on hospital admission; however, by definition, these studies target only community-onset COVID-19 infections and not HOCIs and thus do not capture the in-hospital sources of exposure risk. Search two identified studies that used the total number of patient contacts or the total number of contacts with infectious cases. However, no studies of infections acquired in health-care settings incorporated contact connectivity beyond a patient’s immediate contacts to predict infection risk. Furthermore, the studies found in our searches did not use sophisticated network-theoretical measures or modelling techniques to predict individual patient risk, nor did they account for the time-varying nature of the contacts.
Added value of this study
To our knowledge, this is the first study to forecast HOCIs at the patient level by constructing contact-networks from routinely collected hospital bed records. To investigate the predictive use of patient contact-networks, we used a large multinational hospital dataset collected throughout extended periods of the COVID-19 pandemic in two hospital groups; one in London, UK, and one in Geneva, Switzerland. Using these datasets, we constructed and generalised models to predict HOCIs at the patient level both with and without measures of patient centrality calculated using the dynamic patient contact-networks. Our results show that variables extracted from patient contact-networks are strong predictors of HOCI in both testing and validation. Such network measures lead to improved prediction over standard risk-factor models on the basis of patient clinical data or hospital contextual variables. Most network-derived variables were significantly elevated in HOCIs, emphasising their importance as risk factors.
Implications of all the available evidence
This study shows that dynamic contact-networks provide novel sources of predictive power for respiratory infections acquired in health-care settings, improving the performance of traditional risk-factor prediction models for HOCIs. Contact-network-derived risk factors have the potential to enhance individualised infection prevention and early diagnosis. We designed a machine-learning framework to extract contact risk factors using routinely available bed administrative data and showed its novel and generalisable prediction power. The framework can be used in real time to generate daily risk predictions as part of a suite of surveillance tools in modern, data-driven infection prevention and control strategies.
In this study, we combine dynamic networks of patient contacts (based on bed allocation records) with clinical attributes and hospital contextual data into a novel forecasting framework to predict patient risk of HOCI acquisition for targeting preventive interventions. As a proof of principle, we perform a retrospective cohort study to assess the predictive power of risk factors that were extracted from patient-contact networks, constructed from routinely collected hospital data. We train and test models on a large London hospital dataset spanning the first two major UK surges of COVID-19 (ie, March 23–May 30, 2020 and Sept 7, 2020–April 24, 2021). We then validate the predictive gain from contact-network risk factors by applying the framework to an external dataset from a university-affiliated geriatric hospital in Geneva during surge one (ie, March 1–May 31, 2020) and to data from the same London hospital group after surge two (ie, after April 2–Aug 13, 2021) in the UK, when COVID-19 had become endemic.
Results

Figure 1Background hospital infections and contact structure across the study period
Daily number of new patients who tested positive for COVID-19 within the hospital (COCI and HOCI) varied substantially across the study period. A peak of 59 cases was reached on March 30, 2020, and a peak of 64 cases was reached on Jan 6, 2021, dipping to zero new daily cases over days during July, August, September, and October. The patient-contact network also varied across the study period, with differences in connectivity and size of patient-contact clusters between each of the infection surges and during the summer period. COCI=community-onset COVID-19 infection. HOCI=hospital-onset COVID-19 infection.
Table 1Univariate analysis of variable sets for control versus HOCI data
Table 2Summary of test and validation set performance across variable groups

Figure 2Model performance by variable set

Figure 3Epidemiology curves of study validation data
Newly identified COVID-19 cases are reported across time and are broken down by HOCI and COCI case types. (A) Non-UK (ie, Geneva) hospital caseload during an epidemic surge of cases. (B) UK hospital group after pandemic surges 1 and 2, when COVID-19 became endemic and non-surging. COCI=community-onset COVID-19 infection. HOCI=hospital-onset COVID-19 infection.
Discussion
We used network analysis in combination with machine learning to predict patient-level HOCI using routinely captured hospital data. To our knowledge, this is the first study to forecast individual patient HOCIs by extracting patient contact networks from bed records. Together with hospital contextual variables, we report patient contact-network centrality as a significant HOCI risk factor, able to increase predictive performance across all datasets analysed.
- Lanièce Delaunay C
- Saeed S
- Nguyen QD
physical distancing, presenteeism, environmental ventilation, and contaminated fomites, which can all be linked to particular patient groups.
- Kampf G
- Brüggemann Y
- Kaba HEJ
- et al.
In our training and testing data, patients managed in elderly care, general medicine, renal, and surgical units were significantly over-represented in the HOCI group (table 1). Staffing levels and stress in critical care; complex pathways and excess movements, resulting in high contacts amongst surgery patients; and the strong community links in renal wards might have exacerbated transmission. Older patients and male gender identity being significantly over-represented in HOCIs reflects known features of the wider pandemic.
- Wenham C
- Smith J
- Morgan R
Although IPC focuses on demographic and individual clinical risk variables,
- Sun Y
- Koh V
- Marimuthu K
- et al.
,
- Soltan AAS
- Kouchaki S
- Zhu T
- et al.
our results show that such fixed variables are least predictive overall. Modern IPC might therefore improve management of outbreaks by including contextual and dynamic risk factors.
- Chen C
- Packer S
- Hughes G
- Edeghere O
- Oliver I
- Birney E
These factors are consistent with the hospital contextual risk factors identified in our work. We found that background COVID-19 prevalence within the hospital group was the most predictive variable in our training and test data collected during pandemic surges. Although high case numbers increase transmission sources, background prevalence can also be a proxy for staffing stress and density changes, acting as potential exacerbators. Similarly, high HOCI risk from increased hospital-bed occupancy could be due to high patient loads, increased density, and staffing pressures, which make IPC challenging. Similar to other HCAIs, length of stay was significantly higher for HOCIs (table 1).
,
- Pastor-Satorras R
- Vespignani A
Length of stay and consecutive length of stay both being significantly longer in HOCIs than in controls also supports genomic analysis suggesting COVID-19 acquisition can be linked to previous admissions.
- Lumley SF
- Constantinides B
- Sanderson N
- et al.
Increased movement rates (ie, bed, room, ward, and site moves) were reported as a risk factor for HCAI locally,
- Boncea EE
- Expert P
- Honeyford K
- et al.
yet it was not significantly different for HOCIs in our data (table 1). The risk from movement rates alone is likely to be too general for HOCI, without specificity, and better captured via measures of contact-network centrality. Altogether, models based on hospital contextual variables showed strong predictive performance across epidemic surges. However, including network variables increased performance most notably in the endemic validation data (table 2).
- Peach RL
- Arnaudon A
- Schmidt JA
- et al.
HOCIs were significantly more central in contact networks. Few studies have used contact data to investigate HCAI, and most have considered only direct contacts (ie, network degree).
- Rewley J
- Koehly L
- Marcum CS
- Reed-Tsochas F
,
- Hamel M
- Zoutman D
- O’Callaghan C
,
- Shaughnessy MK
- Micielli RL
- DePestel DD
- et al.
,
- Karan A
- Klompas M
- Tucker R
- et al.
,
- Sun Y
- Koh V
- Marimuthu K
- et al.
Similarly, COVID-19 transmission analysis outside hospital settings has been limited to direct contacts.
- Ferretti L
- Wymant C
- Kendall M
- et al.
,
- Sun Y
- Koh V
- Marimuthu K
- et al.
Consistent with these studies, our results show direct contacts as a strong risk factor of infection. Yet, the infected contact network (ward), measuring network connectedness to all known infections, was more predictive than direct infectious contacts (ie, infected degree), suggesting the presence of longer and indirect transmission chains that can affect contact tracing. Alternatively, disrupting underlying network connectivity by targeting patients with high centrality, together with screening and isolation based on risk factors, could be effective to reduce onward transmission.
- Ho HJ
- Zhang ZX
- Huang Z
- Aung AH
- Lim W-Y
- Chow A
Aimed at exploiting these emerging sources of data, our dynamic disease forecasting framework is designed to be portable to a range of settings and variables. The framework offers precise individual predictions of risk of infection acquisition and is thus amenable for risk stratification in real time, which can serve to guide dynamic IPC resource allocation for rapid screening, isolation, and grouping of patients at high risk of infection acquisition. By incorporating complex multimodal data sources into a single measure of predicted risk, our framework produces relevant and actionable outputs preventing disease acquisition.
Major challenges to effective IPC activity are low bed capacity and inadequate and overwhelmed isolation capacity, in addition to insufficient staffing and microbiological testing resources. These challenges to IPC were vastly exacerbated by the COVID-19 pandemic. We envisage the proposed framework to be used within a modern, data-driven IPC patient management system and able to assist optimal decisions in real-world scenarios. The predicted risk score for each patient can be used by clinicians to rank and prioritise (eg, identify patients at high risk for infection for isolation or grouping followed by targeted enhanced testing). In this way, HOCIs could be identified at the earliest opportunity, which in turn could optimise IPC measures and treatment. Patients at low risk of infection acquisition could also be potentially moved back to regular patient management faster, saving resources that are in demand. However, further work is needed to evaluate the direct implications (ie, clinical and economic) of identifying patients at high risk of infection. In addition to actionable clinical points, a key aspect of this framework is its dynamism and its ability to generate insight on demand. By aggregating complex data sources into single interpretable risk scores, a range of risk sources and their interactions are made accessible to hospital teams. Such data-driven insights, always integrated within human decision making, can enable hospital teams to become more flexible and responsive to complex, rapidly emerging disease threats.
- Abbas M
- Robalo Nunes T
- Martischang R
- et al.
,
- Abbas M
- Robalo Nunes T
- Cori A
- et al.
indirect transmission over surfaces; non-room, ward, or building contact; or interactions from visitors. However, routinely collected patient bed allocations have been shown to capture implicitly non-patient interactions that align with organisational and speciality hospital structures.
- Myall AC
- Peach RL
- Weiße AY
- et al.
Staff and visitor contact data were not available in our data due to privacy restrictions, but such data should be investigated, in accordance with privacy preservations. Second, since our training and testing period occurred largely before the UK’s vaccination rollout, we were unable to include vaccination status as a patient variable. With increasing levels of natural and induced immunity, inclusion of vaccination and recovery status might improve predictions; emerging new variants and incomplete vaccine coverage
- Davies NG
- Abbott S
- Barnard RC
- et al.
make the levels of susceptibility uncertain. Third, patient ethnicity was not available in our study. Due to its contextual complexities, and being a previously identified risk factor,
- Sze S
- Pan D
- Nevill CR
- et al.
ethnicity warrants specific and increased investigation in the future. Fourth, our data did not include ventilation or specific information about room arrangements (appendix p 2), which contribute to COVID-19 transmission.
However, without accounting for ventilation, our models were highly predictive. Finally, various aspects of hospital organisation were altered across the pandemic, including changes in screening practice, personal protective equipment, or bed placement, which were not encoded here as variables.
Overall, our study emphasises that dynamic networks of patient contacts can aid personalised predictions of infection. Our study applies to respiratory virus transmission in hospital, using widely available patient bed records. Further work is needed to extend this framework to other infectious diseases, assessing the types of contact required for transmission, evaluating the implications of identifying a patient at high risk of infection acquisition, and understanding how it could be integrated into IPC more generally.
AM, JRP, RLP, SM, and MB contributed to study concept and design. JRP, MA, SM, NZ, and FR contributed to data acquisition. AM, JRP, MA, and SM contributed to data analysis and accessed and verified the underlying data. AM, JRP, RLP, MA, AH, and MB contributed to the initial manuscript drafting. All authors contributed to data interpretation and final revisions of the manuscript. AH and MB contributed to study supervision. AM, JRP, RLP, MA, SM, SH, AH, and MB contributed to the discussion of the results and reviewed the data. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.