#### Abstract

Within the same country, Spain, with the same cultural aspects and containment policies (without lockdown), why in the initial moment of the COVID-19 first wave, given a significant number of infections, the disease prospered more intensely in some areas than in others? The hypothesis is that the meteorological factors, that is, the outbreak weather conditions are relevant factors which could be used as early indicators of the COVID-19 first wave severity and transmission intensity. This paper presents a model that allows predicting COVID-19 first wave severity and transmission intensity in Spain based on early weather information. The weather explanatory variables were threshold average temperature and threshold average absolute humidity defined as daily average temperature and daily average absolute humidity averaged at the moment in which the number of infections began to grow exponentially and in its previous 13 days. Socioeconomic factors as independent variables were also employed. The used independent variables used are the maximum daily incidence rate and the incidence rate doubling speed defined as the speed at which the daily incidence rate when the number of infections begins to grow exponentially becomes double. A principal component analysis and a linear regression model approach proved the existence of correlation between the variables. Temperature is the most important driver followed by absolute humidity and the correlation found in both cases is negative. A 0.1ºC/1 g/m3 increase of threshold average temperature/absolute humidity is associated with an outbreak incidence rate doubling speed natural logarithm reduction of 0.219 and 0.193 respectively. A 0.1ºC/1 g/m3 increase of threshold average temperature/absolute humidity is associated with a maximum daily incidence rate natural logarithm reduction of 0.253 and 0.222 respectively. The results show that the virus has harder time intensifying and spreading in warmer temperature and higher absolute humidity during the first wave.

**Keywords:** SARS-CoV2; Incident Rate;
Outbreak Incident Rate Doubling Speed;
Correlation Analyses; Linear Regression

**Abbreviations:** WHO: World Health Organization;
MIR: Maximum daily Incidence
Rate; RDS: Rate Doubling Speed; AH: Absolute
Humidity; GDP: Gross Domestic
Product; PD: Populated Municipality; MOV:
Movements; PCA: Principal Component
Analysis; TT: Temperature

#### Introduction

All of us are immersed in one of the greatest challenges that humanity has faced in recent years. At the end of December 2019 in Wuhan, Hubei province, China, a new disease appeared to change everything. Called COVID-19 by the World Health Organization (WHO), this new respiratory infectious disease is the result of a novel coronavirus called SARS-CoV-2 that had not previously been identified in humans. On March 11th, 2020 the World Health Organization declared that COVID-19 can be characterized as a pandemic [1,2] and on March 13th, Europe was defined as the epicenter of the pandemic [3]. Spain was one of the most affected countries in the world during the COVID-19 first wave, that is, from March to June [4]. The National Center of Microbiology of Carlos III Institute of Health declared the first official COVID-19 case in Spain on January 31st in La Gomera, Canary Island [5]. At the beginning of March the situation worsened with a significant increase in infections, so a nationwide lockdown was imposed on March 14th [2]. Despite the abundance of articles that try to investigate and understand the evolutionary dynamics of the virus, there are still many unknowns. The one that has caught our attention and justifies this study is this: within the same country, Spain, with the same cultural aspects and containment policies (without lockdown), why in the initial moment of the COVID-19 first wave, given a significant number of infections, the disease prospered more intensely in some areas than in others? The proposed hypothesis is that the outbreak weather conditions are relevant factors which could be used as early indicators of the COVID-19 first wave severity and transmission intensity. This hypothesis agrees with previous studies that point out that cities with significant COVID-19 outbreaks have very similar climates pattern with relatively cool and dry environment [6,7] and other ones which show that other SARS virus outbreaks were significantly associated with the temperature and its variations [8,9].

Therefore, this study is going to focus on the initial moments of the disease spread and more specifically on the moment in which the infections acquire exponential character; it has happened in Spain in pre-lockdown conditions so without active measures of social distancing and with a minority masks use. This could reinforce the hypothesis of the importance of weather factors in this specific moment because at that time these factors have not had to compete with other more relevant factors such as policy or sanitary measures more significant once the pandemic has already started [10]. The initial hypothesis is also supported by the proven fact that the COVID-19 spread was favored by 4 causes which involve meteorology factors. Low temperatures and absolute humidity weak the immune system favoring the proliferation of infections [11,12] they also promote the persistence of the virus on surfaces and therefore its spread through fomites or direct contact [13,14] although this is now considered a minor mode [15]. Cold temperatures produce changes in habits towards less healthy routines favoring indoor places where COVID-19 transmission rates are nearly 20 times higher than outdoors [16-18]. Finally because aerosols are one of the main COVID-19 spread modes [16,19] and their dispersion in the air is affected by both variables [20].

However, the existence of a correlation between temperature and humidity with COVID-19 transmission is not yet clear [21] although most studies point towards a negative one [22,23]. Three studies carried out in specific regions of Spain obtain different results: negative correlation [24,25] and no significant association [26] between COVID-19 and temperature. For this reason, the objective of the present study is to continue analyzing if there is correlations between COVID-19 first wave transmission intensity and severity with average temperature and average absolute humidity of the early moments of the disease outbreak. The final goal is to design a model that allows us to predict the importance of a COVID-19 outbreak with early weather information. Some socioeconomics elements has been also used as controlling factors, in line with other works [27,28] in order to consolidate the results. The analysis is focused on the 50 Spanish provinces to investigate and summarize what happened in the whole territory of the country during COVID-19 first wave.

#### Data and Methods

The health data were extracted from the National Epidemiological Surveillance Network (RENAVE) provided by National Center of Microbiology of Carlos III Institute of Health from February 1st to May 31st, 2020 for the 50 Spanish provinces. Specifically, the used data are the provincial daily incidence rate per 100,000 inhabitants, that is, the number of new daily positive COVID-19 cases in each province divided by the population at risk of the disease and all multiplied by 100,000 inhabitants. COVID-19 positive cases were defined from the PCR test with a positive result in 99.74% of the data. The remainder was diagnosed by symptoms compatible with the disease. With these data, the Maximum daily Incidence Rate (MIR) reached in each of the 50 provinces during the first wave was calculated. This variable, which can be understand as a measure of the COVID-19 first wave severity, showed a highly differentiated spatial distribution (Figure 1). There were provinces with MIR twenty times higher than others, as Soria (127.5) in comparison with Huelva (5.4). This graph represented the starting point of the study since the variable was our first dependent variable.

The second dependent variable was the speed at which the daily incidence rate when the number of infections begins to grow exponentially becomes double. This variable, related to the COVID-19 first wave transmission intensity, was calculated it in each province (Figure 2) and it is called outbreak incidence Rate Doubling Speed (RDS). The Spanish Meteorological Agency (AEMET) provided daily weather data, daily average temperature and daily average relative humidity, in 50 weather stations considered as reference of the 50 provinces in which the study was established. The outbreak average temperature was calculated in each province; this was at the moment in which the number of infections began to grow exponentially and it was defined by the daily incidence rate per 100,000 inhabitants exceeding the value of 5 [29]. The temperature has been averaged in the 13 days prior to exceeding that threshold and in this day. In order to take into account the longest COVID-19 incubation period [30]. This averaged value is called threshold average temperature (TT) and it will be the first explanatory weather variable. The overcoming of the outbreak threshold happened at different times in each of the Spanish provinces: the earliest start was in Álava on February 28th and the latest in Murcia on March 24th, none of them exceeding March 27th (lockdown initial day plus 13 days) guaranteeing that our study was carried out in pre-lockdown conditions. The process has been repeated with daily average absolute humidity. For this purpose, the Clausius Clapeyron equation is used to calculate the daily average Absolute Humidity (AH) in g/m3 from both the daily average temperature and relative humidity values [31,32].

where T is daily average temperature in ºC and RH is daily average relative humidity in %. Once obtained, the daily average absolute humidity has been averaged for 14 days, in the same way as the daily average temperature, to end up obtaining the second independent weather variable called threshold average Absolute Humidity (AH). National Institute of Statistics has provided with various socioeconomics factors which were used as independent variables: Gross Domestic Product (GDP), Percentage of Population equal to or Older than 60 years (AGE), Population Density of most Populated Municipality (PD) and pre-lockdown intra-provincial Movements (MOV) obtained through mobile phone positioning estimates Several simple linear regression models have been constructed to explore the individual relationships established between the independent weather variables (threshold average temperature and threshold average absolute humidity) with maximum daily incidence rate and outbreak incidence rate doubling speed. The correlation coefficient gave a measure of the linear association obtained. Subsequently, a multiple linear regression models have been elaborated in which all the factors, both meteorological and socioeconomic, are incorporated: A backward technique was applied, in which all the variables were initially incorporated and regressors were progressively eliminated from lower to higher contribution until it was significant enough not to be eliminated. In order to corroborate multiple linear regression findings, a Principal Component Analysis (PCA) was done. This analysis enables to control and avoid multicollinearity of the predictors and to drop our least important variables. Finally, a linear regression models of the dependent variables against the reduced set of principal components was done in order to obtain de best and stable final model. All data have been analyzed using the statistical program Statgraphics©.

#### Results

### Maximum Daily Incidence Rate

**Simple Linear Regression Model:** Table 1 shows the
regression and correlation coefficients of the two proposed simple
linear regression models with threshold average temperature as
explanatory variable and two targets: the maximum daily incidence
rate and its natural logarithm. Both models show a very strong
negative correlation, that is, higher threshold average temperature
is associated with lower maximum daily incidence rate and vice
versa. The best option is the one that results after applying the
natural logarithm (Figure 3a); this model explains 65.58% of
the maximum daily incidence rate variability. Threshold average
temperature above 13.02º C (99% CI, 14.82º to 11.23º C) are
associated with low maximum daily incidence rate of the pandemic,
that is with maximum daily incidence rate lower than 20. The
model with maximum daily incidence rate natural logarithm (Table
1) was used to determine this threshold value. We repeat the same
analysis with threshold average absolute humidity as explanatory
variable. The best model, the natural logarithm of the maximum
daily incidence rate (Figure 3b), explains 55.53% of its variability
(Table 1). A strong negative correlation is reestablished between
the variables. Threshold average absolute humidity above 7.37 g /
m3 (99% CI, 8.33 to 6.41 g / m3) are associated with low maximum
daily incidence rate of the pandemic, that is with maximum daily
incidence rate lower than 20. The model with maximum daily
incidence rate natural logarithm (Table 1) was used to determine
this threshold value.

**Multiple Linear Regression Model:** In order to confirm the
results obtained and to refine them, a multiple linear regression
model in which we incorporated all the variables is built. The
dependent variable is the natural logarithm of the maximum daily
incidence rate. First thing to note is that none of the coefficients for
the provincial socioeconomic variables were statistically significant;
so these variables, GDP, % of population equal to or older than 60
years, population density of most populated municipality and prelockdown
intra-provincial movements, were excluded for the final
model. A second result is the confirmation of a negative correlation
between the maximum daily incidence rate and threshold average
temperature and absolute humidity, the only two variables that the
final model contains. This model explains 68.72% of the maximum
daily incidence rate variation with a confidence level of 99% since
the p-value returned by ANOVA is less than 0.01. Table 2 contains
the regression coefficients of the obtained model. The model
equation obtained is:

where MIR is maximum daily incidence rate, TT is threshold average temperature in 0,1ºCand TAH is threshold average absolute humidity in g / m3. The model obtained represents an advance with respect to the simple linear regression models proposal and it allows to anticipate, with moderate efficiency, one outbreak severity.

**Principal Component Analysis:** In order to analyze if there
is multicollinearity between the predictor variables to confirm the
validity of the model found or to look for an alternative one The
correlation matrix between the model variables (Table 3) clearly
indicates the existence of multicollinearity (correlation values
greater than 0.5). PCA indicates that there are two significant components (eigenvalue greater that 1) and these components
explain 72.54% of the variability of the data (Table 4). In Figure 4
the spatial distribution of two principal components is displayed.
In the first component (Figure 4a), it can be observed two
areas of different behavior, the north and center of the country
characterized by low temperature and absolute humidity and the
south characterized by higher temperature and absolute humidity.
This component doesn’t indicate any characteristic related with
socioeconomic variables, meanwhile the second component (Figure
4b) is related with them showing different behavior in populated
areas in the center (Madrid and its influence’s area), north east
(including Barcelona and its influence’s area, Valencia and Balearic
Islands), north (Guipuzcoa y Bizcaya), and two little centers around
Sevilla and A Coruña. This explanation about the components can
be corroborated by the coefficients of the equations that define the
first and second principal components (Table 5). Here it is important
to highlight that the components are mathematically orthogonal, so
the correlation between them are zero, that is, they are absolutely
independent. PCA demonstrate that meteorological variables are
almost independent of the socioeconomic variables. Finally, the set
of 6 initial variables are reduced to 2: the first component is the
one in which the meteorological factor has the greatest weight and
divides the peninsula into two differentiated areas, as temperature
is the most relevant factor, and the second one that support the
socio-economic factors and therefore, it has greater weight in large
cities. We explore the relationships with these two new explanatory
variables with our target. Results indicate that there is a moderately
strong correlation with first principal component and there is no
evidence of relation with second principal component (Table 6).
Our final model, which corrects the effect of multicollinearity and
explains 60.72% of the maximum daily incidence rate variability, is:

Where MIR is maximum daily incidence rate, PD is population density of most populated municipality, MOV is pre-lockdown intraprovincial movements, AGE is % of population equal to or older than 60 years, TT is threshold average temperature in 0,1ºC and TAH is threshold average absolute humidity in g / m3. Most weight factors are threshold average temperature and threshold average absolute humidity and those with the least are population density of most populated municipality and pre-lockdown intra-provincial movements. A0.1ºC increase of threshold average temperature is associated with a maximum daily incidence rate natural logarithm reduction of 0.253 (99% CI, 0.301 to 0.205). A 1 g/m3 rise in threshold average absolute humidity is related with a maximum daily incidence rate natural logarithm reduction of 0.222 (99% CI, 0.270 to 0.174).

### Outbreak Incidence Rate Doubling Speed

**Simple Linear Regression Model:** Table 7 shows the regression
and correlation coefficients obtained with threshold average
temperature (TT) as explanatory variable and outbreak incidence
Rate Doubling Speed (RDS) and its natural logarithm as targets. The
best model (Figure 4a), the one that results after applying natural
logarithm to the outbreak incidence rate doubling speed, explains
57.35% of its variability. It indicates a strong negative correlation
between variables. Threshold average temperature above 13.69
ºC (99% CI, 15.63º to 11.75ºC), are associated with low outbreak
incidence rate doubling speed, that is with low outbreak incidence
rate doubling speed equal or lower than 1. The model with outbreak
incidence rate doubling speed natural logarithm (Table 7) was
used to determine this threshold value. When threshold average
absolute humidity is employed as explanatory variable, the best
model (Figure 4b) is the one that results after applying the natural
logarithm to the outbreak incidence rate doubling speed: explains
32.85% of its variability (Table 7) and a moderately strong negative
correlation between the variables. It is important to note that this is
the model with lowest correlation index obtained.

**Principal Components Regression:** As it was demonstrated,
there is multicollinearity so the relationship was established
between our target, that is, outbreak incidence rate doubling
speed, and the two principal components through a simple linear
regression model. Results obtained were very similar as the
previous case: there was a moderately strong correlation with
first principal component and no relation was found with second
principal component (Table 8). Our final model, which corrects the
multicollinearity effect and explains 52.55% of outbreak incidence
rate doubling speed variability is:

where RDS is outbreak incidence rate doubling speed, PD is population density of most populated municipality, MOV is prelockdown intra-provincial movements, AGE is % of population equal to or older than 60 years, TT is threshold average temperature in 0,1ºC and TAH is threshold average absolute humidity in g / m3. A 0.1ºC increase of threshold average temperature is associated with an outbreak incidence rate doubling speed natural logarithm reduction of 0.219 (99% CI, 0.249 to 0.190). A 1 g/m3 rise in threshold average absolute humidity is related with an outbreak incidence rate doubling speed natural logarithm reduction of 0.193 (99% CI, 0.219 to 0.167).

#### Conclusion

A statistical analysis to evaluate if outbreak average temperature and average absolute humidity could be use as early indicators of severity and transmission intensity of COVID-19 first wave in Spain has been presented in this work. The existence of correlation between two dependent variables and both meteorological and economic factors has been confirmed. Nevertheless, socioeconomic factors employed are less important than weather factors, particularly population density of most populated municipality and pre-lockdown intra provincial movements. Temperature is the most important driver followed by absolute humidity and the correlation found in both cases is negative. A 0.1ºC / 1 g/m3 increase of threshold average temperature / absolute humidity is associated with an outbreak incidence rate doubling speed natural logarithm reduction of 0.219 (99% CI, 0.249 to 0.190) and 0.193 (99% CI, 0.219 to 0.167) respectively. A 0.1ºC / 1 g/m3 increase of threshold average temperature/absolute humidity is associated with a maximum daily incidence rate natural logarithm reduction of 0.253 (99% CI, 0.301 to 0.205) and 0.222 (99% CI, 0.270 to 0.174) respectively. Correlations obtained are in agreement with the majority of studies carried out [33]. Correlation does not imply causality but there is some evidence that in Spain the virus has harder time intensifying and spreading in warmer temperature and higher absolute humidity during the first wave. These results could also suggest a possible seasonal pattern of the COVID-19 disease. This is the first work presenting a model that allows predicting COVID-19 first wave severity and transmission intensity in the whole country, Spain, based on early average temperature and absolute humidity; but this study does not imply that these variables were a primary driver of COVID-19 transmission; more factors must be analyzed. This methodology can be extrapolated to other mid-latitude countries and will serve to show why cert areas compared to others have had more intense Covid-19 first wave episodes. The model obtained could be used as an useful supplement to help authorities to act quickly taking preventive measures and defining theirs COVID-19 combat strategy but its use is limited to future situations in which meteorological factors become relevant again [34,35] that is, when the current political and social restriction and health measures disappear when the disease becomes endemic and shows clearly its seasonal pattern.

#### Acknowledgement

The authors gratefully acknowledge Project ENPY 221/20 grant from the Carlos III Institute of Health. The authors also wish to thank the Spanish Meteorological Agency and the Spanish Health Ministry for providing the datasets.

#### Disclaimer

The researchers declare that they have no conflict of interest that would compromise the independence of this research work. The views expressed by the authors do not necessarily coincide with those of the institutions they are affiliated with.

#### References

- (2020a) WHO.
- (2020) BOE. Boletín Oficial del Estado Españ
- (2020b) WHO.
- (2020) The Lancet. COVID-19 in Spain: a predictable storm? The Lancet Public Health 5(11): e568.
- (2020a) Ministerio de Sanidad de Españ
- Sajadi MM, Habibzadeh P, Vintzileos A, Shokouhi S, Miralles-Wilhelm F, et al. (2020) Temperature, Humidity, and Latitude Analysis to Estimate Potential Spread and Seasonality of Coronavirus Disease 2019 (COVID-19). JAMA Netw Open 3(6): e2011834.
- EI Khattabi EM, Zouini M, Jamil MO (2020) The Thermal Constituting of the Air Provoking the Spread of (COVID-19). SSRN Electronic Journal.
- Tan J, Mu L, Huang J, Yu S, Chen B, et al. (2005) An initial investigation of the association between the SARS outbreak and weather: with the view of the environmental temperature and its variation. J Epidemiol Community Health 59(3): 186-192.
- Chan KH, Peiris JSM, Lam SY, Poon LLM, Yuen KY, et al. (2011) The effects of temperature and relative humidity on the viability of the SARS coronavirus. Advances in Virology 2011: 734690.
- Jüni P, Rothenbühler M, Bobos P, Thorpe KE, Da Costa BR, et al. (2020) Impact of climate and public health interventions on the COVID-19 pandemic: a prospective cohort study. CMAJ 192(21): E566-E573.
- Kudo E, Song E, Yockey L, Rakib T, Wong P, et al. (2019) Low ambient humidity impairs barrier function and innate resistance against influenza infection. Proceedings of the National Academy of Sciences 116(22): 10905-10910.
- Moriyama M, Hugentobler WJ, Iwasaki A (2020) Seasonality of Respiratory Viral Infections. Annu Rev Virol 7(1): 83-101.
- Chin AWH, Chu JTS, Perera MRA, Hui KPY, Yen HL, et al. (2020) Stability of SARS-CoV-2 in different environmental conditions.
- Casanova LM, Jeon S, Rutala WA, Weberç DJ, Sobsey MD (2010) Effects of air temperature and relative humidity on coronavirus survival on surfaces. Appl Environ Microbiol 76(9): 2712-2717.
- (2020) CDC. How COVID-19 Spreads.
- (2020) Ministerio de Ciencia e Innovación de Españ Informe científico sobre vías de transmisión.
- Nishiura H, Oshitani H, Kobayashi T, Saito T, Sunagawa T, et al. (2020) Closed environments facilitate secondary transmission of coronavirus disease 2019 (COVID-19). medRxiv.
- Qian H, Miao T, Liu L, Zheng X, Luo D, et al. (2020) Indoor transmission of SARS-CoV-2. medRxiv.
- (2020b) Ministerio de Sanidad de Españ Evaluación del riesgo de transmisión de la COVID-19 por aerosoles: Medidas de prevención y recomendaciones. Technical report.
- Zhao L, Qi Y, Luzzatto-Fegiz P, Cui Y, Zhu Y (2020) COVID-19: Effects of Environmental Conditions on the Propagation of Respiratory Droplets. Nano letters 20(10): 7744-7750.
- Briz-Redón A, Serrano-Aroca A (2020a) The effect of climate on the spread of the COVID-19 pandemic: A review of findings, and statistical and modelling techniques. Progress in Physical Geography: Earth and Environment 44(5): 591-604.
- Wang J, Tang K, Feng K, Lin X, Lv W, et al. (2020) High temperature and high humidity reduce the transmission of COVID-19.
- Wu X, Nethery RC, Sabath BM, Baun D, Dominici F (2020) Exposure to air pollution and COVID-19 mortality in the United States. Science Advances.
- Abdollahi A, Rahbaralam M (2020) Effect of temperature on the transmission of COVID- 19: A machine learning case study in Spain. medRxiv.
- Tobías A, Molina T (2020) Is temperature reducing the transmission of COVID-19? Environmental Research 186: 109553.
- Briz-Redón, A, Serrano-Aroca A (2020b) A spatio-temporal analysis for exploring the effect of temperature on COVID-19 early evolution in Spain. Science of the Total Environment 728: 138811.
- Hamidi S, Sabouri S, Ewing R (2020) Does density aggravate the COVID-19 pandemic? J Am Plann Assoc, p. 1-15.
- Sahoo PS, Powell MA, Mittal S, Garg VK (2020) Is the transmission of novel coronavirus disease (COVID-19) weather dependent? Journal of the Air & Waste Management Association 70(11): 1061-1064.
- (2020) CNE-ISCIII and Ciberesp. Factores de Difusión COVID-19 en Españ
- Cowling BJ, Aiello AE (2020) Public Health Measures to Slow Community Spread of Coronavirus Disease. The Journal of Infectious Diseases 221(11): 1749-1751.
- Herrmann H, Bucksch H (2014) Clausius-Clapeyron equation. Dictionary Geotechnical Engineering/Wörterbuch GeoTechnik.
- Gupta S, Raghuwanshi GS, Chanda A (2020) Effect of weather on COVID-19 spread in the US: A prediction model for India in 2020. Sci Total Environ 728: 138860.
- Mecenas P, Moreira Bastos RT, Vallinoto AC, Normando D (2020) Effects of temperature and humidity on the spread of COVID-19: A systematic review. PLOS ONE.
- Baker RE, Wenchang Y, Vecchi GA, Metcalf CJ, Grenfell BT (2020) Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic. Science 369(6501): 315-319.
- Jolliffe IT (2002) Principal Component Analysis Springer Series in Statistics. Springer Science.