"Spatial Variability of COVID-19 First Wave Severity and Transmission Intensity in Spain: The Influence of Meteorological Factors"

All of us are immersed in one of the greatest challenges that
humanity has faced in recent years...


Introduction
All of us are immersed in one of the greatest challenges that many unknowns. The one that has caught our attention and justifies this study is this: within the same country, Spain, with the same cultural aspects and containment policies (without lockdown), why in the initial moment of the COVID-19 first wave, given a significant number of infections, the disease prospered more intensely in some areas than in others? The proposed hypothesis is that the outbreak weather conditions are relevant factors which could be used as early indicators of the COVID-19 first wave severity and transmission intensity. This hypothesis agrees with previous studies that point out that cities with significant COVID-19 outbreaks have very similar climates pattern with relatively cool and dry environment [6,7] and other ones which show that other SARS virus outbreaks were significantly associated with the temperature and its variations [8,9].
Therefore, this study is going to focus on the initial moments of the disease spread and more specifically on the moment in which the infections acquire exponential character; it has happened in Spain in pre-lockdown conditions so without active measures of social distancing and with a minority masks use. This could reinforce the hypothesis of the importance of weather factors in this specific moment because at that time these factors have not had to compete with other more relevant factors such as policy or sanitary measures more significant once the pandemic has already started [10]. The initial hypothesis is also supported by the proven fact that the COVID-19 spread was favored by 4 causes which involve meteorology factors. Low temperatures and absolute humidity weak the immune system favoring the proliferation of infections [11,12] they also promote the persistence of the virus on surfaces and therefore its spread through fomites or direct contact [13,14] although this is now considered a minor mode [15].
Cold temperatures produce changes in habits towards less healthy routines favoring indoor places where COVID-19 transmission rates are nearly 20 times higher than outdoors [16][17][18]. Finally because aerosols are one of the main COVID-19 spread modes [16,19] and their dispersion in the air is affected by both variables [20].
However, the existence of a correlation between temperature and humidity with COVID-19 transmission is not yet clear [21] although most studies point towards a negative one [22,23]. Three studies carried out in specific regions of Spain obtain different results: negative correlation [24,25] and no significant association [26] between COVID-19 and temperature. For this reason, the objective of the present study is to continue analyzing if there is correlations between COVID-19 first wave transmission intensity and severity with average temperature and average absolute humidity of the early moments of the disease outbreak. The final goal is to design a model that allows us to predict the importance of a COVID-19 outbreak with early weather information. Some socioeconomics elements has been also used as controlling factors, in line with other works [27,28] in order to consolidate the results.
The analysis is focused on the 50 Spanish provinces to investigate and summarize what happened in the whole territory of the country during COVID-19 first wave.

Data and Methods
The health data were extracted from the National Epidemiological Surveillance Network The second dependent variable was the speed at which the daily incidence rate when the number of infections begins to grow exponentially becomes double. This variable, related to the COVID-19 first wave transmission intensity, was calculated it in each province ( Figure 2) and it is called outbreak incidence Rate Doubling Speed (RDS). The Spanish Meteorological Agency (AEMET) provided daily weather data, daily average temperature and daily average relative humidity, in 50 weather stations considered as reference of the 50 provinces in which the study was established. The outbreak average temperature was calculated in each province; this was at the moment in which the number of infections began to grow exponentially and it was defined by the daily incidence rate per 100,000 inhabitants exceeding the value of 5 [29]. The temperature has been averaged in the 13 days prior to exceeding that threshold and in this day. In order to take into account the longest COVID-19 incubation period [30]. This averaged value is called threshold average temperature (TT) and it will be the first explanatory weather variable. The overcoming of the outbreak threshold happened at different times in each of the Spanish provinces: the earliest start was in Álava on February 28 th and the latest in Murcia on March 24 th , none of them exceeding March 27 th (lockdown initial day plus 13 days) guaranteeing that our study was carried out in pre-lockdown conditions. The process has been repeated with daily average absolute humidity. For this purpose, the Clausius Clapeyron equation is used to calculate the daily average Absolute Humidity (AH) in g/m 3 from both the daily average temperature and relative humidity values [31,32].
where T is daily average temperature in ºC and RH is daily average relative humidity in %. Once obtained, the daily average absolute humidity has been averaged for 14

Maximum Daily Incidence Rate
Simple Linear Regression Model: Table 1 shows the regression and correlation coefficients of the two proposed simple linear regression models with threshold average temperature as explanatory variable and two targets: the maximum daily incidence rate and its natural logarithm. Both models show a very strong negative correlation, that is, higher threshold average temperature is associated with lower maximum daily incidence rate and vice versa. The best option is the one that results after applying the natural logarithm ( Figure 3a); this model explains 65.58% of the maximum daily incidence rate variability. Threshold average temperature above 13.02º C (99% CI, 14.82º to 11.23º C) are associated with low maximum daily incidence rate of the pandemic, that is with maximum daily incidence rate lower than 20. The model with maximum daily incidence rate natural logarithm (Table   1) was used to determine this threshold value. We repeat the same analysis with threshold average absolute humidity as explanatory variable. The best model, the natural logarithm of the maximum daily incidence rate (Figure 3b), explains 55.53% of its variability (Table 1). A strong negative correlation is reestablished between the variables. Threshold average absolute humidity above 7.37 g / m 3 (99% CI, 8.33 to 6.41 g / m 3 ) are associated with low maximum daily incidence rate of the pandemic, that is with maximum daily incidence rate lower than 20. The model with maximum daily incidence rate natural logarithm (Table 1) was used to determine this threshold value.

Multiple Linear Regression Model:
In order to confirm the results obtained and to refine them, a multiple linear regression model in which we incorporated all the variables is built. The dependent variable is the natural logarithm of the maximum daily incidence rate. First thing to note is that none of the coefficients for the provincial socioeconomic variables were statistically significant; so these variables, GDP, % of population equal to or older than 60 years, population density of most populated municipality and prelockdown intra-provincial movements, were excluded for the final model. A second result is the confirmation of a negative correlation between the maximum daily incidence rate and threshold average temperature and absolute humidity, the only two variables that the final model contains. This model explains 68.72% of the maximum daily incidence rate variation with a confidence level of 99% since the p-value returned by ANOVA is less than 0.01. Table 2   where MIR is maximum daily incidence rate, TT is threshold average temperature in 0,1ºCand TAH is threshold average absolute humidity in g / m 3 . The model obtained represents an advance with respect to the simple linear regression models proposal and it allows to anticipate, with moderate efficiency, one outbreak severity.

Principal Component Analysis: In order to analyze if there
is multicollinearity between the predictor variables to confirm the validity of the model found or to look for an alternative one The correlation matrix between the model variables (Table 3) clearly indicates the existence of multicollinearity (correlation values greater than 0.5). PCA indicates that there are two significant components (eigenvalue greater that 1) and these components explain 72.54% of the variability of the data (Table 4). In Figure 4 the spatial distribution of two principal components is displayed.
In the first component (Figure 4a), it can be observed two areas of different behavior, the north and center of the country characterized by low temperature and absolute humidity and the south characterized by higher temperature and absolute humidity.  (Table 5). Here it is important to highlight that the components are mathematically orthogonal, so the correlation between them are zero, that is, they are absolutely independent. PCA demonstrate that meteorological variables are almost independent of the socioeconomic variables. Finally, the set of 6 initial variables are reduced to 2: the first component is the one in which the meteorological factor has the greatest weight and divides the peninsula into two differentiated areas, as temperature is the most relevant factor, and the second one that support the socio-economic factors and therefore, it has greater weight in large cities. We explore the relationships with these two new explanatory variables with our target. Results indicate that there is a moderately strong correlation with first principal component and there is no evidence of relation with second principal component ( Table 6).
Our final model, which corrects the effect of multicollinearity and explains 60.72% of the maximum daily incidence rate variability, is:     Where MIR is maximum daily incidence rate, PD is population density of most populated municipality, MOV is pre-lockdown intraprovincial movements, AGE is % of population equal to or older than 60 years, TT is threshold average temperature in 0,1ºC and

Outbreak Incidence Rate Doubling Speed
Simple Linear Regression Model: Table 7 shows the regression and correlation coefficients obtained with threshold average temperature (TT) as explanatory variable and outbreak incidence Rate Doubling Speed (RDS) and its natural logarithm as targets. The best model (Figure 4a), the one that results after applying natural logarithm to the outbreak incidence rate doubling speed, explains 57.35% of its variability. It indicates a strong negative correlation between variables. Threshold average temperature above 13.69 ºC (99% CI, 15.63º to 11.75ºC), are associated with low outbreak incidence rate doubling speed, that is with low outbreak incidence rate doubling speed equal or lower than 1. The model with outbreak incidence rate doubling speed natural logarithm (Table 7) was used to determine this threshold value. When threshold average absolute humidity is employed as explanatory variable, the best model (Figure 4b) is the one that results after applying the natural logarithm to the outbreak incidence rate doubling speed: explains 32.85% of its variability (Table 7) and a moderately strong negative correlation between the variables. It is important to note that this is the model with lowest correlation index obtained. Table 7: Regression coefficients and results of Simple Linear Regression models (99% confidence interval, CI) with threshold average temperature (TT) and with threshold average absolute humidity (AH) as explanatory variables, and outbreak incidence rate doubling speed (RDS) and its natural logarithm as targets. B0 is the slope and B1 is the intercept.   Correlations obtained are in agreement with the majority of studies carried out [33]. Correlation does not imply causality but there is some evidence that in Spain the virus has harder time intensifying and spreading in warmer temperature and higher absolute humidity during the first wave. These results could also suggest a possible seasonal pattern of the COVID-19 disease. This is the first work presenting a model that allows predicting COVID-19 first wave severity and transmission intensity in the whole country, Spain, based on early average temperature and absolute humidity; but this study does not imply that these variables were a primary driver of COVID-19 transmission; more factors must be analyzed. This methodology can be extrapolated to other mid-latitude countries and will serve to show why cert areas compared to others have had more intense Covid-19 first wave episodes. The model obtained could be used as an useful supplement to help authorities to act quickly taking preventive measures and defining theirs COVID-19

Conclusion
combat strategy but its use is limited to future situations in which meteorological factors become relevant again [34,35] that is, when the current political and social restriction and health measures disappear when the disease becomes endemic and shows clearly its seasonal pattern.