ESTIMATION OF DAILY MEAN TEMPERATURES: AN ACCURATE METHOD FOR THE DOURO VALLEY

Air temperature data from many locations worldwide are only available as series of daily minima and maxima temperatures. Historically, several different approaches have been used to estimate the actual daily mean temperature, as only in the last two or three decades automatic thermometers are able to compute its actual value. The most common approach is to estimate it by averaging the daily minima and maxima. When only daily minima and maxima are available, an alternative approach, proposed by Dall’Amico and Hornsteiner in 2006, uses the two daily extremes together with next day minima temperature and a coefficient related to the local daily astronomical sunset time. Additionally, the method uses two optimizable coefficients related to the region’s temperature profile. In order to use this approach it is necessary to optimize the region’s unknown parameters. For this optimization, it is necessary a dataset containing the maxima, minima, and the actual daily mean temperatures for at least one year. In this research, for the period 2007-2014, we used three datasets of minima, maxima and actual mean temperatures obtained at three automatic meteorological stations located in the Douro Valley to optimize the two unknown parameters in the Dall’Amico and Hornsteiner approach. Moreover, we compared the actual mean daily temperatures available from the three datasets with the correspondent values estimated by using i) the usual approach of averaging the daily maxima and minima temperatures and ii) the Dall’Amico and Hornsteiner approach. Results show that the former approach overestimates, on average, the daily mean temperatures by 0.5ºC. The Dall’Amico and Hornsteiner approach showed to be a better approximation of mean temperatures for the three meteorological stations used in this research, being unbiased relative to the actual mean values of daily temperatures. In conclusion, this research confirms that the Dall’Amico and Hornsteiner is a better approach to estimate the mean daily temperatures and provides the optimized parameters for three sites located at each of the three sub-regions of the Douro Valley ( Baixo Corgo, Cima Corgo and Douro Superior ).


INTRODUCTION
Temperatures gridded datasets start in 1850 because there are too few observations available from before this date (Brohan et al., 2006). However, local meteorological datasets exist since the 16 th century as, for example, the 1756-1998 daily temperature and pressure dataset for Stockholm (Moberg et al., 2002). Until a few decades ago, temperatures were recorded from several manual daily readings. For the Stockholm dataset, during 1756-1760, the temperature readings were made around sunrise and 13:00; during 1761-1783 the temperature readings were made around 6:30-8:00, 13:00 and 22:00-23:00; and in 1784 observations at 06:00, 14:00 and 21:00 were introduced for all variables. At the present time, effective practices for carrying out meteorological observations and measurements of temperature according to internationally agreed standards may be found in (World Meteorological Organization, 2014). Presently, there is still not a widespread consensus among meteorological organizations relative to the definition of mean daily temperature. For example the World Meteorological Organization defines the mean daily temperature as "Mean of the temperatures observed at 24 equidistant times in the course of a continuous interval of 24 hours (normally the mean solar day, from midnight to midnight, according to the zonal time or the mean solar time of the station); or a combination of temperatures observed at less numerous times, so arranged as to depart as little as possible from the mean defined above" (World Meteorological Organization, 2016). On the other hand, the Government of Canada defines the mean daily temperature as "The average of the maximum and minimum temperature at a location for a specified time interval" (Government of Canada, 2016).
Daily mean temperatures are used as input variables in a number of climatological, physical, ecological, agricultural, biological, and technical processes and applications.
Historically, several different approaches have been used to estimate the actual daily mean temperature as it was only in the last two or three decades that automatic thermometers that operate continuously are able to compute its mathematical value: . Several approaches estimate the daily mean temperature by averaging a collection of several daily temperature readings taken at different times of the day (Sakellariou and Kambezidis, 2016;Ma and Guttorp, 2013;Weiss and Hays, 2005).
In many countries and in many places, the daily mean temperature still is estimated using the common approach of averaging the maxima and minima temperatures of a day. In this research, this approach will be called the TAvg method. It is long known that the use of this method produces poor estimates of the actual mean daily temperature Kaemtz et al. (1845). In the Douro Valley (see Figure 1), meteorological and climatological studies about its present and future climate, as well as agricultural studies on the region's crops and agricultural products (grapes, olives, almonds, figs, oranges and cherries), use estimations of mean daily temperatures. Most of these studies have used the TAvg method to estimate the unknown real mean temperature (Santos and Malheiro, 2011;Jones, 2012;Jones and Alves, 2012;Jones 2013;Fraga et al., 2014;C. Real et al., 2014;C. Real et al., 2016;Fraga and Santos, 2017). Portugal e a localização da Região do Douro (mapas de relevo: www.maps-for-free.com).
An alternative estimation approach, proposed by Dall'Amico and Hornsteiner (2006) that we will call the Dall'Amico method, uses the two daily extreme temperatures together with the next day minima temperature and a coefficient related to the local daily astronomical sunset time (US Naval Observatory, 2017). Additionally, the method uses two optimizable coefficients, related to the temperatures profiles of the region. For the optimization of the two unknown parameters the authors suggest that a dataset containing the maxima, minima, and the actual daily mean temperatures for at least one year should be used to fit the coefficients to the available data.
This research used three datasets of maxima, minima and mean daily temperatures collected at three meteorological stations located at each of the three sub-regions of the Douro Valley (Baixo Corgo, Cima Corgo and Douro Superior -see Figure 2). As these datasets were obtained in automatic meteorological stations that are able to measure temperatures continuously, the mean temperatures in the datasets are the actual values. We compared the actual mean daily temperatures to the estimates obtained by using the TAvg method and by using the alternative method proposed by Dall'Amico and Hornsteiner (2006). The quality of the estimates obtained by the two methods was assessed. Localização das estações meteorológicas.
We are aware that the use of simple methods as the one proposed by Dall'Amico and Hornsteiner are prone to lead to poorer results when used in sites where a significant spatial variability exists. For these sites, local landscape may determine highly variable patterns for the evolution of temperatures along each day. This is the case of mountain regions such as the Douro Valley, interrupted longitudinally by the Douro river valley, where mesoscale weather processes, the existence of dynamically and thermally wind systems, sunshine exposition, clouds at site level and other factors, influence the daily evolution of the weather at the site. Nevertheless, after having the optimized coefficents for the Dall'Amico and Hornsteiner expression its use is intended to estimate the unknown real value of the mean temperature, using only the daily extreme temperatures.

Data
Three datasets of maxima, minima and mean daily temperatures for the period 2007-2014, collected at Cambres (91m elevation), Pinhão (107 m elevation) and Vilariça (171 m elevation) stations, located at each of the three sub-regions of the Douro Valley (Baixo Corgo, Cima Corgo and Douro Superior), have been used (Figure 2). Considering a constant lapse rate of 5.1 ºC/ km (elevation) (Stone and Carlson 1979), the difference in temperatures from station to station that may be inputable to the difference in the elevation is less than 0.4 ºC. A methodology developd by Feng et al. (2004) was used to identify data that was suspect of being erroneous. In order to homogenize the cleaned dataset the RHtestsV3 software package (Wang, 2011) was used.

Methods
Estimates of mean temperatures, obtained by using two methods of estimation (TAvg and Dall'Amico), were compared with the correspondent real mean temperatures.

TAvg method
The TAvg method is the most common and simple method for estimating the mean daily temperature. The mean daily temperature is estimated by averaging minima and maxima daily temperatures, see equation 1.

Dall'Amico method
Dall'Amico method (Dall'Amico and Hornsteiner 2006) is a method for estimating the mean daily temperature, , on the basis of minima and maxima daily temperatures. The method accounts for the temperature trend by including the minimum temperature on the following day in addition to the extremes on the day in question, see equation 2.
(2) Where: * S day i is the nondimensional astronomical sunset time on day i, S day i = t day i (sunset) / 24 h where t day i (sunset) is the astronomical 1 sunset time on day i. The astronomical sunset times for any location are 1 Dall' Amico and Hornsteiner (2006) have tested their methodology in two different sites: one located in a Euaropean flat terrain and the other located in an Alpine valley site. They concluded that the use of the actual sunset times yielded slightly worse results when compared to the astronomical sunset time.
available on the internet: e.g. http://aa.usno.navy.mil/data/docs/RS_OneYear.ph p; *C D is the proportionality coefficient for the day; *C N is the proportionality coefficient for the night.
As suggested in Dall' Amico and Hornsteiner (2006) seasonally varying coefficients C D and C N in equation 2 have been optimized partitioning years into two periods: one that will be designated as cold period and the other as warm period, C D Warm , C D Cold , C N Warm , and C N Cold . The optimization consisted on the minimization of the Root Mean Square Errors (RMSE) of the estimates for the mean daily temperatures at Cambres, Pinhão and Vilariça stations obtained by using equation 2, relative to their actual mean daily temperatures. Parameter optimization for the Dall'Amico method was performed by using the Microsoft Excel (2016) evolutionary solver algorithm. The optimization process has also determined the boundaries of the two periods (cold and warm).
For both methods, the respective Root Mean Square Errors (RMSE) and bias have been calculated in order to assess their relative accuracy.

Assessing model for overfitting
To assess if the estimates obtained by using the Dall'Amico method were overfitting 2 the data, we evaluated the model by using a cross-validation methodology. To that effect, 4-fold methodology was defined. The datasets of maxima, minima and mean daily temperatures from eight years (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) were separated into four groups (folds) having each group the data from two consecutive years. Six years of data (tree folds) were used to estimate the model parameters (training the model) and the two remaining years of data (one fold) were used to validate the accuracy of the model (testing the model).
Additionally, in order to assess if the model parameters would maintain valid for a period of seven years we evaluated the quality of the estimates obtained using a single year of data to compute the model parameters (training set) and the remaining seven years of data to test the model's estimates (testing set). We have done this evaluation using each single year of data to compute the model parameters (training set). We will present results naming this methodology as "validation 1/7".

RESULTS AND DISCUSSION
The values for the boundaries of the cold and warm periods resulted from the optimization process: warm period: March 1 to September 30 and cold period: January 1 to February 28 and from October 1 to December 31. Additionally, four optimum values for the seasonally varying proportionality coefficients were obtained for the Douro Valley, which are presented in Table II.
Using each of the two pairs of C i (C D Warm , C N Warm or C D Cold , C N Cold ) in equation 2, two forms of this equation were obtained. The first, uses C D Warm , C N Warm , and is intended for the estimation of the mean daily temperature for the warm months (months 3 to 9). The second, uses C D Cold , C N Cold , and is intended for the estimation of the mean daily temperature for the cold months (months 1, 2, 10, 11, and 12). The bias and Root Mean Square Errors (RMSE) obtained using both methods are presented in Table  III. The Dall'Amico Method estimator is unbiased while the TAvg method is a biased estimator of the actual mean daily temperature, overestimating the actual value, on average, 0.5ºC.
We assessed the accuracy of the mean temperatures estimates calculated using the Dall'Amico Method when using coefficients C D and C N optimized for a site other than the one for which the estimates are computed. We run several combinations using coefficients C D and C N optimized using the data from one single site and then assessing, for the two remaining sites, the accuracy of the estimates computed using the previously optimized C D and C N . The corresponding bias and RMSE have been computed and the results are shown in Table IV.
The results of the three combinations in Table IV show that when compared to the TAvg method, the Dall'Amico Method has comparable values for RMSE and a smaller bias. In a small region such as the Douro Valley, despite the fact that landscape may determine highly variable patterns for the evolution of temperatures along each day, the results indicate that the use of the optimized coefficients presented in the "the 3 stations" column in Table II will lead to estimates of the mean daily temperatures having a comparable variability but a smaller bias, relative to the TAvg method. Using the Dall'Amico Method with the C D and C N coefficients optimized using temperatures data from the three stations, we compared the differences between the actual mean monthly temperatures (the average of all mean daily temperatures during the correspondent month) and its estimates obtained by the two methods (TAvg and Dall'Amico) at each station (see Figure 3, Figure 4 and Figure 5).
In the 96-month period (Jan 2007 to Dec 2014), in the three stations (Cambres, Pinhão and Vilariça), 79.51% of the mean monthly temperatures estimated using the Dall' Amico method have a smaller deviation to the actual monthly mean temperatures than the estimates obtained using the TAvg method.
These results, together with the results obtained using mean daily temperatures (see Table III), show that the Dall'Amico method is a better approximation for both daily and monthly mean temperatures, having smaller bias and RMSE relative to the estimates obtained using the TAvg method.   Results for the methodology validation 1/7 are presented in Table VI showing that even when using a single year of data to train the model, the obtained coefficients enable equation 2 to produce good estimates of the mean daily temperatures when using new unknown data as input.
These results suggest that for a slow climate change process, the obtained coefficients will only require sparse recalibrations maintaining useable during periods of 7 or more years. Although the average of the eight resulting partial models has an RMSE and a bias both larger than correspondent values of the complete model, the estimates are still better than the ones obtained by using the TAvg method.
The present research shows that the TAvg estimation method overestimates the actual mean daily air temperatures in the Douro Valley. On average, the overestimation is of 0.5ºC, being larger for the warm months (0.6ºC) than for the remaining months (0.3ºC). The alternative estimation method, the Dall'Amico method, is also a simple method for estimating the mean daily temperature based on the daily maxima and minima temperatures. This latter method showed to be closer to the actual values of the mean daily temperature for every month of the year, having a small bias relative to the actual values (0.0 for daily mean temperatures and 0.13 for monthly mean temperatures in the 96-month period analyzed in this research).  Results for the approach of using a single year of data to estimate the model parameters Several meteorological and agricultural indexes based on daily temperatures use mean daily temperatures as input. It is not always clear for which methodology of mean temperatures calculation they are calibrated to. In viticulture, for example, growing degree-days (GDD) are defined as the sum of the mean daily temperature above a threshold from January 1 to a given date: , T avg is the average temperature and T base is a temperature used as threshold. The T avg value used in the calculation of GDD is not the actual mean daily temperature but its estimation obtained by using the TAvg method. For example, van Leeuwen et al. (2008) studied the heat requirement (GDD) for each grapevine variety to reach each phenological event defining heat requirement data for most grape varieties. Using daily average temperatures obtained by a methodology that produces estimates closer to the unknown daily mean tempetartures than the TAvg estimates will lead to a definition of more precise heat requirements for the grapevine phenological events. Nowadays, however, more accurate and consistent growing heat summation indices such as growing degree hours (GDH) based on hourly temperatures, are replacing GDD for phenological prediction (Gu, 2016).
Optimized parameters for three sites located at each of the three sub-regions of the Douro Valley (Baixo Corgo, Cima Corgo and Douro Superior), for the period 2007-2014, are made available in this research. Interpolation of the values of the parameters will allow their use to estimate mean temperatures (with moderate confidence) in sites different from the ones used in the optimization process. As the three stations are located at sites having an average elevation of 120 meters, the use of the optimized parameters made available in this research will make necessary to refer the daily maxima and minima to a reference elevation of 120. Rolland (2003) indicates that temperature decreases, on average, 5 ºC / 1000 meters elevation increase.
The use of more precise methods for the estimation of the actual mean daily temperature, although providing estimates closer to the actual values, should be followed by a recalibration of the indices and variables dependent on mean temperatures because, as in the GDD example, the indicative values of some indices are based on estimates obtained by using the routine TAvg method.

CONCLUSIONS
In the Douro Valley, for the period of 2007-2014 and for the three meteorological stations used in this research, the common approach overestimates, on average, the daily mean temperatures by 0.5 ºC (0.90 overestimates, 0.06 unterestimates, 0.04 exact matches) while the Dall'Amico method has, in the same period, a bias of 0.0 ºC (0.44 overestimates, 0.40 unterestimates, 0.16 exact matches). Although this does not seems a large bias, it will be interesting to note that in 1980-2009, the mean annual temperature in the Douro Valley increased 1.0 ºC, from an average value of 14.9 ºC in the early 80s to 15. 9 ºC at 2009 (Sousa 2015). Estimates which overestimate the actual mean temperature, on average, by 0.5 ºC, correspond, if the trend in 1980-2009 maintains, to the actual mean temperatures that will happen in this region by 2032, 15 years from now. In conclusion, this research confirms that the Dall'Amico method should be preferred to estimate the mean daily temperatures.

ACKNOWLEDGMENTS
This work was partly funded by: Associação para o Desenvolvimento da Viticultura Duriense (ADVID); "ERDF -European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation -COMPETE 2020 Programme within project «POCI-01-0145-FEDER-006961», and by National Funds through the FCT -Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013."