MULTIVARIATE CLUSTERING OF VITICULTURAL TERROIRS IN THE DOURO WINEMAKING REGION

The Douro Demarcated Region (DDR) is one of the most important winemaking regions in Portugal. Viticulture is historically and culturally tied to the DDR, having a strong impact on the local economy. This mountainous region, characterized by the steep slopes of the Douro Valley, provides a wide range of environmental characteristics for grapevine growth. Different climatic conditions, soil profiles, topography, grapevine varieties and management practices comprise the DDR terroir , resulting in the distinctiveness of the wines produced. In the present study, an assessment of the homogeneous zones of interest for viticultural activities is proposed for the DDR, by integrating different terroir elements (thermal, hydric, soils, topography and vegetation), using state-of-the-art high resolution datasets (1 km) and a large number of variables/indices. A multivariate zoning was carried out using a principal component analysis and a subsequent clustering methodology. A geospatial assessment of the terroir elements was also performed separately for each type of vegetation covering the DDR. The inter-connections of the different terroir elements that exist at a given location, were also innovatively assessed. The zoning may promote a more appropriate selection of vineyard sites, the selection of more locally-adapted varieties and rootstocks and the adoption of appropriate viticultural practices and management planning. After assessing the terroir conditions that exist at the current vineyard locations, possible expansion zones within the DDR were also evaluated. The present study may also be used as an archetypal methodology that can be applied to other winemaking regions worldwide


INTRODUCTION
For viticulture, identifying the uniqueness of the environmental characteristics within a given region is crucial to analyze the winemaking suitability and potential. This zoning assesses the most relevant viticultural characteristics in each winegrowing region, providing a basis for strategic vintage planning (Jones, 2006;Clingeleffer, 2014;Costantini et al., 2016). The regional climate, soil properties, topography, varieties (biodiversity) and management practices of any given winemaking regions are usually described by the terroir concept (OIV, 2010). Although this term also includes other more broad aspects, signifying the distinctiveness and origin of the wines (e.g. branding, wine type and landscape), grapevine yields and berry quality attributes are usually linked to the abovementioned terroir characteristics.
Climate is the main driver for grapevine development and growth (Keller, 2010). Sunlight, heat and water demands influence grapevine yields and are responsible for a balanced grape ripening (e.g. Bindi et al., 1996;Jones and Davis, 2000;Camps and Ramos, 2012;Fraga et al., 2015). Soil is another important terroir element, crucial for water and nutrient uptake, which may also influence berry quality attributes (Morlat and Jacquet, 2003;Renouf et al., 2010). The topographic elements represent yet another key factor that influences viticultural and oenological characteristics of a given region. Amongst the most important topographic elements for viticulture are elevation, slope and solar exposure (Nascimbene et al., 2013;Yau et al., 2013). Additionally, vegetation type (grapevine cultivars, cover crops, intercrops and mixed vegetation systems) and management practices are also important terroir composing elements (Böhm, 2010).
The Douro Demarcated Region (DDR) is one of the oldest regulated winemaking regions in the world, famous for its Port Wine and other high-quality wines (Oliveira et al., 2005;Magalhães, 2008;Cunha and Richter, 2016;Fraga and Santos, 2017). The DDR spans over 250,000 ha in the Douro valley ( Figure 1). It is one of the most important winemaking regions in Portugal, accounting for approximately 20% of the country's vineyard area and wine production (IVV, 2015). The terroir characteristics of the DDR are indeed unique and are not commonly found in other regions. Apart from vineyards, which is the main regional crop (over 20% of the total area), forestry systems occupy nearly 50% of the area, followed by pastures (20%), fruit trees (mostly dry fruits, ~5%) and olive trees (~5%) (EEA, 2002).
Classified as Word Heritage by the UNESCO since 2001, for its natural resources and unique vineyard landscape, it is one of the most important mountainous viticultural regions in the world. The complex topography requires vineyards to be grown mostly on hillsides, requiring large human resources. The varieties planted are mostly autochthonous/native to the country, such as 'Touriga Nacional' or 'Touriga Franca'. The Mediterranean-like climatic conditions are particularly noticeable in this region, with an annual precipitation ranging from 400 to 900 mm (30% from April to September, gradually decreasing eastwards), and annual mean temperatures varying from 12 to 16°C.
Given the distinctness of the DDR, it is important to define its terroir characteristics in a thorough integrated approach. Several studies have addressed the terroir characteristics of a given winemaking region (Storchi et al., 2005;Jones, 2006;Carey et al., 2008;Clingeleffer, 2014;Costantini et al., 2016;Moral et al., 2016). However, for the DDR, few studies have been devoted to this assessment. Given the Mediterranean-type climatic characteristics and the complex topography of the DDR, very high resolution zoning data is required for a better understanding of the interactions and synergies among the different terroir elements. Similar terroir conditions across the regions are likely not only to produce grapes and wines with analogous characteristics, but also to enable crop management planning. Due to its high socioeconomic significance, vineyards are occupying increasing areas within the region. As such, it also becomes important to assess zones with high viticultural potential that are currently used for other less profitable crops, such as annual field crops, pastures and fruit trees ( Figure 1a). Furthermore, climate change is projected to have strong negative impacts on both yields and quality of the wines produced in the DDR (Fraga et al., 2016a). A high resolution viticultural zoning of the DDR may allow developing adaptation measures against the projected warming and drying trends (Jones and Alves, 2012;Cunha and Richter, 2016;Fraga et al., 2016a,b). The present study aims to develop a multivariate zoning approach, through a holistic integration of the main terroir elements, herein illustratively applied to the DDR. A similar methodology can be conducted in other viticultural regions worldwide.
A fourth set of variables was chosen to characterize the topography of the DDR, including elevation (m), slope (º) and solar exposure (number of hours of sun, excluding clouds). Elevation was retrieved from a digital elevation model, DEM (GTOPO30). Slope and solar exposure (photoperiod) were calculated from the DEM dataset using geographical information systems -ArcGIS (version 10.3.1) Spatial Analyst Tool.
A fifth set of variables was included to take into account the effect of vegetation growth and grapevine phenology. The Normalized Difference Vegetation Index (NDVI) was used for the analysis of the vegetative growth. NDVI takes into account the difference between near-infrared and red reflectances, providing a measure of vegetation greenness, where higher NDVI corresponds to higher vegetation greenness (Holm et al., 1987;Roderick et al., 1996;Huete et al., 2002). Applied to viticulture, spectral vegetation indices may assist management activities (Johnson et al., 2003;Fraga et al., 2014a) and have shown links to production and wine attributes (Johnson et al., 2001;Usha and Singh, 2013). In the present study, NDVI from the Moderate Resolution Imaging Spectroradiometer (MODIS) is obtained from the National Aeronautics and Space Administration (NASA) Land Processes Distributed Active Archive Center (LP DAAC; https://lpdaac.usgs.gov/). Two tiles covering all of Portugal (h17v04, h17v05) are retrieved, at a 1 km spatial resolution and on a 16-day temporal resolution, from 2000 to 2015. Furthermore, a set of maps containing the Julian dates of the main grapevine phenological timings (budburst, flowering, veraison and harvest) were also included as variables in the vegetation dynamics set. This data, which has been previously validated, was obtained from Fraga et al. (2016a).

Principal component analysis and clustering methodology
The delineation of homogeneous zones in the DDR was carried out using a principal component analysis (PCA) and a subsequent k-means clustering methodology, using MATLAB (version R2017a). Variables of each of the five terroir elements (thermal, hydric, soils, topography and vegetation) were first normalized by the corresponding mean prior to PCA, as variables have different physical units, also warranting equal weighting in clustering. PCA was used herein as a method to reduce the number of variables (over 40 variables for all terroir elements; Table I) to retain for subsequent clustering (k-means on the subspace of the leading orthogonal modes). After a preliminary sensitivity analysis, the leading two components (PC1 and PC2) of each terroir element were retained, in all cases cumulatively explaining much of the variance. By retaining only the leading two components, most of the variability in the initial variables is retained, while the dimension of the subsequent clustering is significantly lowered.
Including additional components was also tested, though the analysis was not significantly improved at the expanse of a larger number of variables and less statistically robust results. The number of resulting clusters was tested and four clusters were eventually selected for each element. The selection of the number of clusters, though subjective (different number of clusters were also tested), was found to be a balanced compromise solution for capturing the different conditions within the DDR. In fact, a lower number (two or three clusters) is not able to entirely retain the diversity of the region, while a larger number would produce artificial sub-regions. Finally, the resulting clusters were mapped, corresponding to the viticultural terroirs in the DDR at ca. 1 km spatial resolution (total of 3910 1km grid-cells).

Relationships between the different terroir elements
Following the maps of the clusters for each terroir element, their integrated assessment is performed. For this purpose, the highest occurrence between clusters of different terroir elements at the same 1 km gridcell is determined. This assessment was done separately for each of the sub-regions within the DDR (Baixo Corgo: BC; Cima Corgo: CC and Douro Superior: DS). The frequencies of occurrence of each clusters/terroir element were also assessed separately by each land cover class, using the CORINE land cover classification (EEA, 2002) shown in Figure 1b.

Clustering approach
As previously mentioned, the leading two principal components of each terroir element were retained for clustering. Table II shows the percentage of variation explained by these components. Cumulatively, the explained variance of the first two PCs was over 93% for the thermal, hydric and vegetation groups and over 66% for the soil and topography groups. The first principal component represents more than 90% of the total variance for the thermal and hydric groups, while the second component represents much lower variance (3.5% and 1.5%, respectively). For the other three terroir elements, the first PC, while still explaining large fractions of variance, showed comparatively low values (soil: 44.9%; topography; 34.7%; vegetation: 62.8%), while the second PC now acquires higher relative relevance. For these terroir elements, the spatial organization of the region in near-homogeneous areas is much less marked than for the thermal, hydric and vegetation groups, thus explaining the overall lower fractions of represented variance. A description of the four resulting clusters of each terroir element is provided in Table III. While the description of the thermal, hydric, topography and vegetation clusters are self-explanatory, the soil clusters correspond to the underlining 'soil unit name' from FAO (Food and Agriculture Organization of the United Nations) in the original soil dataset (FAO/IIASA/ISRIC/ISSCAS/JRC, 2012). Therefore, the nomenclature used in the HSWD was kept. Regarding the thermal clusters (Fig.2a), a clear distinction is evident in all the corresponding variables, with minimum temperature in the winter playing a key role for the zoning. Also for the hydric conditions (Figure 2c), a strong variable differentiation is apparent, being DI the leading variable. For the soil clustering (Figure 2e), some of the base variables have a more decisive role in zoning. Luvisols have higher gravel levels, Humic Cambisols have higher total exchangeable bases, Eutric/Dystric Cambisols show sand content and higher exchangeable sodium and Dystric Regosols have higher levels of organic matter. Regarding topography (Figure 2g), the description of the clusters is directly based on the weight of each of the three base variables. For the vegetation clustering ( Figure  2i), NDVI acquires a more important role for zoning, particularly its summer values.

Spatial homogeneity in the DDR
After processing data from each of the terroir elements, a preliminary zoning of the DDR was achieved (Figure 2). Different homogeneous areas were established for each of the five terroir elements. The cooler (thermal cluster 1; henceforth Thermal-1) and moderate cooler zones (Thermal-2) are located in the outer/higher elevation areas of the DDR, while the moderate warm (Thermal-3) and warm (Thermal-4) zones are located in the inner/lower elevation areas (Figure 2b). For the whole DDR, the cluster with the highest frequency is Thermal-2, followed by Thermal-3 (Table IV). The DDR is primarily characterized by moderately wet/wet (Hydric-2/Hydric-1) conditions in the western and central areas (BC and CC), and moderately dry/dry (Hydric-3/Hydric-4) conditions in the east (DS) (Figure 2d). The Hydric-2 cluster has the highest occurrence (34% ; Table IV). Regarding the soils of the DDR (Figure 2f), most of the region corresponds to the Soil-3 cluster: Dystric Regosols (63%), while other clusters occur significantly less. Concerning topography, there is a strong association between high elevations, lower solar exposures and lower slopes (Topography-1 and -2, respectively) and between low elevations, high solar exposures and high slopes (Topography-3 and -4, respectively). The predominant cluster in the DDR is Topography-4 (33%) ( Table IV). For the vegetation clusters, a clear distinction between early phenology (Vegetation-1 and -2), in the eastern areas, and late phenology (Vegetation-3 and -4), in the west, is apparent. Higher greenness tends to occur in the higher elevation outer areas, while low greenness occurs in the inner areas of the DDR. The predominant cluster is Vegetation-2: early phenology and low greenness.

Diagrama de radar com os valores normalizados das variáveis utilizadas na metodologia de clustering: (a, b) térmico; (C, d) hídrico; (E, f) solos; (G, h) topografia; (I, j) dinâmica da vegetação. Representação geográfica dos agrupamentos de cada um dos cinco elementos do terroir na
Região Demarcada do Douro. Integrated sub-regional terroir assessment The relationships between clusters of the different terroir elements were also performed on a subregional level (for BC, CC and DS). The highest association between clusters is shown in Figure 3. In all sub-regions Dystric Regosols dominate the associations between soils and other terroir elements.
For the BC sub-region (Figure 3a), the moderately cool and the wet areas are strongly associated. Warm areas are associated with low elevations/high solar exposures and late phenology/low greenness. Wet areas are also tied to late phenology/high greenness and humic cambisols. In this sub-region, moderate wet areas are associated with low elevations/high solar exposures. In the CC sub-region, moderate cool and moderate wet areas are highly associated, being also connected to high elevation areas/low slopes and late phenology/low greenness. In the DS sub-region, moderate warm areas show links to early phenology/low greenness, while warm dry areas with low elevations and high solar exposures largely overlap.

Land cover types
A geospatial assessment of the terroir elements was also performed for each type of vegetation covering the DDR (Figure 4). Vineyards tend to be equally distributed in moderately cool (Thermal-2), moderately warm (Thermal-3) and warm zones (Thermal-4), and in moderately wet areas (Hydric-2). Dystric Regosols are the dominant soil type not only for this crop (Soil-4), but also for all the other vegetation types, since it is a regionally predominant soil type. Vineyards at low elevations and high solar exposures dominate the topography of the DDR (Topography-4). This crop is usually located in low greenness and late phenology zones (Vegetation-3).
Comparing the terroir elements of other vegetation types, it is clear that olive trees tend to be located in areas that show very similar characteristics to vineyard locations, with correlation coefficient r=0.72 (Table V). Pasture locations in the DDR also reveal some similarities to vineyard locations, but to a lesser extent (r=0.64). The areas that provide fewer similarities to the current vineyard locations are those devoted to annual crops (r=0.13), mostly due to hydric conditions in which they are currently grown (dryer lands).

DISCUSSION AND CONCLUSIONS
In the present study, an assessment of the homogeneous zones of interest for viticultural activities has been proposed for the DDR, by integrating different terroir elements, using state-ofthe-art high resolution datasets. Few studies have been dedicated to the assessment of the terroir composing elements over a whole region (Taylor, 2004;van Leeuwen et al., 2004;Carey et al., 2008;Yau et al., 2013;Costantini et al., 2016;Moral et al., 2016). However, the integration of all these factors, quantified by a large number of variables, into a high resolution viticultural zoning (~1km) in the DDR was not previously performed. The present study thereby significantly improves our understanding of the terroir spatial variability in the DDR.
Overall, four classes of each terroir element (thermal, hydric, soils, topography and vegetation dynamics) have been established, by applying a kmeans clustering on the subspace of the leading orthogonal modes. The delimited thermal classes may allow growers to select and grow grapevine varieties based on their optimum thermal requirements (Lopes et al., 2008;Alves et al., 2013). The hydric classes may also assist the selection of suitable rootstocks to improve plant water status (Pavlousek, 2011;Harbertson and Keller, 2012). These issues are particularly relevant taking into account the projected climate change impacts for the region, mainly the drying trend over the next decades (Fraga et al., 2016b;Santos et al., 2016). Furthermore, other terroir elements considered herein, such as slope degree, solar exposure, soil type and vegetation greenness, may assist growers in their management activities and vintage planning throughout the year (Johnson et al., 2001;Mackenzie and Christy, 2005;Judit et al., 2011;Bettiga et al., 2012;Nascimbene et al., 2013). The interconnections of the different terroir elements that exist at a given location was innovatively assessed. Although similar methodologies were already applied by previous studies to other winemaking regions, allowing the identification of the existing terroirs (Douglas et al., 2001;Nunez et al., 2011;Priori et al., 2014;Costantini et al., 2016;Moral et al., 2016), an integration of all these factors was still incipient. These interactions may allow identifying the different environmental processes that account for the winemaking quality potential of the different zones in the DDR.
The economic value of this crop is particularly relevant for the Portuguese economy. Regarding the size of the country, Portugal has already a relatively large coverage of planted vineyards, even when compared to other food crops (Anderson and Aryal, 2013). Nonetheless, given the increasing demand for DDR wines, including the world-famous Port wine, it is expected that the vineyard area continues growing. The results highlight that areas that are currently devoted to olive trees and pastures tend to have the highest suitability to new vineyard plantations. In fact, the strong relationship between vineyard and olive tree land use is already noticeable in the DDR. The multivariate zoning presented herein was able to clearly reveal this relationship. This assessment provides valuable information to future potential winemaking investors, in selecting new areas for vineyard settlements.
The multivariate zoning of viticultural terroirs provided in the present study may also be of great importance to policymakers and stakeholders operating in the DDR, acting as an additional decision support tool. Currently, in this region, a ranking system for vineyard plots is in force, with a classification scheme ranging from the best sites (class A) to the worst (class F) (Instituto dos Vinhos do Douro e Porto -IVDP). This ranking system may also be revised based on the current findings. In addition, taking into account the most recent climate change projections for the region (Fraga et al., 2016b;Santos et al., 2017), this ranking may require substantial adaptations in the future. Lastly, the present study can also be used as an archetypal methodology that can be applied to other winemaking regions in the country and even worldwide.