AMS-VegBank: a new database of vegetation plots for the Italian territory LONG DATABASE REPORT

The importance of collection, storage and exchange of georeferenced vegetation plot-based data has significantly grown in the recent decades, because of the new potentialities offered by ecoinformatics. In this article we introduce the Alma Mater Studiorum – University of Bologna vegetation database (AMS-VegBank; GIVD code EU-IT-021) compiling 17,505 georeferenced vegetation-plot observations within a time span of 90 years. This database includes 337,799 occur-rence data of vascular plant species, belonging to many different habitat types. The historical relevance of the presented database is highlighted by the presence of some of the most ancient vegetation-plot observations in Europe (years 1930– 1938). The geographic coverage of the database is mostly for Italian territory but it includes also data from other countries. The thematic focuses represented in the database are various, such as small Mediterranean islands, the Dolomite Mountains and the Italian National Parks. The large amount of historical plots available for the country not previously included in existing databases, combined with the constant action to improve the georeferencing of existing data and the addition of new data, highlight the uniqueness of this database. AMS-VegBank represents thus an important tool for studying plant biodiversity within the context of continental and global vegetation plot databases. Taxonomic reference: All plant names reported in this article follow the nomenclature by Pignatti et al. (2017–2019). Abbreviations: EVA = European Vegetation Archive; GIVD = Global Index of Vegetation-Plot Databases. Scope: This database aims to archive vegetation plot data from published and unpublished sources, to support the ecological, biogeographical and conservation research on Italian plant communities with a focus on specific projects and with an open cooperation with the other national databases. The database started as an internal archive of the vegetation ecologists working at the Alma Mater Studiorum – University of Bologna, the oldest university of the Western World, and one of the largest in Italy, with one of the largest connection network in Europe. The specific aim of AMS-VegBank is to collect vegetation data from specific research projects, by archiving vegetation plot data with high quality spatial and temporal references. Abstract: The Vegetation Plot Database - Alma Mater Studiorum - University of Bologna (AMS-VegBank) is presently made up of more than 17,500 vegetation plots, largely coming from specific research projects on Italian plant communities performed by researchers connected with the institution. The database is made up by collection of plot data containing accurate information on vegetation composition and structure as well as the physical environment collected within projects coordinated by leading vegetation scientists and performed in various parts of Italy. Among these, the most relevant projects are: 1) Mediterranean Islands Project (ISL), which consists of more than 5700 plots, many of which are original, on smaller Mediterranean islands; 2) The Dolomiti (DOL) project contains almost 2400 phytosociological plots, largely collected by Sandro and Erika Pignatti within a time frame of several decades and which have been accurately geo-referenced and properly archived; 3) the complete collection of all the plots sampled in the Foreste Casentinesi National Park (INP), including the first phytosociological relevés to vegetation mapping or collecting data for Vegetation database of the Nature Reserve Network of Romagna Region (RER), containing almost 1700 plots original. The Vegetation of serpentine outcrops of Italian peninsula (SER), with over 200 plots, almost completely original. The Segetal Vegetation project (SEV) plots.7) Monitoring project of the Park (MNP), with plots with


Introduction
In recent years, we have observed a constant growth of biodiversity studies and ecological informatics, thanks to the continuously growing storage and analysis capacities and the increasing relevance of biodiversity data (Franklin et al. 2017). The technical progress enhanced the sharing and integration of notable quantities of biodiversity data, stimulating the constitution of large data repositories. In the last 100 years, vegetation scientists collected and published impressive amounts of plant co-occurrence data by sampling vegetation plots or relevés (Chytrý et al. 2016;Bruelheide et al. 2019). Various continental or global initiatives are presently active in coordinating the aggregation of vegetation databases, such as: i) the Global Index of Vegetation-Plot Databases (GIVD; see Dengler et al. 2011), that contains metadata of many vegetation databases worldwide; ii) the European Vegetation Archive (EVA), a centralised database of European vegetation plots developed by the European Vegetation Survey working group of the International Association for Vegetation Science (IAVS) (Chytrý et al. 2016); iii) the sPlot Consortium, which is developing a global vegetation-plot database to investigate plant trait-environment relationships across biomes and which recently released an open access version (Bruelheide et al. 2019;Sabatini et al. 2021). Apart from these generalist large-scale initiatives, various others are also carried out and implemented, with a special focus on a particular environment or biogeographical region, i.e. the so-called thematic databases, as for example the GrassPlot database of multi-scale plant diversity in Palaearctic grasslands, which also differs in special data quality requirements (Dengler et al. 2018). On a national scale, in Europe two major approaches have been observed, ranging from the development of national databases in some countries, such as the Netherlands or Czech Republic (Chytrý and Rafajová 2003;Schaminée et al. 2012), or the independent development of local or specialised databases in other countries (e.g. France, Germany and Italy). On a wider scale, there are also supranational initiatives such as SIVIM database, which is a free access archive for Spain and Portugal (Font et al. 2012).
The storage and reuse of these data is now a fundamental source of information to address ecological and biogeographical questions, such as vegetation and habitat classification, ecological modeling, plant species invasions, understanding biodiversity patterns from a macroecological perspective or testing island biogeography theory (Chytrý et al. 2020;Biurrun et al. 2021;Chiarucci et al. 2021;Wagner et al. 2021). The large amount of available vegetation plots estimated for Europe by Schaminée et al. (2009) underlines the great potential offered by these data and, therefore, the necessity of make them easily available. The use of specific softwares for managing vegetation databases, such as Turboveg (Hennekens and Schaminée 2001), notably simplified and enhanced the collection and storage of large vegetation databases, so that it is currently recognized as the official software for EVA (Chytrý et al. 2016). The increasing collaborations among research centres of diverse countries also encouraged the collection and digitization of vegetation plots, with possible mutual data integration and, therefore, more robust results from a scientific viewpoint. The availability of large amounts of historical not yet digitized and recent vegetation plots for the Italian country makes it necessary to perform new efforts for vegetation data collection and storage. In Italy, 13 vegetation databases are currently registered in the GIVD. In this article we present the principal features of the new AMS-VegBank database, the vegetation-plot database of the Alma Mater Studiorum -University of Bologna (EU-IT-021). In particular, we illustrate the main collections of plot-based vegetation data of this database and its potential further improvement.

Data collection
Data collection started in early 2005 (with data stored in a local server), and was merged into a single database only in 2018. The first nucleus of the AMS-VegBank database was represented by a collection of the vegetation-plot observations performed on the small circum-Italian islands, that was then implemented by the merging of other data collections coming from projects carried out in different biogeographic regions (Alps, Apennines and Po valley). Vegetation-plot data were collected from formal publications (articles, books, monographs), but also from a large amount of «grey literature» (i.e. degree theses, doctorate theses, technical reports etc.), paying attention not to choose sources already present in other Italian databases and limit as much as possible duplications. Presently, the sources used to extract data for AMS-VegBank database are 245. Almost all the data were collected according to Braun-Blanquet's phytosociological method, but other approaches were also considered (e.g. estimation of percentage cover for all occurring species). Data are stored in Turboveg 2.140b (Hennekens and Schaminée 2001).
For each plot, all the site data are provided as reported in the original sources. We added new data fields to improve the administrative and geographical information of the database, namely country, region, province, municipality and macroarea (i.e. protected area, island, mountain range etc.). During the digitization process, when plots are not georeferenced (not all sources provided coordinates, namely those before the 2000s), geographical coordinates are inferred (when possible) from the name of the locality and attributed to each plot ensuring a specific dedication to convert the authors' indications about localities in the best approximated geographic coordinates, with various degrees of accuracy estimated in meters. Accuracy is estimated based on a geographic and topographic check constantly performed, by using elevation, slope and exposition provided by the source and remote sensing information in a GIS environment. When the original sources report only the geographical coordinates, but not the accuracy, we do not estimate it. Locations of plots in the maps are always cross-checked. Presently, we are dedicating a significant effort to interact with original authors to locate coordinates and improve the quality of plot geolocalisation.
Header information provided by the original publication (e.g. locality, elevation, slope, aspect, and other environmental data) is also stored. Elevation, slope and exposition data are never corrected, even when they are potentially wrong in the original source.
The original names of the taxa recorded in each plot are always maintained, which so far generated a list of 20,682 taxa (some of them identified at the genus level, but most at the species or subspecies level), here included vascular plant species, bryophytes and lichens. Concerning vascular plant species, the nomenclature is largely based on Conti et al. (2005Conti et al. ( , 2007, but also includes other nomenclatural sources, depending on the historical origin of data (Fiori 1923(Fiori -1929Zangheri 1976). We maintained original species names to ensure traceability of taxonomical harmonization, leaving the option to standardize the names to the researchers using the data and avoiding accumulation of taxonomic interpretations. For most of the vegetation plots, the original syntaxonomic assignment or a classification following broad physiognomical categories (e.g. beech forest, arid grassland, humid grassland etc.) are also recorded.

Database content
The AMS-VegBank is now one of the principal Italian vegetation databases in terms of data storage and geographic coverage. It presently contains 17,505 vegetation-plot data, performed in the period 1930-2021 mostly according to Braun-Blanquet's phytosociological method (92.4%), subdivided in various thematic collections (see further). To these species co-occurrence data correspond 337,799 plant species records. These data derive from formally published papers (71.4%), BSc, MSc and PhD theses (11.9%), unpublished data from specific projects of vegetation sampling in areas of particular naturalistic relevance (14.4%), and present fieldwork performed by the members of the BIOME Lab of the University of Bologna. 5015 vegetation plots (28.6%) are still unpublished (Table 1). The distribution per decade of the vegetation plots shows a progressive increase during time, with the lowest values in the first half of the 20 th century and the highest in the 1990s ( Table 2). The geographical distribution of the vegetation plots shows a predominance of those performed in Italy; a marginal share (7.0%) of vegetation plots coming from other countries (especially Malta) is also present ( Table 3). Concerning the Italian regions, the vegetation plots principally come from Veneto, Tuscany, Emilia-Romagna and Sicily (68.0% of the total plot number for the Italian territory; see Figure 1a). The highest density of plots is recorded for the low altitudes (0-400 m a.s.l.), in particular for the plain and the low hills ( Figure 1b). The precision of the georeferentiation spans from very accurate coordinates provided by high-precision GPS to a maximum of 30,000 m for some old vegetation plots not having detailed information about sampling location; 502 vegetation plots (2.9% of the total) were not georeferenced since it was impossible to attribute credible coordinates for the insufficient data provided by the authors. Anyway, precision values of more than 10,000 m are exceptional (ca. 1.4% of the total number of plot data), whereas a precision comprised between 0 and 50 m is much more frequent (22.8% of plots); in most cases, precision value ranges between 201 and 1000 m (32.4% of plots). Precision values comprised between 51 and 200 m and between 1001 and 10,000 m characterise 15.5 and 15.7% of the total amount of plots, respectively. In addition, 2124 (12.1%) vegetation plots do not have an accuracy estimation for the geographical coordinates, since this value was not indicated in the original sources or was not estimable in a credible way. Species richness per plot ranges from 1 to 103 and average is 19.3 ± 12.2 (s.d.); the highest density of plots is recorded for a number of species observed equal to 16 per plot ( Figure 1c). The plot area ranges from a minimum of 0.01 m² to a maximum of 1000 m²; the most frequent plot area value is 100 m², that is one of the standards used by our team members during field sampling. For 2111 plots (12.1% of the total) no indication of the surface considered was provided in the original sources.
A brief description of the various collections of plot data making up the AMS-VegBank is given in the following paragraphs.

Dataset «Small Mediterranean Islands»
This dataset includes 5723 vegetation plots, 1048 still unpublished (18.3% of the total), sampled over the period 1950-2021 in 54 islands and islets of the central Mediterranean area. The archipelagos with the highest concentration of plot data are, in descending order, the Tuscan Archipelago, the Maltese Islands, the Aeolian Islands, the Pelagian Islands and the Tremiti Islands. The Tuscan Archipelago counts 1616 plots (period 1975-2010), of which 331 still unpublished (20.5% of the total). The plot data derive largely from regional projects of vegetation description performed during the 1990s and 2000s, and the corpus of plots was used for testing fundamental biogeographical theories such as island species-area relationship and the importance of habitats as independent predictors of species richness patterns and species composition within islands . The Maltese Islands count 916 vegetation plots (period 1973-2017), all of them already published as a monographic study of the vegetation of this archipelago (Brullo et al. 2020). The Aeolian Islands count 794 vegetation plots (period 1966-2020), of which 143 are still unpublished (18.0% of the total), which are currently under analysis for a dedicated article. They derive from numerous vegetation and phytogeographical studies concerning the circum-Sicilian islands and, more in general, the Mediterranean coasts. The Pelagian Islands count 568 vegetation plots (period 1975-1993), all of them already published. The Tremiti Islands count 474 vegetation plots (period 1966-2020), of which 303 still unpublished (63.9% of the total), and they are presently being used for a dedicated article.
The territory covered by this dataset is 968.4 km² wide and the altitudinal range varies from 0 to 1000 m a.s.l., with a clear predominance of the plots falling within the range 0-150 m. The vegetation is articulated in Mediterranean chasmophytic and xerophilous (at times halophilous) communities, typical of both calcareous and volcanic substrates, Mediterranean maquis, sometimes residual Mediterranean forests dominated by Quercus ilex or, rarely, Pinus halepensis.

Dataset «Dolomites»
This dataset contains an important collection of 2387 vegetation plots (221 still unpublished, i.e. 9.3%) collected in the period 1955-2019 in the Dolomite Mountains. The majority of these plots (1962 plots, 82.2%) were performed by Sandro Pignatti and Erika Wikus Pignatti, during several decades, and largely used to produce a monographic book (Pignatti and Pignatti 2014). The original data files were kindly provided by the Pignatti family to us and we dedicated part of two BSc theses to georeference each plot. Since 2009, the Dolomites are a UNESCO World Heritage Site for their renowned environmental, landscape and biological value: they host indeed various endemic or subendemic species, such as Alchemilla lasenii, Campanula morettiana, Delphinium dubium, Minuartia graminifolia, Rhizobotrya alpina etc.   1921-1930 1 1931-1940 41 1941-1950 25 1951-1960 186 1961-1970 578 1971-1980 2340 1981-1990 2132 1991-2000 5010 2001-2010 3388 2011-2020 3674 2021-2030 130 Total number 17505 The plot data were recorded in the altitudinal range 350-2950 m a.s.l., with the highest concentration in the range 1500-2500 m a.s.l.; 100 were performed between 2500 and 2950 m a.s.l. The vegetation described by these plots is montane and alpine, characterised by forests of broadleaved trees, dry and steppe grasslands, alpine grasslands, alpine taiga with conifers and Rhododendron, rupicolous and petrophilous formations on dolomitic and calcareous screes and pioneer vegetation of the highest altitudes.

Dataset «Italian National Parks»
This dataset includes 998 vegetation plots (period 1930-2021), 415 of which are still unpublished (41.6%). Many of them were performed in the Foreste Casentinesi National Park, one of the most important forest areas at a European level, with wide old-growth mixed forests of Fagus sylvatica and Abies alba (Blasi et al. 2010), part of which is now recognized as a UNESCO World Heritage. The 12 vegetation plots performed before 1950 (the oldest one dating to 5 August 1930) are likely the most ancient data of such type at the Italian level and among the most ancient ones at a European level. The plot data stored in this collection derive from old works (Zangheri 1966), degree theses discussed at the University of Bologna under the supervision of illustrious scholars, a doctorate thesis (Lelli et al. 2018(Lelli et al. , 2021 and other data collection projects. The territory studied is 364 km² wide and the altitudinal range varies from 400 to 1658 m a.s.l., with most of the plots recorded in the range 600-1300 m. The vegetation described by these plots is principally represented by Quercus spp. woodlands in the hill belt and Fagus sylvatica forests in the mountain belt, sometimes mixed with Abies alba; coniferous reforestations with Picea abies, Pseudotsuga menziesii, Pinus nigra and P. calabrica are relatively common in degraded and eroded slopes. In the old-growth forest areas, generally located between 1200 and 1550 m a.s.l., A. alba and F. sylvatica are accompanied by Fraxinus excelsior, Tilia platyphyllos, Acer platanoides, A. pseudoplatanus, A. opulifolium and Ulmus glabra, a very rare vegetation in the Apennines with relict nature (Viciani et al. 2010). The first integral nature reserve of Italy, namely the Sasso Fratino Forest, is located in this national park and its vegetation data are here stored.
This data set is now being expanded in order to cover all the national parks of Italy, to provide a useful analytical tool for assessing the biodiversity of this important network of protected areas.

Dataset «Protected Areas of Emilia-Romagna»
This dataset includes 1681 vegetation plots, 1505 of which are available at a dedicated data centre of the regional administration (Regione Emilia-Romagna 2022), performed in the period 1940-2007 in various areas of great naturalistic interest, which were designed for the institution of protected areas (many of them are now part of the Natura 2000 network as Sites of Community Importance). All these areas falls within the Emilia-Romagna region, a transitional area between the continental and Mediterranean parts of Italy (Ferrari 1980). The territory studied is nearly 2000 km² wide and the altitudinal range varies from 0 to 2020 m a.s.l., with a clear predominance of the plots recorded in the range 500-1500 m. The vegetation is highly variable: halophytic communities typical of littoral environments or brackish marshes, submediterranean xerophilous communities in the gypsum outrocks and badlands, mixed Quercus spp.-dominated and Fagus sylvatica forests, high altitude grasslands on the Apennines watershed etc.

Dataset «Serpentine outcrops of the Italian peninsula»
This dataset counts 231 vegetation plots (period 1994-1998), already published, performed on mafic and ultramafic rocky substrates in Tuscany and eastern Liguria (e.g. Chiarucci et al. 1998;Chiarucci 2003). The vegetation is characterised by xerophilous and metallophilous species forming sparse grasslands similar to a garrigue, but also by shrub formations with Juniperus oxycedrus or reforestations with conifers. Many endemic and highly specialised species, such as Alyssum bertolonii or Armeria denticulata, are exclusively limited to this type of vegetation.

Dataset «Segetal vegetation»
This database counts 779 vegetation plots (period 1952-2019), already published, principally performed in northern and central Italy, in areas cultivated with cereals, maize, legumes and other vegetables. The vegetation is characterised by segetal and commensal species, weeds and species with broad ecological spectrum. Part of this group of vegetation plots was already used for a comprehensive study on the vegetation of arable habitats of central Europe (Glaser et al. 2022). However, there is no overlap in the data between the resulted European Weed Database and the presented database.

Dataset «Maremma Regional Park»
This database counts 90 vegetation plots (period 2007-2009), all of them unpublished and deriving from a dedicated project of vegetation monitoring within the Maremma Regional Park, a coastal area of great naturalistic value, still largely undisturbed. Despite small in size, this data set is relevant for the spatial representativeness of the data and for the accuracy of geolocalisation. In fact, the plots of this project were recorded according to a probabilistic sampling design and are therefore spatially representative of the park conditions. All plots have a high accurate spatial location (< 1 m). The vegetation here represented is Mediterranean, essentially ascribable to maquis, Quercus ilex or Pinus spp. forests, garrigues, coastal or halophilous communities etc.

Dataset «Various Italian plots»
This dataset counts 5616 vegetation plots (period 1951-2021), of which 1736 still unpublished (30.9%). It is a miscellaneous collection of plots not included in the above mentioned datasets, which were performed in various environments (quarries, springs, river beds, rocky hills, forests, marine coasts etc.) and published in different contexts (technical reports, degree theses, academic studies), therefore the vegetation here represented is ascribable to many diverse physiognomic types.

Future perspectives and ongoing projects
AMS-VegBank is a large and consolidated database of vegetation-plot data, largely but not esclusively focusing on Italian area, storing a huge amount of quality data for some specific geographic areas (e.g. small Mediterranen islands, Dolomite Mountains), for some important protected areas (e.g. Italian National Parks, protected areas of Emilia-Romagna), but also includes vegetation-plot data on other thematic subjects, such as serpentine outcrops, segetal vegetation and others. This makes AMS-VegBank a multi-thematic database covering different habitats and ecosystems of the Italian country and surrouding areas.
As above indicated, data collection and storage in AMS-VegBank database is still ongoing and the number of plot data preserved in the database is continuously and quickly growing. Presently active projects include the digitization of studies carried out by the members of the research group, such as a large project performed in 2015 on the gypsum outcrops of the Emilia-Romagna region (ca. 160 plots), a project on the Apennine chestnut groves (ca. 500 plots), a project on the drainage canals of the Po valley (ca. 120 plots, that will be a precious peculiarity of AMS-VegBank: drainage canals are generally little considered by vegetation and plant ecologists, therefore vegetation data for these environments are very rare and frequently confined to the grey literature - Montanari et al. 2020), a project on the protected areas of the Natura 2000 network in the province of Siena (Tuscany -ca. 600 plots; Chiarucci et al. 2012), and other projects focused on the resurvey of historical plots. Furthermore, a synergic effort is now being carried out to avoid duplication of data and taxonomic and syntaxonomic inconsistencies with major Italian vegetation databases. The AMS-VegBank database is now part of collaborative collections and studies of diversity observations aiming to boost scientific research, nature conservation, open science and dissemination -namely, EVA, inter-university centre PlantData. Within EVA, data accessibility is semi-restricted (code 2), therefore a formal request with a project proposal has to be submitted to EVA. No versioning has been provided for AMS-VegBank.
Substantially, AMS-VegBank is now an important reference for making available to the global scientific community the vegetation-plot data of many Italian and other Mediterranean areas.
Despite the figures here provided, the data entry to AMS-VegBank are constantly growing, because of several ongoing digitization projects. All the data provided here are updated to 24 June 2022.

Authors contributions
AC, NA, FB and VB conceptualised the paper. FB wrote a first draft, NA and AC contributed to this draft with data elaboration, figures and critical comments, VB managed the database. MC, ME, CL, IM, JN, GV, GP and PZ provided vegetation plot data and useful suggestions. All authors critically read and approved the final version of the article.